[PATCH v2 00/14] Align Linaro's and ARM's rt-app trees

List overview All Threads
Download

newer

older

Re: [Sched-tools] [PATCH v3 06/17]...

[PATCH 00/13] Align Linaro's and...

Juri Lelli

17 Nov 2016 17 Nov '16

4:34 p.m.

Hi all,

this is an attempt to align and finally merge Linaro's and ARM's versions of rt-app.

v2 should address Vincent's comments on v1, plus I added a patch about tracing (13/14) that I thought need fixing while I was doing more debugging.

Please let me know if this new version of the set is OK for merging.

Thanks a lot,

- Juri

[1] https://github.com/scheduler-tools/rt-app/commits/master

Juri Lelli (14): doc: clarify example for config step rt-app: fix compile warnings rt-app: add a t_zero to which every tasks has to refer rt-app: add wake-up latency to the log metrics rt-app: add slack to log metrics rt-app: fix batch tasks rt-app: add 'configured period' to log metrics rt-app: write logs only if continue_running is set rt-app: refactor log data to reduce the number of parameters of run() rt-app: move wu_lat to the end of the log rt-app: make c_prd c_period and enlarge it rt-app: add configured run (c_run) to log metrics rt-app: let caller manage tracing start/stop rt-app: add delay thread property

-- 2.10.0

Show replies by date

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 01/14] doc: clarify example for config step

Add detailed example for static cross-compile of rt-app.

Signed-off-by: Juri Lelli juri.lelli@arm.com Acked-by: Vincent Guittot vincent.guittot@linaro.org --- doc/tutorial.txt | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)

diff --git a/doc/tutorial.txt b/doc/tutorial.txt index 100a2df068c6..78cd1fe0e2e2 100644 --- a/doc/tutorial.txt +++ b/doc/tutorial.txt @@ -43,6 +43,31 @@ export ac_cv_lib_json_c_json_object_from_file=yes ./configure --host=aarch64-linux-gnu LDFLAGS=" --static -L<path to parent of json repo>/json-c/. libs/" CFLAGS="-I<path to parent of json repo>" --with-deadline make

+e.g, with a directory structure like the following: + +$ tree -d +. +├── autom4te.cache +├── build +├── build-aux +├── doc +│ └── examples +│ ├── cpufreq_governor_efficiency +│ ├── merge +│ └── tutorial +├── json-c +│ ├── autom4te.cache +│ ├── doc +│ │ └── html +│ └── tests +├── libdl +├── m4 +└── src + +the configure step becomes + +./configure --host=aarch64-linux-gnu LDFLAGS="--static -L./../json-c/." CFLAGS="-I./.." --with-deadline + **** json file skeleton ****

The json file that describes a workload is made on 3 main objects: tasks,

-- 2.10.0

Juri Lelli

4:34 p.m.

New subject: [PATCH v2 02/14] rt-app: fix compile warnings

Fix various compilation warnings in one go.

diff --git a/libdl/dl_syscalls.h b/libdl/dl_syscalls.h index 27cbfe16882d..ca07503ea744 100644 --- a/libdl/dl_syscalls.h +++ b/libdl/dl_syscalls.h @@ -17,7 +17,9 @@ #define __DL_SYSCALLS__

#include <linux/kernel.h> -#include <linux/unistd.h> +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> #include <time.h> #include <linux/types.h>

diff --git a/src/rt-app.c b/src/rt-app.c index 5283ac7fd668..211dd77da98f 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -170,7 +170,7 @@ int calibrate_cpu_cycles(int clock)

}

-static inline loadwait(unsigned long exec) +static inline unsigned long loadwait(unsigned long exec) { unsigned long load_count;

@@ -358,7 +358,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, for (i = 0; i < nbevents; i++) { if (!continue_running && !lock) - return; + return perf;

log_debug("[%d] runs events %d type %d ", ind, i, events[i].type); if (opts.ftrace) diff --git a/src/rt-app_args.h b/src/rt-app_args.h index 9bbde384328b..23ba50c5f343 100644 --- a/src/rt-app_args.h +++ b/src/rt-app_args.h @@ -36,10 +36,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. #ifdef DLSCHED #include "dl_syscalls.h" #endif - -#ifdef JSON #include "rt-app_parse_config.h" -#endif

#define DEFAULT_THREAD_PRIORITY 10 #define PATH_LENGTH 256 diff --git a/src/rt-app_parse_config.h b/src/rt-app_parse_config.h index 023cabd2ca62..9b0e5faf425d 100644 --- a/src/rt-app_parse_config.h +++ b/src/rt-app_parse_config.h @@ -45,5 +45,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

void parse_config(const char *filename, rtapp_options_t *opts); +void +parse_config_stdin(rtapp_options_t *opts);

#endif // _RTAPP_PARSE_CONFIG_H diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index c4840db89625..887bfc8ef79e 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -19,6 +19,8 @@ along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */

+#include <errno.h> +#include <string.h> #include "rt-app_utils.h"

unsigned long @@ -253,7 +255,7 @@ resource_to_string(resource_t resource, char *resource_name) void ftrace_write(int mark_fd, const char *fmt, ...) { va_list ap; - int n, size = BUF_SIZE; + int n, size = BUF_SIZE, ret; char *tmp, *ntmp;

if (mark_fd < 0) { @@ -274,8 +276,15 @@ void ftrace_write(int mark_fd, const char *fmt, ...)

/* If it worked return success */ if (n > -1 && n < size) { - write(mark_fd, tmp, n); + ret = write(mark_fd, tmp, n); free(tmp); + if (ret < 0) { + log_error("Cannot write mark_fd: %s\n", + strerror(errno)); + exit(EXIT_FAILURE); + } else if (ret < n) { + log_debug("Cannot write all bytes at once into mark_fd\n"); + } return; }

diff --git a/src/rt-app_utils.h b/src/rt-app_utils.h index 64f97f60aaab..a41d8ad205aa 100644 --- a/src/rt-app_utils.h +++ b/src/rt-app_utils.h @@ -28,6 +28,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. #include <string.h> #include <stdarg.h> #include <stdlib.h> +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> #include "rt-app_types.h"

#ifndef LOG_PREFIX

-- 2.10.0

Vincent Guittot

21 Nov 21 Nov

9:07 a.m.

New subject: [PATCH v2 02/14] rt-app: fix compile warnings

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Fix various compilation warnings in one go.

Signed-off-by: Juri Lelli juri.lelli@arm.com

Acked-by: Vincent Guittot vincent.guittot@linaro.org

...

libdl/dl_syscalls.h | 4 +++- src/rt-app.c | 4 ++-- src/rt-app_args.h | 3 --- src/rt-app_parse_config.h | 2 ++ src/rt-app_utils.c | 13 +++++++++++-- src/rt-app_utils.h | 3 +++ 6 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/libdl/dl_syscalls.h b/libdl/dl_syscalls.h index 27cbfe16882d..ca07503ea744 100644 --- a/libdl/dl_syscalls.h +++ b/libdl/dl_syscalls.h @@ -17,7 +17,9 @@ #define __DL_SYSCALLS__

#include <linux/kernel.h> -#include <linux/unistd.h> +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> #include <time.h> #include <linux/types.h>

diff --git a/src/rt-app.c b/src/rt-app.c index 5283ac7fd668..211dd77da98f 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -170,7 +170,7 @@ int calibrate_cpu_cycles(int clock)

}

-static inline loadwait(unsigned long exec) +static inline unsigned long loadwait(unsigned long exec) { unsigned long load_count;

@@ -358,7 +358,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, for (i = 0; i < nbevents; i++) { if (!continue_running && !lock)
                  return;
                  return perf;

          log_debug("[%d] runs events %d type %d ", ind, i, events[i].type);
          if (opts.ftrace)
diff --git a/src/rt-app_args.h b/src/rt-app_args.h index 9bbde384328b..23ba50c5f343 100644 --- a/src/rt-app_args.h +++ b/src/rt-app_args.h @@ -36,10 +36,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. #ifdef DLSCHED #include "dl_syscalls.h" #endif

-#ifdef JSON #include "rt-app_parse_config.h" -#endif

#define DEFAULT_THREAD_PRIORITY 10 #define PATH_LENGTH 256 diff --git a/src/rt-app_parse_config.h b/src/rt-app_parse_config.h index 023cabd2ca62..9b0e5faf425d 100644 --- a/src/rt-app_parse_config.h +++ b/src/rt-app_parse_config.h @@ -45,5 +45,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

void parse_config(const char *filename, rtapp_options_t *opts); +void +parse_config_stdin(rtapp_options_t *opts);

#endif // _RTAPP_PARSE_CONFIG_H diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index c4840db89625..887bfc8ef79e 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -19,6 +19,8 @@ along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */

+#include <errno.h> +#include <string.h> #include "rt-app_utils.h"

unsigned long @@ -253,7 +255,7 @@ resource_to_string(resource_t resource, char *resource_name) void ftrace_write(int mark_fd, const char *fmt, ...) { va_list ap;
  int n, size = BUF_SIZE;
  int n, size = BUF_SIZE, ret;
  char *tmp, *ntmp;

  if (mark_fd < 0) {
@@ -274,8 +276,15 @@ void ftrace_write(int mark_fd, const char *fmt, ...)
            /* If it worked return success */
            if (n > -1 && n < size) {
                  write(mark_fd, tmp, n);
                  ret = write(mark_fd, tmp, n);
                  free(tmp);
                  if (ret < 0) {
                          log_error("Cannot write mark_fd: %s\n",
                                          strerror(errno));
                          exit(EXIT_FAILURE);
                  } else if (ret < n) {
                          log_debug("Cannot write all bytes at once into mark_fd\n");
                  }
                  return;
          }
diff --git a/src/rt-app_utils.h b/src/rt-app_utils.h index 64f97f60aaab..a41d8ad205aa 100644 --- a/src/rt-app_utils.h +++ b/src/rt-app_utils.h @@ -28,6 +28,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. #include <string.h> #include <stdarg.h> #include <stdlib.h> +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> #include "rt-app_types.h"

#ifndef LOG_PREFIX

2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 03/14] rt-app: add a t_zero to which every tasks has to refer

Also a barrier is added so that execution starts at the same instant for everybody.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 211dd77da98f..442771f553f9 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -35,6 +35,8 @@ static int nthreads; static volatile sig_atomic_t running_threads; static int p_load; rtapp_options_t opts; +static struct timespec t_zero; +static pthread_barrier_t threads_barrier;

static ftrace_data_t ft_data = { .debugfs = "/sys/kernel/debug", @@ -217,7 +219,8 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem) }

static int run_event(event_data_t *event, int dry_run, - unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources) + unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources, + struct timespec *t_first) { rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -303,7 +306,7 @@ static int run_event(event_data_t *event, int dry_run,

if (rdata->res.timer.init == 0) { rdata->res.timer.init = 1; - clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next); + rdata->res.timer.t_next = *t_first; }

rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period); @@ -348,7 +351,8 @@ static int run_event(event_data_t *event, int dry_run, }

int run(int ind, phase_data_t *pdata, unsigned long *duration, - rtapp_resource_t *resources) + rtapp_resource_t *resources, + struct timespec *t_first) { event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -365,7 +369,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i); - lock += run_event(&events[i], !continue_running, &perf, duration, resources); + lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first); }

return perf; @@ -411,7 +415,7 @@ void *thread_body(void *arg) thread_data_t *data = (thread_data_t*) arg; phase_data_t *pdata; struct sched_param param; - struct timespec t_start, t_end; + struct timespec t_start, t_end, t_first; unsigned long t_start_usec; unsigned long perf, duration; timing_point_t *curr_timing; @@ -536,6 +540,21 @@ void *thread_body(void *arg) } }

+ if (data->ind == 0) { + /* + * Only first thread sets t_zero. Other threads sync with this + * timestamp. + */ + clock_gettime(CLOCK_MONOTONIC, &t_zero); + if (opts.ftrace) + log_ftrace(ft_data.marker_fd, + "[%d] sets zero time", + data->ind); + } + + pthread_barrier_wait(&threads_barrier); + t_first = t_zero; + log_notice("[%d] starting thread ...\n", data->ind);

fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n"); @@ -571,7 +590,7 @@ void *thread_body(void *arg)

duration = 0; clock_gettime(CLOCK_MONOTONIC, &t_start); - perf = run(data->ind, pdata, &duration, *(data->resources)); + perf = run(data->ind, pdata, &duration, *(data->resources), &t_first); clock_gettime(CLOCK_MONOTONIC, &t_end);

if (timings) @@ -660,7 +679,7 @@ int main(int argc, char* argv[]) /* allocated threads */ nthreads = opts.nthreads; threads = malloc(nthreads * sizeof(pthread_t)); - running_threads = 0; + pthread_barrier_init(&threads_barrier, NULL, nthreads);

/* install a signal handler for proper shutdown */ signal(SIGQUIT, shutdown);

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:35 p.m.

New subject: [PATCH v2 03/14] rt-app: add a t_zero to which every tasks has to refer

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Also a barrier is added so that execution starts at the same instant for everybody.

Signed-off-by: Juri Lelli juri.lelli@arm.com

Acked-by: Vincent Guittot vincent.guittot@linaro.org

...

src/rt-app.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 211dd77da98f..442771f553f9 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -35,6 +35,8 @@ static int nthreads; static volatile sig_atomic_t running_threads; static int p_load; rtapp_options_t opts; +static struct timespec t_zero; +static pthread_barrier_t threads_barrier;

static ftrace_data_t ft_data = { .debugfs = "/sys/kernel/debug", @@ -217,7 +219,8 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem) }

static int run_event(event_data_t *event, int dry_run,
          unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources)
          unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
{ rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -303,7 +306,7 @@ static int run_event(event_data_t *event, int dry_run,
                    if (rdata->res.timer.init == 0) {
                            rdata->res.timer.init = 1;
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          rdata->res.timer.t_next = *t_first;
                  }

                  rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
@@ -348,7 +351,8 @@ static int run_event(event_data_t *event, int dry_run, }

int run(int ind, phase_data_t *pdata, unsigned long *duration,
          rtapp_resource_t *resources)
          rtapp_resource_t *resources,
          struct timespec *t_first)
{ event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -365,7 +369,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first);
  }

  return perf;
@@ -411,7 +415,7 @@ void *thread_body(void *arg) thread_data_t *data = (thread_data_t*) arg; phase_data_t *pdata; struct sched_param param;
  struct timespec t_start, t_end;
  struct timespec t_start, t_end, t_first;
  unsigned long t_start_usec;
  unsigned long perf, duration;
  timing_point_t *curr_timing;
@@ -536,6 +540,21 @@ void *thread_body(void *arg) } }
  if (data->ind == 0) {
          /*
           * Only first thread sets t_zero. Other threads sync with this
           * timestamp.
           */
          clock_gettime(CLOCK_MONOTONIC, &t_zero);
          if (opts.ftrace)
                  log_ftrace(ft_data.marker_fd,
                          "[%d] sets zero time",
                          data->ind);
  }
  pthread_barrier_wait(&threads_barrier);
  t_first = t_zero;
  log_notice("[%d] starting thread ...\n", data->ind);

  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
@@ -571,7 +590,7 @@ void *thread_body(void *arg)
            duration = 0;
            clock_gettime(CLOCK_MONOTONIC, &t_start);
          perf = run(data->ind, pdata, &duration, *(data->resources));
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first);
          clock_gettime(CLOCK_MONOTONIC, &t_end);

          if (timings)
@@ -660,7 +679,7 @@ int main(int argc, char* argv[]) /* allocated threads */ nthreads = opts.nthreads; threads = malloc(nthreads * sizeof(pthread_t));
  running_threads = 0;
  pthread_barrier_init(&threads_barrier, NULL, nthreads);

  /* install a signal handler for proper shutdown */
  signal(SIGQUIT, shutdown);
-- 2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

Add wake-up latency metric (a la cyclictest).

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 29 +++++++++++++++++++---------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 442771f553f9..d23c3074f057 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -220,7 +220,7 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources, - struct timespec *t_first) + struct timespec *t_first, unsigned long *wu_latency) { rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -299,7 +299,7 @@ static int run_event(event_data_t *event, int dry_run, break; case rtapp_timer: { - struct timespec t_period, t_now; + struct timespec t_period, t_now, t_wu; log_debug("timer %d ", event->duration);

t_period = usec_to_timespec(event->duration); @@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,

rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period); clock_gettime(CLOCK_MONOTONIC, &t_now); - if (timespec_lower(&t_now, &rdata->res.timer.t_next)) + if (timespec_lower(&t_now, &rdata->res.timer.t_next)) { clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL); - else + clock_gettime(CLOCK_MONOTONIC, &t_now); + t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); + *wu_latency = timespec_to_usec(&t_wu); + } else { clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next); + *wu_latency = 0UL; + } } break; case rtapp_suspend: @@ -352,7 +357,8 @@ static int run_event(event_data_t *event, int dry_run,

int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources, - struct timespec *t_first) + struct timespec *t_first, + unsigned long *wu_latency) { event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -369,7 +375,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i); - lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first); + lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first, wu_latency); }

return perf; @@ -417,7 +423,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec; - unsigned long perf, duration; + unsigned long perf, duration, wu_latency; timing_point_t *curr_timing; timing_point_t *timings; timing_point_t tmp_timing; @@ -557,7 +563,9 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n"); + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n", + "#idx", "perf", "run", "period", + "start", "end", "rel_st", "wu_lat");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); @@ -588,9 +596,9 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;

- duration = 0; + duration = wu_latency = 0; clock_gettime(CLOCK_MONOTONIC, &t_start); - perf = run(data->ind, pdata, &duration, *(data->resources), &t_first); + perf = run(data->ind, pdata, &duration, *(data->resources), &t_first, &wu_latency); clock_gettime(CLOCK_MONOTONIC, &t_end);

if (timings) @@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf; + curr_timing->wu_latency = wu_latency;

if (opts.logsize && !timings) log_timing(data->log_handler, curr_timing); diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 57fb80d363d5..4c20f43cb627 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period; + unsigned long wu_latency; __u64 start_time; __u64 end_time; __u64 rel_start_time; diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 887bfc8ef79e..28896c9a76e6 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -134,14 +134,15 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%d\t%lu\t%lu\t%lu\t%llu\t%llu\t%llu", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu", t->ind, t->perf, t->duration, t->period, t->start_time, t->end_time, - t->rel_start_time + t->rel_start_time, + t->wu_latency ); fprintf(handler, "\n"); }

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:34 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Add wake-up latency metric (a la cyclictest).

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 29 +++++++++++++++++++---------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 442771f553f9..d23c3074f057 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -220,7 +220,7 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first, unsigned long *wu_latency)
{ rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -299,7 +299,7 @@ static int run_event(event_data_t *event, int dry_run, break; case rtapp_timer: {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);
                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
  case rtapp_suspend:
@@ -352,7 +357,8 @@ static int run_event(event_data_t *event, int dry_run,

int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first,
          unsigned long *wu_latency)
{ event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -369,7 +375,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first, wu_latency);
  }

  return perf;
@@ -417,7 +423,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec;
  unsigned long perf, duration;
  unsigned long perf, duration, wu_latency;
  timing_point_t *curr_timing;
  timing_point_t *timings;
  timing_point_t tmp_timing;
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",

replacing tab with space breaks the cpufreq_governor_efficiency script

...

                             "#idx", "perf", "run", "period",
                             "start", "end", "rel_st", "wu_lat");

  if (opts.ftrace)
          log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind);
@@ -588,9 +596,9 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;
          duration = 0;
          duration = wu_latency = 0;
          clock_gettime(CLOCK_MONOTONIC, &t_start);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first, &wu_latency);
          clock_gettime(CLOCK_MONOTONIC, &t_end);

          if (timings)
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;

          if (opts.logsize && !timings)
                  log_timing(data->log_handler, curr_timing);
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 57fb80d363d5..4c20f43cb627 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period;
  unsigned long wu_latency;
  __u64 start_time;
  __u64 end_time;
  __u64 rel_start_time;
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 887bfc8ef79e..28896c9a76e6 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -134,14 +134,15 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler,
          "%d\t%lu\t%lu\t%lu\t%llu\t%llu\t%llu",
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu",
          t->ind,
          t->perf,
          t->duration,
          t->period,
          t->start_time,
          t->end_time,
          t->rel_start_time
          t->rel_start_time,
          t->wu_latency
  );
  fprintf(handler, "\n");
}

2.10.0

Juri Lelli

4:26 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 18/11/16 16:34, Vincent Guittot wrote:

...

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...

...
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
replacing tab with space breaks the cpufreq_governor_efficiency script

I'm not very familiar with the script, but does this fix it?

--- doc/examples/cpufreq_governor_efficiency/dvfs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/examples/cpufreq_governor_efficiency/dvfs.sh index 7caee7d437f7..ac69a53086e2 100755 --- a/doc/examples/cpufreq_governor_efficiency/dvfs.sh +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh @@ -32,7 +32,7 @@ if [ $1 ] ; then sum=0 loop=0 overrun=0 - for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' |cut -f 3); do + for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' | awk '{print $3}'); do loop=$(expr $loop + 1) sum=$(expr $sum + $i) if [ $4 -le $i ] ; then

Vincent Guittot

4:33 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 18 November 2016 at 17:26, Juri Lelli juri.lelli@arm.com wrote:

...

On 18/11/16 16:34, Vincent Guittot wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
replacing tab with space breaks the cpufreq_governor_efficiency script
I'm not very familiar with the script, but does this fix it?

doc/examples/cpufreq_governor_efficiency/dvfs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/examples/cpufreq_governor_efficiency/dvfs.sh index 7caee7d437f7..ac69a53086e2 100755 --- a/doc/examples/cpufreq_governor_efficiency/dvfs.sh +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh @@ -32,7 +32,7 @@ if [ $1 ] ; then sum=0 loop=0 overrun=0
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' |cut -f 3); do
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' | awk '{print $3}'); do

yes, it is

...

            loop=$(expr $loop + 1)
            sum=$(expr $sum + $i)
            if [ $4 -le $i ] ; then

Juri Lelli

4:47 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 18/11/16 17:33, Vincent Guittot wrote:

...

On 18 November 2016 at 17:26, Juri Lelli juri.lelli@arm.com wrote:

...
On 18/11/16 16:34, Vincent Guittot wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
replacing tab with space breaks the cpufreq_governor_efficiency script
I'm not very familiar with the script, but does this fix it?

doc/examples/cpufreq_governor_efficiency/dvfs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/examples/cpufreq_governor_efficiency/dvfs.sh index 7caee7d437f7..ac69a53086e2 100755 --- a/doc/examples/cpufreq_governor_efficiency/dvfs.sh +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh @@ -32,7 +32,7 @@ if [ $1 ] ; then sum=0 loop=0 overrun=0
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' |cut -f 3); do
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' | awk '{print $3}'); do
yes, it is

OK, good. I think I'll split the modification of the header (together with this fix) and the $SUBJECT path.

Patrick Bellasi

22 Nov 22 Nov

2:12 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 18-Nov 16:47, Juri Lelli wrote:

...

On 18/11/16 17:33, Vincent Guittot wrote:

...
On 18 November 2016 at 17:26, Juri Lelli juri.lelli@arm.com wrote:

...
On 18/11/16 16:34, Vincent Guittot wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
replacing tab with space breaks the cpufreq_governor_efficiency script
I'm not very familiar with the script, but does this fix it?

doc/examples/cpufreq_governor_efficiency/dvfs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/examples/cpufreq_governor_efficiency/dvfs.sh index 7caee7d437f7..ac69a53086e2 100755 --- a/doc/examples/cpufreq_governor_efficiency/dvfs.sh +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh @@ -32,7 +32,7 @@ if [ $1 ] ; then sum=0 loop=0 overrun=0
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' |cut -f 3); do
  for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed '1d;n;d' | sed '1d' | awk '{print $3}'); do

This can be further simplified using a single awk command to: a) skip all header line (assuming they start by #) b) printing the third column of all the following pair numbered lines

which is something like:

LOGFILE=rt-app_$1_run$3us_sleep$4us.log for i in $(awk '/^#/{HC++; next;} {SN=NR-HC+1; if(SN%2){print $3}}' $LOGFILE)

where I've used: HC (HeaderCount) : counter of header line SN (SampleNumber) : counter of samples lines NR (Line Number) : awk's predefined line number variable

-- #include <best/regards.h> Patrick Bellasi

Vincent Guittot

21 Nov 21 Nov

9:28 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Add wake-up latency metric (a la cyclictest).

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 29 +++++++++++++++++++---------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 442771f553f9..d23c3074f057 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -220,7 +220,7 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first, unsigned long *wu_latency)
{ rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -299,7 +299,7 @@ static int run_event(event_data_t *event, int dry_run, break; case rtapp_timer: {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);

You should add the same wake up latency metric for the sleep event

...

                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
  case rtapp_suspend:
@@ -352,7 +357,8 @@ static int run_event(event_data_t *event, int dry_run,

int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first,
          unsigned long *wu_latency)
{ event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -369,7 +375,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first, wu_latency);
  }

  return perf;
@@ -417,7 +423,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec;
  unsigned long perf, duration;
  unsigned long perf, duration, wu_latency;
  timing_point_t *curr_timing;
  timing_point_t *timings;
  timing_point_t tmp_timing;
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
                             "#idx", "perf", "run", "period",
                             "start", "end", "rel_st", "wu_lat");

  if (opts.ftrace)
          log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind);
@@ -588,9 +596,9 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;
          duration = 0;
          duration = wu_latency = 0;
          clock_gettime(CLOCK_MONOTONIC, &t_start);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first, &wu_latency);
          clock_gettime(CLOCK_MONOTONIC, &t_end);

          if (timings)
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;

          if (opts.logsize && !timings)
                  log_timing(data->log_handler, curr_timing);
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 57fb80d363d5..4c20f43cb627 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period;
  unsigned long wu_latency;
  __u64 start_time;
  __u64 end_time;
  __u64 rel_start_time;
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 887bfc8ef79e..28896c9a76e6 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -134,14 +134,15 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler,
          "%d\t%lu\t%lu\t%lu\t%llu\t%llu\t%llu",
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu",
          t->ind,
          t->perf,
          t->duration,
          t->period,
          t->start_time,
          t->end_time,
          t->rel_start_time
          t->rel_start_time,
          t->wu_latency
  );
  fprintf(handler, "\n");
}

2.10.0

Vincent Guittot

9:33 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...
Add wake-up latency metric (a la cyclictest).

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 29 +++++++++++++++++++---------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 442771f553f9..d23c3074f057 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -220,7 +220,7 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first, unsigned long *wu_latency)
{ rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -299,7 +299,7 @@ static int run_event(event_data_t *event, int dry_run, break; case rtapp_timer: {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);
You should add the same wake up latency metric for the sleep event

...
                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
  case rtapp_suspend:
@@ -352,7 +357,8 @@ static int run_event(event_data_t *event, int dry_run,

int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first)
          struct timespec *t_first,
          unsigned long *wu_latency)
{ event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -369,7 +375,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first);
          lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first, wu_latency);
  }

  return perf;
@@ -417,7 +423,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec;
  unsigned long perf, duration;
  unsigned long perf, duration, wu_latency;
  timing_point_t *curr_timing;
  timing_point_t *timings;
  timing_point_t tmp_timing;
@@ -557,7 +563,9 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "#idx\tperf\trun\tperiod\tstart\t\tend\t\trel_st\n");
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n",
                             "#idx", "perf", "run", "period",
                             "start", "end", "rel_st", "wu_lat");

  if (opts.ftrace)
          log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind);
@@ -588,9 +596,9 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;
          duration = 0;
          duration = wu_latency = 0;
          clock_gettime(CLOCK_MONOTONIC, &t_start);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first);
          perf = run(data->ind, pdata, &duration, *(data->resources), &t_first, &wu_latency);
          clock_gettime(CLOCK_MONOTONIC, &t_end);

          if (timings)
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;

What happen if we have several timer in one phase ? we will only get the last wake up latency, isn't it ?

...

...
            if (opts.logsize && !timings)
                    log_timing(data->log_handler, curr_timing);
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 57fb80d363d5..4c20f43cb627 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period;
  unsigned long wu_latency;
  __u64 start_time;
  __u64 end_time;
  __u64 rel_start_time;
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 887bfc8ef79e..28896c9a76e6 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -134,14 +134,15 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler,
          "%d\t%lu\t%lu\t%lu\t%llu\t%llu\t%llu",
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu",
          t->ind,
          t->perf,
          t->duration,
          t->period,
          t->start_time,
          t->end_time,
          t->rel_start_time
          t->rel_start_time,
          t->wu_latency
  );
  fprintf(handler, "\n");
}

2.10.0

Juri Lelli

12:26 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21/11/16 10:33, Vincent Guittot wrote:

...

On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...

    case rtapp_timer:
            {

                  struct timespec t_period, t_now;

                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);

@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,

                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);

                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))

                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);

```
                  else
```

                          clock_gettime(CLOCK_MONOTONIC, &t_now);

                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);

                          *wu_latency = timespec_to_usec(&t_wu);

You should add the same wake up latency metric for the sleep event

OK, will look into it.

...

                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);

                          *wu_latency = 0UL;

                  }
          }
          break;

[...]

...

...
...
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;
What happen if we have several timer in one phase ? we will only get the last wake up latency, isn't it ?

Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Vincent Guittot

2:08 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...

On 21/11/16 10:33, Vincent Guittot wrote:

...
On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
    case rtapp_timer:
            {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);
You should add the same wake up latency metric for the sleep event
OK, will look into it.

...
...
...
                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
[...]

...
...
...
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;
What happen if we have several timer in one phase ? we will only get the last wake up latency, isn't it ?
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

More generally speaking, the log timing are saved per phase so the associated metrics should be per phase as well. With the example above, the fields c_period and the c_run give the last part of the phase.

Then, if we also record the wake up latency of sleep event, there are even more use case where several sleep and timer are used in a phase

Juri Lelli

2:43 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21/11/16 15:08, Vincent Guittot wrote:

...

Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote:

...
On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
    case rtapp_timer:
            {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);
You should add the same wake up latency metric for the sleep event
OK, will look into it.

...
...
...
                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
[...]

...
...
...
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;
What happen if we have several timer in one phase ? we will only get the last wake up latency, isn't it ?
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)
Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

...

More generally speaking, the log timing are saved per phase so the associated metrics should be per phase as well. With the example above, the fields c_period and the c_run give the last part of the phase.

Then, if we also record the wake up latency of sleep event, there are even more use case where several sleep and timer are used in a phase

Vincent Guittot

3:13 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...

On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote:

...
On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
    case rtapp_timer:
            {
                  struct timespec t_period, t_now;
                  struct timespec t_period, t_now, t_wu;
                  log_debug("timer %d ", event->duration);

                  t_period = usec_to_timespec(event->duration);
@@ -311,10 +311,15 @@ static int run_event(event_data_t *event, int dry_run,
                    rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period);
                    clock_gettime(CLOCK_MONOTONIC, &t_now);
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next))
                  if (timespec_lower(&t_now, &rdata->res.timer.t_next)) {
                          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL);
                  else
                          clock_gettime(CLOCK_MONOTONIC, &t_now);
                          t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next);
                          *wu_latency = timespec_to_usec(&t_wu);
You should add the same wake up latency metric for the sleep event
OK, will look into it.

...
...
...
                  } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          *wu_latency = 0UL;
                  }
          }
          break;
[...]

...
...
...
@@ -608,6 +616,7 @@ void *thread_body(void *arg) curr_timing->period = timespec_to_usec(&t_diff); curr_timing->duration = duration; curr_timing->perf = perf;
          curr_timing->wu_latency = wu_latency;
What happen if we have several timer in one phase ? we will only get the last wake up latency, isn't it ?
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)
Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },
Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...

seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

...

...
More generally speaking, the log timing are saved per phase so the associated metrics should be per phase as well. With the example above, the fields c_period and the c_run give the last part of the phase.

Then, if we also record the wake up latency of sleep event, there are even more use case where several sleep and timer are used in a phase

Juri Lelli

5:27 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21/11/16 16:13, Vincent Guittot wrote:

...

On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote:

...
On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote:

...
On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...

...
...
...
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

# cat rt-app-thread0-0.log # Policy : SCHED_OTHER priority : 0 #idx perf run period start end rel_st slack c_run c_period wu_lat 0 46082 10324 19981 275642529810 275642549791 1000786 9447 10000 20000 185 0 92165 20683 39976 275642549821 275642589796 1020796 19079 20000 40000 191 0 46082 10419 19968 275642589825 275642609793 1060801 9338 10000 20000 187 0 92165 20464 39972 275642609821 275642649793 1080797 19298 20000 40000 187 0 92165 20739 399976 275642649821 275643049797 1120797 379022 20000 400000 191 0 46082 10147 19967 275643049826 275643069793 1520802 9609 10000 20000 187

I agree that it's more cumbersome to write the second json expression, but we can easily rely on additional tooling to generate easily such json files (like we already do in LISA).

Vincent Guittot

22 Nov 22 Nov

9:41 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...

On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote:

...
On 21 November 2016 at 10:28, Vincent Guittot vincent.guittot@linaro.org wrote: > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
...
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case. Then, I can probably come with more complex use cases that could not be easily unrolled.

...

"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

...

we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

...

# cat rt-app-thread0-0.log # Policy : SCHED_OTHER priority : 0 #idx perf run period start end rel_st slack c_run c_period wu_lat 0 46082 10324 19981 275642529810 275642549791 1000786 9447 10000 20000 185 0 92165 20683 39976 275642549821 275642589796 1020796 19079 20000 40000 191 0 46082 10419 19968 275642589825 275642609793 1060801 9338 10000 20000 187 0 92165 20464 39972 275642609821 275642649793 1080797 19298 20000 40000 187 0 92165 20739 399976 275642649821 275643049797 1120797 379022 20000 400000 191 0 46082 10147 19967 275643049826 275643069793 1520802 9609 10000 20000 187

I agree that it's more cumbersome to write the second json expression, but we can easily rely on additional tooling to generate easily such json files (like we already do in LISA).

Juri Lelli

10:21 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 10:41, Vincent Guittot wrote:

...

On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote: > On 21 November 2016 at 10:28, Vincent Guittot > vincent.guittot@linaro.org wrote: > > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
...
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case.

That is probably the point on which I don't quite agree. To me the fact that there is a "timer" inside a phase means that that phase is periodic and thus the period is unique. Maybe my view comes from the fact that I never had to describe something different. Can you (or Leo maybe?) give a real modeling use case where this is useful instead?

...

Then, I can probably come with more complex use cases that could not be easily unrolled.

...
"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

...

...
we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

...
# cat rt-app-thread0-0.log # Policy : SCHED_OTHER priority : 0 #idx perf run period start end rel_st slack c_run c_period wu_lat 0 46082 10324 19981 275642529810 275642549791 1000786 9447 10000 20000 185 0 92165 20683 39976 275642549821 275642589796 1020796 19079 20000 40000 191 0 46082 10419 19968 275642589825 275642609793 1060801 9338 10000 20000 187 0 92165 20464 39972 275642609821 275642649793 1080797 19298 20000 40000 187 0 92165 20739 399976 275642649821 275643049797 1120797 379022 20000 400000 191 0 46082 10147 19967 275643049826 275643069793 1520802 9609 10000 20000 187

I agree that it's more cumbersome to write the second json expression, but we can easily rely on additional tooling to generate easily such json files (like we already do in LISA).

Juri Lelli

10:41 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 10:21, Juri Lelli wrote:

...

On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...

...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration); - ldata->c_run = event->duration; + ldata->c_run += event->duration; clock_gettime(CLOCK_MONOTONIC, &t_start); *perf += loadwait(event->duration); clock_gettime(CLOCK_MONOTONIC, &t_end); --->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?

Vincent Guittot

10:50 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 11:41, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 10:21, Juri Lelli wrote:

...
On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...
...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;
                  ldata->c_run += event->duration;
                  clock_gettime(CLOCK_MONOTONIC, &t_start);
                  *perf += loadwait(event->duration);
                  clock_gettime(CLOCK_MONOTONIC, &t_end);
--->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?

IMO adding runtime makes perfectly sense as commented in the related patch But not sleep which is a idle state just like timer

Juri Lelli

11 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 11:50, Vincent Guittot wrote:

...

On 22 November 2016 at 11:41, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 10:21, Juri Lelli wrote:

...
On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...
...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;
                  ldata->c_run += event->duration;
                  clock_gettime(CLOCK_MONOTONIC, &t_start);
                  *perf += loadwait(event->duration);
                  clock_gettime(CLOCK_MONOTONIC, &t_end);
--->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?
IMO adding runtime makes perfectly sense as commented in the related patch But not sleep which is a idle state just like timer

True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?

Vincent Guittot

1:17 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 12:00, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 11:50, Vincent Guittot wrote:

...
On 22 November 2016 at 11:41, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 10:21, Juri Lelli wrote:

...
On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...
...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;
                  ldata->c_run += event->duration;
                  clock_gettime(CLOCK_MONOTONIC, &t_start);
                  *perf += loadwait(event->duration);
                  clock_gettime(CLOCK_MONOTONIC, &t_end);
--->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?
IMO adding runtime makes perfectly sense as commented in the related patch But not sleep which is a idle state just like timer
True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?

IIUC the slack metric represents the real "idle/sleep" time compared to what have been configured (timer duration minus duration of events in the time slot) Can't we also add a slack value for sleep events and add it slack value of timers in a phase to stay consistent with other metrics ?

so in the example above, the sack time will be 4000+6000 like the best case

Juri Lelli

1:53 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 14:17, Vincent Guittot wrote:

...

On 22 November 2016 at 12:00, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 11:50, Vincent Guittot wrote:

...
On 22 November 2016 at 11:41, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 10:21, Juri Lelli wrote:

...
On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...
...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;
                  ldata->c_run += event->duration;
                  clock_gettime(CLOCK_MONOTONIC, &t_start);
                  *perf += loadwait(event->duration);
                  clock_gettime(CLOCK_MONOTONIC, &t_end);
--->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?
IMO adding runtime makes perfectly sense as commented in the related patch But not sleep which is a idle state just like timer
True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?
IIUC the slack metric represents the real "idle/sleep" time compared to what have been configured (timer duration minus duration of events in the time slot) Can't we also add a slack value for sleep events and add it slack value of timers in a phase to stay consistent with other metrics ?

so in the example above, the sack time will be 4000+6000 like the best case

Mmm, too me the slack it the time left before the next activation (of a periodic task) after all the events of the current activation have been completed. So, I'd say that we actually want to report 6000 in the example above. 'sleep' event is still something that we want to accomplish inside the current period, right? So it's like a 'runtime' (the freq invariant version of 'run').

Vincent Guittot

4:53 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 14:53, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 14:17, Vincent Guittot wrote:

...
On 22 November 2016 at 12:00, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 11:50, Vincent Guittot wrote:

...
On 22 November 2016 at 11:41, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 10:21, Juri Lelli wrote:

...
On 22/11/16 10:41, Vincent Guittot wrote:

[...]

...
> There also the c_run metrics in another patch, does it mean that you > want only one run event per phase ? >

Nope. That is broken in my current set. I'll fix that as you suggested (summing up the durations).

Which basically should result in the following right?

--->8--- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 655c43e06a7d..bec511121d77 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -287,7 +287,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;
                  ldata->c_run += event->duration;
                  clock_gettime(CLOCK_MONOTONIC, &t_start);
                  *perf += loadwait(event->duration);
                  clock_gettime(CLOCK_MONOTONIC, &t_end);
--->8---

I was actually thinking that we should also sum up durations of all event that specify a duration (sleep, runtime, etc.) and maybe modify the log header as s/c_run/c_duration/ ?
IMO adding runtime makes perfectly sense as commented in the related patch But not sleep which is a idle state just like timer
True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?
IIUC the slack metric represents the real "idle/sleep" time compared to what have been configured (timer duration minus duration of events in the time slot) Can't we also add a slack value for sleep events and add it slack value of timers in a phase to stay consistent with other metrics ?

so in the example above, the sack time will be 4000+6000 like the best case
Mmm, too me the slack it the time left before the next activation (of a periodic task) after all the events of the current activation have been completed. So, I'd say that we actually want to report 6000 in the example above. 'sleep' event is still something that we want to accomplish inside the current period, right? So it's like a 'runtime' (the freq invariant version of 'run').

ok so I'm not sure to understand what you want to measure with slack. Your description sounds like something that you want to compare to a deadline task properties

Your patchset changes the log file to the following: #idx: just an increasing index of the phase perf: number of busy loop that has been executed in the phase run: duration used to execute the run/runtime events period: duration of the phase start: absolute start time end: absolute end time rel_st: relative start time of the phase slack: time between end of the last event and the end of the phase ? c_run: sum of configured run/runtime duration c_period: sum of the timer period ? wu_lat: sum of wakeup latency ?

Juri Lelli

23 Nov 23 Nov

8:29 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 17:53, Vincent Guittot wrote:

...

On 22 November 2016 at 14:53, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 14:17, Vincent Guittot wrote:

...
On 22 November 2016 at 12:00, Juri Lelli juri.lelli@arm.com wrote:

[...]

...

...
...
...
True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?

IIUC the slack metric represents the real "idle/sleep" time compared to what have been configured (timer duration minus duration of events in the time slot) Can't we also add a slack value for sleep events and add it slack value of timers in a phase to stay consistent with other metrics ?

so in the example above, the sack time will be 4000+6000 like the best case

Mmm, too me the slack it the time left before the next activation (of a periodic task) after all the events of the current activation have been completed. So, I'd say that we actually want to report 6000 in the example above. 'sleep' event is still something that we want to accomplish inside the current period, right? So it's like a 'runtime' (the freq invariant version of 'run').

ok so I'm not sure to understand what you want to measure with slack. Your description sounds like something that you want to compare to a deadline task properties

If the deadline corresponds to the period the definition matches yes. I think for periodic tasks it is usually the case, right? If deadline == period, the slack gives us an indication about how close to the deadline an activation completed its work.

...

Your patchset changes the log file to the following: #idx: just an increasing index of the phase perf: number of busy loop that has been executed in the phase run: duration used to execute the run/runtime events period: duration of the phase start: absolute start time end: absolute end time rel_st: relative start time of the phase slack: time between end of the last event and the end of the phase ?

If the phase is periodic yes (see the example above). I can add such example in the documentation together with the patch introducing the new column.

...

c_run: sum of configured run/runtime duration c_period: sum of the timer period ?

Yeah, I guess we can go with summing the periods.

...

wu_lat: sum of wakeup latency ?

Huh, yep. I guess we can add up wu_lat(s) experienced during a phase (multiple timers).

Vincent Guittot

9:48 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 23 November 2016 at 09:29, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 17:53, Vincent Guittot wrote:

...
On 22 November 2016 at 14:53, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 14:17, Vincent Guittot wrote:

...
On 22 November 2016 at 12:00, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
...
True.. but I was thinking that the best case slack will be affected also by sleep phases, so then if we compared the measured slack with the best one we should probably account for sleep events ?

e.g.,

|--- run 5000 ---|-- sleep 4000 --|--- run 5000 ---| |-----------------------------period 20000 -------------------| |----------| measured slack 6000

If we factor sleep duration in, we will compare 6000 (best case) vs 6000. If we don't, we will compare 10000 (best case) vs 6000, which ends up telling us that we ran slower for some reason, while instead we simply slept by constrution?

IIUC the slack metric represents the real "idle/sleep" time compared to what have been configured (timer duration minus duration of events in the time slot) Can't we also add a slack value for sleep events and add it slack value of timers in a phase to stay consistent with other metrics ?

so in the example above, the sack time will be 4000+6000 like the best case

Mmm, too me the slack it the time left before the next activation (of a periodic task) after all the events of the current activation have been completed. So, I'd say that we actually want to report 6000 in the example above. 'sleep' event is still something that we want to accomplish inside the current period, right? So it's like a 'runtime' (the freq invariant version of 'run').

ok so I'm not sure to understand what you want to measure with slack. Your description sounds like something that you want to compare to a deadline task properties

If the deadline corresponds to the period the definition matches yes. I think for periodic tasks it is usually the case, right? If deadline == period, the slack gives us an indication about how close to the deadline an activation completed its work.

...
Your patchset changes the log file to the following: #idx: just an increasing index of the phase perf: number of busy loop that has been executed in the phase run: duration used to execute the run/runtime events period: duration of the phase start: absolute start time end: absolute end time rel_st: relative start time of the phase slack: time between end of the last event and the end of the phase ?

If the phase is periodic yes (see the example above). I can add such example in the documentation together with the patch introducing the new column.

yes. it would be good to update the documentation to explain what the new metrics are

...

...
c_run: sum of configured run/runtime duration c_period: sum of the timer period ?

Yeah, I guess we can go with summing the periods.

...
wu_lat: sum of wakeup latency ?

Huh, yep. I guess we can add up wu_lat(s) experienced during a phase (multiple timers).

Patrick Bellasi

22 Nov 22 Nov

10:43 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

Hi Vincent,

On 22-Nov 10:41, Vincent Guittot wrote:

...

On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit :

...
On 21/11/16 10:33, Vincent Guittot wrote: > On 21 November 2016 at 10:28, Vincent Guittot > vincent.guittot@linaro.org wrote: > > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
...
Not sure I get where the problem is. Can you give me an example (and intended usage) of a phase with multiple timers? I don't seem to understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case. Then, I can probably come with more complex use cases that could not be easily unrolled.

In the example you provided before I cannot really see what we are trying to model. I understand that's a dummy example however it's hard to parse to me.

he first point seems like we want to define the time for the next activation of run0, this time is 20ms. Then we run immediately run1 and we reset the time to activate run0 after 40ms, completely overriding the previous configuration to start it every 20ms. Isn't that the expected semantic?

If that's the case, IMO this semantic is quite confusing and it should be at least avoided or perhaps also forbidden. I'm not convinced that, in general, it can make sense at all to: 1. keep reprogramming the same timer within the same phase 2. execute two (or more) run commands back to back

Do we have more realistic use-cases where we found it useful having the flexibility you are proposing?

...

...
"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

But if some constraints makes it more readable the workload description without compromising the possibility to model the real behaviors, where is the problem?

Do you think that the "extended form" does not allow to represent some use-casese?

...

Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

I agree on flexibility but provided it does not reduce readability of the described scenarios. I would like to be easy for a human to read and understand what will happen.

...

There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

This is something we need to extend and Juri has a proposal.

...

...
we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

I'm more for the "extended form" proposed by Juri, which put some constraints. Unless we convince ourself that is kind of limiting for the description of realistic use-cases which cannot be expressed in this form.

-- #include <best/regards.h> Patrick Bellasi

Vincent Guittot

11:01 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 11:43, Patrick Bellasi patrick.bellasi@arm.com wrote:

...

Hi Vincent,

On 22-Nov 10:41, Vincent Guittot wrote:

...
On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote:

...
Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit : > On 21/11/16 10:33, Vincent Guittot wrote: > > On 21 November 2016 at 10:28, Vincent Guittot > > vincent.guittot@linaro.org wrote: > > > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
...
> Not sure I get where the problem is. Can you give me an example (and > intended usage) of a phase with multiple timers? I don't seem to > understand how that is supposed to work. :)

Below is an example of a timer that is used twice in the same phase: "tasks" : { "thread0" : { "instance" : 1, "loop" : -1, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, } },

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case. Then, I can probably come with more complex use cases that could not be easily unrolled.

In the example you provided before I cannot really see what we are trying to model. I understand that's a dummy example however it's hard to parse to me.

he first point seems like we want to define the time for the next activation of run0, this time is 20ms. Then we run immediately run1 and we reset the time to activate run0 after 40ms, completely overriding the previous configuration to start it every 20ms. Isn't that the expected semantic?

The example above is: start run 1 40ms after the start of run 0 and start next event 20ms after the start of run1 so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

May be the confusion comes from how timer behave. Timer has been created as an extension of sleep. The main drawback of sleep is that it starts only when the previous event (like a run ) has finished whereas we sometime wants to start an even at a fixed point compared to the others whatever the duration of the previous run. So timer event has been created. But nothing prevents someone to use one or several timers one or several times on a phase.

I would say that timer is for sleep what runtime is for run. may be i should have name it fix_sleep instead of timer

...

If that's the case, IMO this semantic is quite confusing and it should be at least avoided or perhaps also forbidden. I'm not convinced that, in general, it can make sense at all to:

keep reprogramming the same timer within the same phase

execute two (or more) run commands back to back

Do we have more realistic use-cases where we found it useful having the flexibility you are proposing?

...
...
"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

But if some constraints makes it more readable the workload description without compromising the possibility to model the real behaviors, where is the problem?

Do you think that the "extended form" does not allow to represent some use-casese?

...
Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

I agree on flexibility but provided it does not reduce readability of the described scenarios. I would like to be easy for a human to read and understand what will happen.

...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

This is something we need to extend and Juri has a proposal.

...
...
we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

I'm more for the "extended form" proposed by Juri, which put some constraints. Unless we convince ourself that is kind of limiting for the description of realistic use-cases which cannot be expressed in this form.

-- #include <best/regards.h>

Patrick Bellasi

Patrick Bellasi

11:54 a.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22-Nov 12:01, Vincent Guittot wrote:

...

On 22 November 2016 at 11:43, Patrick Bellasi patrick.bellasi@arm.com wrote:

...
Hi Vincent,

On 22-Nov 10:41, Vincent Guittot wrote:

...
On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 15:08, Vincent Guittot wrote: > Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit : > > On 21/11/16 10:33, Vincent Guittot wrote: > > > On 21 November 2016 at 10:28, Vincent Guittot > > > vincent.guittot@linaro.org wrote: > > > > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
...
> > Not sure I get where the problem is. Can you give me an example (and > > intended usage) of a phase with multiple timers? I don't seem to > > understand how that is supposed to work. :) > > Below is an example of a timer that is used twice in the same phase: > "tasks" : { > "thread0" : { > "instance" : 1, > "loop" : -1, > "run0" : 10000, > "timer0" : { "ref" : "timerA", "period" : 20000 }, > "run1" : 20000, > "timer1" : { "ref" : "timerA", "period" : 40000 }, > } > }, >

Mmm, but these are IMHO two distinct phases and we should describe them as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

...
seems that we only make confusion descriptions possible (and we complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case. Then, I can probably come with more complex use cases that could not be easily unrolled.

In the example you provided before I cannot really see what we are trying to model. I understand that's a dummy example however it's hard to parse to me.

he first point seems like we want to define the time for the next activation of run0, this time is 20ms. Then we run immediately run1 and we reset the time to activate run0 after 40ms, completely overriding the previous configuration to start it every 20ms. Isn't that the expected semantic?

The example above is: start run 1 40ms after the start of run 0

I would say run1 starts 20ms after the start of run0, isn't it?

...

and start next event 20ms after the start of run1

mmm... not following here.

...

so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

To me this is the expected behavior:

+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

...

May be the confusion comes from how timer behave. Timer has been created as an extension of sleep. The main drawback of sleep is that it starts only when the previous event (like a run ) has finished whereas we sometime wants to start an even at a fixed point compared to the others whatever the duration of the previous run. So timer event has been created.

Which make sense.

...

But nothing prevents someone to use one or several timers one or several times on a phase.

I would say that timer is for sleep what runtime is for run. may be i should have name it fix_sleep instead of timer

Ok, let say we want to keep this flexibility than we have to agree on how to report the statistics. We usually reported stats per-phase... but perhaps if we allow multiple runs within a single phase than we should evaluate the possibility to report stats per each single run within a phase. For example, in the previous scenario it can be interesting to know if we are exceedin the 60ms timeout because of r0 or r1 being delayed.

...

...
If that's the case, IMO this semantic is quite confusing and it should be at least avoided or perhaps also forbidden. I'm not convinced that, in general, it can make sense at all to:

keep reprogramming the same timer within the same phase

execute two (or more) run commands back to back

Do we have more realistic use-cases where we found it useful having the flexibility you are proposing?

...
...
"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

But if some constraints makes it more readable the workload description without compromising the possibility to model the real behaviors, where is the problem?

Do you think that the "extended form" does not allow to represent some use-casese?

...
Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

I agree on flexibility but provided it does not reduce readability of the described scenarios. I would like to be easy for a human to read and understand what will happen.

...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

This is something we need to extend and Juri has a proposal.

...
...
we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

I'm more for the "extended form" proposed by Juri, which put some constraints. Unless we convince ourself that is kind of limiting for the description of realistic use-cases which cannot be expressed in this form.

-- #include <best/regards.h>

Patrick Bellasi

-- #include <best/regards.h> Patrick Bellasi

Juri Lelli

12:32 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 11:54, Patrick Bellasi wrote:

...

On 22-Nov 12:01, Vincent Guittot wrote:

[...]

...

...
The example above is: start run 1 40ms after the start of run 0

I would say run1 starts 20ms after the start of run0, isn't it?

...
and start next event 20ms after the start of run1

mmm... not following here.

...
so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

To me this is the expected behavior:

+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

So, if we really need this level of flexibility we can add up periods as well (as Vincent was already saying) so that we end up we a row in the log for each phase reporting a "super-period" of 60ms.

If we agree that this makes sense, what we do for the slack? One possibility could be to start adding up there as well, so that the subslack of the first timer and the subslack of the second one are reflected in the slack reported in the per-phase row.

On thing that might be counter-intuitive looking at traces though is that if, for example, the first run executes for more that the first period (say 25 ms, e.g. frequency was low) it will report a -5ms slack which adds up with the second sub slack of r1 (which instead recovered and effectively executed for 20ms. So, now we report a 10ms overall slack, while in the trace we see a 15ms gap before the next event.

+------------+----------+ +-----+ +----------+ |r0 |r1 | |r0 | |r1 | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

Oh, BTW, I also think that the following change is actually needed, otherwise the first run overshooting will influence the super-period.

Agree?

--->8---

...

From 52492c9ba118b686b76fcda05389a3a9c7c73120 Mon Sep 17 00:00:00 2001

From: Juri Lelli juri.lelli@arm.com Date: Tue, 22 Nov 2016 12:29:27 +0000 Subject: [PATCH] rt-app: don't let overshooting run phases affect timers

A run phase that overshoots might run for more that the next timer event. If that happens we still want for timers to not be affect by this happening.

Don't use current clock for setting the next timer event, always refer to t_first.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 1 - 1 file changed, 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index bec511121d77..36013758f718 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,7 +338,6 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else { - clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next); ldata->wu_latency = 0UL; } }

Vincent Guittot

1:51 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 13:32, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 11:54, Patrick Bellasi wrote:

...
On 22-Nov 12:01, Vincent Guittot wrote:

[...]

...
...
The example above is: start run 1 40ms after the start of run 0

I would say run1 starts 20ms after the start of run0, isn't it?

...
and start next event 20ms after the start of run1

mmm... not following here.

...
so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

To me this is the expected behavior:

+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

So, if we really need this level of flexibility we can add up periods as well (as Vincent was already saying) so that we end up we a row in the log for each phase reporting a "super-period" of 60ms.

If we agree that this makes sense, what we do for the slack? One possibility could be to start adding up there as well, so that the subslack of the first timer and the subslack of the second one are reflected in the slack reported in the per-phase row.

On thing that might be counter-intuitive looking at traces though is that if, for example, the first run executes for more that the first period (say 25 ms, e.g. frequency was low) it will report a -5ms slack which adds up with the second sub slack of r1 (which instead recovered and effectively executed for 20ms. So, now we report a 10ms overall slack, while in the trace we see a 15ms gap before the next event.

+------------+----------+ +-----+ +----------+ |r0 |r1 | |r0 | |r1 | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

Oh, BTW, I also think that the following change is actually needed, otherwise the first run overshooting will influence the super-period.

Agree?

--->8--- From 52492c9ba118b686b76fcda05389a3a9c7c73120 Mon Sep 17 00:00:00 2001 From: Juri Lelli juri.lelli@arm.com Date: Tue, 22 Nov 2016 12:29:27 +0000 Subject: [PATCH] rt-app: don't let overshooting run phases affect timers

A run phase that overshoots might run for more that the next timer event. If that happens we still want for timers to not be affect by this happening.

Don't use current clock for setting the next timer event, always refer to t_first.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 1 - 1 file changed, 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index bec511121d77..36013758f718 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,7 +338,6 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);

The 1st implementation didn't update t_next when timer has already expired. But then following timers were also noted as expired because the prev period has stolen time whereas the run time was correct. We faced the issue while testing frequency scaling policy: the governor was slow to react to the 1st period, the overshoot during this 1st period was so huge compared the margin of the use case than the next periods were already expired when the run has finished whereas the governor has reacted and increased freq to max. In fact, the delay generated by the 1st period was so huge that it was not compensated after the result was meaningless because 1 period was really wrong but the following one were also seen as wrong

As an example: run 9ms in a period of 10ms. Your platform starts at lowest OPP which is 10 time slower than max OPP. If the 1st run event takes something like 50ms (the time for the governor to react), we have an overshoot of 40ms. This means that even if we are now at max OPP and the run event really takes 9ms, the next 40 periods will be seen as wrong because we will set timer in the past.

...

                            ldata->wu_latency = 0UL;
                    }
            }

Juri Lelli

2:02 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 14:51, Vincent Guittot wrote:

...

On 22 November 2016 at 13:32, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 11:54, Patrick Bellasi wrote:

...
On 22-Nov 12:01, Vincent Guittot wrote:

[...]

...
...
The example above is: start run 1 40ms after the start of run 0

I would say run1 starts 20ms after the start of run0, isn't it?

...
and start next event 20ms after the start of run1

mmm... not following here.

...
so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

To me this is the expected behavior:

+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

So, if we really need this level of flexibility we can add up periods as well (as Vincent was already saying) so that we end up we a row in the log for each phase reporting a "super-period" of 60ms.

If we agree that this makes sense, what we do for the slack? One possibility could be to start adding up there as well, so that the subslack of the first timer and the subslack of the second one are reflected in the slack reported in the per-phase row.

On thing that might be counter-intuitive looking at traces though is that if, for example, the first run executes for more that the first period (say 25 ms, e.g. frequency was low) it will report a -5ms slack which adds up with the second sub slack of r1 (which instead recovered and effectively executed for 20ms. So, now we report a 10ms overall slack, while in the trace we see a 15ms gap before the next event.

+------------+----------+ +-----+ +----------+ |r0 |r1 | |r0 | |r1 | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

Oh, BTW, I also think that the following change is actually needed, otherwise the first run overshooting will influence the super-period.

Agree?

--->8--- From 52492c9ba118b686b76fcda05389a3a9c7c73120 Mon Sep 17 00:00:00 2001 From: Juri Lelli juri.lelli@arm.com Date: Tue, 22 Nov 2016 12:29:27 +0000 Subject: [PATCH] rt-app: don't let overshooting run phases affect timers

A run phase that overshoots might run for more that the next timer event. If that happens we still want for timers to not be affect by this happening.

Don't use current clock for setting the next timer event, always refer to t_first.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 1 - 1 file changed, 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index bec511121d77..36013758f718 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,7 +338,6 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else {
                          clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
The 1st implementation didn't update t_next when timer has already expired. But then following timers were also noted as expired because the prev period has stolen time whereas the run time was correct. We faced the issue while testing frequency scaling policy: the governor was slow to react to the 1st period, the overshoot during this 1st period was so huge compared the margin of the use case than the next periods were already expired when the run has finished whereas the governor has reacted and increased freq to max. In fact, the delay generated by the 1st period was so huge that it was not compensated after the result was meaningless because 1 period was really wrong but the following one were also seen as wrong

As an example: run 9ms in a period of 10ms. Your platform starts at lowest OPP which is 10 time slower than max OPP. If the 1st run event takes something like 50ms (the time for the governor to react), we have an overshoot of 40ms. This means that even if we are now at max OPP and the run event really takes 9ms, the next 40 periods will be seen as wrong because we will set timer in the past.

OK, right. I think we discussed already this point in the past (and didn't agree on a common solution :). Since now we have actually imposed and used the "cumulative" approach (where successive activation get penalized potentially infinitely), it models a more strict behaviour by always keeping an absolute (t_first) reference no matter what happens to run phases.

Since I think that both approaches might be useful to implement, why don't we add a "mode" : [absolute|relative] to timers and than we check the mode when executing the event to decide if we want to use current clock for t_next or keep t_first as reference?

Juri Lelli

2:34 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 14:02, Juri Lelli wrote:

[...]

...

Since I think that both approaches might be useful to implement, why don't we add a "mode" : [absolute|relative] to timers and than we check the mode when executing the event to decide if we want to use current clock for t_next or keep t_first as reference?

Something like the following (on top of previous patch) maybe? --->8--- src/rt-app.c | 2 ++ src/rt-app_parse_config.c | 6 ++++++ src/rt-app_types.h | 1 + 3 files changed, 9 insertions(+)

diff --git a/src/rt-app.c b/src/rt-app.c index 36013758f718..e28d4f9e8ddc 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,6 +338,8 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else { + if (rdata->res.timer.relative) + clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next); ldata->wu_latency = 0UL; } } diff --git a/src/rt-app_parse_config.c b/src/rt-app_parse_config.c index b0f302e2ba25..fbc4612b2257 100644 --- a/src/rt-app_parse_config.c +++ b/src/rt-app_parse_config.c @@ -181,6 +181,7 @@ static int init_timer_resource(rtapp_resource_t *data, const rtapp_options_t *op { log_info(PIN3 "Init: %s timer", data->name); data->res.timer.init = 0; + data->res.timer.relative = 1; }

static int init_cond_resource(rtapp_resource_t *data, const rtapp_options_t *opts) @@ -489,6 +490,11 @@ parse_thread_event_data(char *name, struct json_object *obj, rdata = &(opts->resources[data->res]); ddata = &(opts->resources[data->dep]);

+ tmp = get_string_value_from(obj, "mode", TRUE, "relative"); + if (!strncmp(tmp, "absolute", strlen("absolute"))) + rdata->res.timer.relative = 0; + free(tmp); + log_info(PIN2 "type %d target %s [%d] period %d", data->type, rdata->name, rdata->index, data->duration); return; } diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 60bd1e85d671..c9227b8ed6a1 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -87,6 +87,7 @@ struct _rtapp_signal { struct _rtapp_timer { struct timespec t_next; int init; + int relative; };

struct _rtapp_iomem_buf {

Vincent Guittot

3:01 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 15:34, Juri Lelli juri.lelli@arm.com wrote:

...

On 22/11/16 14:02, Juri Lelli wrote:

[...]

...
Since I think that both approaches might be useful to implement, why don't we add a "mode" : [absolute|relative] to timers and than we check the mode when executing the event to decide if we want to use current clock for t_next or keep t_first as reference?

Yes being able to select between relative and absolute can make sense

...

Something like the following (on top of previous patch) maybe? --->8--- src/rt-app.c | 2 ++ src/rt-app_parse_config.c | 6 ++++++ src/rt-app_types.h | 1 + 3 files changed, 9 insertions(+)

diff --git a/src/rt-app.c b/src/rt-app.c index 36013758f718..e28d4f9e8ddc 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,6 +338,8 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else {
                          if (rdata->res.timer.relative)
                                  clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          ldata->wu_latency = 0UL;
                  }
          }
diff --git a/src/rt-app_parse_config.c b/src/rt-app_parse_config.c index b0f302e2ba25..fbc4612b2257 100644 --- a/src/rt-app_parse_config.c +++ b/src/rt-app_parse_config.c @@ -181,6 +181,7 @@ static int init_timer_resource(rtapp_resource_t *data, const rtapp_options_t *op { log_info(PIN3 "Init: %s timer", data->name); data->res.timer.init = 0;
  data->res.timer.relative = 1;
}

static int init_cond_resource(rtapp_resource_t *data, const rtapp_options_t *opts) @@ -489,6 +490,11 @@ parse_thread_event_data(char *name, struct json_object *obj, rdata = &(opts->resources[data->res]); ddata = &(opts->resources[data->dep]);
          tmp = get_string_value_from(obj, "mode", TRUE, "relative");

Ok so the timer event will look like absolute mode: "timer1" : { "ref" : "timerA", "period" : 20000, "mode" : "absolute" }

relative mode: "timer1" : { "ref" : "timerA", "period" : 20000, "mode" : "relative" } or "timer1" : { "ref" : "timerA", "period" : 20000 } for keeping current behavior

Looks ok to me

...

          if (!strncmp(tmp, "absolute", strlen("absolute")))
                  rdata->res.timer.relative = 0;
          free(tmp);
          log_info(PIN2 "type %d target %s [%d] period %d", data->type, rdata->name, rdata->index, data->duration);
          return;
  }
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 60bd1e85d671..c9227b8ed6a1 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -87,6 +87,7 @@ struct _rtapp_signal { struct _rtapp_timer { struct timespec t_next; int init;
  int relative;
};

struct _rtapp_iomem_buf {

Juri Lelli

4:13 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22/11/16 16:01, Vincent Guittot wrote:

...

On 22 November 2016 at 15:34, Juri Lelli juri.lelli@arm.com wrote:

...
On 22/11/16 14:02, Juri Lelli wrote:

[...]

...
Since I think that both approaches might be useful to implement, why don't we add a "mode" : [absolute|relative] to timers and than we check the mode when executing the event to decide if we want to use current clock for t_next or keep t_first as reference?

Yes being able to select between relative and absolute can make sense

...
Something like the following (on top of previous patch) maybe? --->8--- src/rt-app.c | 2 ++ src/rt-app_parse_config.c | 6 ++++++ src/rt-app_types.h | 1 + 3 files changed, 9 insertions(+)

diff --git a/src/rt-app.c b/src/rt-app.c index 36013758f718..e28d4f9e8ddc 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -338,6 +338,8 @@ static int run_event(event_data_t *event, int dry_run, t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); ldata->wu_latency = timespec_to_usec(&t_wu); } else {
                          if (rdata->res.timer.relative)
                                  clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next);
                          ldata->wu_latency = 0UL;
                  }
          }
diff --git a/src/rt-app_parse_config.c b/src/rt-app_parse_config.c index b0f302e2ba25..fbc4612b2257 100644 --- a/src/rt-app_parse_config.c +++ b/src/rt-app_parse_config.c @@ -181,6 +181,7 @@ static int init_timer_resource(rtapp_resource_t *data, const rtapp_options_t *op { log_info(PIN3 "Init: %s timer", data->name); data->res.timer.init = 0;
  data->res.timer.relative = 1;
}

static int init_cond_resource(rtapp_resource_t *data, const rtapp_options_t *opts) @@ -489,6 +490,11 @@ parse_thread_event_data(char *name, struct json_object *obj, rdata = &(opts->resources[data->res]); ddata = &(opts->resources[data->dep]);
          tmp = get_string_value_from(obj, "mode", TRUE, "relative");
Ok so the timer event will look like absolute mode: "timer1" : { "ref" : "timerA", "period" : 20000, "mode" : "absolute" }

relative mode: "timer1" : { "ref" : "timerA", "period" : 20000, "mode" : "relative" } or "timer1" : { "ref" : "timerA", "period" : 20000 } for keeping current behavior

Looks ok to me

Yep. Will add the patch in v3.

Vincent Guittot

1:06 p.m.

New subject: [PATCH v2 04/14] rt-app: add wake-up latency to the log metrics

On 22 November 2016 at 12:54, Patrick Bellasi patrick.bellasi@arm.com wrote:

...

On 22-Nov 12:01, Vincent Guittot wrote:

...
On 22 November 2016 at 11:43, Patrick Bellasi patrick.bellasi@arm.com wrote:

...
Hi Vincent,

On 22-Nov 10:41, Vincent Guittot wrote:

...
On 21 November 2016 at 18:27, Juri Lelli juri.lelli@arm.com wrote:

...
On 21/11/16 16:13, Vincent Guittot wrote:

...
On 21 November 2016 at 15:43, Juri Lelli juri.lelli@arm.com wrote: > On 21/11/16 15:08, Vincent Guittot wrote: >> Le Monday 21 Nov 2016 à 12:26:03 (+0000), Juri Lelli a écrit : >> > On 21/11/16 10:33, Vincent Guittot wrote: >> > > On 21 November 2016 at 10:28, Vincent Guittot >> > > vincent.guittot@linaro.org wrote: >> > > > On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

[...]

...
>> > Not sure I get where the problem is. Can you give me an example (and >> > intended usage) of a phase with multiple timers? I don't seem to >> > understand how that is supposed to work. :) >> >> Below is an example of a timer that is used twice in the same phase: >> "tasks" : { >> "thread0" : { >> "instance" : 1, >> "loop" : -1, >> "run0" : 10000, >> "timer0" : { "ref" : "timerA", "period" : 20000 }, >> "run1" : 20000, >> "timer1" : { "ref" : "timerA", "period" : 40000 }, >> } >> }, >> > > Mmm, but these are IMHO two distinct phases and we should describe them > as such (maybe add a check to disallow such a json description). It

I don't think that we should disallow this kind of description. IMHO, it is more readable than splitting it in several phase

Then, what about this: "phase0" : { "loop" : 2, "run0" : 10000, "timer0" : { "ref" : "timerA", "period" : 20000 }, "run1" : 20000, "timer1" : { "ref" : "timerA", "period" : 40000 }, }, "phase1" : { "loop" : 1, "run0" : 20000, "timer0" : { "ref" : "timerA", "period" : 400000 }, "run1" : 10000, "timer1" : { "ref" : "timerA", "period" : 20000 }, }

I can probably come we other description involving different timers as well. Just to say that we should not restrict how to use events in a phase.

> seems that we only make confusion descriptions possible (and we > complicate handling from a code pow).

It's just about an addition instead of overwriting the value. That's already the behavior for duration, period and perf and that will probably be the case c_run as well

Not convinced yet. :)

IMHO, the following is more clear (while I agree that it is also more verbose :).

Not sure that it's more clear because it really depends of what you what to emulate. While trying to emulate a use case, you often apply phase to the different main steps of the use case. Then, I can probably come with more complex use cases that could not be easily unrolled.

In the example you provided before I cannot really see what we are trying to model. I understand that's a dummy example however it's hard to parse to me.

he first point seems like we want to define the time for the next activation of run0, this time is 20ms. Then we run immediately run1 and we reset the time to activate run0 after 40ms, completely overriding the previous configuration to start it every 20ms. Isn't that the expected semantic?

The example above is: start run 1 40ms after the start of run 0

I would say run1 starts 20ms after the start of run0, isn't it?

Yes. I mixed 20ms and 40ms

...

...
and start next event 20ms after the start of run1

mmm... not following here.

...
so you will not be impacted by the real duration of run0 and run1 (because of freq scaling or micro arch). Do this sequence 2 times then start the next sequence/phase

To me this is the expected behavior:

+-----+ +----------+ +-----+ +----------+ |r0 | |r1 | |r0 | |r1 | | | | | | | | | +-----------------------------------------------------------> 0 10 20 30 40 50 60 70 80 100 120 ^ ^ | | + + Timer0 Timer1

...
May be the confusion comes from how timer behave. Timer has been created as an extension of sleep. The main drawback of sleep is that it starts only when the previous event (like a run ) has finished whereas we sometime wants to start an even at a fixed point compared to the others whatever the duration of the previous run. So timer event has been created.

Which make sense.

...
But nothing prevents someone to use one or several timers one or several times on a phase.

I would say that timer is for sleep what runtime is for run. may be i should have name it fix_sleep instead of timer

Ok, let say we want to keep this flexibility than we have to agree on how to report the statistics. We usually reported stats per-phase... but

yes. right now statistic is per phase

...

perhaps if we allow multiple runs within a single phase than we should evaluate the possibility to report stats per each single run within a phase. For example, in the previous scenario it can be interesting to know if we are exceedin the 60ms timeout because of r0 or r1 being delayed.

I'm afraid of the size and the overhead of having per event statistics vs using other tools like ftrace

...

...
...
If that's the case, IMO this semantic is quite confusing and it should be at least avoided or perhaps also forbidden. I'm not convinced that, in general, it can make sense at all to:

keep reprogramming the same timer within the same phase

execute two (or more) run commands back to back

Do we have more realistic use-cases where we found it useful having the flexibility you are proposing?

...
...
"tasks" : { "thread0" : { "instance" : 1, "loop" : 1, "delay" : 1000000, "cpus" : [1], "phases" : { "phase00" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase01" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase02" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, }, "phase03" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 40000 }, }, "phase10" : { "loop" : 1, "run" : 20000, "timer" : { "ref" : "timerA", "period" : 400000 }, }, "phase11" : { "loop" : 1, "run" : 10000, "timer" : { "ref" : "timerA", "period" : 20000 }, } } } },

The difference is also that with your expression we get one row in the log for each phase (composed of two nested phases), and it seems that's already the behaviour without my modifications. With the latter approach

Yes i agree that it's the current approach and i want to keep the behavior possible as i don't want to put any constraint on how to use events in phases.

But if some constraints makes it more readable the workload description without compromising the possibility to model the real behaviors, where is the problem?

Do you think that the "extended form" does not allow to represent some use-casese?

...
Log timing is saved per phase so all metrics should be also set per phase whatever is put in this phase. This doesn't prevent someone to put only one event per phase in he wants to get metrics per event which seems to be what you want but IMO we should not prevent others to put several identical events in the same phase and still having metrics that reflect the content of the phase.

I agree on flexibility but provided it does not reduce readability of the described scenarios. I would like to be easy for a human to read and understand what will happen.

...
There also the c_run metrics in another patch, does it mean that you want only one run event per phase ?

This is something we need to extend and Juri has a proposal.

...
...
we instead get one entry for each different phase, which seems more inline with what one might expect, IMHO.

I think that both should be possible

I'm more for the "extended form" proposed by Juri, which put some constraints. Unless we convince ourself that is kind of limiting for the description of realistic use-cases which cannot be expressed in this form.

-- #include <best/regards.h>

Patrick Bellasi

-- #include <best/regards.h>

Patrick Bellasi

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 05/14] rt-app: add slack to log metrics

Add measured slack to log metrics; it is meausured as the interval of time between the end of the current activation and the begin of the next one.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 23 +++++++++++++++-------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 11 +++++++++-- src/rt-app_utils.h | 3 +++ 4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index d23c3074f057..bce004bc91a3 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -220,7 +220,7 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources, - struct timespec *t_first, unsigned long *wu_latency) + struct timespec *t_first, unsigned long *wu_latency, long *slack) { rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -299,7 +299,7 @@ static int run_event(event_data_t *event, int dry_run, break; case rtapp_timer: { - struct timespec t_period, t_now, t_wu; + struct timespec t_period, t_now, t_wu, t_slack; log_debug("timer %d ", event->duration);

t_period = usec_to_timespec(event->duration); @@ -311,6 +311,8 @@ static int run_event(event_data_t *event, int dry_run,

rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period); clock_gettime(CLOCK_MONOTONIC, &t_now); + t_slack = timespec_sub(&rdata->res.timer.t_next, &t_now); + *slack = timespec_to_usec_long(&t_slack); if (timespec_lower(&t_now, &rdata->res.timer.t_next)) { clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL); clock_gettime(CLOCK_MONOTONIC, &t_now); @@ -358,7 +360,8 @@ static int run_event(event_data_t *event, int dry_run, int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources, struct timespec *t_first, - unsigned long *wu_latency) + unsigned long *wu_latency, + long *slack) { event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -375,7 +378,8 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, log_ftrace(ft_data.marker_fd, "[%d] executing %d", ind, i); - lock += run_event(&events[i], !continue_running, &perf, duration, resources, t_first, wu_latency); + lock += run_event(&events[i], !continue_running, &perf, + duration, resources, t_first, wu_latency, slack); }

return perf; @@ -424,6 +428,7 @@ void *thread_body(void *arg) struct timespec t_start, t_end, t_first; unsigned long t_start_usec; unsigned long perf, duration, wu_latency; + long slack; timing_point_t *curr_timing; timing_point_t *timings; timing_point_t tmp_timing; @@ -563,9 +568,9 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s\n", + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s\n", "#idx", "perf", "run", "period", - "start", "end", "rel_st", "wu_lat"); + "start", "end", "rel_st", "wu_lat", "slack");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); @@ -596,9 +601,10 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;

- duration = wu_latency = 0; + duration = wu_latency = slack = 0; clock_gettime(CLOCK_MONOTONIC, &t_start); - perf = run(data->ind, pdata, &duration, *(data->resources), &t_first, &wu_latency); + perf = run(data->ind, pdata, &duration, *(data->resources), + &t_first, &wu_latency, &slack); clock_gettime(CLOCK_MONOTONIC, &t_end);

if (timings) @@ -617,6 +623,7 @@ void *thread_body(void *arg) curr_timing->duration = duration; curr_timing->perf = perf; curr_timing->wu_latency = wu_latency; + curr_timing->slack = slack;

if (opts.logsize && !timings) log_timing(data->log_handler, curr_timing); diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 4c20f43cb627..0570e7b5ad3e 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -192,6 +192,7 @@ typedef struct _timing_point_t { unsigned long duration; unsigned long period; unsigned long wu_latency; + long slack; __u64 start_time; __u64 end_time; __u64 rel_start_time; diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 28896c9a76e6..a1a7c05d202e 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -35,6 +35,12 @@ timespec_to_usec_ull(struct timespec *ts) return llround((ts->tv_sec * 1E9 + ts->tv_nsec) / 1000.0); }

+long +timespec_to_usec_long(struct timespec *ts) +{ + return round((ts->tv_sec * 1E9 + ts->tv_nsec) / 1000.0); +} + #ifdef DLSCHED __u64 timespec_to_nsec(struct timespec *ts) @@ -134,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld", t->ind, t->perf, t->duration, @@ -142,7 +148,8 @@ log_timing(FILE *handler, timing_point_t *t) t->start_time, t->end_time, t->rel_start_time, - t->wu_latency + t->wu_latency, + t->slack ); fprintf(handler, "\n"); } diff --git a/src/rt-app_utils.h b/src/rt-app_utils.h index a41d8ad205aa..6a8d921d1321 100644 --- a/src/rt-app_utils.h +++ b/src/rt-app_utils.h @@ -92,6 +92,9 @@ timespec_to_usec(struct timespec *ts); unsigned long long timespec_to_usec_ull(struct timespec *ts);

+long +timespec_to_usec_long(struct timespec *ts); + struct timespec usec_to_timespec(unsigned long usec);

-- 2.10.0

Juri Lelli

4:34 p.m.

New subject: [PATCH v2 06/14] rt-app: fix batch tasks

A big run duration can cause load_count to overflow. Fix this issue by running the workload in bursts of 1 sec each.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index bce004bc91a3..dc84e2cc22ff 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -48,10 +48,11 @@ static ftrace_data_t ft_data = { * Function: to do some useless operation. * TODO: improve the waste loop with more heavy functions */ -void waste_cpu_cycles(int load_loops) +void waste_cpu_cycles(unsigned long long load_loops) { double param, result; - double n, i; + double n; + unsigned long long i;

param = 0.95; n = 4; @@ -175,7 +176,24 @@ int calibrate_cpu_cycles(int clock) static inline unsigned long loadwait(unsigned long exec) { unsigned long load_count; + unsigned long secs; + int i; + + /* + * If exec is still too big, let's run it in bursts + * so that we don't overflow load_count. + */ + secs = exec / 1000000;

+ for (i = 0; i < secs; i++) { + load_count = 1000000000/p_load; + waste_cpu_cycles(load_count); + exec -= 1000000; + } + + /* + * Run for the remainig exec (if any). + */ load_count = (exec * 1000)/p_load; waste_cpu_cycles(load_count);

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:36 p.m.

New subject: [PATCH v2 06/14] rt-app: fix batch tasks

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

A big run duration can cause load_count to overflow. Fix this issue by running the workload in bursts of 1 sec each.

Signed-off-by: Juri Lelli juri.lelli@arm.com

Acked-by: Vincent Guittot vincent.guittot@linaro.org

...

src/rt-app.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index bce004bc91a3..dc84e2cc22ff 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -48,10 +48,11 @@ static ftrace_data_t ft_data = {

Function: to do some useless operation.

TODO: improve the waste loop with more heavy functions

*/ -void waste_cpu_cycles(int load_loops) +void waste_cpu_cycles(unsigned long long load_loops) { double param, result;
  double n, i;
  double n;
  unsigned long long i;

  param = 0.95;
  n = 4;
@@ -175,7 +176,24 @@ int calibrate_cpu_cycles(int clock) static inline unsigned long loadwait(unsigned long exec) { unsigned long load_count;
  unsigned long secs;
  int i;
  /*
   * If exec is still too big, let's run it in bursts
   * so that we don't overflow load_count.
   */
  secs = exec / 1000000;
  for (i = 0; i < secs; i++) {
          load_count = 1000000000/p_load;
          waste_cpu_cycles(load_count);
          exec -= 1000000;
  }
  /*
   * Run for the remainig exec (if any).
   */
  load_count = (exec * 1000)/p_load;
  waste_cpu_cycles(load_count);
-- 2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 07/14] rt-app: add 'configured period' to log metrics

It is handy to have the configured period directly in logs. Add it there.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 22 ++++++++++++++-------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index dc84e2cc22ff..3dc4f6e1adaf 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -238,7 +238,8 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources, - struct timespec *t_first, unsigned long *wu_latency, long *slack) + struct timespec *t_first, unsigned long *wu_latency, long *slack, + unsigned long *c_period) { rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -321,6 +322,7 @@ static int run_event(event_data_t *event, int dry_run, log_debug("timer %d ", event->duration);

t_period = usec_to_timespec(event->duration); + *c_period = event->duration;

if (rdata->res.timer.init == 0) { rdata->res.timer.init = 1; @@ -379,7 +381,8 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources, struct timespec *t_first, unsigned long *wu_latency, - long *slack) + long *slack, + unsigned long *c_period) { event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -397,7 +400,8 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, "[%d] executing %d", ind, i); lock += run_event(&events[i], !continue_running, &perf, - duration, resources, t_first, wu_latency, slack); + duration, resources, t_first, wu_latency, slack, + c_period); }

return perf; @@ -445,7 +449,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec; - unsigned long perf, duration, wu_latency; + unsigned long perf, duration, wu_latency, c_period; long slack; timing_point_t *curr_timing; timing_point_t *timings; @@ -586,9 +590,10 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s\n", + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %8s\n", "#idx", "perf", "run", "period", - "start", "end", "rel_st", "wu_lat", "slack"); + "start", "end", "rel_st", "wu_lat", "slack", + "c_prd");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); @@ -619,10 +624,10 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;

- duration = wu_latency = slack = 0; + duration = wu_latency = slack = c_period = 0; clock_gettime(CLOCK_MONOTONIC, &t_start); perf = run(data->ind, pdata, &duration, *(data->resources), - &t_first, &wu_latency, &slack); + &t_first, &wu_latency, &slack, &c_period); clock_gettime(CLOCK_MONOTONIC, &t_end);

if (timings) @@ -642,6 +647,7 @@ void *thread_body(void *arg) curr_timing->perf = perf; curr_timing->wu_latency = wu_latency; curr_timing->slack = slack; + curr_timing->c_period = c_period;

if (opts.logsize && !timings) log_timing(data->log_handler, curr_timing); diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 0570e7b5ad3e..0ec8781562ee 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period; + unsigned long c_period; unsigned long wu_latency; long slack; __u64 start_time; diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index a1a7c05d202e..e7cd8a83cec7 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld %8lu", t->ind, t->perf, t->duration, @@ -149,7 +149,8 @@ log_timing(FILE *handler, timing_point_t *t) t->end_time, t->rel_start_time, t->wu_latency, - t->slack + t->slack, + t->c_period ); fprintf(handler, "\n"); }

-- 2.10.0

Vincent Guittot

21 Nov 21 Nov

10:03 a.m.

New subject: [PATCH v2 07/14] rt-app: add 'configured period' to log metrics

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

It is handy to have the configured period directly in logs. Add it there.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 22 ++++++++++++++-------- src/rt-app_types.h | 1 + src/rt-app_utils.c | 5 +++-- 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index dc84e2cc22ff..3dc4f6e1adaf 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -238,7 +238,8 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem)

static int run_event(event_data_t *event, int dry_run, unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources,
          struct timespec *t_first, unsigned long *wu_latency, long *slack)
          struct timespec *t_first, unsigned long *wu_latency, long *slack,
          unsigned long *c_period)
{ rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -321,6 +322,7 @@ static int run_event(event_data_t *event, int dry_run, log_debug("timer %d ", event->duration);
                    t_period = usec_to_timespec(event->duration);
                  *c_period = event->duration;

Like for c_run, we can have several periods in a phase and the c_period is per phase not per event so we should probably add them and reset the period when starting a new phase

...

                    if (rdata->res.timer.init == 0) {
                            rdata->res.timer.init = 1;
@@ -379,7 +381,8 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, rtapp_resource_t *resources, struct timespec *t_first, unsigned long *wu_latency,
          long *slack)
          long *slack,
          unsigned long *c_period)
{ event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -397,7 +400,8 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, "[%d] executing %d", ind, i); lock += run_event(&events[i], !continue_running, &perf,
                            duration, resources, t_first, wu_latency, slack);
                            duration, resources, t_first, wu_latency, slack,
                            c_period);
  }

  return perf;
@@ -445,7 +449,7 @@ void *thread_body(void *arg) struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec;
  unsigned long perf, duration, wu_latency;
  unsigned long perf, duration, wu_latency, c_period;
  long slack;
  timing_point_t *curr_timing;
  timing_point_t *timings;
@@ -586,9 +590,10 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s\n",
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %8s\n",
                             "#idx", "perf", "run", "period",
                             "start", "end", "rel_st", "wu_lat", "slack");
                             "start", "end", "rel_st", "wu_lat", "slack",
                             "c_prd");

  if (opts.ftrace)
          log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind);
@@ -619,10 +624,10 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;
          duration = wu_latency = slack = 0;
          duration = wu_latency = slack = c_period = 0;
          clock_gettime(CLOCK_MONOTONIC, &t_start);
          perf = run(data->ind, pdata, &duration, *(data->resources),
                          &t_first, &wu_latency, &slack);
                          &t_first, &wu_latency, &slack, &c_period);
          clock_gettime(CLOCK_MONOTONIC, &t_end);

          if (timings)
@@ -642,6 +647,7 @@ void *thread_body(void *arg) curr_timing->perf = perf; curr_timing->wu_latency = wu_latency; curr_timing->slack = slack;
          curr_timing->c_period = c_period;

          if (opts.logsize && !timings)
                  log_timing(data->log_handler, curr_timing);
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 0570e7b5ad3e..0ec8781562ee 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -191,6 +191,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period;
  unsigned long c_period;
  unsigned long wu_latency;
  long slack;
  __u64 start_time;
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index a1a7c05d202e..e7cd8a83cec7 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler,
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld",
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld %8lu",
          t->ind,
          t->perf,
          t->duration,
@@ -149,7 +149,8 @@ log_timing(FILE *handler, timing_point_t *t) t->end_time, t->rel_start_time, t->wu_latency,
          t->slack
          t->slack,
          t->c_period
  );
  fprintf(handler, "\n");
}

2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 08/14] rt-app: write logs only if continue_running is set

This is to avoid cases where the last row can contain bogus values.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 3dc4f6e1adaf..182110b9c3ec 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -649,7 +649,7 @@ void *thread_body(void *arg) curr_timing->slack = slack; curr_timing->c_period = c_period;

- if (opts.logsize && !timings) + if (opts.logsize && !timings && continue_running) log_timing(data->log_handler, curr_timing);

if (opts.ftrace)

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:36 p.m.

New subject: [PATCH v2 08/14] rt-app: write logs only if continue_running is set

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

This is to avoid cases where the last row can contain bogus values.

Signed-off-by: Juri Lelli juri.lelli@arm.com

Acked-by: Vincent Guittot vincent.guittot@linaro.org

...

src/rt-app.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 3dc4f6e1adaf..182110b9c3ec 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -649,7 +649,7 @@ void *thread_body(void *arg) curr_timing->slack = slack; curr_timing->c_period = c_period;
          if (opts.logsize && !timings)
          if (opts.logsize && !timings && continue_running)
                  log_timing(data->log_handler, curr_timing);

          if (opts.ftrace)
-- 2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 09/14] rt-app: refactor log data to reduce the number of parameters of run()

Just keep log related variables on a struct to reduce the number of parameters we pass to run[_event]().

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 49 +++++++++++++++++++++++-------------------------- src/rt-app_types.h | 8 ++++++++ 2 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 182110b9c3ec..6af272b4b109 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -237,9 +237,8 @@ static void memload(unsigned long count, struct _rtapp_iomem_buf *iomem) }

static int run_event(event_data_t *event, int dry_run, - unsigned long *perf, unsigned long *duration, rtapp_resource_t *resources, - struct timespec *t_first, unsigned long *wu_latency, long *slack, - unsigned long *c_period) + unsigned long *perf, rtapp_resource_t *resources, + struct timespec *t_first, log_data_t *ldata) { rtapp_resource_t *rdata = &(resources[event->res]); rtapp_resource_t *ddata = &(resources[event->dep]); @@ -293,7 +292,7 @@ static int run_event(event_data_t *event, int dry_run, *perf += loadwait(event->duration); clock_gettime(CLOCK_MONOTONIC, &t_end); t_end = timespec_sub(&t_end, &t_start); - *duration += timespec_to_usec(&t_end); + ldata->duration += timespec_to_usec(&t_end); } break; case rtapp_runtime: @@ -313,7 +312,7 @@ static int run_event(event_data_t *event, int dry_run, } while ((diff_ns / 1000) < event->duration);

t_end = timespec_sub(&t_end, &t_start); - *duration += timespec_to_usec(&t_end); + ldata->duration += timespec_to_usec(&t_end); } break; case rtapp_timer: @@ -322,7 +321,7 @@ static int run_event(event_data_t *event, int dry_run, log_debug("timer %d ", event->duration);

t_period = usec_to_timespec(event->duration); - *c_period = event->duration; + ldata->c_period = event->duration;

if (rdata->res.timer.init == 0) { rdata->res.timer.init = 1; @@ -332,15 +331,15 @@ static int run_event(event_data_t *event, int dry_run, rdata->res.timer.t_next = timespec_add(&rdata->res.timer.t_next, &t_period); clock_gettime(CLOCK_MONOTONIC, &t_now); t_slack = timespec_sub(&rdata->res.timer.t_next, &t_now); - *slack = timespec_to_usec_long(&t_slack); + ldata->slack = timespec_to_usec_long(&t_slack); if (timespec_lower(&t_now, &rdata->res.timer.t_next)) { clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &rdata->res.timer.t_next, NULL); clock_gettime(CLOCK_MONOTONIC, &t_now); t_wu = timespec_sub(&t_now, &rdata->res.timer.t_next); - *wu_latency = timespec_to_usec(&t_wu); + ldata->wu_latency = timespec_to_usec(&t_wu); } else { clock_gettime(CLOCK_MONOTONIC, &rdata->res.timer.t_next); - *wu_latency = 0UL; + ldata->wu_latency = 0UL; } } break; @@ -377,12 +376,11 @@ static int run_event(event_data_t *event, int dry_run, return lock; }

-int run(int ind, phase_data_t *pdata, unsigned long *duration, - rtapp_resource_t *resources, - struct timespec *t_first, - unsigned long *wu_latency, - long *slack, - unsigned long *c_period) +int run(int ind, + phase_data_t *pdata, + rtapp_resource_t *resources, + struct timespec *t_first, + log_data_t *ldata) { event_data_t *events = pdata->events; int nbevents = pdata->nbevents; @@ -400,8 +398,7 @@ int run(int ind, phase_data_t *pdata, unsigned long *duration, "[%d] executing %d", ind, i); lock += run_event(&events[i], !continue_running, &perf, - duration, resources, t_first, wu_latency, slack, - c_period); + resources, t_first, ldata); }

return perf; @@ -446,10 +443,10 @@ void *thread_body(void *arg) { thread_data_t *data = (thread_data_t*) arg; phase_data_t *pdata; + log_data_t ldata; struct sched_param param; struct timespec t_start, t_end, t_first; unsigned long t_start_usec; - unsigned long perf, duration, wu_latency, c_period; long slack; timing_point_t *curr_timing; timing_point_t *timings; @@ -624,10 +621,10 @@ void *thread_body(void *arg) log_ftrace(ft_data.marker_fd, "[%d] begins loop %d phase %d step %d", data->ind, i, j, loop); log_debug("[%d] begins loop %d phase %d step %d", data->ind, i, j, loop);;

- duration = wu_latency = slack = c_period = 0; + memset(&ldata, 0, sizeof(ldata)); clock_gettime(CLOCK_MONOTONIC, &t_start); - perf = run(data->ind, pdata, &duration, *(data->resources), - &t_first, &wu_latency, &slack, &c_period); + ldata.perf = run(data->ind, pdata, *(data->resources), + &t_first, &ldata); clock_gettime(CLOCK_MONOTONIC, &t_end);

if (timings) @@ -643,11 +640,11 @@ void *thread_body(void *arg) curr_timing->start_time = timespec_to_usec_ull(&t_start); curr_timing->end_time = timespec_to_usec_ull(&t_end); curr_timing->period = timespec_to_usec(&t_diff); - curr_timing->duration = duration; - curr_timing->perf = perf; - curr_timing->wu_latency = wu_latency; - curr_timing->slack = slack; - curr_timing->c_period = c_period; + curr_timing->duration = ldata.duration; + curr_timing->perf = ldata.perf; + curr_timing->wu_latency = ldata.wu_latency; + curr_timing->slack = ldata.slack; + curr_timing->c_period = ldata.c_period;

if (opts.logsize && !timings && continue_running) log_timing(data->log_handler, curr_timing); diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 0ec8781562ee..276bce02d1e6 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -160,6 +160,14 @@ typedef struct _ftrace_data_t { int marker_fd; } ftrace_data_t;

+typedef struct _log_data_t { + unsigned long perf; + unsigned long duration; + unsigned long wu_latency; + unsigned long c_period; + long slack; +} log_data_t; + typedef struct _rtapp_options_t { int lock_pages;

-- 2.10.0

Juri Lelli

4:34 p.m.

New subject: [PATCH v2 10/14] rt-app: move wu_lat to the end of the log

Move wu_lat to the end of the log, so that can be easily removed in the future if not needed anymore.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 6 +++--- src/rt-app_utils.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 6af272b4b109..dbfc1f5063a1 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -587,10 +587,10 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %8s\n", + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %8s %10s\n", "#idx", "perf", "run", "period", - "start", "end", "rel_st", "wu_lat", "slack", - "c_prd"); + "start", "end", "rel_st", "slack", + "c_prd", "wu_lat");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index e7cd8a83cec7..747ac2b7d83d 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10lu %10ld %8lu", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %8lu %10lu", t->ind, t->perf, t->duration, @@ -148,9 +148,9 @@ log_timing(FILE *handler, timing_point_t *t) t->start_time, t->end_time, t->rel_start_time, - t->wu_latency, t->slack, - t->c_period + t->c_period, + t->wu_latency ); fprintf(handler, "\n"); }

-- 2.10.0

Juri Lelli

4:34 p.m.

New subject: [PATCH v2 11/14] rt-app: make c_prd c_period and enlarge it

As the name is more understandable.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 4 ++-- src/rt-app_utils.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index dbfc1f5063a1..cd3483f25deb 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -587,10 +587,10 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %8s %10s\n", + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %10s\n", "#idx", "perf", "run", "period", "start", "end", "rel_st", "slack", - "c_prd", "wu_lat"); + "c_period", "wu_lat");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index 747ac2b7d83d..ea5724d8c523 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %8lu %10lu", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %10lu %10lu", t->ind, t->perf, t->duration,

-- 2.10.0

Juri Lelli

4:34 p.m.

New subject: [PATCH v2 12/14] rt-app: add configured run (c_run) to log metrics

Add c_run (the run value that comes from json configuration file). This is handy for post-processing logs.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 6 ++++-- src/rt-app_types.h | 2 ++ src/rt-app_utils.c | 3 ++- 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index cd3483f25deb..2354d423bcd2 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -288,6 +288,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration); + ldata->c_run = event->duration; clock_gettime(CLOCK_MONOTONIC, &t_start); *perf += loadwait(event->duration); clock_gettime(CLOCK_MONOTONIC, &t_end); @@ -587,10 +588,10 @@ void *thread_body(void *arg)

log_notice("[%d] starting thread ...\n", data->ind);

- fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %10s\n", + fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %10s %10s\n", "#idx", "perf", "run", "period", "start", "end", "rel_st", "slack", - "c_period", "wu_lat"); + "c_run", "c_period", "wu_lat");

if (opts.ftrace) log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind); @@ -645,6 +646,7 @@ void *thread_body(void *arg) curr_timing->wu_latency = ldata.wu_latency; curr_timing->slack = ldata.slack; curr_timing->c_period = ldata.c_period; + curr_timing->c_run = ldata.c_run;

if (opts.logsize && !timings && continue_running) log_timing(data->log_handler, curr_timing); diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 276bce02d1e6..5a1d87c24e78 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -164,6 +164,7 @@ typedef struct _log_data_t { unsigned long perf; unsigned long duration; unsigned long wu_latency; + unsigned long c_run; unsigned long c_period; long slack; } log_data_t; @@ -199,6 +200,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period; + unsigned long c_run; unsigned long c_period; unsigned long wu_latency; long slack; diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index ea5724d8c523..c40eed55109d 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler, - "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %10lu %10lu", + "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %10lu %10lu %10lu", t->ind, t->perf, t->duration, @@ -149,6 +149,7 @@ log_timing(FILE *handler, timing_point_t *t) t->end_time, t->rel_start_time, t->slack, + t->c_run, t->c_period, t->wu_latency );

-- 2.10.0

Vincent Guittot

21 Nov 21 Nov

10:01 a.m.

New subject: [PATCH v2 12/14] rt-app: add configured run (c_run) to log metrics

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Add c_run (the run value that comes from json configuration file). This is handy for post-processing logs.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 6 ++++-- src/rt-app_types.h | 2 ++ src/rt-app_utils.c | 3 ++- 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index cd3483f25deb..2354d423bcd2 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -288,6 +288,7 @@ static int run_event(event_data_t *event, int dry_run, { struct timespec t_start, t_end; log_debug("run %d ", event->duration);
                  ldata->c_run = event->duration;

The events are run by phase so you should aad the duration of all run event of the phase and reset it when we start a new phase

should we also add the runtime event as well ?

...

                    clock_gettime(CLOCK_MONOTONIC, &t_start);
                    *perf += loadwait(event->duration);
                    clock_gettime(CLOCK_MONOTONIC, &t_end);
@@ -587,10 +588,10 @@ void *thread_body(void *arg)
    log_notice("[%d] starting thread ...\n", data->ind);
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %10s\n",
  fprintf(data->log_handler, "%s %8s %8s %8s %15s %15s %15s %10s %10s %10s %10s\n",
                             "#idx", "perf", "run", "period",
                             "start", "end", "rel_st", "slack",
                             "c_period", "wu_lat");
                             "c_run", "c_period", "wu_lat");

  if (opts.ftrace)
          log_ftrace(ft_data.marker_fd, "[%d] starts", data->ind);
@@ -645,6 +646,7 @@ void *thread_body(void *arg) curr_timing->wu_latency = ldata.wu_latency; curr_timing->slack = ldata.slack; curr_timing->c_period = ldata.c_period;
          curr_timing->c_run = ldata.c_run;

          if (opts.logsize && !timings && continue_running)
                  log_timing(data->log_handler, curr_timing);
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 276bce02d1e6..5a1d87c24e78 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -164,6 +164,7 @@ typedef struct _log_data_t { unsigned long perf; unsigned long duration; unsigned long wu_latency;
  unsigned long c_run;
  unsigned long c_period;
  long slack;
} log_data_t; @@ -199,6 +200,7 @@ typedef struct _timing_point_t { unsigned long perf; unsigned long duration; unsigned long period;
  unsigned long c_run;
  unsigned long c_period;
  unsigned long wu_latency;
  long slack;
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index ea5724d8c523..c40eed55109d 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -140,7 +140,7 @@ void log_timing(FILE *handler, timing_point_t *t) { fprintf(handler,
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %10lu %10lu",
          "%4d %8lu %8lu %8lu %15llu %15llu %15llu %10ld %10lu %10lu %10lu",
          t->ind,
          t->perf,
          t->duration,
@@ -149,6 +149,7 @@ log_timing(FILE *handler, timing_point_t *t) t->end_time, t->rel_start_time, t->slack,
          t->c_run,
          t->c_period,
          t->wu_latency
  );
-- 2.10.0

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 13/14] rt-app: let caller manage tracing start/stop

Don't explicitly manage start/stop of tracing, but assume that the caller does that before starting rt-app. This helps if the caller wants more padding before/after an rt-app execution.

In case tracing is not started and ftrace is enable in the json configuration, rt-app will fail to start.

Signed-off-by: Juri Lelli juri.lelli@arm.com --- src/rt-app.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 2354d423bcd2..2c6df5251ebe 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -430,9 +430,8 @@ shutdown(int sig) }

if (opts.ftrace) { - log_notice("stopping ftrace"); log_ftrace(ft_data.marker_fd, "main ends\n"); - log_ftrace(ft_data.trace_fd, "0"); + log_notice("deconfiguring ftrace"); close(ft_data.trace_fd); close(ft_data.marker_fd); } @@ -745,7 +744,6 @@ int main(int argc, char* argv[]) exit(EXIT_FAILURE); }

- log_ftrace(ft_data.trace_fd, "1"); log_ftrace(ft_data.marker_fd, "main creates threads\n"); }

@@ -937,7 +935,6 @@ int main(int argc, char* argv[]) if (opts.ftrace) { log_notice("stopping ftrace"); log_ftrace(ft_data.marker_fd, "main ends\n"); - log_ftrace(ft_data.trace_fd, "0"); close(ft_data.trace_fd); close(ft_data.marker_fd); }

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:43 p.m.

New subject: [PATCH v2 13/14] rt-app: let caller manage tracing start/stop

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Don't explicitly manage start/stop of tracing, but assume that the caller does that before starting rt-app. This helps if the caller wants more padding before/after an rt-app execution.

In case tracing is not started and ftrace is enable in the json configuration, rt-app will fail to start.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 2354d423bcd2..2c6df5251ebe 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -430,9 +430,8 @@ shutdown(int sig) }
    if (opts.ftrace) {
          log_notice("stopping ftrace");
          log_ftrace(ft_data.marker_fd, "main ends\n");
          log_ftrace(ft_data.trace_fd, "0");
          log_notice("deconfiguring ftrace");

...

            close(ft_data.trace_fd);

you can remove it as well as the open and all reference to trace_fd which is not used anymore AFAICT

...

            close(ft_data.marker_fd);
    }
@@ -745,7 +744,6 @@ int main(int argc, char* argv[]) exit(EXIT_FAILURE); }
          log_ftrace(ft_data.trace_fd, "1");
          log_ftrace(ft_data.marker_fd, "main creates threads\n");
  }
@@ -937,7 +935,6 @@ int main(int argc, char* argv[]) if (opts.ftrace) { log_notice("stopping ftrace");

you can remove this log

...

            log_ftrace(ft_data.marker_fd, "main ends\n");

          log_ftrace(ft_data.trace_fd, "0");
          close(ft_data.trace_fd);

ditto

...

            close(ft_data.marker_fd);
    }

-- 2.10.0

Juri Lelli

4 p.m.

New subject: [PATCH v2 13/14] rt-app: let caller manage tracing start/stop

On 18/11/16 16:43, Vincent Guittot wrote:

...

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...
Don't explicitly manage start/stop of tracing, but assume that the caller does that before starting rt-app. This helps if the caller wants more padding before/after an rt-app execution.

In case tracing is not started and ftrace is enable in the json configuration, rt-app will fail to start.

Signed-off-by: Juri Lelli juri.lelli@arm.com

src/rt-app.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/rt-app.c b/src/rt-app.c index 2354d423bcd2..2c6df5251ebe 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -430,9 +430,8 @@ shutdown(int sig) }
    if (opts.ftrace) {
          log_notice("stopping ftrace");
          log_ftrace(ft_data.marker_fd, "main ends\n");
          log_ftrace(ft_data.trace_fd, "0");
          log_notice("deconfiguring ftrace");
...
            close(ft_data.trace_fd);
you can remove it as well as the open and all reference to trace_fd which is not used anymore AFAICT

...
            close(ft_data.marker_fd);
    }
@@ -745,7 +744,6 @@ int main(int argc, char* argv[]) exit(EXIT_FAILURE); }
          log_ftrace(ft_data.trace_fd, "1");
          log_ftrace(ft_data.marker_fd, "main creates threads\n");
  }
@@ -937,7 +935,6 @@ int main(int argc, char* argv[]) if (opts.ftrace) { log_notice("stopping ftrace");
you can remove this log

...
            log_ftrace(ft_data.marker_fd, "main ends\n");
          log_ftrace(ft_data.trace_fd, "0");
          close(ft_data.trace_fd);
ditto

Right. I'll update accordingly.

Best,

- Juri

Juri Lelli

17 Nov 17 Nov

4:34 p.m.

New subject: [PATCH v2 14/14] rt-app: add delay thread property

Add a new thread property called 'delay' with which the initial starting time of a thread can be delayed. Parameter is expressed in usec.

Usage is:

"tasks" : { "thread0" : { ... "delay" : 1000000, ... } }

Signed-off-by: Juri Lelli juri.lelli@arm.com --- doc/tutorial.txt | 3 +++ src/rt-app.c | 10 ++++++++++ src/rt-app_parse_config.c | 5 +++++ src/rt-app_types.h | 2 ++ 4 files changed, 20 insertions(+)

diff --git a/doc/tutorial.txt b/doc/tutorial.txt index 78cd1fe0e2e2..f88c69f29c1d 100644 --- a/doc/tutorial.txt +++ b/doc/tutorial.txt @@ -200,6 +200,9 @@ class. Default value is period. The unit is usec. * cpus: Array of Integer. Define the CPU affinity of the thread. Default value is all CPUs of the system. An example : "cpus" : [0, 2, 3]

+* delay: Integer. Initial delay before a thread starts execution. The unit +is usec. + * phases: Object. The phases object described TBF

If there is only 1 phase, the sequence of events can be directly put in the diff --git a/src/rt-app.c b/src/rt-app.c index 2c6df5251ebe..4eaa693f7053 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -612,6 +612,16 @@ void *thread_body(void *arg) } } #endif + + if (data->delay > 0) { + struct timespec delay = usec_to_timespec(data->delay); + + log_debug("initial delay %d ", data->delay); + t_first = timespec_add(&t_first, &delay); + clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t_first, + NULL); + } + i = j = loop = idx = 0;

while (continue_running && (i != data->loop)) { diff --git a/src/rt-app_parse_config.c b/src/rt-app_parse_config.c index c59b56290bbf..b0f302e2ba25 100644 --- a/src/rt-app_parse_config.c +++ b/src/rt-app_parse_config.c @@ -689,6 +689,11 @@ parse_thread_data(char *name, struct json_object *obj, int index, } log_info(PIN "key: cpus %s", data->cpuset_str);

+ /* initial delay */ + data->delay = get_int_value_from(obj, "delay", TRUE, 0); + if (data->delay < 0) + data->delay = 0; + /* Get phases */ phases_obj = get_in_object(obj, "phases", TRUE); if (phases_obj) { diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 5a1d87c24e78..60bd1e85d671 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -149,6 +149,8 @@ typedef struct _thread_data_t { char sched_policy_descr[RTAPP_POLICY_DESCR_LENGTH]; int sched_prio;

+ unsigned long delay; + #ifdef DLSCHED struct sched_attr dl_params; #endif

-- 2.10.0

Vincent Guittot

18 Nov 18 Nov

3:37 p.m.

New subject: [PATCH v2 14/14] rt-app: add delay thread property

On 17 November 2016 at 17:34, Juri Lelli juri.lelli@arm.com wrote:

...

Add a new thread property called 'delay' with which the initial starting time of a thread can be delayed. Parameter is expressed in usec.

Usage is:

"tasks" : { "thread0" : { ... "delay" : 1000000, ... } }

Signed-off-by: Juri Lelli juri.lelli@arm.com

Acked-by: Vincent Guittot vincent.guittot@linaro.org

...

doc/tutorial.txt | 3 +++ src/rt-app.c | 10 ++++++++++ src/rt-app_parse_config.c | 5 +++++ src/rt-app_types.h | 2 ++ 4 files changed, 20 insertions(+)

diff --git a/doc/tutorial.txt b/doc/tutorial.txt index 78cd1fe0e2e2..f88c69f29c1d 100644 --- a/doc/tutorial.txt +++ b/doc/tutorial.txt @@ -200,6 +200,9 @@ class. Default value is period. The unit is usec.

cpus: Array of Integer. Define the CPU affinity of the thread. Default

value is all CPUs of the system. An example : "cpus" : [0, 2, 3]

+* delay: Integer. Initial delay before a thread starts execution. The unit +is usec.

phases: Object. The phases object described TBF

If there is only 1 phase, the sequence of events can be directly put in the diff --git a/src/rt-app.c b/src/rt-app.c index 2c6df5251ebe..4eaa693f7053 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -612,6 +612,16 @@ void *thread_body(void *arg) } } #endif
  if (data->delay > 0) {
          struct timespec delay = usec_to_timespec(data->delay);
          log_debug("initial delay %d ", data->delay);
          t_first = timespec_add(&t_first, &delay);
          clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &t_first,
                          NULL);
  }
  i = j = loop = idx = 0;

  while (continue_running && (i != data->loop)) {
diff --git a/src/rt-app_parse_config.c b/src/rt-app_parse_config.c index c59b56290bbf..b0f302e2ba25 100644 --- a/src/rt-app_parse_config.c +++ b/src/rt-app_parse_config.c @@ -689,6 +689,11 @@ parse_thread_data(char *name, struct json_object *obj, int index, } log_info(PIN "key: cpus %s", data->cpuset_str);
  /* initial delay */
  data->delay = get_int_value_from(obj, "delay", TRUE, 0);
  if (data->delay < 0)
          data->delay = 0;
  /* Get phases */
  phases_obj = get_in_object(obj, "phases", TRUE);
  if (phases_obj) {
diff --git a/src/rt-app_types.h b/src/rt-app_types.h index 5a1d87c24e78..60bd1e85d671 100644 --- a/src/rt-app_types.h +++ b/src/rt-app_types.h @@ -149,6 +149,8 @@ typedef struct _thread_data_t { char sched_policy_descr[RTAPP_POLICY_DESCR_LENGTH]; int sched_prio;
  unsigned long delay;
#ifdef DLSCHED struct sched_attr dl_params;

#endif

2.10.0

3153

days inactive

3159

days old

sched-tools@lists.linaro.org

55 comments

participants

tags (0)

participants (3)

Juri Lelli
Patrick Bellasi
Vincent Guittot