This patchset includes fixes/changes that have been done. These patches
mainly fixed some typos or moves code but should not change the
behavior of rt-app.
changes from v1:
- split style fix patches in smaller ones
- clarify some changelogs
- remove patches not suitable for this patchset
Those patches are also found in the branch:
https://git.linaro.org/people/picheng.chen/rt-app.git fixes_v2
Chris Muller (1):
Update thread name
Vincent Guittot (10):
fix deadline print format
consolidate trace and debug point
update .gitignore
fix inconsistency in delay unit
fix cpu affinity string info
deadline: set deadline field to deadline parameter
reorder the start sequence of threads
remove the yaml example as we don't support it
remove useless json_object_put
rt-app: remove use of deprecated json interface
pi-cheng.chen (9):
remove useless space and add blank lines to make the code more
readable
fix some comments
align parameters and indents
some style fixes
Rename variable to improve readability
do not sleep if we have run longer than expected
fix debugfs path
remove unused function
add missed code snip to get deadline parameter
.gitignore | 8 ++
doc/taskset.yml | 53 -------------
src/rt-app.c | 195 +++++++++++++++++++++++++++-------------------
src/rt-app_args.c | 57 +++++++-------
src/rt-app_parse_config.c | 49 +++++++-----
src/rt-app_utils.c | 3 +
6 files changed, 185 insertions(+), 180 deletions(-)
delete mode 100644 doc/taskset.yml
--
1.9.1
From: Xunlei Pang <pang.xunlei(a)linaro.org>
DVFS adds a latency in the execution of task because of the time to
decide to move at max freq. We need to measure this latency and check
that the governor stays in an acceptable range.
When workgen runs a json file, a log file is created for each thread.
This log file records the number of loop that has been executed and
the duration for executing these loops (per phase). We can use these
figures to evaluate to latency that is added by a cpufreq governor
and its "performance efficiency".
We use the run+sleep patten to do the measurement, for the run time per
loop, the performance governor should run the expected duration as the
CPU stays a max freq. At the opposite, the powersave governor will give
use the longest duration (as it stays at lowest OPP). Other governor will
be somewhere between the 2 previous duration as they will use several OPP
and will go back to max frequency after a defined duration which depends
on its monitoring period.
The formula:
duration of powersave gov - duration of the gov
-------------------------------------------------------- x 100%
duration of powersave gov - duration of performance gov
will give the efficiency of the governor. 100% means as efficient as
the perf governor and 0% means as efficient as the powersave governor.
This patch offers json files and shell scripts to do the measurement,
Usage: ./test.sh <cpus> <runtime> <sleeptime>
cpus: number of cpus in the CPU0's frequency domain
runtime: running time in ms per loop of the workload pattern
sleeptime: sleeping time in ms per loop of the workload patten
Example:
"./test.sh 4 100 1000" means
CPU0~CPU3 sharing frequency, "100ms run + 1000ms sleep" workload pattern.
test result on my machine:
~#./test.sh 4 100 1000
Frequency domain CPU0~CPU3, run 100ms, sleep 1000ms:
conservative efficiency: 28%
ondemand efficiency: 95%
NOTE: Make sure there are "sed", "cut", "grep", "rt-app" tools on your test
machine, and run the script under root privilege.
Signed-off-by: Xunlei Pang <pang.xunlei(a)linaro.org>
---
.../cpufreq_governor_efficiency/calibration.json | 27 ++++++++
.../cpufreq_governor_efficiency/calibration.sh | 9 +++
doc/examples/cpufreq_governor_efficiency/dvfs.json | 27 ++++++++
doc/examples/cpufreq_governor_efficiency/dvfs.sh | 38 ++++++++++++
doc/examples/cpufreq_governor_efficiency/test.sh | 71 ++++++++++++++++++++++
5 files changed, 172 insertions(+)
create mode 100644 doc/examples/cpufreq_governor_efficiency/calibration.json
create mode 100755 doc/examples/cpufreq_governor_efficiency/calibration.sh
create mode 100644 doc/examples/cpufreq_governor_efficiency/dvfs.json
create mode 100755 doc/examples/cpufreq_governor_efficiency/dvfs.sh
create mode 100755 doc/examples/cpufreq_governor_efficiency/test.sh
diff --git a/doc/examples/cpufreq_governor_efficiency/calibration.json b/doc/examples/cpufreq_governor_efficiency/calibration.json
new file mode 100644
index 0000000..4377990
--- /dev/null
+++ b/doc/examples/cpufreq_governor_efficiency/calibration.json
@@ -0,0 +1,27 @@
+{
+ "tasks" : {
+ "thread" : {
+ "instance" : 1,
+ "cpus" : [0],
+ "loop" : 1,
+ "phases" : {
+ "run" : {
+ "loop" : 1,
+ "run" : 200000,
+ },
+ "sleep" : {
+ "loop" : 1,
+ "sleep" : 200000,
+ }
+ }
+ }
+ },
+ "global" : {
+ "default_policy" : "SCHED_FIFO",
+ "calibration" : "CPU0",
+ "lock_pages" : true,
+ "ftrace" : true,
+ "logdir" : "./",
+ }
+}
+
diff --git a/doc/examples/cpufreq_governor_efficiency/calibration.sh b/doc/examples/cpufreq_governor_efficiency/calibration.sh
new file mode 100755
index 0000000..89fe5de
--- /dev/null
+++ b/doc/examples/cpufreq_governor_efficiency/calibration.sh
@@ -0,0 +1,9 @@
+# !/bin/bash
+
+echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
+
+sleep 1
+
+pLoad=`rt-app calibration.json 2>&1 |grep pLoad |sed 's/.*= \(.*\)ns.*/\1/'`
+sed 's/"calibration" : .*,/"calibration" : '$pLoad',/' -i dvfs.json
+
diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.json b/doc/examples/cpufreq_governor_efficiency/dvfs.json
new file mode 100644
index 0000000..b413156
--- /dev/null
+++ b/doc/examples/cpufreq_governor_efficiency/dvfs.json
@@ -0,0 +1,27 @@
+{
+ "tasks" : {
+ "thread" : {
+ "instance" : 1,
+ "cpus" : [0],
+ "loop" : 5,
+ "phases" : {
+ "running" : {
+ "loop" : 1,
+ "run" : 100000,
+ },
+ "sleeping" : {
+ "loop" : 1,
+ "sleep" : 1000000,
+ }
+ }
+ }
+ },
+ "global" : {
+ "default_policy" : "SCHED_OTHER",
+ "calibration" : 90,
+ "lock_pages" : true,
+ "ftrace" : true,
+ "logdir" : "./",
+ }
+}
+
diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/examples/cpufreq_governor_efficiency/dvfs.sh
new file mode 100755
index 0000000..1772041
--- /dev/null
+++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh
@@ -0,0 +1,38 @@
+# !/bin/bash
+
+#echo $1 $2 $3
+
+if [ $1 ] && [ $2 ] ; then
+ for i in `seq 0 1 $[$2-1]`
+ do
+ echo $1 > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
+ #cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
+ done
+
+ sleep 3
+fi
+
+if [ $3 ] ; then
+ sed 's/"run" : .*,/"run" : '$3',/' -i dvfs.json
+fi
+
+if [ $4 ] ; then
+ sed 's/"sleep" : .*,/"sleep" : '$4',/' -i dvfs.json
+fi
+
+#cat dvfs.json
+
+rt-app dvfs.json 2> /dev/null
+
+if [ $1 ] ; then
+ mv rt-app-thread-0.log rt-app_$1_run$3us_sleep$4us.log
+
+ declare -i sum
+ sum=0
+ for i in `cat rt-app_$1_run$3us_sleep$4us.log | sed '1~2!d' | sed '1d' |cut -d " " -f 3`;
+ do sum=sum+$i
+ done
+ sum=sum/5
+ echo $sum
+fi
+
diff --git a/doc/examples/cpufreq_governor_efficiency/test.sh b/doc/examples/cpufreq_governor_efficiency/test.sh
new file mode 100755
index 0000000..6f41a1b
--- /dev/null
+++ b/doc/examples/cpufreq_governor_efficiency/test.sh
@@ -0,0 +1,71 @@
+# !/bin/bash
+
+function set_calibration
+{
+ calibration.sh
+}
+
+function test_efficiency
+{
+ declare -i performance
+ declare -i powersave
+ declare -i conservative
+ declare -i ondemand
+ declare -i denominator
+ declare -i numerator
+
+ FILENAME="results_$RANDOM$$.txt"
+
+ dvfs.sh performance $1 $2 $3> $FILENAME
+ dvfs.sh powersave $1 $2 $3 >> $FILENAME
+ dvfs.sh conservative $1 $2 $3 >> $FILENAME
+ dvfs.sh ondemand $1 $2 $3 >> $FILENAME
+
+ performance=`cat $FILENAME |sed -n '1p'`
+ powersave=`cat $FILENAME |sed -n '2p'`
+ conservative=`cat $FILENAME |sed -n '3p'`
+ ondemand=`cat $FILENAME |sed -n '4p'`
+
+ rm -f $FILENAME
+
+ denominator=$powersave-$performance
+ numerator=($powersave-$conservative)*100
+
+ if [ $denominator -le 0 ] ; then
+ echo "Probably not input all the cpus in the same frequncy domain"
+ exit
+ fi
+
+ if [ $numerator -lt 0 ] ; then
+ numerator=0
+ fi
+
+ conservative=$numerator/$denominator
+ echo "conservative efficiency: $conservative%"
+
+ numerator=($powersave-$ondemand)*100
+ if [ $numerator -lt 0 ] ; then
+ numerator=0
+ fi
+
+ ondemand=$numerator/$denominator
+ echo -e "ondemand efficiency: $ondemand%\n"
+}
+
+if [ $# -lt 3 ]; then
+ echo "Usage: ./test.sh <cpus> <runtime> <sleeptime>"
+ echo "cpus: number of cpus in the CPU0's frequency domain"
+ echo "runtime: running time in ms per loop of the workload pattern"
+ echo "sleeptime: sleeping time in ms per loop of the workload patten"
+ echo -e "\nExample: \n\"./test.sh 4 100 1000\" means\nCPU0~CPU3 sharing frequency, \"100ms run + 1000ms sleep\" workload pattern.\n"
+ exit
+fi
+
+echo "Frequency domain CPU0~CPU$[$1-1], run $2ms, sleep $3ms:"
+
+PATH=$PATH:.
+set_calibration
+test_efficiency $1 $[$2*1000] $[$3*1000]
+
+sleep 5
+
--
1.9.1
add a template.json file that can be used by tune_json.py to create use cases
with various type a load
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
doc/examples/template.json | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
create mode 100644 doc/examples/template.json
diff --git a/doc/examples/template.json b/doc/examples/template.json
new file mode 100644
index 0000000..5ed3215
--- /dev/null
+++ b/doc/examples/template.json
@@ -0,0 +1,28 @@
+{
+ /*
+ * Simple use case which creates 10% load
+ * for 6 seconds.
+ * A "sleep" : 0 has been added so the file can be used by tune_json.py to
+ * use a sleep event instead of the timer. In this latter case, you need
+ * to set the timer's period to 0
+ */
+ "tasks" : {
+ "thread0" : {
+ "instance" : 1,
+ "loop" : -1,
+ "run" : 10000,
+ "sleep" : 0,
+ "timer" : { "ref" : "unique", "period" : 100000 }
+ }
+ },
+ "global" : {
+ "duration" : 6,
+ "calibration" : "CPU0",
+ "default_policy" : "SCHED_OTHER",
+ "pi_enabled" : false,
+ "lock_pages" : false,
+ "logdir" : "./",
+ "log_basename" : "rt-app2",
+ "gnuplot" : true
+ }
+}
--
1.9.1
Calibration sequence fails to get the right ns per loop value when the
calibration is done on the A57 cluster of the mt8173evb.
A new calibration has been added that adds a sleep period between each each
calibration loop. This idle phase enable rt-app to get the to get the right ns
per loop value for such kind of HW.
The calibration sequence finally uses the lowest value of the calibration
methods which match with the highest compute capacity
Reported-by: Koan-Sin Tan <freedom.tan(a)linaro.org>
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
src/rt-app.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 75 insertions(+), 5 deletions(-)
diff --git a/src/rt-app.c b/src/rt-app.c
index 97aee0f..37e9892 100644
--- a/src/rt-app.c
+++ b/src/rt-app.c
@@ -62,12 +62,14 @@ void waste_cpu_cycles(int load_loops)
}
/*
-* calibrate_cpu_cycles()
-* collects the time that waste_cycles runs.
+* calibrate_cpu_cycles_1()
+* 1st method to calibrate the ns per loop value
+* We alternate idle period and run period in order to not trig some hw
+* protection mechanism like thermal mitgation
*/
-int calibrate_cpu_cycles(int clock)
+int calibrate_cpu_cycles_1(int clock)
{
- struct timespec start, stop;
+ struct timespec start, stop, sleep;
int max_load_loop = 10000;
unsigned int diff;
int nsec_per_loop, avg_per_loop = 0;
@@ -75,6 +77,10 @@ int calibrate_cpu_cycles(int clock)
while (cal_trial) {
cal_trial--;
+ sleep.tv_sec = 1;
+ sleep.tv_nsec = 0;
+
+ clock_nanosleep(CLOCK_MONOTONIC, 0, &sleep, NULL);
clock_gettime(clock, &start);
waste_cpu_cycles(max_load_loop);
@@ -100,6 +106,69 @@ int calibrate_cpu_cycles(int clock)
return 0;
}
+/*
+* calibrate_cpu_cycles_2()
+* 2nd method to calibrate the ns per loop value
+* We continously runs something to ensure that CPU is set to max freq by the
+* governor
+*/
+int calibrate_cpu_cycles_2(int clock)
+{
+ struct timespec start, stop, sleep;
+ int max_load_loop = 10000;
+ unsigned int diff;
+ int nsec_per_loop, avg_per_loop = 0;
+ int ret, cal_trial = 1000;
+
+ while (cal_trial) {
+ cal_trial--;
+
+ clock_gettime(clock, &start);
+ waste_cpu_cycles(max_load_loop);
+ clock_gettime(clock, &stop);
+
+ diff = (int)timespec_sub_to_ns(&stop, &start);
+ nsec_per_loop = diff / max_load_loop;
+ avg_per_loop = (avg_per_loop + nsec_per_loop) >> 1;
+
+ /* collect a critical mass of samples.*/
+ if ((abs(nsec_per_loop - avg_per_loop) * 50) < avg_per_loop)
+ return avg_per_loop;
+
+ /*
+ * use several loop duration in order to be sure to not
+ * fall into a specific platform loop duration
+ *(like the cpufreq period)
+ */
+ /*randomize the number of loops and recheck 1000 times*/
+ max_load_loop += 33333;
+ max_load_loop %= 1000000;
+ }
+ return 0;
+}
+
+/*
+* calibrate_cpu_cycles()
+* Use several methods to calibrate the ns per loop and get the min value which
+* correspond to the highest achievable compute capacity.
+*/
+int calibrate_cpu_cycles(int clock)
+{
+ int calib1, calib2;
+
+ /* Run 1st method */
+ calib1 = calibrate_cpu_cycles_1(clock);
+
+ /* Run 2nd method */
+ calib2 = calibrate_cpu_cycles_2(clock);
+
+ if (calib1 < calib2)
+ return calib1;
+ else
+ return calib2;
+
+}
+
static inline loadwait(unsigned long exec)
{
unsigned long load_count;
@@ -531,13 +600,14 @@ int main(int argc, char* argv[])
/* Needs to calibrate 'calib_cpu' core */
if (opts.calib_ns_per_loop == 0) {
+ log_notice("Calibrate ns per loop");
cpu_set_t calib_set;
CPU_ZERO(&calib_set);
CPU_SET(opts.calib_cpu, &calib_set);
sched_getaffinity(0, sizeof(cpu_set_t), &orig_set);
sched_setaffinity(0, sizeof(cpu_set_t), &calib_set);
- p_load = calibrate_cpu_cycles(CLOCK_THREAD_CPUTIME_ID);
+ p_load = calibrate_cpu_cycles(CLOCK_MONOTONIC);
sched_setaffinity(0, sizeof(cpu_set_t), &orig_set);
log_notice("pLoad = %dns : calib_cpu %d", p_load, opts.calib_cpu);
} else {
--
1.9.1