Hi Dietmar,
Thanks a lot for reviewing, please see below comments.
On Thu, Sep 17, 2015 at 06:09:43PM +0100, Dietmar Eggemann wrote:
On 07/09/15 06:50, Leo Yan wrote:
Hi all,
[...]
Also have enclosed these two patches for review.
Let's discuss these patches on LKML since you have sent out emails to LKML discussing these changes.
Yeah, will look into Morten's comments for related patches.
[...]
Software Environment
Kernel (4.2 + EAS RFCv5) + extra two patches [1]
ARM-TF [2]
Enable CPUIdle with PSCI
Enable CPUFreq with cpufreq-dt driver
Profiling scritps: calc_idle_diff.py [3]: calculate C-state's difference for different configurations calc_pstate_time.py [4]: calculate P-state's difference for different configurations calc_sched_preformance.py [5]: calculate scheduler performance
I saw that you saved an x86_64 idlestat binary on your github utility/profile_eas project.
I use x86_64 idlestat to compare the trace logs to get difference of idle's duty cycle. For example, i get the rt-app 6%'s trace log file for mainline and eas (ndm), then i will use below command on host PC to get their difference for CPU's idle duty cycle:
./idlestat --import -f eas_ndm_trace.log -b mainline_trace.log -r comparison >> idlestat_compare.txt
So finally can summary idle duty cycle's for different configuration.
I thought so far we have to run idlestat on the target so it can retrieve the target idle state names like WFI, C2 or M2?
Yes, i run idlestat on host with below commands:
./idlestat --trace -f ./result/mp3/trace.log -t 30 -p -c -w -o ./result/mp3/report.log -- ./rt-app ./doc/examples/mp3-long.json
./idlestat --trace -f ./result/rt-app-6/trace.log -t 30 -p -c -w -o ./result/rt-app-6/report.log -- ./rt-app ./doc/examples/rt-app-6.json
There is this energy model (EM) feature in idlestat (-e energy_model_file) which calculates energy consumption per trace file.
example on TC2:
# idlestat --trace -f trace.dat -t T -e energy_model_arm_tc2 Parsed energy model file successfully ... ClusterA Energy Caps 22027 (2.202654e+04) ClusterA Energy Idle 57 (5.740462e+01) ClusterA Energy Index 22084 (2.208395e+04) ClusterB Energy Caps 3236 (3.235515e+03) ClusterB Energy Idle 40 (4.041970e+01) ClusterB Energy Index 3276 (3.275935e+03)
Total Energy Index 25360 (2.535988e+04)
The current idlestat code has only this ARM TC2 specific EM file put it should be easy for you to create one for your Hikey board.
Thanks for pointing out this, before don't know about this. Just now did a quick try, it can works at my side. But i found if add "WFI" state, it will report below error; if remove "WFI" state, then the error will dismiss. "WFI" state should not be ignored, will check idlestat's source code.
Error: parse_energy_model: too many C states specified for cluster in energy_model_hikey can't parse energy model file
Profiling: performance
sysbench --test=cpu --num-threads=1 --max-time=10 run
rt-app performance is calculate with below formula: task performance = slack/(c_period - c_run) * 1024
energy mainline (ndm) noeas (ndm) eas (ndm) eas (sched) prf prf prf prf sysbench 100 100 100 92
rt-app 6% 662 665 393 615 rt-app 13% 648 645 465 394 rt-app 19% 610 648 479 57 rt-app 25% 649 664 306 518 rt-app 31% 600 585 596 366 rt-app 38% 576 584 259 -166 rt-app 44% 466 487 30 -349 rt-app 50% 583 602 598 612
Seeing these performance numbers, have you calibrated your json files for your hikey board?
ARM TC2 example, calibrated against A15:
# cat wl_test.json | grep calibration "calibration": 141,
No, still use "CPU0" for calibration. So below are my rt-app-6.json file, could you check if there still have other things i missed?
{ "tasks": { "thread0": { "instance": 5, "loop": -1, "run": 120, "sleep": 0, "timer": { "ref": "unique", "period": 2000 } } }, "global": { "duration": 20, "calibration": "CPU0", "default_policy": "SCHED_OTHER", "pi_enabled": false, "lock_pages": false, "logdir": "./", "log_basename": "rt-app-6", "gnuplot": true } }
Summary
- After applied the two extra patches, the profiling result is consistent and stable for EAS (ndm) and EAS (sched). The tasks will be placed into first cluster for LITTLE.LITTLE; so EAS (ndm) and EAS (sched) are much
Shouldn’t we call it an SMP system instead LITTLE.LITTLE?
From CPU's topology, LITTLE.LITTLE is somehow different with SMP
system with only one cluster. Later will directly use "SMP".
Thanks, Leo Yan