Hi all,
This is second round profiling for EASv5 on Hikey board. I refined the energy model data according to Dietmar's suggestion (also appreciate for Dietmar's detailed review and guidance). Please help review and looking forward any comment and suggestion.
* Overview
This is the second round profiling for EASv5 patches on Hikey board. According to review comments from eas-dev mailing list, refined the energy model for Hikey, also developed several python/shell scripts for automatic analysis.
So profiling data will be devided into three parts: - C-state profiling data - P-state profiling data - Scheduler performance profiling data
* Hardware Environment
- Platform: 96boards Hikey - SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster - CPU clock: 2 clusters with 8 CPUs have same clock source and support 208mHz/432mHz/729mHz/960mHz/1200mHz - Support CPU and cluster level low power mode
* Software Environment
- Kernel (4.2 + EAS RFCv5) [1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey - ARM-TF [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey - Enable CPUIdle with PSCI - Enable CPUFreq with cpufreq-dt driver
- Profiling scritps: calc_idle_diff.py [3]: calculate C-state's difference for different configurations calc_pstate_time.py [4]: calculate P-state's difference for different configurations calc_sched_preformance.py [5]: calculate scheduler performance
* Conventions
Below are some conventions which used in below tables:
CLS0: Cluster 0 CLS1: Cluster 1 WFI: CPU WFI state C2: CPU power down state M2: Cluster power down state DC: Duty cycle
Configuration | Mailine EASv5 Enable CPUFreq CPUFreq | ENERGY_AWARE ondemand sched ------------- | ------- ----- ------------ -------- ------- Mainline (ndm) | Yes No No Yes No noEAS (ndm) | Yes Yes No Yes No EAS (ndm) | Yes Yes Yes Yes No EAS (sched) | Yes Yes Yes No Yes
* Profiling: C-state & P-state
The detailed profiling result have been uploaded to git-hub [6]; i manually adjust with more readable format for below statistics.
- Case MP3:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 1.17s -1.0s -1.1s -1.1s CLS0: C2 2.10s -1.3s -1.7s -1.9s CLS0: M2 25.01s +2.5s +3.3s -4.9s CLS1: WFI 445.06ms -179.3ms -384.1ms -445.1ms CLS1: C2 182.95ms +552.2ms -183.6ms -185.9ms CLS1: M2 19.24s -1.5s +3.7s +10.7s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 1075.99 | 4803.60 | 105.06 | 37.31 | 5860.00 | 9443367.46 | | EAS DIS | -77.20%| +57.87%| +734.47%| +468.29%| -32.79%| -5.80%| | EAS NDM | -83.61%| -71.82%| +264.43%| +947.55%| +1.84%| -10.65%| | EAS SCHED | +795.42%| -100.00%| -100.00%| -100.00%| -98.56%| -77.71%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 1131.04 | 4814.53 | 116.39 | 29.15 | 5931.09 | 9545269.85 | | EAS DIS | -74.57%| +59.29%| +655.51%| +632.27%| -32.44%| -5.43%| | EAS NDM | -80.69%| -71.87%| +235.64%| +1215.77%| +0.69%| -11.48%| | EAS SCHED | +755.65%| -100.00%| -100.00%| -100.00%| -98.57%| -77.84%| -------------------------------------------------------------------------------------------------
- Case rt-app 6%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.87s +204.7ms +1.1s +286.0ms CLS0: C2 3.96s -272.9ms -4.0s -675.7ms CLS0: M2 9.29ms -83us -6.8ms -4.5ms CLS1: WFI 8.80s +50.1ms +260.2ms -1.6s CLS1: C2 1.16s +38.5ms -1.1s -1.1s CLS1: M2 73.76ms +51.4ms +49.7ms +127.9ms
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.58 | 18030.00 | 220.34 | 0.00 | 57.73 | 8022312.50 | | EAS DIS | -1.09%| +0.89%| -96.88%| +1972.00%| +88.79%| -0.08%| | EAS NDM | -0.42%| -99.69%| +9857.34%| +2738.00%| +87.08%| +101.66%| | EAS SCHED | +118.58%| -97.74%| +9394.42%| 0.00%| +310.84%| +95.94%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.72 | 40950.00 | 402.41 | 0.00 | 61.01 | 18068347.05 | | EAS DIS | +4.08%| +2.27%| -94.49%| +988.60%| +85.60%| +1.09%| | EAS NDM | -3.58%| -99.72%| +5357.03%| +1942.60%| +84.48%| -10.21%| | EAS SCHED | +38.23%| -98.75%| +5370.45%| 0.00%| +340.36%| -8.09%| -------------------------------------------------------------------------------------------------
- Case rt-app 13%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.79s -100.1ms -3.7s -4.3s CLS0: C2 2.76s -1.3s -2.8s -2.7s CLS0: M2 9.71ms -4.8ms -3.0ms -1.6ms CLS1: WFI 8.87s +19.0ms -3.9s -8.5s CLS1: C2 1.57s -918.8ms -1.5s -1.5s CLS1: M2 72.32ms +7.0ms +120.8ms +20.2s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 17.03 | 2730.00 | 15310.00 | 0.00 | 59.01 | 12414704.24 | | EAS DIS | -2.52%| +382.78%| -52.91%| 0.00%| +2.14%| -11.19%| | EAS NDM | -0.29%| -99.27%| -98.28%| +2703000.00%| +4628.01%| +137.63%| | EAS SCHED | +3068.94%| -91.27%| -84.86%| +1078619.00%| +6612.15%| +37.03%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.77 | 8880.00 | 37850.00 | 0.00 | 66.17 | 31519814.16 | | EAS DIS | +0.22%| +378.04%| -52.76%| 0.00%| +5.32%| -0.16%| | EAS NDM | -0.02%| -99.13%| -99.02%| +2705652.00%| +4169.43%| -5.84%| | EAS SCHED | +937.83%| -91.40%| -85.80%| +1954704.00%| +9605.80%| -2.15%| -------------------------------------------------------------------------------------------------
- Case rt-app 19%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.33s -422.5ms -5.1s -5.2s CLS0: C2 312.17ms +29.7ms -301.8ms -303.5ms CLS0: M2 8.01ms +5us -3.7ms -5.5ms CLS1: WFI 8.50s -200.8ms -4.1s -7.8s CLS1: C2 675.16ms -194.5ms -659.0ms -563.9ms CLS1: M2 120.55ms +2.9ms -41.3ms +18.7s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.57 | 3170.00 | 18020.00 | 19.87 | 110.11 | 14660673.76 | | EAS DIS | +1.51%| +57.41%| -6.66%| +802.47%| +4.98%| +0.49%| | EAS NDM | +4.59%| -100.00%| -85.46%| +143432.96%| +218.05%| +102.67%| | EAS SCHED | +517.32%| -95.68%| -80.73%| +57897.68%| +1565.83%| +8.29%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.95 | 12210.00 | 51480.00 | 9.84 | 120.12 | 42968660.00 | | EAS DIS | +4.55%| +55.61%| -5.48%| +5073.68%| +13.30%| +3.20%| | EAS NDM | -1.69%| -100.00%| -90.22%| +403660.16%| +310.75%| -1.29%| | EAS SCHED | +437.62%| -96.49%| -79.72%| +307381.50%| +3399.13%| -2.38%| -------------------------------------------------------------------------------------------------
- Case rt-app 25%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.37s +55.2ms -3.4s -4.6s CLS0: C2 13.79ms -3.4ms +23.4ms -2.4ms CLS0: M2 10.33ms -2.8ms -6.4ms -8.5ms CLS1: WFI 9.28s +109.2ms -3.3s -8.3s CLS1: C2 15.87ms -12.4ms +49.4ms +77.8ms CLS1: M2 73.39ms +51.7ms +99.5ms +17.2s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.56 | 0.00 | 21330.00 | 10.37 | 57.41 | 15631861.68 | | EAS DIS | -1.75%| +1926.00%| -0.80%| +1.54%| +110.80%| -0.25%| | EAS NDM | +10.69%| 0.00%| -95.58%| +233651.21%| +5021.06%| +75.86%| | EAS SCHED | +293.18%| +10204.00%| -99.76%| +120440.02%| +8984.65%| +17.41%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.26 | 0.00 | 74930.00 | 29.43 | 58.58 | 54734011.68 | | EAS DIS | +0.11%| +6195.50%| +2.00%| -63.97%| +153.11%| +2.21%| | EAS NDM | -11.35%| 0.00%| -97.28%| +162964.90%| +9599.76%| -0.64%| | EAS SCHED | +90.25%| +40570.00%| -99.73%| +125517.36%| +24717.17%| -2.66%| -------------------------------------------------------------------------------------------------
- Case rt-app 31%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.39s +238.5ms +192.0ms -4.3s CLS0: C2 3.20ms -762us +1.6ms +82.3ms CLS0: M2 7.33ms -3.7ms -965us -2.8ms CLS1: WFI 8.39s +246.4ms +207.0ms -4.7s CLS1: C2 2.48ms -2.1ms -491us +458.0ms CLS1: M2 122.45ms +150.1ms -2.9ms +9.9s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.80 | 0.00 | 23140.00 | 86.87 | 109.07 | 17086833.60 | | EAS DIS | -1.19%| 0.00%| -1.82%| -70.00%| +136.66%| -1.09%| | EAS NDM | -2.44%| 0.00%| -1.38%| -88.81%| +1.65%| -1.79%| | EAS SCHED | +3041.90%| 0.00%| -85.91%| +114.55%| +18860.30%| +60.83%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 56.27 | 0.00 | 90510.00 | 260.88 | 119.29 | 66387086.96 | | EAS DIS | -1.28%| 0.00%| -0.99%| -77.14%| +117.20%| -1.03%| | EAS NDM | -1.76%| 0.00%| -0.53%| -96.21%| +3.29%| -0.88%| | EAS SCHED | +909.24%| 0.00%| -89.73%| +101.70%| +41110.50%| +0.00%| -------------------------------------------------------------------------------------------------
- Case rt-app 38%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.50s -233.6ms -3.1s -4.1s CLS0: C2 2.04ms -1.0ms +20.4ms +33.3ms CLS0: M2 7.73ms -2.3ms -2.9ms -1.1ms CLS1: WFI 7.51s -232.5ms -2.9s -3.4s CLS1: C2 1.65ms +108us +26.0ms +1.5s CLS1: M2 70.35ms +48.7ms +4.5ms +7.3s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.86 | 0.00 | 13610.00 | 11410.00 | 60.01 | 20950808.88 | | EAS DIS | -4.74%| 0.00%| +1.98%| +1.67%| +85.52%| +2.10%| | EAS NDM | -1.25%| 0.00%| -26.67%| +24.80%| +11348.09%| +39.34%| | EAS SCHED | +3510.50%| 0.00%| -57.83%| -99.39%| +33561.06%| +36.59%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.54 | 0.00 | 53470.00 | 44830.00 | 71.27 | 82113506.32 | | EAS DIS | -0.32%| 0.00%| +0.97%| +1.43%| +83.58%| +1.30%| | EAS NDM | -2.57%| 0.00%| -42.40%| -9.99%| +25170.10%| +0.85%| | EAS SCHED | +1070.72%| 0.00%| -66.71%| -99.53%| +77534.35%| -2.93%| -------------------------------------------------------------------------------------------------
- Case rt-app 44%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 5.58s +148.0ms -2.1s -2.0s CLS0: C2 3.00ms +462us +3.4ms +31.7ms CLS0: M2 8.36ms +1.4ms -4.4ms +84us CLS1: WFI 5.58s +161.4ms -685.8ms -148.4ms CLS1: C2 3.66ms -3.7ms +6.2ms +896.1ms CLS1: M2 66.94ms +8.2ms +60.1ms +2.8s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.31 | 0.00 | 15990.00 | 12870.00 | 63.64 | 24091670.48 | | EAS DIS | -2.58%| 0.00%| -2.44%| +0.62%| +11.94%| -0.82%| | EAS NDM | +0.86%| 0.00%| -80.86%| +96.27%| +5258.27%| +26.91%| | EAS SCHED | +2343.72%| +5931.20%| -75.42%| -82.34%| +36213.64%| +36.51%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.46 | 0.00 | 63200.00 | 50700.00 | 84.69 | 94857755.68 | | EAS DIS | -3.43%| 0.00%| -3.34%| -0.10%| +23.86%| -1.65%| | EAS NDM | +0.28%| 0.00%| -81.79%| +51.87%| +10561.96%| -1.79%| | EAS SCHED | +707.29%| +5931.20%| -79.65%| -87.02%| +73887.48%| -4.07%| -------------------------------------------------------------------------------------------------
- Case rt-app 50%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 6.28s +113.9ms -197.9ms +301.9ms CLS0: C2 1.40ms +2.3ms +1.7ms +3.9ms CLS0: M2 8.25ms -5.1ms -7.6ms -5.2ms CLS1: WFI 6.29s +111.4ms -120.5ms +297.2ms CLS1: C2 2.47ms -2.5ms -2.2ms -2.5ms CLS1: M2 118.94ms -44.1ms -116.6ms +538.7ms
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.58 | 0.00 | 0.57 | 27440.00 | 113.26 | 26482179.82 | | EAS DIS | -4.34%| 0.00%| +6781.74%| -0.98%| -31.93%| -1.04%| | EAS NDM | +3.08%| 0.00%| +13155.65%| +0.91%| +21.72%| +1.23%| | EAS SCHED | +433.29%| 0.00%| -100.00%| -7.51%| +1694.80%| +1.29%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.02 | 0.00 | 10.64 | 108480.00 | 135.34 | 104322408.72 | | EAS DIS | -3.56%| 0.00%| +818.70%| -1.13%| -2.61%| -1.08%| | EAS NDM | -3.13%| 0.00%| +2034.87%| -1.62%| +160.34%| -1.22%| | EAS SCHED | +131.86%| 0.00%| -100.00%| -8.49%| +4018.46%| -2.21%| -------------------------------------------------------------------------------------------------
* Profiling: performance
sysbench --test=cpu --num-threads=1 --max-time=30 run:
Respone Time Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) min 1.67ms +00.00ms +00.00ms +00.00ms avg 1.68ms +00.00ms +00.00ms +00.00ms max 2.35ms +04.54ms +05.32ms +7.77ms approx 95% 1.70ms +00.00ms +00.00ms -00.02ms
rt-app performance is calculate with below formula: task performance = slack/(c_period - c_run) * 1024
energy mainline (ndm) noeas (ndm) eas (ndm) eas (sched) prf prf prf prf rt-app 6% 665 663 630 606 rt-app 13% 696 634 474 348 rt-app 19% 665 663 630 606 rt-app 25% 650 627 391 243 rt-app 31% 587 599 596 396 rt-app 38% 574 562 403 212 rt-app 44% 458 487 399 177 rt-app 50% 478 590 572 616
* Summary
- With EAS (sched), CPUIdle duty cycle is much better than other confugirations; If we review the cpu idle's duty cycle with P-state's statistics, we can easily to know if cpu frequency scaling is driven by scheduler, then it will introduce more aggressive policy to increase frequency. So CPU can finish tasks more quick and finally we can get better cpu idle's duty cycle.
- With EAS (sched), it also introduce much higher for cluster level's cycle number; that can be explained by the scheduler will place tasks on single CPU rather than spread them, so CPU still need run for almost the same time, but the cluster level will run more time due the tasks is running sequential on less CPUs.
- With EAS (sched), the CPUs are mainly running at lowest point (208MHz) or higher OPPs (960MHz or 1.2GHz); but for mainline kernel with ondemand governor, CPUs have many chances run at middle OPPs (432MHz or 729MHz).
- For EAS (ndm), it enters cluster level's idle state for much less time if we compare with EAS (sched); After reviewed the detailed cpu level's idle data, EAS (ndm) will spread tasks into two clusters, but EAS (sched) will let only one cluster to run tasks as possible. So EAS (sched) can power off cluster 1 for most time, but EAS (ndm) will spread tasks to two CPUs, but these two CPUs are placed in two clusters separtely.
- EAS (ndm) and EAS (sched) are much better for CPU level's idle duty cycle compare with noEAS (ndm). After enabled EAS_FEATURE, the CPU will run into low power mode for much more time than noEAS (ndm); though cluster level's idle duty cycle cannot demonstrate this. EAS (ndm) and EAS (sched) also will let CPU run at higher OPP than noEAS (ndm).
- rt-app 6% case is special, EAS(ndm) and EAS(sched) will spread tasks into two clusters; so there have no improvement for cluster level's idle duty cycle.
[1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey [3] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_idle_diff.py [4] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_pstate_time.... [5] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_sched_prefor... [6] https://github.com/Leo-Yan/utility/tree/master/profile_eas/hikey_easv5_round...
Thanks, Leo Yan