Hi all,
This is the third round profiling for EASv5 patches on Hikey board (6th, Sept); Welcome any comment and suggestion.
* Overview
- 3rd round vs 2nd round:
Add two patches based on EASv5, the first patch will select small group/cpu if the groups have same capacity, so it will place tasks into the first cluster for the LITTLE.LITTLE case; the second patch will fix the case: if the sched domain is already the highest level, need directly use its group to calculate shared capacity and energy difference.
Also have enclosed these two patches for review.
- 2nd round vs 1st round:
According to review comments from eas-dev mailing list, refined the energy model for Hikey, also developed several python/shell scripts for automatic analysis.
So profiling data will be devided into three parts: - C-state profiling data - P-state profiling data - Scheduler performance profiling data
* Hardware Environment
- Platform: 96boards Hikey - SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster - CPU clock: 2 clusters with 8 CPUs have same clock source and support 208MHz/432MHz/729MHz/960MHz/1200MHz - Support CPU and cluster level low power mode
* Software Environment
- Kernel (4.2 + EAS RFCv5) + extra two patches [1] - ARM-TF [2] - Enable CPUIdle with PSCI - Enable CPUFreq with cpufreq-dt driver
- Profiling scritps: calc_idle_diff.py [3]: calculate C-state's difference for different configurations calc_pstate_time.py [4]: calculate P-state's difference for different configurations calc_sched_preformance.py [5]: calculate scheduler performance
* Conventions
Below are some conventions which used in below tables:
CLS0: Cluster 0 CLS1: Cluster 1 WFI: CPU WFI state C2: CPU power down state M2: Cluster power down state DC: Duty cycle
Configuration | Mailine EASv5 Enable CPUFreq CPUFreq | ENERGY_AWARE ondemand sched ------------- | ------- ----- ------------ -------- ------- Mainline (ndm) | Yes No No Yes No noEAS (ndm) | Yes Yes No Yes No EAS (ndm) | Yes Yes Yes Yes No EAS (sched) | Yes Yes Yes No Yes
* Profiling: C-state & P-state
The detailed profiling result have been uploaded to git-hub [6];
- Case MP3:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 104.68ms -35.3ms +36.6ms -65.0ms clusterA: C2 2.11s -2.0s -1.8s -973.0ms clusterA: M2 26.45s -6.7s -4.5s -15.7s clusterB: WFI 3.07ms -3.1ms -3.1ms -3.1ms clusterB: C2 98.88ms -56.4ms -98.9ms -88.8ms clusterB: M2 19.52s +9.0s +10.4s +10.3s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 870.39 | 4872.34 | 93.79 | 27.43 | 5880.00 | 9436597.71 | | EAS DIS | +2.87%| -6.05%| -14.43%| -1.53%| 0.00%| -1.40%| | EAS NDM | -91.80%| -78.45%| +146.91%| +1384.29%| -0.16%| -14.46%| | EAS SCHED | +1771.55%| -79.00%| -100.00%| -100.00%| -84.15%| -47.56%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 919.34 | 4882.11 | 95.01 | 27.20 | 5888.56 | 9461934.73 | | EAS DIS | +6.26%| -3.38%| -11.19%| +0.18%| +0.94%| -0.01%| | EAS NDM | -88.31%| -78.24%| +155.38%| +1374.49%| -0.23%| -14.47%| | EAS SCHED | +1684.96%| -78.96%| -100.00%| -100.00%| -84.14%| -47.40%| -------------------------------------------------------------------------------------------------
- Case rt-app 6%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.99s -425.4ms -4.9s -462.3ms clusterA: C2 2.18s +810.8ms -2.2s -2.2s clusterA: M2 9.20ms -5.6ms -6.0ms -4.4ms clusterB: WFI 8.69s +135.2ms -8.6s -8.7s clusterB: C2 1.46s -229.9ms -1.5s -1.4s clusterB: M2 191.43ms -304us +19.8s +21.2s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.73 | 18640.00 | 42.41 | 9.79 | 176.25 | 8307775.13 | | EAS DIS | -2.99%| -1.29%| -76.22%| +1.71%| +0.16%| -1.53%| | EAS NDM | -26.18%| -99.72%| +35.65%| +161267.72%| -62.97%| +84.31%| | EAS SCHED | +2939.63%| -100.00%| +26837.28%| +150.46%| +486.62%| +16.74%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 56.28 | 42010.00 | 76.46 | 9.86 | 181.47 | 18442998.06 | | EAS DIS | -4.92%| -1.33%| -85.96%| -2.70%| +0.50%| -1.57%| | EAS NDM | -59.28%| -99.66%| +14.01%| +160382.48%| -61.11%| -16.44%| | EAS SCHED | +879.42%| -100.00%| +28572.37%| +355.98%| +489.88%| -5.52%| -------------------------------------------------------------------------------------------------
- Case rt-app 13%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 7.08s -25.3ms -3.3s -2.3s clusterA: C2 4.11s -15.6ms -4.1s -4.1s clusterA: M2 6.80ms -3.4ms -695us +11.0ms clusterB: WFI 6.47s -329.8ms -6.5s -6.5s clusterB: C2 4.13s +497.4ms -4.1s -4.1s clusterB: M2 71.32ms +121.8ms +20.0s +21.3s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.74 | 9730.00 | 8510.00 | 0.00 | 58.70 | 10481071.92 | | EAS DIS | -2.03%| -10.89%| +10.69%| +1864.00%| +190.19%| +3.41%| | EAS NDM | +0.78%| -99.77%| -98.47%| +1602036.00%| +222.13%| +49.94%| | EAS SCHED | +3304.36%| -81.57%| -79.87%| +1004144.00%| +4555.06%| +43.69%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 52.98 | 32160.00 | 24740.00 | 0.00 | 64.87 | 32017443.84 | | EAS DIS | +3.59%| -6.69%| +10.02%| +3186.00%| +164.31%| +3.24%| | EAS NDM | +3.68%| -99.78%| -98.58%| +3109608.80%| +275.04%| -4.92%| | EAS SCHED | +1057.87%| -78.78%| -75.17%| +1954311.10%| +5900.74%| -3.22%| -------------------------------------------------------------------------------------------------
- Case rt-app 19%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.59s +730.0ms -5.2s -4.6s clusterA: C2 795.08ms -479.8ms -792.0ms -791.9ms clusterA: M2 6.67ms +1.4ms -5.0ms -510us clusterB: WFI 6.78s +1.7s -6.7s -6.8s clusterB: C2 2.77s -2.1s -2.8s -2.8s clusterB: M2 73.12ms +2.9ms +20.0s +21.3s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.91 | 6330.00 | 14440.00 | 321.43 | 58.54 | 13643658.08 | | EAS DIS | -1.54%| -42.34%| +21.88%| -100.00%| -0.07%| +6.14%| | EAS NDM | -3.08%| -100.00%| -99.19%| +5041.93%| +206.22%| +18.52%| | EAS SCHED | +3458.49%| -98.23%| -80.94%| +3764.90%| +2791.53%| +18.27%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.47 | 23900.00 | 45480.00 | 944.74 | 64.38 | 44475464.16 | | EAS DIS | +0.25%| -40.50%| +9.78%| -100.00%| +2.95%| -4.14%| | EAS NDM | -3.62%| -100.00%| -99.21%| +4529.65%| +185.78%| -4.48%| | EAS SCHED | +1466.81%| -98.46%| -78.07%| +3391.27%| +4333.30%| -4.00%| -------------------------------------------------------------------------------------------------
- Case rt-app 25%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 9.41s +80.7ms -5.2s -3.1s clusterA: C2 6.62ms +2.5ms -2.6ms -386us clusterA: M2 9.02ms -5.5ms -1.8ms +12.0ms clusterB: WFI 9.35s +72.1ms -9.3s -9.3s clusterB: C2 11.49ms +208us +2.0ms +40.2ms clusterB: M2 121.57ms +67.8ms +19.8s +20.4s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.86 | 0.00 | 21240.00 | 9.90 | 109.76 | 15628682.88 | | EAS DIS | -5.69%| +1592.00%| -0.85%| +95.05%| +64.24%| -0.20%| | EAS NDM | +3.80%| 0.00%| -62.71%| +27272.12%| +4797.52%| -5.11%| | EAS SCHED | +3999.88%| +62.70%| -97.79%| +132936.06%| -57.80%| -15.63%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.41 | 0.00 | 76010.00 | 9.64 | 122.04 | 55578519.60 | | EAS DIS | -3.66%| +6163.00%| -2.84%| +318.59%| +66.44%| -2.56%| | EAS NDM | +0.43%| 0.00%| -59.18%| +91362.04%| +16748.26%| +0.34%| | EAS SCHED | +1258.89%| +159.70%| -99.03%| +136755.93%| -34.05%| -75.79%| -------------------------------------------------------------------------------------------------
- Case rt-app 31%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.64s -193.4ms -9.0ms -6.2s clusterA: C2 2.23ms -1.0ms +3.8ms +2.2ms clusterA: M2 8.42ms +1.2ms -1.4ms -5.0ms clusterB: WFI 8.64s -210.4ms -11.2ms -7.7s clusterB: C2 n.a. +1.6ms +4.9ms +79.7ms clusterB: M2 190.32ms -439us -119.2ms +18.9s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.49 | 0.00 | 22710.00 | 18.56 | 181.44 | 16794565.52 | | EAS DIS | +0.12%| 0.00%| +1.81%| -46.77%| +0.23%| +1.73%| | EAS NDM | +2.55%| 0.00%| +0.18%| -100.00%| -61.71%| -0.73%| | EAS SCHED | +320.07%| 0.00%| -95.70%| +4179.96%| +9229.78%| +29.82%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.15 | 0.00 | 89550.00 | 9.60 | 205.88 | 65549695.12 | | EAS DIS | -3.25%| 0.00%| +1.82%| +0.33%| +0.66%| +1.81%| | EAS NDM | -0.25%| 0.00%| +0.45%| -100.00%| -51.82%| +0.24%| | EAS SCHED | +102.30%| 0.00%| -96.24%| +30805.83%| +23883.23%| -1.48%| -------------------------------------------------------------------------------------------------
- Case rt-app 38%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 7.38s -68.4ms -3.2s -4.7s clusterA: C2 3.81ms +3.0ms +7.7ms +4.4ms clusterA: M2 8.30ms -6.5ms -5.4ms -2.6ms clusterB: WFI 7.38s -6.7ms -2.6s -3.6s clusterB: C2 655us -655us +191.1ms +447.0ms clusterB: M2 71.73ms +52.2ms +8.7s +10.5s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.43 | 0.00 | 13740.00 | 11510.00 | 70.25 | 21153777.44 | | EAS DIS | +1.77%| 0.00%| +0.44%| +0.09%| +70.23%| +0.53%| | EAS NDM | -0.79%| 0.00%| -90.94%| -72.37%| +25593.95%| +21.13%| | EAS SCHED | +2751.25%| 0.00%| -74.16%| -99.45%| +29707.83%| +31.77%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.33 | 0.00 | 53790.00 | 44650.00 | 103.91 | 82213110.64 | | EAS DIS | +0.56%| 0.00%| 0.00%| -1.81%| +50.12%| -0.87%| | EAS NDM | -1.66%| 0.00%| -91.35%| -75.88%| +53561.82%| -1.89%| | EAS SCHED | +821.76%| 0.00%| -77.45%| -99.53%| +55525.06%| -4.51%| -------------------------------------------------------------------------------------------------
- Case rt-app 44%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 5.62s +123.1ms -4.0s -1.7s clusterA: C2 4.22ms +933us -606us -1.6ms clusterA: M2 3.77ms +86us -697us +2.6ms clusterB: WFI 5.61s +87.6ms -46.5ms -859.0ms clusterB: C2 1.36ms -1.4ms +48.7ms +80.8ms clusterB: M2 71.19ms +10.7ms +4.1s +2.4s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 15.51 | 19.62 | 15660.00 | 12720.00 | 451.61 | 24180973.92 | | EAS DIS | +5.74%| -100.00%| +0.19%| -5.82%| +115.02%| -0.30%| | EAS NDM | +13.73%| -100.00%| -74.27%| +44.65%| +1343.72%| +17.57%| | EAS SCHED | +210.57%| -100.00%| +30.46%| -90.64%| +1664.80%| +5.91%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 51.45 | 76.43 | 61390.00 | 50060.00 | 1482.15 | 94633209.36 | | EAS DIS | +6.03%| -100.00%| -1.24%| -6.07%| +129.76%| -1.26%| | EAS NDM | +12.05%| -100.00%| -74.51%| +15.60%| +1314.79%| -2.64%| | EAS SCHED | +72.01%| -100.00%| +27.51%| -91.65%| +1547.61%| -4.47%| -------------------------------------------------------------------------------------------------
- Case rt-app 50%:
C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 6.34s +227.7ms +67.6ms +304.1ms clusterA: C2 3.35ms -1.5ms -2.0ms -1.7ms clusterA: M2 7.73ms -1.6ms -1.9ms -3.1ms clusterB: WFI 6.34s +231.2ms +69.2ms +308.0ms clusterB: C2 n.a. n.a. +2.0ms +2.2ms clusterB: M2 188.71ms -64.3ms -120.2ms +1.3s
P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.81 | 0.00 | 9.95 | 27340.00 | 169.39 | 26460416.57 | | EAS DIS | +0.54%| 0.00%| +208.91%| -1.79%| -26.08%| -1.92%| | EAS NDM | -3.51%| 0.00%| +98.23%| -0.55%| -63.03%| -1.00%| | EAS SCHED | +2926.71%| 0.00%| -100.00%| -2.93%| +587.69%| +1.97%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.44 | 0.00 | 10.62 | 108060.00 | 170.45 | 103961414.23 | | EAS DIS | +0.41%| 0.00%| +743.42%| -1.43%| +2.25%| -1.37%| | EAS NDM | -1.75%| 0.00%| +90.32%| -0.92%| -50.47%| -1.01%| | EAS SCHED | +894.37%| 0.00%| -100.00%| -3.08%| +865.46%| -1.28%| -------------------------------------------------------------------------------------------------
* Profiling: performance
sysbench --test=cpu --num-threads=1 --max-time=10 run
rt-app performance is calculate with below formula: task performance = slack/(c_period - c_run) * 1024
energy mainline (ndm) noeas (ndm) eas (ndm) eas (sched) prf prf prf prf sysbench 100 100 100 92
rt-app 6% 662 665 393 615 rt-app 13% 648 645 465 394 rt-app 19% 610 648 479 57 rt-app 25% 649 664 306 518 rt-app 31% 600 585 596 366 rt-app 38% 576 584 259 -166 rt-app 44% 466 487 30 -349 rt-app 50% 583 602 598 612
* Summary
- After applied the two extra patches, the profiling result is consistent and stable for EAS (ndm) and EAS (sched). The tasks will be placed into first cluster for LITTLE.LITTLE; so EAS (ndm) and EAS (sched) are much better for cluster level's idle duty cycle compared with noEAS (ndm).
- If the tasks are placed only on one cluster, the CPU's cycles will not change too much, but cluster level's cycles will increase much higher with EAS (ndm) and EAS (sched); So after packed tasks into one cluster, the cluster level will run longer time.
- With "sched" governor, it is more aggressive than "ondemand" governor, the CPUs will easily run at high OPPs (1.2GHz) and low OPP (208MHz); With "ondemand" governor, CPUs have many chances run at middle OPPs (729MHz or 960MHz).
- Need investigate rt-app 31% case, it is abnormal for cpu/cluster's idle duty cycle.
- Need investigate rt-app 25% case, it is abnormal for "ondemand" governor, which stay at OPP 1.2GHz much longer than other configurations.
[1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey_round3 [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey [3] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_idle_diff.py [4] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_pstate_time.... [5] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_sched_prefor... [6] https://github.com/Leo-Yan/utility/tree/master/profile_eas/hikey_easv5_round...
Thanks, Leo Yan