Hi Leo,
Interesting analysis. You didn't explain what EAS (ndm) and EAS (sched) stand for. :)
Also, can you confirm that for clock topology that each cluster of 4 cpus can be scaled (DVFS) independently?
Regards, Amit
On Thu, Aug 13, 2015 at 10:26 AM, Leo Yan leo.yan@linaro.org wrote:
Hi all,
Below are my trying for profiling EAS; please help review and welcome any suggestion or question.
Purpose
This is the first round profiling for EASv5 patches on Hikey board; With profiling EASv5 patches on Hikey board, can get below info and feedback for EAS's developement:
- Created the profiling enviornment for ARM64
- Collected manifestation after applied EASv5 patches on SoC with two CA53 clusters
- I cannot measure hardware power cosumption, so currently _ONLY_ check CPU duty cycle for comparasion scheduler behavior
Hardware Environment
- Platform: 96boards Hikey
- SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster
- CPU clock: 2 clusters with 8 CPUs have coupled clock source and support 208mHz/432mHz/729mHz/960mHz/1200mHz
- Support CPU and cluster level low power mode
Software Environment
- Kernel: 4.2rc4 + EAS RFCv5
- ARM-TF: [1]
- Enabled CPUIdle with PSCI
- Enabled CPUFreq with cpufreq-dt driver
Profiling Data
CLS0: Cluster 0 CLS1: Cluster 1 CPU_PD: CPU power down state CLS_PW: Cluster power down state
- Case Sysbench: sysbench --test=cpu --cpu-max-prime=20000 run
Respone Time Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) min 4.19ms +00.00ms +00.00ms +00.00ms avg 4.21ms +00.00ms +00.00ms -00.01ms max 6.86ms +00.09ms +00.04ms +13.61ms approx 95% 4.23ms +00.00ms +00.00ms -00.02ms
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 000.20ms +000.35ms +000.34ms +000.89ms CLS0: CPU_PD 001.58ms +000.76ms +000.88ms +001.40ms CLS0: CLS_PD 001.82ms -000.41ms -000.12ms +001.10ms CLS1: WFI n.a. n.a. +2.9s +000.07ms CLS1: CPU_PD n.a. n.a. +001.30ms +6.7s CLS1: CLS_PD 42.11s +003.8ms -2.9s -6.8s
- Case MP3: ./idlestat --trace -f mp3_trace.log -t 30 -p -c -w -o mp3_report.log -- rt-app ./doc/examples/mp3-long.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 067.31ms -022.10ms -022.20ms +019.70ms CLS0: CPU_PD 887.74ms +919.80ms -292.40ms -316.90ms CLS0: CLS_PD 17.08s -444.80ms +895.30ms +5.3s CLS1: WFI 000.59ms +002.30ms +000.28ms +196.70ms CLS1: CPU_PD n.a. +000.26ms n.a. +004.00ms CLS1: CLS_PD 28.80s +189.10ms -269.40ms -10.6s
- Case rt-app 6%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-6.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.82s +037.40ms -7.1s -7.3s CLS0: CPU_PD 4.26s -154.20ms -3.5s -4.2s CLS0: CLS_PD 005.76ms n.a. +17.6s +18.9s CLS1: WFI 6.46s +2.0s +2.4s -155.90ms CLS1: CPU_PD 6.01s -1.9s -5.3s -6.0s CLS1: CLS_PD 123.76ms -118.50ms -121.90ms +1.2s
- Case rt-app 13%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-13.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.26s -304.70ms -8.8s -4.2s CLS0: CPU_PD 1.11s -695.20ms -275.2ms -1.1s CLS0: CLS_PD 003.49ms -001.30ms +18.3s -613us CLS1: WFI 8.32s -8.3s -3.4s -8.3s CLS1: CPU_PD 2.65s -2.6s -2.6s -2.6s CLS1: CLS_PD 123.07ms +19.9s -121.6ms +20.3s
- Case rt-app 19%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-19.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.91s -256.70ms -473.4ms -2.7s CLS0: CPU_PD 428.42ms +000.86ms -425.5ms -356.20ms CLS0: CLS_PD 002.28ms +000.45ms +1.1ms n.a. CLS1: WFI 6.20s +224.80ms -6.2s -5.2s CLS1: CPU_PD 4.02s -193.20ms -4.0s -4.0s CLS1: CLS_PD 073.93ms +1.3s +20.0s +039.80ms
- Case rt-app 25%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-25.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.60s -170.90ms -3.5s -3.4s CLS0: CPU_PD 025.02ms -018.90ms -023.20ms -023.50ms CLS0: CLS_PD 004.43ms -001.80ms -003.50ms +004.40ms CLS1: WFI 9.72s -1.1s -9.6s -9.2s CLS1: CPU_PD 239.13ms +1.7s -237.90ms -152.00ms CLS1: CLS_PD 075.45ms +001.00ms +19.9s +20.1s
- Case rt-app 31%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-31.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.54s +108.10ms -5.6s -189.20ms CLS0: CPU_PD 001.97ms +001.70ms -001.40ms +005.50ms CLS0: CLS_PD 003.24ms -002.30ms +003.00ms +001.90ms CLS1: WFI 8.56s +108.60ms -8.6s -260.00ms CLS1: CPU_PD n.a. +001.70ms n.a. +199.60ms CLS1: CLS_PD 189.15ms -000.28ms +19.9s +449.60ms
- Case rt-app 38%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-38.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.77s +695.00ms -7.0s -4.9s CLS0: CPU_PD 000.96ms +001.50ms +001.70ms -000.01ms CLS0: CLS_PD 003.06ms -000.59ms -002.50ms +002.30ms CLS1: WFI 8.79s +913.70ms -8.8s -8.8s CLS1: CPU_PD 001.71ms +151.30ms -001.70ms -000.45ms CLS1: CLS_PD 123.32ms -120.60ms +19.6s +20.4s
- Case rt-app 44%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-44.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 5.76s +101.20ms -4.6s -4.5s CLS0: CPU_PD 003.07ms -000.18ms -001.70ms -001.80ms CLS0: CLS_PD 002.02ms -001.20ms +000.23ms +001.30ms CLS1: WFI 6.01s +108.60ms -5.9s -5.9s CLS1: CPU_PD 001.14ms +001.20ms +000.07ms +000.65ms CLS1: CLS_PD 190.75ms -115.50ms +19.6s +19.6s
- Case rt-app 50%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-50.json
Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 6.62s -267.20ms -6.6s -6.6s CLS0: CPU_PD n.a. +001.80ms +001.50ms +000.72ms CLS0: CLS_PD 001.47ms +001.30ms -000.33ms +005.10ms CLS1: WFI 6.89s -348.20ms -6.9s -6.9s CLS1: CPU_PD 001.63ms -001.50ms -001.60ms -001.60ms CLS1: CLS_PD 001.87ms +123.50ms +19.9s +20.2s
Summary
For two same clusters case, EAS (ndm) has the best performance than other configurations. We can get most benefit for CPU/cluster's duty cycle with EAS (ndm); The reason is for almost all cases, EAS (ndm) increases much time for cluster's power down, than means the scheduler have optimized load balance within one cluster rather than spread tasks to two clusters.
EAS (sched) is not consistent for all cases, for case MP3 even worse than mainline; for cases rt-app 6%/13%/25%/38%/44%/50% it has almost same behavior with EAS (ndm), but it's not stable for cases rt-app 19%/31%.
EAS (sched) introduces very high latency for sysbench, so the max response time it will take 20.47ms, which is much higher than other three configs.
EAS (sched) and EAS (ndm) will impact idle states selection in CPUIdle, and it increases the possibility for CPU level's state rather than cluster level's state; this result is coming from sysbench case.
[1] https://github.com/Leo-Yan/arm-trusted-firmware/tree/hikey_enable_low_power_...
Thanks, Leo Yan _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev