eas-dev

eas-dev@lists.linaro.org

423 discussions

Re: [Eas-dev] Per CPU deadline scheduler bandwidth

by Steve Muckle

On 10/12/2015 06:55 AM, Vincent Guittot wrote: >> For the RT story, are you thinking to use rq->rt_avg in some way? > > Yes that was my goal but the deadline is also accounted in rq->rt_avg > which is then used in scale_rt_capacity. I was planning to remove the > deadline class from rq->rt_avg and to use the per cpu deadline > bandwidth in scale_rt_capacity instead If the deadline contribution to rq->rt_avg is left in place, can RT and deadline both be accounted for via that one mechanism for the purposes of sched-dvfs? I'm not sure how the accuracy/behavior of the deadline accounting to rq->rt_avg differs from the forthcoming patchset Juri mentioned, but if it's not as good, perhaps the mechanisms could be combined so that rt_avg (and by extension the per-CPU capacity adjustments in CFS) benefit as well.

9 years, 8 months

Thoughts and Questions For EAS Energy Model

by Leo Yan

Hi all, Below are some thoughts and questions after reviewed EAS's energy model; my purpose is want to get clear the energy model from user's perspective, so below question will _ONLY_ focus on the model and not dig into the implementation. This email is related long, but i think if use formulas, we can easily get the same page; So i lists the energy model's formulas, then based on them i try to match with TC2's power data and bring up some questions. Look forward to your suggestions and comments. * Basic Energy and Power Calculation Formulas From the doc Documentation/scheduler/sched-energy.txt, we can get to know the energy can be calculated with: Energy [j] = Power [w] * Time [s] (F.1) So let's assume there have one piece of code, which has fixed instruction numbers will be executed on CPU, the execution duration is depend on CPU's pipeline and CPU's frequency. So can convert F.1 to F.2: Code [instructions] Energy [j] = Power [w] * ------------------------------ (Inst Per Cycle) * Frequency Code [instructions] = Power [w] * ------------------------------ (F.2) MIPS(f) `-> 'f' is factor of frequency Because MIPS(f) can be normalize as the CPU's capacity corresponding to OPP, so we can simply convert from F.2 to F.3: Code [instructions] Energy [j] = Power [w] * ------------------------------ (F.3) CPU_Capacity(f) If breakdown Power[w], we can split it into two parts: static leakage, and dynamic leakage: Power [w] = Ps [w] + Pd [w] (F.4) Static power leakage can be calculated with below formula: Ps [w] = i * V [v] (F.5) `-> 'i' is coefficient for according to silicon's process V [v] is voltage according to OPP Dynamic power leakage can be calculated with below formula: Pd [w] = b * V [v] * V [v] * frequency (F.6) `-> 'b' is coefficient for according to silicon's process V [v] is voltage according to OPP Here have two special cases, if the island's clock is gated, then Pd [w] = 0, So: Power [w] = Ps [w] (F.7) If the island is powered off, then Ps [w] = 0, Pd [w] = 0; So: Power [w] = 0 (F.8) So energy can be calculated as (come from F.3 and F.4): Code [instructions] Energy [j] = (Ps [w] + Pd [w]) * ---------------------- (F.9) CPU_Capacity(f) * Formulas for duty cycle We separate the logic (cluster or CPU) into two states: P-state and C-state, for P-state and C-state they have different power data, this is because after the logic enter C-state, it will be clock gating or powered off. So if we expand the time axis for relative long time, we need calculate CPU's utilization percentage (for CPU is full running, util = 100%). Let's simplize the ratio between "Code [instructions]" and "CPU_Capactity(f)" as the utilization, So the energy calculation can be depicted as: Code [instructions] Util(f) = -------------------------- (F.10) CPU_Capacity(f) Energy [j] = Power_Pstate [w] * Util(f) + Power_Cstate [w] * (1 - Util(f)) (F.11) (F.12) Energy [j] = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i)) + Sum(i=0..MAX_IDL)(Power_Cstate [w](i) * Util_IDL(i)) Sum(i=0..MAX_OPP)Util_OPP(i) + Sum(i..MAX_IDLE)Util_IDL(i) = 1 * Formulas for clusters (F.13) Energy [j] = Energy_cluster [j] + Sum(i=0..MAX_CPU_PER_CLUSTER)Energy_cpu(i) [j] (F.14) Energy_cluster [j] = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i)) + Sum(i=WFI, ClusterOff)(Power_Cstate [w](i) * Util_IDL(i)) (F.15) Energy_cpu [j] = Sum(i=0..MAX_OPP)(Power_Pstate [w](i) * Util_OPP(i)) + Sum(i=WFI, CPUOff)(Power_Cstate [w](i) * Util_IDL(i)) * Thoughts and Questions - Let's summary EAS's energy model as below: CPU::capacity_state::power : CPU's power [w] for specific OPP Power(OPP) = Ps [w] + Pd [w] CPU::idle_state::power : CPU's power [w] for specific idle state Power(IDLE_WFI) = Ps [w] Power(IDLE_CPUOff) = 0 CPU's IDLE_WFI means: CPU is clock gating, so has static leakage but don't include dynamic leakage. CLUSTER::capacity_state::power : Cluster's power [w] for specific OPP Power(OPP) = Ps [w] + Pd [w] CLUSTER::idle_state::power : CPU's power [w] for specific idle state Power(IDLE_WFI) = Ps [w] + Pd [w] Power(IDLE_CLSOff) = 0 Cluster's IDLE_WFI is quite special, means all CPUs in cluster have been powered off, but cluster's logic (L2$ and SCU, etc) is powered on and clock is enabled, so it includes cluster level's static power and dynamic power. Are these formulas matching the original design? - TC2's data for cluster's sleep: static struct idle_state idle_states_cluster_a7[] = { { .power = 25 }, /* WFI */ { .power = 10 }, /* cluster-sleep-l */ }; static struct idle_state idle_states_cluster_a15[] = { { .power = 70 }, /* WFI */ { .power = 25 }, /* cluster-sleep-b */ }; For cluster level's sleep, the clock is gating and domain is powered off, so the dynamic leakage and static leakge should be zero, right? - TC2's data for CPU's idle state: static struct idle_state idle_states_core_a7[] = { { .power = 0 }, /* WFI */ }; static struct idle_state idle_states_core_a15[] = { { .power = 0 }, /* WFI */ }; CPU has two idle state, one is 'WFI' and another is 'C2'; For 'WFI' state, the power will not be zero, this is because 'WFI' state means internal clock gating, so according to F.7, there should have static leakage. BTW, for TC2, there have no corresponding idle state for 'C2', this is weird. Could you confirm it has been delibrately removed? - TC2's data for P-state: static struct capacity_state cap_states_cluster_a7[] = { /* Cluster only power */ { .cap = 150, .power = 2967, }, /* 350 MHz */ [...] }; static struct capacity_state cap_states_core_a7[] = { /* Power per cpu */ { .cap = 150, .power = 187, }, /* 350 MHz */ [...] }; From previous experience, the CPU level's power leakage is very higher than cluster level's leakage. For example, for CA7, if only power on cluster (all CPUs in cluster are powered off), the power delta is ~10mA@156MHz; if power on one CPUs, the power delta is about 30mA@156MHz. I also checked the data for CA53, it has similar result. So this is confilict with TC2's power data, you can see the cluster level's power leakage is quite high (almost 15 times than CPU level). This means almostly we cannot get much benefit from CPU level's low power state, due cluster level will contribute most of power consumption. This is not make sense. - From formula F.4, we can combine power with static leakage and dynamic leakage; IPA also used static/dynamic leakage to depict energy model. But EAS uses another way, which provide the power data according to every OPP and idle state. So that means on one platform, we need provide two kinds of power data. IMHO, i think the static and dynamic leakage is more simple; because usually we will use (mW/MHz) to describe the power efficiency for specific CPU, though (mW/MHz) cannot very accurately for power consumption if the voltage has been changed (See formula F.6, usually the voltage will be increased at higher frequency). But if we use mW/MHz, maybe we can calculate with very simple way for we can just only use it to mulitplate with frequency to get dynamic power. So we only need provide below parameters: P-state: static leakage, power efficiency (mW/MHz), capacity (DMIPS/MHz); C-state: static leakage, power efficiency (mW/MHz); What's the thoughts for unify the energy model? Thanks, Leo Yan

9 years, 9 months

EASv5 Profiling On Hikey (6th Sept)

by Leo Yan

Hi all, This is the third round profiling for EASv5 patches on Hikey board (6th, Sept); Welcome any comment and suggestion. * Overview - 3rd round vs 2nd round: Add two patches based on EASv5, the first patch will select small group/cpu if the groups have same capacity, so it will place tasks into the first cluster for the LITTLE.LITTLE case; the second patch will fix the case: if the sched domain is already the highest level, need directly use its group to calculate shared capacity and energy difference. Also have enclosed these two patches for review. - 2nd round vs 1st round: According to review comments from eas-dev mailing list, refined the energy model for Hikey, also developed several python/shell scripts for automatic analysis. So profiling data will be devided into three parts: - C-state profiling data - P-state profiling data - Scheduler performance profiling data * Hardware Environment - Platform: 96boards Hikey - SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster - CPU clock: 2 clusters with 8 CPUs have same clock source and support 208MHz/432MHz/729MHz/960MHz/1200MHz - Support CPU and cluster level low power mode * Software Environment - Kernel (4.2 + EAS RFCv5) + extra two patches [1] - ARM-TF [2] - Enable CPUIdle with PSCI - Enable CPUFreq with cpufreq-dt driver - Profiling scritps: calc_idle_diff.py [3]: calculate C-state's difference for different configurations calc_pstate_time.py [4]: calculate P-state's difference for different configurations calc_sched_preformance.py [5]: calculate scheduler performance * Conventions Below are some conventions which used in below tables: CLS0: Cluster 0 CLS1: Cluster 1 WFI: CPU WFI state C2: CPU power down state M2: Cluster power down state DC: Duty cycle Configuration | Mailine EASv5 Enable CPUFreq CPUFreq | ENERGY_AWARE ondemand sched ------------- | ------- ----- ------------ -------- ------- Mainline (ndm) | Yes No No Yes No noEAS (ndm) | Yes Yes No Yes No EAS (ndm) | Yes Yes Yes Yes No EAS (sched) | Yes Yes Yes No Yes * Profiling: C-state & P-state The detailed profiling result have been uploaded to git-hub [6]; - Case MP3: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 104.68ms -35.3ms +36.6ms -65.0ms clusterA: C2 2.11s -2.0s -1.8s -973.0ms clusterA: M2 26.45s -6.7s -4.5s -15.7s clusterB: WFI 3.07ms -3.1ms -3.1ms -3.1ms clusterB: C2 98.88ms -56.4ms -98.9ms -88.8ms clusterB: M2 19.52s +9.0s +10.4s +10.3s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 870.39 | 4872.34 | 93.79 | 27.43 | 5880.00 | 9436597.71 | | EAS DIS | +2.87%| -6.05%| -14.43%| -1.53%| 0.00%| -1.40%| | EAS NDM | -91.80%| -78.45%| +146.91%| +1384.29%| -0.16%| -14.46%| | EAS SCHED | +1771.55%| -79.00%| -100.00%| -100.00%| -84.15%| -47.56%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 919.34 | 4882.11 | 95.01 | 27.20 | 5888.56 | 9461934.73 | | EAS DIS | +6.26%| -3.38%| -11.19%| +0.18%| +0.94%| -0.01%| | EAS NDM | -88.31%| -78.24%| +155.38%| +1374.49%| -0.23%| -14.47%| | EAS SCHED | +1684.96%| -78.96%| -100.00%| -100.00%| -84.14%| -47.40%| ------------------------------------------------------------------------------------------------- - Case rt-app 6%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.99s -425.4ms -4.9s -462.3ms clusterA: C2 2.18s +810.8ms -2.2s -2.2s clusterA: M2 9.20ms -5.6ms -6.0ms -4.4ms clusterB: WFI 8.69s +135.2ms -8.6s -8.7s clusterB: C2 1.46s -229.9ms -1.5s -1.4s clusterB: M2 191.43ms -304us +19.8s +21.2s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.73 | 18640.00 | 42.41 | 9.79 | 176.25 | 8307775.13 | | EAS DIS | -2.99%| -1.29%| -76.22%| +1.71%| +0.16%| -1.53%| | EAS NDM | -26.18%| -99.72%| +35.65%| +161267.72%| -62.97%| +84.31%| | EAS SCHED | +2939.63%| -100.00%| +26837.28%| +150.46%| +486.62%| +16.74%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 56.28 | 42010.00 | 76.46 | 9.86 | 181.47 | 18442998.06 | | EAS DIS | -4.92%| -1.33%| -85.96%| -2.70%| +0.50%| -1.57%| | EAS NDM | -59.28%| -99.66%| +14.01%| +160382.48%| -61.11%| -16.44%| | EAS SCHED | +879.42%| -100.00%| +28572.37%| +355.98%| +489.88%| -5.52%| ------------------------------------------------------------------------------------------------- - Case rt-app 13%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 7.08s -25.3ms -3.3s -2.3s clusterA: C2 4.11s -15.6ms -4.1s -4.1s clusterA: M2 6.80ms -3.4ms -695us +11.0ms clusterB: WFI 6.47s -329.8ms -6.5s -6.5s clusterB: C2 4.13s +497.4ms -4.1s -4.1s clusterB: M2 71.32ms +121.8ms +20.0s +21.3s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.74 | 9730.00 | 8510.00 | 0.00 | 58.70 | 10481071.92 | | EAS DIS | -2.03%| -10.89%| +10.69%| +1864.00%| +190.19%| +3.41%| | EAS NDM | +0.78%| -99.77%| -98.47%| +1602036.00%| +222.13%| +49.94%| | EAS SCHED | +3304.36%| -81.57%| -79.87%| +1004144.00%| +4555.06%| +43.69%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 52.98 | 32160.00 | 24740.00 | 0.00 | 64.87 | 32017443.84 | | EAS DIS | +3.59%| -6.69%| +10.02%| +3186.00%| +164.31%| +3.24%| | EAS NDM | +3.68%| -99.78%| -98.58%| +3109608.80%| +275.04%| -4.92%| | EAS SCHED | +1057.87%| -78.78%| -75.17%| +1954311.10%| +5900.74%| -3.22%| ------------------------------------------------------------------------------------------------- - Case rt-app 19%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.59s +730.0ms -5.2s -4.6s clusterA: C2 795.08ms -479.8ms -792.0ms -791.9ms clusterA: M2 6.67ms +1.4ms -5.0ms -510us clusterB: WFI 6.78s +1.7s -6.7s -6.8s clusterB: C2 2.77s -2.1s -2.8s -2.8s clusterB: M2 73.12ms +2.9ms +20.0s +21.3s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.91 | 6330.00 | 14440.00 | 321.43 | 58.54 | 13643658.08 | | EAS DIS | -1.54%| -42.34%| +21.88%| -100.00%| -0.07%| +6.14%| | EAS NDM | -3.08%| -100.00%| -99.19%| +5041.93%| +206.22%| +18.52%| | EAS SCHED | +3458.49%| -98.23%| -80.94%| +3764.90%| +2791.53%| +18.27%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.47 | 23900.00 | 45480.00 | 944.74 | 64.38 | 44475464.16 | | EAS DIS | +0.25%| -40.50%| +9.78%| -100.00%| +2.95%| -4.14%| | EAS NDM | -3.62%| -100.00%| -99.21%| +4529.65%| +185.78%| -4.48%| | EAS SCHED | +1466.81%| -98.46%| -78.07%| +3391.27%| +4333.30%| -4.00%| ------------------------------------------------------------------------------------------------- - Case rt-app 25%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 9.41s +80.7ms -5.2s -3.1s clusterA: C2 6.62ms +2.5ms -2.6ms -386us clusterA: M2 9.02ms -5.5ms -1.8ms +12.0ms clusterB: WFI 9.35s +72.1ms -9.3s -9.3s clusterB: C2 11.49ms +208us +2.0ms +40.2ms clusterB: M2 121.57ms +67.8ms +19.8s +20.4s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.86 | 0.00 | 21240.00 | 9.90 | 109.76 | 15628682.88 | | EAS DIS | -5.69%| +1592.00%| -0.85%| +95.05%| +64.24%| -0.20%| | EAS NDM | +3.80%| 0.00%| -62.71%| +27272.12%| +4797.52%| -5.11%| | EAS SCHED | +3999.88%| +62.70%| -97.79%| +132936.06%| -57.80%| -15.63%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.41 | 0.00 | 76010.00 | 9.64 | 122.04 | 55578519.60 | | EAS DIS | -3.66%| +6163.00%| -2.84%| +318.59%| +66.44%| -2.56%| | EAS NDM | +0.43%| 0.00%| -59.18%| +91362.04%| +16748.26%| +0.34%| | EAS SCHED | +1258.89%| +159.70%| -99.03%| +136755.93%| -34.05%| -75.79%| ------------------------------------------------------------------------------------------------- - Case rt-app 31%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 8.64s -193.4ms -9.0ms -6.2s clusterA: C2 2.23ms -1.0ms +3.8ms +2.2ms clusterA: M2 8.42ms +1.2ms -1.4ms -5.0ms clusterB: WFI 8.64s -210.4ms -11.2ms -7.7s clusterB: C2 n.a. +1.6ms +4.9ms +79.7ms clusterB: M2 190.32ms -439us -119.2ms +18.9s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.49 | 0.00 | 22710.00 | 18.56 | 181.44 | 16794565.52 | | EAS DIS | +0.12%| 0.00%| +1.81%| -46.77%| +0.23%| +1.73%| | EAS NDM | +2.55%| 0.00%| +0.18%| -100.00%| -61.71%| -0.73%| | EAS SCHED | +320.07%| 0.00%| -95.70%| +4179.96%| +9229.78%| +29.82%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.15 | 0.00 | 89550.00 | 9.60 | 205.88 | 65549695.12 | | EAS DIS | -3.25%| 0.00%| +1.82%| +0.33%| +0.66%| +1.81%| | EAS NDM | -0.25%| 0.00%| +0.45%| -100.00%| -51.82%| +0.24%| | EAS SCHED | +102.30%| 0.00%| -96.24%| +30805.83%| +23883.23%| -1.48%| ------------------------------------------------------------------------------------------------- - Case rt-app 38%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 7.38s -68.4ms -3.2s -4.7s clusterA: C2 3.81ms +3.0ms +7.7ms +4.4ms clusterA: M2 8.30ms -6.5ms -5.4ms -2.6ms clusterB: WFI 7.38s -6.7ms -2.6s -3.6s clusterB: C2 655us -655us +191.1ms +447.0ms clusterB: M2 71.73ms +52.2ms +8.7s +10.5s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.43 | 0.00 | 13740.00 | 11510.00 | 70.25 | 21153777.44 | | EAS DIS | +1.77%| 0.00%| +0.44%| +0.09%| +70.23%| +0.53%| | EAS NDM | -0.79%| 0.00%| -90.94%| -72.37%| +25593.95%| +21.13%| | EAS SCHED | +2751.25%| 0.00%| -74.16%| -99.45%| +29707.83%| +31.77%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.33 | 0.00 | 53790.00 | 44650.00 | 103.91 | 82213110.64 | | EAS DIS | +0.56%| 0.00%| 0.00%| -1.81%| +50.12%| -0.87%| | EAS NDM | -1.66%| 0.00%| -91.35%| -75.88%| +53561.82%| -1.89%| | EAS SCHED | +821.76%| 0.00%| -77.45%| -99.53%| +55525.06%| -4.51%| ------------------------------------------------------------------------------------------------- - Case rt-app 44%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 5.62s +123.1ms -4.0s -1.7s clusterA: C2 4.22ms +933us -606us -1.6ms clusterA: M2 3.77ms +86us -697us +2.6ms clusterB: WFI 5.61s +87.6ms -46.5ms -859.0ms clusterB: C2 1.36ms -1.4ms +48.7ms +80.8ms clusterB: M2 71.19ms +10.7ms +4.1s +2.4s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 15.51 | 19.62 | 15660.00 | 12720.00 | 451.61 | 24180973.92 | | EAS DIS | +5.74%| -100.00%| +0.19%| -5.82%| +115.02%| -0.30%| | EAS NDM | +13.73%| -100.00%| -74.27%| +44.65%| +1343.72%| +17.57%| | EAS SCHED | +210.57%| -100.00%| +30.46%| -90.64%| +1664.80%| +5.91%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 51.45 | 76.43 | 61390.00 | 50060.00 | 1482.15 | 94633209.36 | | EAS DIS | +6.03%| -100.00%| -1.24%| -6.07%| +129.76%| -1.26%| | EAS NDM | +12.05%| -100.00%| -74.51%| +15.60%| +1314.79%| -2.64%| | EAS SCHED | +72.01%| -100.00%| +27.51%| -91.65%| +1547.61%| -4.47%| ------------------------------------------------------------------------------------------------- - Case rt-app 50%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) clusterA: WFI 6.34s +227.7ms +67.6ms +304.1ms clusterA: C2 3.35ms -1.5ms -2.0ms -1.7ms clusterA: M2 7.73ms -1.6ms -1.9ms -3.1ms clusterB: WFI 6.34s +231.2ms +69.2ms +308.0ms clusterB: C2 n.a. n.a. +2.0ms +2.2ms clusterB: M2 188.71ms -64.3ms -120.2ms +1.3s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.81 | 0.00 | 9.95 | 27340.00 | 169.39 | 26460416.57 | | EAS DIS | +0.54%| 0.00%| +208.91%| -1.79%| -26.08%| -1.92%| | EAS NDM | -3.51%| 0.00%| +98.23%| -0.55%| -63.03%| -1.00%| | EAS SCHED | +2926.71%| 0.00%| -100.00%| -2.93%| +587.69%| +1.97%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.44 | 0.00 | 10.62 | 108060.00 | 170.45 | 103961414.23 | | EAS DIS | +0.41%| 0.00%| +743.42%| -1.43%| +2.25%| -1.37%| | EAS NDM | -1.75%| 0.00%| +90.32%| -0.92%| -50.47%| -1.01%| | EAS SCHED | +894.37%| 0.00%| -100.00%| -3.08%| +865.46%| -1.28%| ------------------------------------------------------------------------------------------------- * Profiling: performance sysbench --test=cpu --num-threads=1 --max-time=10 run rt-app performance is calculate with below formula: task performance = slack/(c_period - c_run) * 1024 energy mainline (ndm) noeas (ndm) eas (ndm) eas (sched) prf prf prf prf sysbench 100 100 100 92 rt-app 6% 662 665 393 615 rt-app 13% 648 645 465 394 rt-app 19% 610 648 479 57 rt-app 25% 649 664 306 518 rt-app 31% 600 585 596 366 rt-app 38% 576 584 259 -166 rt-app 44% 466 487 30 -349 rt-app 50% 583 602 598 612 * Summary - After applied the two extra patches, the profiling result is consistent and stable for EAS (ndm) and EAS (sched). The tasks will be placed into first cluster for LITTLE.LITTLE; so EAS (ndm) and EAS (sched) are much better for cluster level's idle duty cycle compared with noEAS (ndm). - If the tasks are placed only on one cluster, the CPU's cycles will not change too much, but cluster level's cycles will increase much higher with EAS (ndm) and EAS (sched); So after packed tasks into one cluster, the cluster level will run longer time. - With "sched" governor, it is more aggressive than "ondemand" governor, the CPUs will easily run at high OPPs (1.2GHz) and low OPP (208MHz); With "ondemand" governor, CPUs have many chances run at middle OPPs (729MHz or 960MHz). - Need investigate rt-app 31% case, it is abnormal for cpu/cluster's idle duty cycle. - Need investigate rt-app 25% case, it is abnormal for "ondemand" governor, which stay at OPP 1.2GHz much longer than other configurations. [1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey_round3 [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey [3] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_idle_diff.py [4] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_pstate_time… [5] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_sched_prefo… [6] https://github.com/Leo-Yan/utility/tree/master/profile_eas/hikey_easv5_roun… Thanks, Leo Yan

9 years, 9 months

EASv5 Profiling On Hikey (2nd Round)

by Leo Yan

Hi all, This is second round profiling for EASv5 on Hikey board. I refined the energy model data according to Dietmar's suggestion (also appreciate for Dietmar's detailed review and guidance). Please help review and looking forward any comment and suggestion. * Overview This is the second round profiling for EASv5 patches on Hikey board. According to review comments from eas-dev mailing list, refined the energy model for Hikey, also developed several python/shell scripts for automatic analysis. So profiling data will be devided into three parts: - C-state profiling data - P-state profiling data - Scheduler performance profiling data * Hardware Environment - Platform: 96boards Hikey - SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster - CPU clock: 2 clusters with 8 CPUs have same clock source and support 208mHz/432mHz/729mHz/960mHz/1200mHz - Support CPU and cluster level low power mode * Software Environment - Kernel (4.2 + EAS RFCv5) [1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey - ARM-TF [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey - Enable CPUIdle with PSCI - Enable CPUFreq with cpufreq-dt driver - Profiling scritps: calc_idle_diff.py [3]: calculate C-state's difference for different configurations calc_pstate_time.py [4]: calculate P-state's difference for different configurations calc_sched_preformance.py [5]: calculate scheduler performance * Conventions Below are some conventions which used in below tables: CLS0: Cluster 0 CLS1: Cluster 1 WFI: CPU WFI state C2: CPU power down state M2: Cluster power down state DC: Duty cycle Configuration | Mailine EASv5 Enable CPUFreq CPUFreq | ENERGY_AWARE ondemand sched ------------- | ------- ----- ------------ -------- ------- Mainline (ndm) | Yes No No Yes No noEAS (ndm) | Yes Yes No Yes No EAS (ndm) | Yes Yes Yes Yes No EAS (sched) | Yes Yes Yes No Yes * Profiling: C-state & P-state The detailed profiling result have been uploaded to git-hub [6]; i manually adjust with more readable format for below statistics. - Case MP3: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 1.17s -1.0s -1.1s -1.1s CLS0: C2 2.10s -1.3s -1.7s -1.9s CLS0: M2 25.01s +2.5s +3.3s -4.9s CLS1: WFI 445.06ms -179.3ms -384.1ms -445.1ms CLS1: C2 182.95ms +552.2ms -183.6ms -185.9ms CLS1: M2 19.24s -1.5s +3.7s +10.7s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 1075.99 | 4803.60 | 105.06 | 37.31 | 5860.00 | 9443367.46 | | EAS DIS | -77.20%| +57.87%| +734.47%| +468.29%| -32.79%| -5.80%| | EAS NDM | -83.61%| -71.82%| +264.43%| +947.55%| +1.84%| -10.65%| | EAS SCHED | +795.42%| -100.00%| -100.00%| -100.00%| -98.56%| -77.71%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 1131.04 | 4814.53 | 116.39 | 29.15 | 5931.09 | 9545269.85 | | EAS DIS | -74.57%| +59.29%| +655.51%| +632.27%| -32.44%| -5.43%| | EAS NDM | -80.69%| -71.87%| +235.64%| +1215.77%| +0.69%| -11.48%| | EAS SCHED | +755.65%| -100.00%| -100.00%| -100.00%| -98.57%| -77.84%| ------------------------------------------------------------------------------------------------- - Case rt-app 6%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.87s +204.7ms +1.1s +286.0ms CLS0: C2 3.96s -272.9ms -4.0s -675.7ms CLS0: M2 9.29ms -83us -6.8ms -4.5ms CLS1: WFI 8.80s +50.1ms +260.2ms -1.6s CLS1: C2 1.16s +38.5ms -1.1s -1.1s CLS1: M2 73.76ms +51.4ms +49.7ms +127.9ms P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.58 | 18030.00 | 220.34 | 0.00 | 57.73 | 8022312.50 | | EAS DIS | -1.09%| +0.89%| -96.88%| +1972.00%| +88.79%| -0.08%| | EAS NDM | -0.42%| -99.69%| +9857.34%| +2738.00%| +87.08%| +101.66%| | EAS SCHED | +118.58%| -97.74%| +9394.42%| 0.00%| +310.84%| +95.94%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.72 | 40950.00 | 402.41 | 0.00 | 61.01 | 18068347.05 | | EAS DIS | +4.08%| +2.27%| -94.49%| +988.60%| +85.60%| +1.09%| | EAS NDM | -3.58%| -99.72%| +5357.03%| +1942.60%| +84.48%| -10.21%| | EAS SCHED | +38.23%| -98.75%| +5370.45%| 0.00%| +340.36%| -8.09%| ------------------------------------------------------------------------------------------------- - Case rt-app 13%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.79s -100.1ms -3.7s -4.3s CLS0: C2 2.76s -1.3s -2.8s -2.7s CLS0: M2 9.71ms -4.8ms -3.0ms -1.6ms CLS1: WFI 8.87s +19.0ms -3.9s -8.5s CLS1: C2 1.57s -918.8ms -1.5s -1.5s CLS1: M2 72.32ms +7.0ms +120.8ms +20.2s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 17.03 | 2730.00 | 15310.00 | 0.00 | 59.01 | 12414704.24 | | EAS DIS | -2.52%| +382.78%| -52.91%| 0.00%| +2.14%| -11.19%| | EAS NDM | -0.29%| -99.27%| -98.28%| +2703000.00%| +4628.01%| +137.63%| | EAS SCHED | +3068.94%| -91.27%| -84.86%| +1078619.00%| +6612.15%| +37.03%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.77 | 8880.00 | 37850.00 | 0.00 | 66.17 | 31519814.16 | | EAS DIS | +0.22%| +378.04%| -52.76%| 0.00%| +5.32%| -0.16%| | EAS NDM | -0.02%| -99.13%| -99.02%| +2705652.00%| +4169.43%| -5.84%| | EAS SCHED | +937.83%| -91.40%| -85.80%| +1954704.00%| +9605.80%| -2.15%| ------------------------------------------------------------------------------------------------- - Case rt-app 19%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.33s -422.5ms -5.1s -5.2s CLS0: C2 312.17ms +29.7ms -301.8ms -303.5ms CLS0: M2 8.01ms +5us -3.7ms -5.5ms CLS1: WFI 8.50s -200.8ms -4.1s -7.8s CLS1: C2 675.16ms -194.5ms -659.0ms -563.9ms CLS1: M2 120.55ms +2.9ms -41.3ms +18.7s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.57 | 3170.00 | 18020.00 | 19.87 | 110.11 | 14660673.76 | | EAS DIS | +1.51%| +57.41%| -6.66%| +802.47%| +4.98%| +0.49%| | EAS NDM | +4.59%| -100.00%| -85.46%| +143432.96%| +218.05%| +102.67%| | EAS SCHED | +517.32%| -95.68%| -80.73%| +57897.68%| +1565.83%| +8.29%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.95 | 12210.00 | 51480.00 | 9.84 | 120.12 | 42968660.00 | | EAS DIS | +4.55%| +55.61%| -5.48%| +5073.68%| +13.30%| +3.20%| | EAS NDM | -1.69%| -100.00%| -90.22%| +403660.16%| +310.75%| -1.29%| | EAS SCHED | +437.62%| -96.49%| -79.72%| +307381.50%| +3399.13%| -2.38%| ------------------------------------------------------------------------------------------------- - Case rt-app 25%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.37s +55.2ms -3.4s -4.6s CLS0: C2 13.79ms -3.4ms +23.4ms -2.4ms CLS0: M2 10.33ms -2.8ms -6.4ms -8.5ms CLS1: WFI 9.28s +109.2ms -3.3s -8.3s CLS1: C2 15.87ms -12.4ms +49.4ms +77.8ms CLS1: M2 73.39ms +51.7ms +99.5ms +17.2s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.56 | 0.00 | 21330.00 | 10.37 | 57.41 | 15631861.68 | | EAS DIS | -1.75%| +1926.00%| -0.80%| +1.54%| +110.80%| -0.25%| | EAS NDM | +10.69%| 0.00%| -95.58%| +233651.21%| +5021.06%| +75.86%| | EAS SCHED | +293.18%| +10204.00%| -99.76%| +120440.02%| +8984.65%| +17.41%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.26 | 0.00 | 74930.00 | 29.43 | 58.58 | 54734011.68 | | EAS DIS | +0.11%| +6195.50%| +2.00%| -63.97%| +153.11%| +2.21%| | EAS NDM | -11.35%| 0.00%| -97.28%| +162964.90%| +9599.76%| -0.64%| | EAS SCHED | +90.25%| +40570.00%| -99.73%| +125517.36%| +24717.17%| -2.66%| ------------------------------------------------------------------------------------------------- - Case rt-app 31%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.39s +238.5ms +192.0ms -4.3s CLS0: C2 3.20ms -762us +1.6ms +82.3ms CLS0: M2 7.33ms -3.7ms -965us -2.8ms CLS1: WFI 8.39s +246.4ms +207.0ms -4.7s CLS1: C2 2.48ms -2.1ms -491us +458.0ms CLS1: M2 122.45ms +150.1ms -2.9ms +9.9s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.80 | 0.00 | 23140.00 | 86.87 | 109.07 | 17086833.60 | | EAS DIS | -1.19%| 0.00%| -1.82%| -70.00%| +136.66%| -1.09%| | EAS NDM | -2.44%| 0.00%| -1.38%| -88.81%| +1.65%| -1.79%| | EAS SCHED | +3041.90%| 0.00%| -85.91%| +114.55%| +18860.30%| +60.83%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 56.27 | 0.00 | 90510.00 | 260.88 | 119.29 | 66387086.96 | | EAS DIS | -1.28%| 0.00%| -0.99%| -77.14%| +117.20%| -1.03%| | EAS NDM | -1.76%| 0.00%| -0.53%| -96.21%| +3.29%| -0.88%| | EAS SCHED | +909.24%| 0.00%| -89.73%| +101.70%| +41110.50%| +0.00%| ------------------------------------------------------------------------------------------------- - Case rt-app 38%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.50s -233.6ms -3.1s -4.1s CLS0: C2 2.04ms -1.0ms +20.4ms +33.3ms CLS0: M2 7.73ms -2.3ms -2.9ms -1.1ms CLS1: WFI 7.51s -232.5ms -2.9s -3.4s CLS1: C2 1.65ms +108us +26.0ms +1.5s CLS1: M2 70.35ms +48.7ms +4.5ms +7.3s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.86 | 0.00 | 13610.00 | 11410.00 | 60.01 | 20950808.88 | | EAS DIS | -4.74%| 0.00%| +1.98%| +1.67%| +85.52%| +2.10%| | EAS NDM | -1.25%| 0.00%| -26.67%| +24.80%| +11348.09%| +39.34%| | EAS SCHED | +3510.50%| 0.00%| -57.83%| -99.39%| +33561.06%| +36.59%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.54 | 0.00 | 53470.00 | 44830.00 | 71.27 | 82113506.32 | | EAS DIS | -0.32%| 0.00%| +0.97%| +1.43%| +83.58%| +1.30%| | EAS NDM | -2.57%| 0.00%| -42.40%| -9.99%| +25170.10%| +0.85%| | EAS SCHED | +1070.72%| 0.00%| -66.71%| -99.53%| +77534.35%| -2.93%| ------------------------------------------------------------------------------------------------- - Case rt-app 44%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 5.58s +148.0ms -2.1s -2.0s CLS0: C2 3.00ms +462us +3.4ms +31.7ms CLS0: M2 8.36ms +1.4ms -4.4ms +84us CLS1: WFI 5.58s +161.4ms -685.8ms -148.4ms CLS1: C2 3.66ms -3.7ms +6.2ms +896.1ms CLS1: M2 66.94ms +8.2ms +60.1ms +2.8s P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.31 | 0.00 | 15990.00 | 12870.00 | 63.64 | 24091670.48 | | EAS DIS | -2.58%| 0.00%| -2.44%| +0.62%| +11.94%| -0.82%| | EAS NDM | +0.86%| 0.00%| -80.86%| +96.27%| +5258.27%| +26.91%| | EAS SCHED | +2343.72%| +5931.20%| -75.42%| -82.34%| +36213.64%| +36.51%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 54.46 | 0.00 | 63200.00 | 50700.00 | 84.69 | 94857755.68 | | EAS DIS | -3.43%| 0.00%| -3.34%| -0.10%| +23.86%| -1.65%| | EAS NDM | +0.28%| 0.00%| -81.79%| +51.87%| +10561.96%| -1.79%| | EAS SCHED | +707.29%| +5931.20%| -79.65%| -87.02%| +73887.48%| -4.07%| ------------------------------------------------------------------------------------------------- - Case rt-app 50%: C-state DC Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 6.28s +113.9ms -197.9ms +301.9ms CLS0: C2 1.40ms +2.3ms +1.7ms +3.9ms CLS0: M2 8.25ms -5.1ms -7.6ms -5.2ms CLS1: WFI 6.29s +111.4ms -120.5ms +297.2ms CLS1: C2 2.47ms -2.5ms -2.2ms -2.5ms CLS1: M2 118.94ms -44.1ms -116.6ms +538.7ms P-State Statistics ------------------------------------------------------------------------------------------------- | Cluster Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 16.58 | 0.00 | 0.57 | 27440.00 | 113.26 | 26482179.82 | | EAS DIS | -4.34%| 0.00%| +6781.74%| -0.98%| -31.93%| -1.04%| | EAS NDM | +3.08%| 0.00%| +13155.65%| +0.91%| +21.72%| +1.23%| | EAS SCHED | +433.29%| 0.00%| -100.00%| -7.51%| +1694.80%| +1.29%| ------------------------------------------------------------------------------------------------- | CPU Level (ms) | ------ ------------------------------------------------------------------------------------------ | Item | 208MHz | 432MHz | 729MHz | 960MHz | 1.2GHz | Cycles | ------------------------------------------------------------------------------------------------- | MAINLINE | 55.02 | 0.00 | 10.64 | 108480.00 | 135.34 | 104322408.72 | | EAS DIS | -3.56%| 0.00%| +818.70%| -1.13%| -2.61%| -1.08%| | EAS NDM | -3.13%| 0.00%| +2034.87%| -1.62%| +160.34%| -1.22%| | EAS SCHED | +131.86%| 0.00%| -100.00%| -8.49%| +4018.46%| -2.21%| ------------------------------------------------------------------------------------------------- * Profiling: performance sysbench --test=cpu --num-threads=1 --max-time=30 run: Respone Time Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) min 1.67ms +00.00ms +00.00ms +00.00ms avg 1.68ms +00.00ms +00.00ms +00.00ms max 2.35ms +04.54ms +05.32ms +7.77ms approx 95% 1.70ms +00.00ms +00.00ms -00.02ms rt-app performance is calculate with below formula: task performance = slack/(c_period - c_run) * 1024 energy mainline (ndm) noeas (ndm) eas (ndm) eas (sched) prf prf prf prf rt-app 6% 665 663 630 606 rt-app 13% 696 634 474 348 rt-app 19% 665 663 630 606 rt-app 25% 650 627 391 243 rt-app 31% 587 599 596 396 rt-app 38% 574 562 403 212 rt-app 44% 458 487 399 177 rt-app 50% 478 590 572 616 * Summary - With EAS (sched), CPUIdle duty cycle is much better than other confugirations; If we review the cpu idle's duty cycle with P-state's statistics, we can easily to know if cpu frequency scaling is driven by scheduler, then it will introduce more aggressive policy to increase frequency. So CPU can finish tasks more quick and finally we can get better cpu idle's duty cycle. - With EAS (sched), it also introduce much higher for cluster level's cycle number; that can be explained by the scheduler will place tasks on single CPU rather than spread them, so CPU still need run for almost the same time, but the cluster level will run more time due the tasks is running sequential on less CPUs. - With EAS (sched), the CPUs are mainly running at lowest point (208MHz) or higher OPPs (960MHz or 1.2GHz); but for mainline kernel with ondemand governor, CPUs have many chances run at middle OPPs (432MHz or 729MHz). - For EAS (ndm), it enters cluster level's idle state for much less time if we compare with EAS (sched); After reviewed the detailed cpu level's idle data, EAS (ndm) will spread tasks into two clusters, but EAS (sched) will let only one cluster to run tasks as possible. So EAS (sched) can power off cluster 1 for most time, but EAS (ndm) will spread tasks to two CPUs, but these two CPUs are placed in two clusters separtely. - EAS (ndm) and EAS (sched) are much better for CPU level's idle duty cycle compare with noEAS (ndm). After enabled EAS_FEATURE, the CPU will run into low power mode for much more time than noEAS (ndm); though cluster level's idle duty cycle cannot demonstrate this. EAS (ndm) and EAS (sched) also will let CPU run at higher OPP than noEAS (ndm). - rt-app 6% case is special, EAS(ndm) and EAS(sched) will spread tasks into two clusters; so there have no improvement for cluster level's idle duty cycle. [1] https://github.com/Leo-Yan/linux/tree/profile_easv5_hikey [2] https://github.com/96boards/arm-trusted-firmware/tree/hikey [3] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_idle_diff.py [4] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_pstate_time… [5] https://github.com/Leo-Yan/utility/blob/master/profile_eas/calc_sched_prefo… [6] https://github.com/Leo-Yan/utility/tree/master/profile_eas/hikey_easv5_roun… Thanks, Leo Yan

9 years, 10 months

[RFC] sched: Central, scheduler-driven, power-perfomance control

by Patrick Bellasi

Hi eas-dev, as pre-announced by Morten on that ML, last week we posted on LKML an RFC introducing SchedTune, a proposal for a central tuning knob. The current posting addresses DVFS biasing only. There will be a follow up posting to deal with task placement via integration with EAS as the upstream discussions move along. The full cover letter can be found here: http://www.kernelhub.org/?msg=819910&p=2 For your convenience the patch set is available here: git://www.linux-arm.com/linux-power eas/stune/rfcv1 Cheers Patrick -- #include <best/regards.h> Patrick Bellasi

9 years, 10 months

EAS Profiling On Hikey (1st Round)

by Leo Yan

Hi all, Below are my trying for profiling EAS; please help review and welcome any suggestion or question. * Purpose This is the first round profiling for EASv5 patches on Hikey board; With profiling EASv5 patches on Hikey board, can get below info and feedback for EAS's developement: - Created the profiling enviornment for ARM64 - Collected manifestation after applied EASv5 patches on SoC with two CA53 clusters - I cannot measure hardware power cosumption, so currently _ONLY_ check CPU duty cycle for comparasion scheduler behavior * Hardware Environment - Platform: 96boards Hikey - SoC: Hi6220, 2 clusters, 4xCA53 CPUs in each cluster - CPU clock: 2 clusters with 8 CPUs have coupled clock source and support 208mHz/432mHz/729mHz/960mHz/1200mHz - Support CPU and cluster level low power mode * Software Environment - Kernel: 4.2rc4 + EAS RFCv5 - ARM-TF: [1] - Enabled CPUIdle with PSCI - Enabled CPUFreq with cpufreq-dt driver * Profiling Data CLS0: Cluster 0 CLS1: Cluster 1 CPU_PD: CPU power down state CLS_PW: Cluster power down state - Case Sysbench: sysbench --test=cpu --cpu-max-prime=20000 run Respone Time Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) min 4.19ms +00.00ms +00.00ms +00.00ms avg 4.21ms +00.00ms +00.00ms -00.01ms max 6.86ms +00.09ms +00.04ms +13.61ms approx 95% 4.23ms +00.00ms +00.00ms -00.02ms Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 000.20ms +000.35ms +000.34ms +000.89ms CLS0: CPU_PD 001.58ms +000.76ms +000.88ms +001.40ms CLS0: CLS_PD 001.82ms -000.41ms -000.12ms +001.10ms CLS1: WFI n.a. n.a. +2.9s +000.07ms CLS1: CPU_PD n.a. n.a. +001.30ms +6.7s CLS1: CLS_PD 42.11s +003.8ms -2.9s -6.8s - Case MP3: ./idlestat --trace -f mp3_trace.log -t 30 -p -c -w -o mp3_report.log -- rt-app ./doc/examples/mp3-long.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 067.31ms -022.10ms -022.20ms +019.70ms CLS0: CPU_PD 887.74ms +919.80ms -292.40ms -316.90ms CLS0: CLS_PD 17.08s -444.80ms +895.30ms +5.3s CLS1: WFI 000.59ms +002.30ms +000.28ms +196.70ms CLS1: CPU_PD n.a. +000.26ms n.a. +004.00ms CLS1: CLS_PD 28.80s +189.10ms -269.40ms -10.6s - Case rt-app 6%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-6.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 7.82s +037.40ms -7.1s -7.3s CLS0: CPU_PD 4.26s -154.20ms -3.5s -4.2s CLS0: CLS_PD 005.76ms n.a. +17.6s +18.9s CLS1: WFI 6.46s +2.0s +2.4s -155.90ms CLS1: CPU_PD 6.01s -1.9s -5.3s -6.0s CLS1: CLS_PD 123.76ms -118.50ms -121.90ms +1.2s - Case rt-app 13%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-13.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.26s -304.70ms -8.8s -4.2s CLS0: CPU_PD 1.11s -695.20ms -275.2ms -1.1s CLS0: CLS_PD 003.49ms -001.30ms +18.3s -613us CLS1: WFI 8.32s -8.3s -3.4s -8.3s CLS1: CPU_PD 2.65s -2.6s -2.6s -2.6s CLS1: CLS_PD 123.07ms +19.9s -121.6ms +20.3s - Case rt-app 19%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-19.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.91s -256.70ms -473.4ms -2.7s CLS0: CPU_PD 428.42ms +000.86ms -425.5ms -356.20ms CLS0: CLS_PD 002.28ms +000.45ms +1.1ms n.a. CLS1: WFI 6.20s +224.80ms -6.2s -5.2s CLS1: CPU_PD 4.02s -193.20ms -4.0s -4.0s CLS1: CLS_PD 073.93ms +1.3s +20.0s +039.80ms - Case rt-app 25%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-25.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 9.60s -170.90ms -3.5s -3.4s CLS0: CPU_PD 025.02ms -018.90ms -023.20ms -023.50ms CLS0: CLS_PD 004.43ms -001.80ms -003.50ms +004.40ms CLS1: WFI 9.72s -1.1s -9.6s -9.2s CLS1: CPU_PD 239.13ms +1.7s -237.90ms -152.00ms CLS1: CLS_PD 075.45ms +001.00ms +19.9s +20.1s - Case rt-app 31%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-31.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.54s +108.10ms -5.6s -189.20ms CLS0: CPU_PD 001.97ms +001.70ms -001.40ms +005.50ms CLS0: CLS_PD 003.24ms -002.30ms +003.00ms +001.90ms CLS1: WFI 8.56s +108.60ms -8.6s -260.00ms CLS1: CPU_PD n.a. +001.70ms n.a. +199.60ms CLS1: CLS_PD 189.15ms -000.28ms +19.9s +449.60ms - Case rt-app 38%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-38.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 8.77s +695.00ms -7.0s -4.9s CLS0: CPU_PD 000.96ms +001.50ms +001.70ms -000.01ms CLS0: CLS_PD 003.06ms -000.59ms -002.50ms +002.30ms CLS1: WFI 8.79s +913.70ms -8.8s -8.8s CLS1: CPU_PD 001.71ms +151.30ms -001.70ms -000.45ms CLS1: CLS_PD 123.32ms -120.60ms +19.6s +20.4s - Case rt-app 44%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-44.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 5.76s +101.20ms -4.6s -4.5s CLS0: CPU_PD 003.07ms -000.18ms -001.70ms -001.80ms CLS0: CLS_PD 002.02ms -001.20ms +000.23ms +001.30ms CLS1: WFI 6.01s +108.60ms -5.9s -5.9s CLS1: CPU_PD 001.14ms +001.20ms +000.07ms +000.65ms CLS1: CLS_PD 190.75ms -115.50ms +19.6s +19.6s - Case rt-app 50%: ./idlestat --trace -f trace.log -t 30 -p -c -w -o report.log -- rt-app ./doc/examples/rt-app-50.json Idle Dutycycle Mainline (ndm) noEAS (ndm) EAS (ndm) EAS (sched) CLS0: WFI 6.62s -267.20ms -6.6s -6.6s CLS0: CPU_PD n.a. +001.80ms +001.50ms +000.72ms CLS0: CLS_PD 001.47ms +001.30ms -000.33ms +005.10ms CLS1: WFI 6.89s -348.20ms -6.9s -6.9s CLS1: CPU_PD 001.63ms -001.50ms -001.60ms -001.60ms CLS1: CLS_PD 001.87ms +123.50ms +19.9s +20.2s * Summary - For two same clusters case, EAS (ndm) has the best performance than other configurations. We can get most benefit for CPU/cluster's duty cycle with EAS (ndm); The reason is for almost all cases, EAS (ndm) increases much time for cluster's power down, than means the scheduler have optimized load balance within one cluster rather than spread tasks to two clusters. - EAS (sched) is not consistent for all cases, for case MP3 even worse than mainline; for cases rt-app 6%/13%/25%/38%/44%/50% it has almost same behavior with EAS (ndm), but it's not stable for cases rt-app 19%/31%. - EAS (sched) introduces very high latency for sysbench, so the max response time it will take 20.47ms, which is much higher than other three configs. - EAS (sched) and EAS (ndm) will impact idle states selection in CPUIdle, and it increases the possibility for CPU level's state rather than cluster level's state; this result is coming from sysbench case. [1] https://github.com/Leo-Yan/arm-trusted-firmware/tree/hikey_enable_low_power… Thanks, Leo Yan

9 years, 10 months

[RFCv5] Energy cost model for energy-aware scheduling

by Morten Rasmussen

Hi eas-dev, In case you don't follow LKML, we have posted RFCv5 of the EAS patch set for review and comments. The main addition in this posting is scheduler-driven DVFS along with a number of smaller fixes and improvements. The full cover letter can be found here: https://lkml.org/lkml/2015/7/7/754 For your convenience the patch set is available here: http://www.linux-arm.org/git?p=linux-power.git branch: energy_model_rfc_v5 Additional patches introducing a central tuning knob that influences both task placement and DVFS will be posted later for discussion. Thanks, Morten

9 years, 11 months

Re: [Eas-dev] [RESEND PATCH v2] doc: measure the efficiency of cpufreq governors

by Amit Kucheria

(This might be of interest to folks on eas-dev, adding them to cc) On Wed, Jun 24, 2015 at 11:53 AM, Vincent Guittot <vincent.guittot(a)linaro.org> wrote: > On 24 June 2015 at 03:41, <pang.xunlei(a)zte.com.cn> wrote: >> Hi Vincent, >> >> Vincent Guittot <vincent.guittot(a)linaro.org> wrote 2015-06-23 PM 09:43:55: >>> Re: [RESEND PATCH v2] doc: measure the efficiency of cpufreq governors >>> >>> Hi Xunlei, >>> >>> I have run the bench on a quad A15 with sched-dvfs (but without eas >>> patches unlike you) >>> #sudo ./test.sh 4 100 1000 >>> Frequency domain CPU0~CPU3, run 100ms, sleep 1000ms: >>> powersave efficiency: 0% >>> performance efficiency: 100% >>> conservative efficiency: 18% >>> ondemand efficiency: 48% >>> cfs efficiency: 24% >>> >>> #sudo ./test.sh 4 200 1000 >>> Frequency domain CPU0~CPU3, run 200ms, sleep 1000ms: >>> powersave efficiency: 0% >>> performance efficiency: 100% >>> conservative efficiency: 40% >>> ondemand efficiency: 68% >>> cfs efficiency: 30% >>> >>> $ sudo ./test.sh 4 50 1000 >>> Frequency domain CPU0~CPU3, run 50ms, sleep 1000ms: >>> powersave efficiency: 0% >>> performance efficiency: 100% >>> conservative efficiency: 0% >>> ondemand efficiency: 25% >>> cfs efficiency: 19% >>> >>> As an example, here is the result when ondemand parameter are tuned >>> for the platform >>> sudo ./test.sh 4 100 1000 >>> Frequency domain CPU0~CPU3, run 100ms, sleep 1000ms: >>> powersave efficiency: 0% >>> performance efficiency: 100% >>> conservative efficiency: 19% >>> ondemand efficiency: 94% >>> cfs efficiency: 23% >>> >>> Beside these results, i have seen variation in the results that >>> confirm the interest of having more statistics like, min, man stdev >> >> I guess the hardware environment has something to do with this, on my >> platform, there're 11 available freuencies in total: 1200Mhz~2200Mhz, >> the step size is 100Mhz. > > Yes for sure, it was just to give some figures with a different platform. > Regarding the variation, i have seen these variations for the same > platform with the same SW. Nevertheless, this variations are somewhat > normal if we consider the default sampling rate of 164ms for some > governor compared to a run duration of 100ms > > Regards, > Vincent > >> >> Also it may get different results when running with different workload >> loops. >> >> -Xunlei >> >>> >>> Regards, >>> Vincent >>> >>> >>> >>> On 18 June 2015 at 13:35, <pang.xunlei(a)zte.com.cn> wrote: >>> > Just tested on my Intel EAS test environment(implemented x86 frequency >>> > invariant hook). >>> > With EAS disabled and sched-dvfs enabled. >>> > >>> > #./test.sh 3 100 1000 >>> > Frequency domain CPU0~CPU2, run 100ms, sleep 1000ms: >>> > powersave efficiency: 0% >>> > performance efficiency: 100% >>> > conservative efficiency: 92% >>> > ondemand efficiency: 97% >>> > cfs efficiency: 79% >>> > >>> > #./test.sh 3 200 1000 >>> > Frequency domain CPU0~CPU2, run 200ms, sleep 1000ms: >>> > powersave efficiency: 0% >>> > performance efficiency: 100% >>> > conservative efficiency: 97% >>> > ondemand efficiency: 99% >>> > cfs efficiency: 89% >>> > >>> > #./test.sh 3 50 1000 >>> > Frequency domain CPU0~CPU2, run 50ms, sleep 1000ms: >>> > powersave efficiency: 0% >>> > performance efficiency: 100% >>> > conservative efficiency: 93% >>> > ondemand efficiency: 96% >>> > cfs efficiency: 58% >>> > >>> > #./test.sh 3 1000 100 >>> > Frequency domain CPU0~CPU2, run 1000ms, sleep 100ms: >>> > powersave efficiency: 0% >>> > performance efficiency: 100% >>> > conservative efficiency: 99% >>> > ondemand efficiency: 99% >>> > cfs efficiency: 97% >>> > >>> > Seems sched-dvfs is computing inefficient at low cpu usage(implies >> power >>> > efficient), >>> > but computing efficient at high cpu usage. >>> > >>> > -Xunlei >>> > >>> > Xunlei Pang <xlpang(a)126.com> wrote 2015-06-18 PM 05:06:07: >>> >> [RESEND PATCH v2] doc: measure the efficiency of cpufreq governors >>> >> >>> >> From: Xunlei Pang <pang.xunlei(a)linaro.org> >>> >> >>> >> DVFS adds a latency in the execution of task because of the time to >>> >> decide to move at max freq. We need to measure this latency and check >>> >> that the governor stays in an acceptable range. >>> >> >>> >> When workgen runs a json file, a log file is created for each thread. >>> >> This log file records the number of loop that has been executed and >>> >> the duration for executing these loops (per phase). We can use these >>> >> figures to evaluate to latency that is added by a cpufreq governor >>> >> and its "performance efficiency". >>> >> >>> >> We use the run+sleep pattern to do the measurement, for the run time >> per >>> >> loop, the performance governor should run the expected duration as >> the >>> >> CPU stays a max freq. At the opposite, the powersave governor will >> give >>> >> use the longest duration (as it stays at lowest OPP). Other governor >>> > will >>> >> be somewhere between the 2 previous duration as they will use several >>> > OPP >>> >> and will go back to max frequency after a defined duration which >> depends >>> >> on its monitoring period. >>> >> >>> >> The formula: >>> >> >>> >> duration of powersave gov - duration of the gov >>> >> -------------------------------------------------------- x 100% >>> >> duration of powersave gov - duration of performance gov >>> >> >>> >> will give the efficiency of the governor. 100% means as efficient as >>> >> the perf governor and 0% means as efficient as the powersave >> governor. >>> >> >>> >> This patch offers json files and shell scripts to do the measurement, >>> >> >>> >> Usage: ./test.sh <cpus> <runtime> <sleeptime> >>> >> cpus: number of cpus in the CPU0's frequency domain >>> >> runtime: running time in ms per loop of the workload pattern >>> >> sleeptime: sleeping time in ms per loop of the workload pattern >>> >> >>> >> Example: >>> >> "./test.sh 4 100 1000" means >>> >> CPU0~CPU3 sharing frequency, "100ms run + 1000ms sleep" workload >>> > pattern. >>> >> >>> >> test result on my machine: >>> >> ~#./test.sh 4 100 1000 >>> >> Frequency domain CPU0~CPU3, run 100ms, sleep 1000ms: >>> >> powersave efficiency: 0% >>> >> performance efficiency: 100% >>> >> conservative efficiency: 28% >>> >> ondemand efficiency: 95% >>> >> >>> >> NOTE: Make sure there are "sed", "cut", "grep", "rt-app", etc tools >> on >>> >> your test machine, and run the script under root privilege. >>> >> >>> >> Signed-off-by: Xunlei Pang <pang.xunlei(a)linaro.org> >>> >> --- >>> >> doc/examples/cpufreq_governor_efficiency/README | 54 >> ++++++++++++++ >>> >> .../cpufreq_governor_efficiency/calibration.json | 27 +++++++ >>> >> .../cpufreq_governor_efficiency/calibration.sh | 11 +++ >>> >> doc/examples/cpufreq_governor_efficiency/dvfs.json | 27 +++++++ >>> >> doc/examples/cpufreq_governor_efficiency/dvfs.sh | 38 ++++++++++ >>> >> doc/examples/cpufreq_governor_efficiency/test.sh | 82 +++++++++++ >>> >> +++++++++++ >>> >> 6 files changed, 239 insertions(+) >>> >> create mode 100644 doc/examples/cpufreq_governor_efficiency/README >>> >> create mode 100644 >>> > doc/examples/cpufreq_governor_efficiency/calibration.json >>> >> create mode 100755 >>> > doc/examples/cpufreq_governor_efficiency/calibration.sh >>> >> create mode 100644 >> doc/examples/cpufreq_governor_efficiency/dvfs.json >>> >> create mode 100755 doc/examples/cpufreq_governor_efficiency/dvfs.sh >>> >> create mode 100755 doc/examples/cpufreq_governor_efficiency/test.sh >>> >> >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/README b/doc/ >>> >> examples/cpufreq_governor_efficiency/README >>> >> new file mode 100644 >>> >> index 0000000..cc8efe1 >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/README >>> >> @@ -0,0 +1,54 @@ >>> >> +Measure the efficiency of cpufreq governors using rt-app >>> >> + >>> >> +BACKGROUND: >>> >> + DVFS adds a latency in the execution of task because of the time >> to >>> >> + decide to move at max freq. We need to measure this latency and >>> > check >>> >> + that the governor stays in an acceptable range. >>> >> + >>> >> + When workgen runs a json file, a log file is created for each >>> > thread. >>> >> + This log file records the number of loop that has been executed >> and >>> >> + the duration for executing these loops (per phase). We can use >>> > these >>> >> + figures to evaluate to latency that is added by a cpufreq >> governor >>> >> + and its "performance efficiency". >>> >> + >>> >> + We use the run+sleep pattern to do the measurement, for the run >>> > time per >>> >> + loop, the performance governor should run the expected duration >> as >>> > the >>> >> + CPU stays a max freq. At the opposite, the powersave governor >> will >>> > give >>> >> + use the longest duration (as it stays at lowest OPP). Other >>> > governor will >>> >> + be somewhere between the 2 previous duration as they will use >>> > several OPP >>> >> + and will go back to max frequency after a defined duration which >>> > depends >>> >> + on its monitoring period. >>> >> + >>> >> + The formula: >>> >> + >>> >> + duration of powersave gov - duration of the gov >>> >> + -------------------------------------------------------- x 100% >>> >> + duration of powersave gov - duration of performance gov >>> >> + >>> >> + will give the efficiency of the governor. 100% means as >> efficient >>> > as >>> >> + the perf governor and 0% means as efficient as the powersave >>> > governor. >>> >> + >>> >> + This test offers json files and shell scripts to do the >>> > measurement, >>> >> + >>> >> +USAGE: >>> >> + ./test.sh <cpus> <runtime> <sleeptime> >>> >> + cpus: number of cpus in the CPU0's frequency domain >>> >> + runtime: running time in ms per loop of the workload pattern >>> >> + sleeptime: sleeping time in ms per loop of the workload pattern >>> >> + >>> >> +Example: >>> >> + "./test.sh 4 100 1000" means >>> >> + CPU0~CPU3 sharing frequency, "100ms run + 1000ms sleep" workload >>> > pattern. >>> >> + >>> >> + test result on an Intel machine: >>> >> + ~#./test.sh 4 100 1000 >>> >> + Frequency domain CPU0~CPU3, run 100ms, sleep 1000ms: >>> >> + powersave efficiency: 0% >>> >> + performance efficiency: 100% >>> >> + conservative efficiency: 28% >>> >> + ondemand efficiency: 95% >>> >> + >>> >> +NOTE: >>> >> + Make sure there are "sed", "cut", "grep", "rt-app", etc tools >>> >> on your test >>> >> + machine, and run the script under root privilege. >>> >> + >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/ >>> >> calibration.json >>> > b/doc/examples/cpufreq_governor_efficiency/calibration.json >>> >> new file mode 100644 >>> >> index 0000000..4377990 >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/calibration.json >>> >> @@ -0,0 +1,27 @@ >>> >> +{ >>> >> + "tasks" : { >>> >> + "thread" : { >>> >> + "instance" : 1, >>> >> + "cpus" : [0], >>> >> + "loop" : 1, >>> >> + "phases" : { >>> >> + "run" : { >>> >> + "loop" : 1, >>> >> + "run" : 200000, >>> >> + }, >>> >> + "sleep" : { >>> >> + "loop" : 1, >>> >> + "sleep" : 200000, >>> >> + } >>> >> + } >>> >> + } >>> >> + }, >>> >> + "global" : { >>> >> + "default_policy" : "SCHED_FIFO", >>> >> + "calibration" : "CPU0", >>> >> + "lock_pages" : true, >>> >> + "ftrace" : true, >>> >> + "logdir" : "./", >>> >> + } >>> >> +} >>> >> + >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/calibration.sh >>> >> b/doc/examples/cpufreq_governor_efficiency/calibration.sh >>> >> new file mode 100755 >>> >> index 0000000..d10e644 >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/calibration.sh >>> >> @@ -0,0 +1,11 @@ >>> >> +#!/bin/sh >>> >> + >>> >> +set -e >>> >> + >>> >> +echo performance > >>> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >>> >> + >>> >> +sleep 1 >>> >> + >>> >> +pLoad=$(rt-app calibration.json 2>&1 |grep pLoad |sed 's/.*= >>> > $.*$ns.*/\1/') >>> >> +sed 's/"calibration" : .*,/"calibration" : '$pLoad',/' -i dvfs.json >>> >> + >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.json b/ >>> >> doc/examples/cpufreq_governor_efficiency/dvfs.json >>> >> new file mode 100644 >>> >> index 0000000..b413156 >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.json >>> >> @@ -0,0 +1,27 @@ >>> >> +{ >>> >> + "tasks" : { >>> >> + "thread" : { >>> >> + "instance" : 1, >>> >> + "cpus" : [0], >>> >> + "loop" : 5, >>> >> + "phases" : { >>> >> + "running" : { >>> >> + "loop" : 1, >>> >> + "run" : 100000, >>> >> + }, >>> >> + "sleeping" : { >>> >> + "loop" : 1, >>> >> + "sleep" : 1000000, >>> >> + } >>> >> + } >>> >> + } >>> >> + }, >>> >> + "global" : { >>> >> + "default_policy" : "SCHED_OTHER", >>> >> + "calibration" : 90, >>> >> + "lock_pages" : true, >>> >> + "ftrace" : true, >>> >> + "logdir" : "./", >>> >> + } >>> >> +} >>> >> + >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/dvfs.sh b/doc/ >>> >> examples/cpufreq_governor_efficiency/dvfs.sh >>> >> new file mode 100755 >>> >> index 0000000..8591fc7 >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/dvfs.sh >>> >> @@ -0,0 +1,38 @@ >>> >> +#!/bin/sh >>> >> + >>> >> +#echo $1 $2 $3 >>> >> +set -e >>> >> + >>> >> +if [ $1 ] && [ $2 ] ; then >>> >> + for i in $(seq 0 1 $(expr $2 - 1)); do >>> >> + echo $1 > >> /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor >>> >> + #cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor >>> >> + done >>> >> + >>> >> + sleep 3 >>> >> +fi >>> >> + >>> >> +if [ $3 ] ; then >>> >> + sed 's/"run" : .*,/"run" : '$3',/' -i dvfs.json >>> >> +fi >>> >> + >>> >> +if [ $4 ] ; then >>> >> + sed 's/"sleep" : .*,/"sleep" : '$4',/' -i dvfs.json >>> >> +fi >>> >> + >>> >> +#cat dvfs.json >>> >> + >>> >> +rt-app dvfs.json 2> /dev/null >>> >> + >>> >> +if [ $1 ] ; then >>> >> + mv -f rt-app-thread-0.log rt-app_$1_run$3us_sleep$4us.log >>> >> + >>> >> + sum=0 >>> >> + for i in $(cat rt-app_$1_run$3us_sleep$4us.log | sed 'n;d' | sed >>> >> '1d' |cut -f 3); do >>> >> + sum=$(expr $sum + $i) >>> >> + done >>> >> + sum=$(expr $sum / 5) >>> >> + echo $sum >>> >> + rm -f rt-app_$1_run$3us_sleep$4us.log >>> >> +fi >>> >> + >>> >> diff --git a/doc/examples/cpufreq_governor_efficiency/test.sh b/doc/ >>> >> examples/cpufreq_governor_efficiency/test.sh >>> >> new file mode 100755 >>> >> index 0000000..d72fc6a >>> >> --- /dev/null >>> >> +++ b/doc/examples/cpufreq_governor_efficiency/test.sh >>> >> @@ -0,0 +1,82 @@ >>> >> +#!/bin/sh >>> >> + >>> >> +set -e >>> >> + >>> >> +set_calibration() { >>> >> + calibration.sh >>> >> +} >>> >> + >>> >> +test_efficiency() { >>> >> + >>> >> + FILENAME="results_$RANDOM$$.txt" >>> >> + >>> >> + if [ -e /sys/devices/system/cpu/cpu0/cpufreq/ >>> >> scaling_available_governors ]; then >>> >> + for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/ >>> >> scaling_available_governors); do >>> >> + export gov_$i=$(echo $i) >>> >> + done >>> >> + else >>> >> + echo "cpufreq is not available!" >>> >> + exit >>> >> + fi >>> >> + >>> >> + if [ ! $gov_performance ] ; then >>> >> + echo "Can't find performance governor!" >>> >> + exit >>> >> + fi >>> >> + >>> >> + if [ ! $gov_powersave ] ; then >>> >> + echo "Can't find powersave governor!" >>> >> + exit >>> >> + fi >>> >> + >>> >> + # Get powersave data >>> >> + dvfs.sh powersave $1 $2 $3 > $FILENAME >>> >> + powersave=$(cat $FILENAME |sed -n '1p') >>> >> + >>> >> + # Get performance data >>> >> + dvfs.sh performance $1 $2 $3 > $FILENAME >>> >> + performance=$(cat $FILENAME |sed -n '1p') >>> >> + >>> >> + if [ $performance -ge $powersave ] ; then >>> >> + echo "Error! Probably not input all the cpus in the same >>> >> frequency domain" >>> >> + exit >>> >> + fi >>> >> + >>> >> + denominator=$(expr $powersave - $performance) >>> >> + echo "powersave efficiency: 0%" >>> >> + echo "performance efficiency: 100%" >>> >> + >>> >> + # Calcuate other governors data >>> >> + for gov_next in $gov_conservative $gov_ondemand $gov_cfs; do >>> >> + if [ "$gov_next" != "" ] ; then >>> >> + dvfs.sh $gov_next $1 $2 $3 > $FILENAME >>> >> + data=$(cat $FILENAME |sed -n '1p'); >>> >> + numerator=$(expr $powersave - $data) >>> >> + numerator=$(expr $numerator \* 100) >>> >> + if [ $numerator -lt 0 ] ; then >>> >> + let numerator=0 >>> >> + fi >>> >> + data=$(expr $numerator / $denominator) >>> >> + echo "$gov_next efficiency: $data%" >>> >> + fi >>> >> + done >>> >> + >>> >> + rm -f $FILENAME >>> >> +} >>> >> + >>> >> +if [ $# -lt 3 ]; then >>> >> + echo "Usage: ./test.sh <cpus> <runtime> <sleeptime>" >>> >> + echo "cpus: number of cpus in the CPU0's frequency domain" >>> >> + echo "runtime: running time in ms per loop of the workload >> pattern" >>> >> + echo "sleeptime: sleeping time in ms per loop of the workload >>> > pattern" >>> >> + echo -e "\nExample: \n\"./test.sh 4 100 1000\" means\nCPU0~CPU3 >>> >> sharing frequency, \"100ms run + 1000ms sleep\" workload pattern.\n" >>> >> + exit >>> >> +fi >>> >> + >>> >> +echo "Frequency domain CPU0~CPU$(expr $1 - 1), run $2ms, sleep >> $3ms:" >>> >> + >>> >> +sleep 1 >>> >> +PATH=$PATH:. >>> >> +set_calibration >>> >> +test_efficiency $1 $(expr $2 \* 1000) $(expr $3 \* 1000) >>> >> + >>> >> -- >>> >> 1.9.1 >>> >>

10 years

eas-test branch ci failed

by Alex Shi

Hi Fathi, I just updated the eas-test branch on https://git.linaro.org/kernel/eas-backports.git But the dashboard said deploy linaro kernel failed: https://validation.linaro.org/dashboard/streams/public/team/linaro/eas/bund… And I didn't find the eas on https://ci.linaro.org/view/kernel-ci/ or lsk-ci. Guess I miss sth, could you lead me to right ci or testing url? Thanks&Regards! Alex

10 years

Updates to cpu capacity consolidation patches in the backports tree

by Amit Kucheria

Hi Alex, In 4.1-rc1, several patches (see 36ee28e4 onwards) related to cpu capacity consolidation were merged. It would be a good idea to refresh the eas-backport tree so that these patches are cherry-picked directly from mainline into stable/sched-upstream branch and their equivalent versions in stable/sched-core are removed. Regards, Amit

10 years

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

eas-dev