- eas-dev - lists.linaro.org

[PATCH RFCv3 00/18] EASv5.2+: Power and Performance Optimization

by Leo Yan

This patch series is to optimize both power and performance on big.LITTLE system. Almost optimization methodology are same with RFCv2 version, so you can refer the detailed description in [1]. In this patch series, the new enhencemences for performance is to spread tasks to more clusters when detect the highest capacity cores are busy, the criteria for 'big core busy' is the big core is not idle, this is because we have patch "sched/fair: avoid small task to migrate to higher capacity CPU" to filter out to only migrate relative big load task to big core, so if there have task is running on big core, that means the big core utilization is not small. If all big cores have task running, usually system is quite busy so we should go back to select idlest CPU to replace "want_affine". So this is mainly finished by patch "sched/fair: avoid migrate single task to busy big CPU" and patch "sched/fair: select idle CPUs when big cluster is busy". This patch series also optimize power for both PELT and WALT signals, this is finished by patch "sched/fair: save power for when use walt signals". [1] https://lists.linaro.org/pipermail/eas-dev/2016-July/000522.html Leo Yan (18): sched/fair: optimize to more chance to select previous CPU sched/fair: select CPU based on using lowest capacity sched/fair: support to spread task in lowest schedule domain sched/fair: add path to migrate to higher capacity CPU sched/fair: force idle balance when busiest group is overloaded sched/fair: avoid small task to migrate to higher capacity CPU sched/fair: set imbalance for too many tasks on rq sched/fair: kick nohz idle balance for misfit task sched/fair: consider over utilized only for CPU is not idle sched/fair: filter task for energy aware path sched/fair: replace capacity_of by capacity_orig_of sched/fair: refine when task is allowed only run one CPU Documentation: EAS performance tunning for sysfs sched/fair: avoid migrate single task to busy big CPU sched/fair: fix building error for schedtune_task_margin sched/fair: save power for when use walt signals sched/fair: check task boosted value on destination CPU sched/fair: select idle CPUs when big cluster is busy Documentation/scheduler/sched-energy.txt | 87 ++++++++++ kernel/sched/fair.c | 286 ++++++++++++++++++++++++++++--- 2 files changed, 348 insertions(+), 25 deletions(-) -- 1.9.1

9 years, 8 months

1
18
0 0

[RFC PATCH v1 0/3] sched: Introduce Window Assisted Load Tracking

by markivx＠codeaurora.org

This patch series implements an alternative window assisted load tracking mechanism in lieu of PELT based cpu utilization tracking. Testing has shown that a window based non-decaying metric such as WALT guiding cpu frequency and task placement decisions can improve performance/power especially when running workloads more commonly found on mobile devices. The aim of this series is to incorporate WALT accounting into the scheduler and feed WALT statistics to schedutil in order to guide cpu frequency selection. The implementation is detailed in the commit text of Patch 1. The eventual goal is to also guide placement decisions based on WALT statistics. WALT has existed in out-of-tree kernels for ARM/ARM64 commercialized devices for a few years. This is an effort to bring WALT to mainline as well as to test on multiple architectures and with varied workloads. This RFC version is mainly to preview what the code will look like on mainline. Future RFC revisions will include a theoretical discussion and benchmark results. Tested on an Intel x86_64 machine (on top of 4.7-rc6). (Benchmark results will be sent out separately and as part of this message in the next RFC version). Patch 1: Adds WALT tracking to the scheduler Patches 2-3: Temporary patches to bring in EAS/sched-freq like capacity table and to use Intel PMC counters for more accurate frequency invariant load tracking on X86. Included for completeness but not meant for merging. include/linux/sched.h | 35 ++++++++++ include/linux/sched/sysctl.h | 2 + include/trace/events/sched.h | 76 +++++++++++++++++++++ init/Kconfig | 9 +++ kernel/sched/Makefile | 1 + kernel/sched/core.c | 29 ++++++++- kernel/sched/cpufreq_schedutil.c | 44 ++++++++++++- kernel/sched/cputime.c | 11 +++- kernel/sched/debug.c | 10 +++ kernel/sched/fair.c | 7 +- kernel/sched/sched.h | 13 ++++ kernel/sched/walt.c | 580 ++++++++++++++++++++++++++++++++++ kernel/sched/walt.h | 75 +++++++++++++++++++++ kernel/sysctl.c | 18 +++++ 14 files changed, 904 insertions(+), 6 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project

9 years, 8 months

1
0
0 0

[RFC PATCH v1 0/3] sched: Introduce Window Assisted Load Tracking to track CPU utilization

by markivx＠codeaurora.org

This patch series implements an alternative window assisted load tracking mechanism in lieu of PELT based cpu utilization tracking. Testing has shown that a window based non-decaying metric such as WALT guiding cpu frequency and task placement decisions can improve performance/power especially when running workloads more commonly found on mobile devices. The aim of this series is to incorporate WALT accounting into the scheduler and feed WALT statistics to schedutil in order to guide cpu frequency selection. The implementation is detailed in the commit text of Patch 1. The eventual goal is to also guide placement decisions based on WALT statistics. WALT has existed in out-of-tree kernels for ARM/ARM64 commercialized devices for a few years. This is an effort to bring WALT to mainline as well as to test on multiple architectures and with varied workloads. This RFC version is mainly to preview what the code will look like on mainline. Future RFC revisions will include a theoretical discussion and benchmark results. Tested on an Intel x86_64 machine (on top of 4.7-rc6). (Benchmark results will be sent out separately and as part of this message in the next RFC version). Patch 1: Adds WALT tracking to the scheduler Patches 2 and 3: Temporary patches to bring in EAS/sched-freq like capacity table and to use Intel PMC counters for more accurate frequency invariant load tracking on X86. Included for completeness but not meant for merging. include/linux/sched.h | 35 ++++++++++ include/linux/sched/sysctl.h | 2 + include/trace/events/sched.h | 76 +++++++++++++++++++++ init/Kconfig | 9 +++ kernel/sched/Makefile | 1 + kernel/sched/core.c | 29 ++++++++- kernel/sched/cpufreq_schedutil.c | 44 ++++++++++++- kernel/sched/cputime.c | 11 +++- kernel/sched/debug.c | 10 +++ kernel/sched/fair.c | 7 +- kernel/sched/sched.h | 13 ++++ kernel/sched/walt.c | 580 ++++++++++++++++++++++++++++++++++ kernel/sched/walt.h | 75 +++++++++++++++++++++ kernel/sysctl.c | 18 +++++ 14 files changed, 904 insertions(+), 6 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project

9 years, 8 months

1
0
0 0

[PATCH RFCv2 0/8] EASv5.2: Performance Optimization

by Leo Yan

This patch series is to optimize performance and refine patches according to review comments. - Patch 0001 is add more chance to select previous CPU for cache hot; - In EAS code, the critical path is task waken up with function energy_aware_wake_cpu(); this function is purposed to select one possible target CPU with most energy saving. So it includes two underlying functionality: the first one is to select most power efficiency CPU for the task in one cluster, another is to migrate task from big core to little core if little core can meet performance requirement. For first functionality for selection most power efficiency CPU within cluster, EAS prefers to select a non-idle CPU so as result it packs tasks into one CPU as possible. This is not an optimal solution with two reasons: the first reason is this introduces long schedule latency after multiple tasks on the same rq; the second reason is it easily gets result as small tasks packing within one CPU with higher operating point. Finally this is the observed foremost issue if there have multiple tasks, neither power or performance can achieve optimal result. So patch 0002 is to solve this issue to try to select CPU if can keep CPU at lowest OPP as possible. - Current code has no mechanism to spread these tasks throughout the little cluster so tasks are packed on one CPU when CPU is not “over-utilized”. In this case, only one CPU is very busy but other CPUs in the same cluster are in idle state. Patch 0003 is to spread task in lowest schedule domain (in cluster level) after add a medium state named "half-utilized". This may a temperary solution, due this likely a better solution is to unify flag for "over-utilized". - In CFS, PELT signals take long time to increase to high value and decay to small value; on the other hand, EAS does not take account load_avg value (runnable time) but only focus on util_avg value (running time). So these issues are really dependent on fundamental signals. So hope have advanced method to accelerate PELT signals and dismiss the issue introduced by long runnable time. Patch 0004 we can take it as a temperary solution, likely we can use the big difference between load_avg and util_avg to change to use inflate value, also can use it to reflect runnable time. Patch 0004 also has side effects for misfit flag. If any CPU has “misfit” task on it, then EAS will set imbalance value as CPU capacity and migrate such load from little core to big core. So “misfit” is quite good for there have only one big task on the little CPU so the CPU cannot meet task’s performance requirement with function “task_fits_max(p, rq->cpu)”; but if there have two tasks on the little CPU, then the task’s utilization value just half of CPU capacity value so finally EAS considers CPU can meet task requirement. Patch 0004 can more easily to set true for misfit: rq->misfit_task = !task_fits_max(p, rq->cpu) - In function energy_aware_wake_cpu(), it is possible to directly migrate task from little core to big core, but the conditions are rigid: the condition 1 is CPU capacity cannot meet this task requirement; the condition 2 is source CPU is “over-utilized”. If the source CPU is not “over-utilized” for condition 2, then even little CPU cannot meet task requirement but EAS will compare CPU energy and as the end it still selects previous little CPU Patch 0005 is to add extra path to directly migrate task from little core to big core. - For very heavily workload with multi-threads, we observed the tasks are not migrated within big cluster, also tasks are hard to migrate from big cluster to little cluster even little cluster have idle CPUs are available to run. So need optimize EAS to handle this case likely to go back with CFS behaviour. Patch 0006 and 0008 are to fix this related issues. - SMP load balance may migrate small task onto big core, but usually at this time point we are only looking forward big tasks migration, finally this hurts both power and performance. So patch 0007 it will avoid small task to migrate to higher capacity CPU so it will give more chance to real big task migration to higher capacity CPU. Leo Yan (8): sched/fair: optimize to more chance to select previous CPU sched/fair: select CPU based on using lowest capacity sched/fair: support to spread task in lowest schedule domain sched/fair: use load metrics to replace util when have big difference sched/fair: add path to migrate to higher capacity CPU sched/fair: force idle balance when busiest group is overloaded sched/fair: avoid small task to migrate to higher capacity CPU sched/fair: set imbalance for too many tasks on rq kernel/sched/fair.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 173 insertions(+), 20 deletions(-) -- 1.9.1

9 years, 10 months

1
8
0 0

[PATCH RFC 0/8] EASv5.2: Optimize Performance

by Leo Yan

This patch series is to optimize performance. Patch 0001 is to optimize CPU selection flow so let task has more chance to stay on previous CPU. Patch 0002 actually is a big change for EAS's policy for CPU selection, it trys to select idle CPU as possible. From profiling result, 0002 have good effect that spread tasks out if there have many tasks are running at the meantime. Patches 0003~0004 are to optimize the scenario for single thread case. In this case, the thread has relative high utilization value, but the value cannot easily over tipping point. So patche 0004 try to set criteria to in some condition change to use load_avg rather than util_avg to boost the single thread. Patch 0005 is to optimize the flow for spreading tasks within big cluster. Patches 0006~0007 is to fix the signal for avg_load. Leo Yan (8): sched/fair: optimize to more chance to select previous CPU sched/fair: select idle CPU for waken up task sched/fair: add path to migrate to higher capacity CPU sched/fair: use load to replace util when have big difference sched/fair: spread tasks in cluster when over tipping point sched/fair: correct avg_load as CPU average load sched/fair: fix to calculate average load cross cluster sched/fair: set imbn to 1 for too many tasks on rq include/linux/sched.h | 1 + kernel/sched/fair.c | 93 +++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 84 insertions(+), 10 deletions(-) -- 1.9.1

9 years, 10 months

6
44
0 0

SchedTune: define payoff parameters for platform

by Leo Yan

Hi Patrick, [ + eas-dev ] Here have a common question for how to define schedTune threshold array for payoff. So basically I want check below questions: - When every CGroup has its own perf_boost_idx for PB region and perf_constrain_idx for PC region. So do you have suggestion or guideline to define these index? And for difference CGroup like "backgroud", "foreground" or "performance" every CGroup will have its dedicated index or the platform can share the same index value? - How to define the array value for "threshold_gains"? IIUC this array is platform dependency, but what's the reasonable method to generate this table? Here have some suggested testing for generating this table? Or my understanding is wrong so this array is fixed, then just need ajust perf_boost_idx/perf_constrain_idx for platform is enough? - So far we cannot set these payoff parameters (including perf_boost_idx/perf_constrain_idx and threshold_gains) from sysfs dynamically, so how we can initilizae these value for platform specific? Suppose now we can only set these value when kernel's init flow, right? Thanks, Leo Yan

9 years, 10 months

3
9
0 0

RV Owners Email List

by Amanda Clark

Hi, I am Amanda, Would you be interested in acquiring an email list of "Moms Email List" from USA? We have data for Mortgage Email List, New Homeowner Email List, Online Shoppers List, Travelers Email List and many more. Choose the best one that meets your need. We provide you with current and active contact on every list. Take advantage of that, let your marketing efforts be fruitful. Each record in the list Contact Name( First, Middle, Last Name), Direct Mailing Address ( Address, City, State, Zip Code), List Type, Source, IP Address, and Email Address. All the contacts are opt-in verified, 100% permission based and can be used for unlimited multi-channel marketing. Please let me know your thoughts towards Moms Email List. Best Regards, Amanda Clark --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

9 years, 11 months

1
0
0 0

[PATCH v2] idlestat: Initialize CPU's idle state properly

by Leo Yan

In current code the CPU's idle state cpufreq_pstates::idle is initialized to '-1'; and until parse first "cpu_idle" event for the CPU then set CPU's idle state to '0' or '1' corresponding to active or idle. This will cause error for P-state's statistics: from the beginning to first "cpu_idle" event, during this period the CPU's idle state is '-1' so function cpu_change_pstate() will always think it's first update and finally abandon previous time. This will introduce very big error if the CPU is always running and never run into idle state. So this patch is to fix this issue by initialize CPU's corresponding C-state and P-state: - Firstly gather every CPU's starting frequency and time stamp; - Then gather CPU's idle state according to first cpu_idle log: If the CPU first cpu_idle state is '-1', that means from the beginning the CPU is stayed on idle state; If the CPU first cpu_idle state is other value, means the CPU is active. - With these info, finally initialize every CPU's C-state and P-state before analyse trace logs. Here should note one thing is: when CPU is idle at beginning, we don't know exact idle state, so just assume CPU is in idle state 0; but this will not impact too much for statistics, due usually idlestat will wakeup all CPUs at the beginning. So it will introduce very small deviation. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- tracefile_idlestat.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) diff --git a/tracefile_idlestat.c b/tracefile_idlestat.c index 3430693..d0cd366 100644 --- a/tracefile_idlestat.c +++ b/tracefile_idlestat.c @@ -152,6 +152,127 @@ int load_text_data_line(char *buffer, struct cpuidle_datas *datas, char *format, return get_wakeup_irq(datas, buffer); } + +/** + * init_cpu_idle_state - Init CPU's idle state according to first cpu_idle log. + * For a specific cpu_idle event, its state is '-1' then that means from the + * beginning the CPU is stayed on idle state; Otherwise means the CPU is active. + * So initilize per-CPU idle flag to get more accurate time. + * + * @datas: structure for P-state and C-state's statistics + * @f: the file handle of the idlestat trace file + */ +void init_cpu_idle_state(struct cpuidle_datas *datas, FILE *f) +{ + char buffer[BUFSIZE]; + int state, cpu; + double time; + struct cpufreq_pstates *ps; + + unsigned long *cpu_start_idle; + int *cpu_start_freq; + double cpu_start_time; + + fseek(f, 0, SEEK_SET); + + cpu_start_freq = malloc(sizeof(int) * datas->nrcpus); + for (cpu = 0; cpu < datas->nrcpus; cpu++) + cpu_start_freq[cpu] = 0xdeadbeef; + + /* + * Find the start time stamp and the CPU's frequency at beginning; + * So we can use these info to add dummy info. + */ + while (fgets(buffer, BUFSIZE, f)) { + + if (strstr(buffer, "cpu_frequency")) { + if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu) + != 3) { + fprintf(stderr, "warning: Unrecognized cpuidle " + "record. The result of analysis might " + "be wrong.\n"); + return; + } + } else + continue; + + if (cpu_start_freq[cpu] != 0xdeadbeef) + continue; + + if (cpu == 0) + cpu_start_time = time; + + cpu_start_freq[cpu] = state; + + break; + } + + /* After traverse file, reset offset */ + fseek(f, 0, SEEK_SET); + + /* + * Find the CPU's idle state at beginning + */ + cpu_start_idle = malloc(sizeof(long) * datas->nrcpus); + for (cpu = 0; cpu < datas->nrcpus; cpu++) + cpu_start_idle[cpu] = 0xdeadbeef; + + while (fgets(buffer, BUFSIZE, f)) { + + if (strstr(buffer, "cpu_idle")) { + if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu) + != 3) { + fprintf(stderr, "warning: Unrecognized cpuidle " + "record. The result of analysis might " + "be wrong.\n"); + return; + } + } else + continue; + + /* CPU's state has been initialized, skip it */ + if (cpu_start_idle[cpu] != 0xdeadbeef) + continue; + + /* + * The CPU's first cpu_idle is '-1', means CPU is staying in + * idle state and exit from idle until first cpu_idle event. + * Otherwise, means the CPU is active at beginning. + */ + if (state == -1) + cpu_start_idle[cpu] = 0; + else + cpu_start_idle[cpu] = 4294967295; + } + + /* After traverse file, reset offset */ + fseek(f, 0, SEEK_SET); + + /* Initialize every CPU's cstate and pstate */ + for (cpu = 0; cpu < datas->nrcpus; cpu++) { + + ps = &(datas->pstates[cpu]); + + if (cpu_start_idle[cpu] == 0) { + /* + * CPU is idle at beginning, init cstate; + * + * here don't know exact idle state, so just assume CPU + * is in idle state 0; but this will not impace too much + * for statistics, due usually idlestat will wakeup all + * CPUs at the beginning. + */ + ps->idle = 1; + store_data(cpu_start_time, 0, cpu, datas); + } else { + /* CPU is busy at beginning, init pstate */ + ps->idle = 0; + cpu_change_pstate(datas, cpu, cpu_start_freq[cpu], + cpu_start_time); + } + } +} + void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) { double begin = 0, end = 0; @@ -159,6 +280,8 @@ void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) setup_topo_states(datas); + init_cpu_idle_state(datas, f); + do { if (load_text_data_line(buffer, datas, TRACE_FORMAT, &begin, &end, &start) != -1) { -- 1.9.1

9 years, 11 months

3
4
0 0

[PATCH v3] idlestat: Initialize CPU's idle state properly

by Leo Yan

In current code the CPU's idle state cpufreq_pstates::idle is initialized to '-1'; and until parse first "cpu_idle" event for the CPU then set CPU's idle state to '0' or '1' corresponding to active or idle. This will cause error for P-state's statistics: from the beginning to first "cpu_idle" event, during this period the CPU's idle state is '-1' so function cpu_change_pstate() will always think it's first update and finally abandon previous time. This will introduce very big error if the CPU is always running and never run into idle state. So this patch is to fix this issue by initialize CPU's corresponding C-state and P-state: - Firstly gather every CPU's starting frequency and time stamp; - Then gather CPU's idle state according to first cpu_idle log: If the CPU first cpu_idle state is '-1', that means from the beginning the CPU is stayed on idle state; If the CPU first cpu_idle state is other value, means the CPU is active. - With these info, finally initialize every CPU's C-state and P-state before analyse trace logs. Here should note one thing is: when CPU is idle at beginning, we don't know exact idle state, so just assume CPU is in idle state 0; but this will not impact too much for statistics, due usually idlestat will wakeup all CPUs at the beginning. So it will introduce very small deviation. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- tracefile_idlestat.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) diff --git a/tracefile_idlestat.c b/tracefile_idlestat.c index 3430693..2674478 100644 --- a/tracefile_idlestat.c +++ b/tracefile_idlestat.c @@ -152,6 +152,153 @@ int load_text_data_line(char *buffer, struct cpuidle_datas *datas, char *format, return get_wakeup_irq(datas, buffer); } + +/** + * init_cpu_idle_state - Init CPU's idle state according to first cpu_idle log. + * For a specific cpu_idle event, its state is '-1' then that means from the + * beginning the CPU is stayed on idle state; Otherwise means the CPU is active. + * So initilize per-CPU idle flag to get more accurate time. + * + * @datas: structure for P-state and C-state's statistics + * @f: the file handle of the idlestat trace file + */ +int init_cpu_idle_state(struct cpuidle_datas *datas, FILE *f) +{ + char buffer[BUFSIZE]; + int state, cpu; + double time; + struct cpufreq_pstates *ps; + + unsigned long *cpu_start_idle; + int *cpu_start_freq; + double cpu_start_time; + + int ret; + + ret = fseek(f, 0, SEEK_SET); + if (ret < 0) { + fprintf(stderr, "failed to set the start file position\n"); + return ret; + } + + cpu_start_freq = malloc(sizeof(int) * datas->nrcpus); + if (!cpu_start_freq) { + fprintf(stderr, "failed to alloc for start frequency states\n"); + return -1; + } + + for (cpu = 0; cpu < datas->nrcpus; cpu++) + cpu_start_freq[cpu] = 0xdeadbeef; + + /* + * Find the start time stamp and the CPU's frequency at beginning; + * So we can use these info to add dummy info. + */ + while (fgets(buffer, BUFSIZE, f)) { + + if (strstr(buffer, "cpu_frequency")) { + if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu) + != 3) { + fprintf(stderr, "warning: Unrecognized cpuidle " + "record. The result of analysis might " + "be wrong.\n"); + return -1; + } + } else + continue; + + if (cpu_start_freq[cpu] != 0xdeadbeef) + continue; + + if (cpu == 0) + cpu_start_time = time; + + cpu_start_freq[cpu] = state; + + break; + } + + /* After traverse file, reset offset */ + ret = fseek(f, 0, SEEK_SET); + if (ret < 0) { + fprintf(stderr, "failed to set the start file position\n"); + return ret; + } + + /* + * Find the CPU's idle state at beginning + */ + cpu_start_idle = malloc(sizeof(long) * datas->nrcpus); + if (!cpu_start_idle) { + fprintf(stderr, "failed to alloc for start idle states\n"); + return -1; + } + + for (cpu = 0; cpu < datas->nrcpus; cpu++) + cpu_start_idle[cpu] = 0xdeadbeef; + + while (fgets(buffer, BUFSIZE, f)) { + + if (strstr(buffer, "cpu_idle")) { + if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu) + != 3) { + fprintf(stderr, "warning: Unrecognized cpuidle " + "record. The result of analysis might " + "be wrong.\n"); + return -1; + } + } else + continue; + + /* CPU's state has been initialized, skip it */ + if (cpu_start_idle[cpu] != 0xdeadbeef) + continue; + + /* + * The CPU's first cpu_idle is '-1', means CPU is staying in + * idle state and exit from idle until first cpu_idle event. + * Otherwise, means the CPU is active at beginning. + */ + if (state == -1) + cpu_start_idle[cpu] = 0; + else + cpu_start_idle[cpu] = 4294967295; + } + + /* After traverse file, reset offset */ + ret = fseek(f, 0, SEEK_SET); + if (ret < 0) { + fprintf(stderr, "failed to set the start file position\n"); + return ret; + } + + /* Initialize every CPU's cstate and pstate */ + for (cpu = 0; cpu < datas->nrcpus; cpu++) { + + ps = &(datas->pstates[cpu]); + + if (cpu_start_idle[cpu] == 0) { + /* + * CPU is idle at beginning, init cstate; + * + * here don't know exact idle state, so just assume CPU + * is in idle state 0; but this will not impace too much + * for statistics, due usually idlestat will wakeup all + * CPUs at the beginning. + */ + ps->idle = 1; + store_data(cpu_start_time, 0, cpu, datas); + } else { + /* CPU is busy at beginning, init pstate */ + ps->idle = 0; + cpu_change_pstate(datas, cpu, cpu_start_freq[cpu], + cpu_start_time); + } + } + + return 0; +} + void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) { double begin = 0, end = 0; @@ -159,6 +306,11 @@ void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) setup_topo_states(datas); + if (init_cpu_idle_state(datas, f) < 0) { + fprintf(stderr, "failed to initlized cpu states\n"); + exit(-1); + } + do { if (load_text_data_line(buffer, datas, TRACE_FORMAT, &begin, &end, &start) != -1) { -- 1.9.1

9 years, 11 months

1
0
0 0

subscribe

by Zhifei Yang

subscribe IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

9 years, 11 months

1
0
0 0

[PATCH] sched/tune: Fix building when CGROUPS not enable

by Jon Medhurst (Tixy)

If we enable CONFIG_SCHED_TUNE without CONFIG_CGROUPS we get the following errors: kernel/sched/fair.c: In function 'energy_diff_evaluate': kernel/sched/fair.c:4795:2: error: implicit declaration of function 'schedtune_normalize_energy' [-Werror=implicit-function-declaration] nrg_delta = schedtune_normalize_energy(eenv->nrg.diff); ^ kernel/sched/fair.c:4798:2: error: implicit declaration of function 'schedtune_accept_deltas' [-Werror=implicit-function-declaration] eenv->payoff = schedtune_accept_deltas( ^ Fix this by making sure the dummy version of these functions are defined if the real ones aren't. Signed-off-by: Jon Medhurst <tixy(a)linaro.org> --- This is another build fix for the 3.18 backport of EAS [1] but I'm not sure if the missing functions are actually meant to do something in the case when we have SCHED_TUNE without CGROUPS? [1] http://git.linaro.org/arm/eas/kernel.git/shortlog/refs/heads/linux-3.18-eas… kernel/sched/tune.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/kernel/sched/tune.h b/kernel/sched/tune.h index da1f7b2..3410a1d 100644 --- a/kernel/sched/tune.h +++ b/kernel/sched/tune.h @@ -1,6 +1,3 @@ - -#ifdef CONFIG_SCHED_TUNE - #ifdef CONFIG_CGROUP_SCHEDTUNE int schedtune_cpu_boost(int cpu); @@ -13,14 +10,7 @@ int schedtune_normalize_energy(int energy); int schedtune_accept_deltas(int nrg_delta, int cap_delta, struct task_struct *task); -#else /* CONFIG_CGROUP_SCHEDTUNE */ - -#define schedtune_enqueue_task(task, cpu) do { } while (0) -#define schedtune_dequeue_task(task, cpu) do { } while (0) - -#endif /* CONFIG_CGROUP_SCHEDTUNE */ - -#else /* CONFIG_SCHED_TUNE */ +#else #define schedtune_enqueue_task(task, cpu) do { } while (0) #define schedtune_dequeue_task(task, cpu) do { } while (0) @@ -28,4 +18,4 @@ int schedtune_accept_deltas(int nrg_delta, int cap_delta, #define schedtune_normalize_energy(energy) energy #define schedtune_accept_deltas(nrg_delta, cap_delta, task) nrg_delta -#endif /* CONFIG_SCHED_TUNE */ +#endif -- 2.1.4

9 years, 11 months

4
11
0 0

DEV, We could not deliver your parcel, #0000147338

by FedEx 2Day A.M.

Dear Dev, This is to confirm that one or more of your parcels has been shipped. You can review complete details of your order in the find attached. Regards, Francis Alexander, FedEx Station Agent.

9 years, 11 months

1
0
0 0

DEV, Problem with parcel shipping, ID:00536973

by FedEx International Ground

Dear Dev, We could not deliver your item. Shipment Label is attached to email. Sincerely, Jason Fischer, FedEx Operation Manager.

9 years, 11 months

1
0
0 0

[PATCH 0/3] Fixes for EAS backport to 3.18

by Jon Medhurst

Hi I found some bugs when integrating the 3.18 EAS backport [1] into the LSK 3.18 based kernel I look after for ARM's Juno and Versatile Express boards. These patches are my fixes for those bugs, I don't know whether they are useful or relevent to other versions of EAS. If they are OK, I guess I should at least add them to [1] ? ---------------------------------------------------------------- Jon Medhurst (3): arm: Fix build error "conflicting types for 'scale_cpu_capacity'" arm: Fix #if/#ifdef mixup in topology.c sched/tune: Avoid null pointer dereference in schedtune_add_cluster_nrg arch/arm/include/asm/topology.h | 1 + arch/arm/kernel/topology.c | 2 +- kernel/sched/tune.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) [1] http://git.linaro.org/arm/eas/kernel.git/shortlog/refs/heads/linux-3.18-eas…

9 years, 11 months

5
11
0 0

Re: [Eas-dev] [PATCH] sched/tune: fix payoff calculation for performance constraint region

by toby

On 2016年05月26日 17:44, eas-dev-request(a)lists.linaro.org wrote: > Send eas-dev mailing list submissions to > eas-dev(a)lists.linaro.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.linaro.org/mailman/listinfo/eas-dev > or, via email, send a message with subject or body 'help' to > eas-dev-request(a)lists.linaro.org > > You can reach the person managing the list at > eas-dev-owner(a)lists.linaro.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of eas-dev digest..." > > > Today's Topics: > > 1. Re: [PATCH] sched/tune: fix payoff calculation for > performance constraint region (Patrick Bellasi) > 2. Re: [PATCH] sched/tune: fix payoff calculation for > performance constraint region (Leo Yan) > 3. Re: [PATCH] sched/tune: fix payoff calculation for > performance constraint region (Patrick Bellasi) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 25 May 2016 18:53:28 +0100 > From: Patrick Bellasi <patrick.bellasi(a)arm.com> > To: Leo Yan <leo.yan(a)linaro.org> > Cc: eas-dev(a)lists.linaro.org > Subject: Re: [Eas-dev] [PATCH] sched/tune: fix payoff calculation for > performance constraint region > Message-ID: <20160525175328.GA15730@e105326-lin> > Content-Type: text/plain; charset="utf-8" > > Hi Leo, > thanks a lot for reviewing SchedTune and pointing out this issue. > > Actually going through the code I've noticed another big issue > related to the definition of the acceptable regions, following > commented inline. Basically, with the current implementation > we was getting a correct "by chance" C region, while the acceptance > for the B region was completely wrong. > > In attachment a new version of the patch, please have a look and let > me know if you have any doubt and/or suggestions. > If the patch is ok for you, lemme also know if it's ok for you to add > your sign-off. > > Cheers Patrick > > On 23-May 21:47, Leo Yan wrote: >> On Fri, May 20, 2016 at 12:24:49AM +0800, Leo Yan wrote: >>> When calculate payoff criteria for performance constraint region, >>> the inequality formula is wrong: >>> >>> cap_delta / nrg_delta > cap_gain / nrg_gain >>> >>> Here nrg_delta < 0, so when multiply it both side then should then >>> multiplying nrg_delta inverts the inequality: >>> >>> nrg_delta * cap_gain > cap_delta * nrg_gain >>> >>> So finally we can get unified formula for both performance constraint >>> region and performance boost region. So this patch unified these the >>> calculation after fixed inequality formula. >>> >>> Signed-off-by: Leo Yan <leo.yan(a)linaro.org> >>> --- >>> kernel/sched/tune.c | 54 ++++++++++++++++++++++++++++------------------------- >>> 1 file changed, 29 insertions(+), 25 deletions(-) >>> >>> diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c >>> index 9d8eeb4..1da85a8 100644 >>> --- a/kernel/sched/tune.c >>> +++ b/kernel/sched/tune.c >>> @@ -58,36 +58,40 @@ __schedtune_accept_deltas(int nrg_delta, int cap_delta, >>> int perf_boost_idx, int perf_constrain_idx) >>> { >>> int payoff = -INT_MAX; >>> + int idx = -1; >>> >>> /* Performance Boost (B) region */ >>> - if (nrg_delta > 0 && cap_delta > 0) { >>> - /* >>> - * Evaluate "Performance Boost" vs "Energy Increase" >>> - * payoff criteria: >>> - * cap_delta / nrg_delta < cap_gain / nrg_gain >>> - * which is: >>> - * nrg_delta * cap_gain > cap_delta * nrg_gain >>> - */ >>> - payoff = nrg_delta * threshold_gains[perf_boost_idx].cap_gain; >>> - payoff -= cap_delta * threshold_gains[perf_boost_idx].nrg_gain; >>> - return payoff; >>> - } >>> - >>> + if (nrg_delta > 0 && cap_delta > 0) > > Looking better at the called, I think it's worth con accept also > points in the P axis, thus: > > if (nrg_delta >= 0 && cap_delta > 0) > > >>> + idx = perf_boost_idx; >>> /* Performance Constraint (C) region */ >>> - if (nrg_delta < 0 && cap_delta < 0) { >>> - /* >>> - * Evaluate "Performance Boost" vs "Energy Increase" >>> - * payoff criteria: >>> - * cap_delta / nrg_delta > cap_gain / nrg_gain >>> - * which is: >>> - * cap_delta * nrg_gain > nrg_delta * cap_gain >>> - */ >>> - payoff = cap_delta * threshold_gains[perf_constrain_idx].nrg_gain; >>> - payoff -= nrg_delta * threshold_gains[perf_constrain_idx].cap_gain; >>> - return payoff; >>> - } >>> + else if (nrg_delta < 0 && cap_delta < 0) > > For the same considerations we should better accept points in the E > axis, thus: > > else if (nrg_delta < 0 && cap_delta <= 0) > >>> + idx = perf_constrain_idx; >>> >>> /* Default: reject schedule candidate */ >>> + if (idx == -1) >>> + return payoff; >>> + >>> + /* >>> + * Evaluate "Performance Boost" vs "Energy Increase" >>> + * >>> + * - Performance Boost (B) region >>> + * >>> + * Condition: nrg_delta > 0 && cap_delta > 0 >>> + * Payoff criteria: >>> + * cap_delta / nrg_delta < cap_gain / nrg_gain = > Looking better to put the condition "nrg_delta == 0" or "cap_delta == 0" in function "accept_deltas", to avoid fetch rcu lock and more functions called, thus: /* Optimal (O) region */ if ((nrg_delta < 0 && cap_delta >= 0) || (nrg_delta <=0 && cap_delta > 0)) { trace_sched_tune_filter(nrg_delta, cap_delta, 0, 0, 1, 0); return INT_MAX; } > Here the inequality has a wrong direction. > The schedule candidate acceptable in the B region are those for which: > > cap_gain / nrg_gain < cap_delta / nrg_delta > > which represents points in the "upper cut". > > Thus: >>> + * nrg_delta * cap_gain > cap_delta * nrg_gain > > has to be: > cap_gain * nrg_delta < cap_delta * nrg_gain > > Which results into a "positive accept" payoff defined as: > > payoff = (cap_delta * nrg_gain) - (cap_gain * nrg_delta) > >>> + * (note: nrg_delta > 0, nrg_gain > 0) >>> + * >>> + * - Performance Constraint (C) region >>> + * >>> + * Condition: nrg_delta < 0 && cap_delta < 0 >>> + * payoff criteria: >>> + * cap_delta / nrg_delta > cap_gain / nrg_gain = >>> + * nrg_delta * cap_gain > cap_delta * nrg_gain > > > In the C region we have both a wrong definition and the sign error you > reported, which turned out to provide a "by change" correct > implementation, which is: > > cap_gain / nrg_gain > cap_delta / nrg_delta = > cap_gain * nrg_delta < cap_delta * nrg_gain > > Which results into a "positive accept" payoff defined as: > > payoff = (cap_delta * nrg_gain) - (cap_gain * nrg_delta) > > The same as for the B region... > >>> + * (note: nrg_delta < 0, nrg_gain > 0) >>> + */ >>> + payoff = nrg_delta * threshold_gains[perf_boost_idx].cap_gain; >>> + payoff -= cap_delta * threshold_gains[perf_boost_idx].nrg_gain; >> >> Sorry, here should be: >> + payoff = nrg_delta * threshold_gains[idx].cap_gain; >> + payoff -= cap_delta * threshold_gains[idx].nrg_gain; > > ... which means that this two operations have to be inverted: > > payoff = cap_delta * threshold_gains[gain_idx].nrg_gain; > payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain; > >> >>> return payoff; >>> } >>> >>> -- >>> 1.9.1 >>> >> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

9 years, 11 months

1
0
0 0

DEV, Problems with item delivery, n.000563650

by FedEx 2Day A.M.

Dear Dev, We could not deliver your parcel. Shipment Label is attached to email. Regards, Ron Justice, Sr. Support Manager.

9 years, 11 months

1
0
0 0

DEV, Delivery Notification, ID 00109410

by FedEx 2Day

Dear Dev, Your parcel has arrived at May 28. Courier was unable to deliver the parcel to you. Shipment Label is attached to email. Yours trully, Warren Carney, Sr. Support Manager.

9 years, 11 months

1
0
0 0

DEV, Unable to deliver your item, #000820884

by FedEx International Ground

CONTENT REMOVED

9 years, 11 months

1
0
0 0

[PATCH] sched/tune: fix payoff calculation for performance constraint region

by Leo Yan

When calculate payoff criteria for performance constraint region, the inequality formula is wrong: cap_delta / nrg_delta > cap_gain / nrg_gain Here nrg_delta < 0, so when multiply it both side then should then multiplying nrg_delta inverts the inequality: nrg_delta * cap_gain > cap_delta * nrg_gain So finally we can get unified formula for both performance constraint region and performance boost region. So this patch unified these the calculation after fixed inequality formula. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- kernel/sched/tune.c | 54 ++++++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index 9d8eeb4..1da85a8 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -58,36 +58,40 @@ __schedtune_accept_deltas(int nrg_delta, int cap_delta, int perf_boost_idx, int perf_constrain_idx) { int payoff = -INT_MAX; + int idx = -1; /* Performance Boost (B) region */ - if (nrg_delta > 0 && cap_delta > 0) { - /* - * Evaluate "Performance Boost" vs "Energy Increase" - * payoff criteria: - * cap_delta / nrg_delta < cap_gain / nrg_gain - * which is: - * nrg_delta * cap_gain > cap_delta * nrg_gain - */ - payoff = nrg_delta * threshold_gains[perf_boost_idx].cap_gain; - payoff -= cap_delta * threshold_gains[perf_boost_idx].nrg_gain; - return payoff; - } - + if (nrg_delta > 0 && cap_delta > 0) + idx = perf_boost_idx; /* Performance Constraint (C) region */ - if (nrg_delta < 0 && cap_delta < 0) { - /* - * Evaluate "Performance Boost" vs "Energy Increase" - * payoff criteria: - * cap_delta / nrg_delta > cap_gain / nrg_gain - * which is: - * cap_delta * nrg_gain > nrg_delta * cap_gain - */ - payoff = cap_delta * threshold_gains[perf_constrain_idx].nrg_gain; - payoff -= nrg_delta * threshold_gains[perf_constrain_idx].cap_gain; - return payoff; - } + else if (nrg_delta < 0 && cap_delta < 0) + idx = perf_constrain_idx; /* Default: reject schedule candidate */ + if (idx == -1) + return payoff; + + /* + * Evaluate "Performance Boost" vs "Energy Increase" + * + * - Performance Boost (B) region + * + * Condition: nrg_delta > 0 && cap_delta > 0 + * Payoff criteria: + * cap_delta / nrg_delta < cap_gain / nrg_gain = + * nrg_delta * cap_gain > cap_delta * nrg_gain + * (note: nrg_delta > 0, nrg_gain > 0) + * + * - Performance Constraint (C) region + * + * Condition: nrg_delta < 0 && cap_delta < 0 + * payoff criteria: + * cap_delta / nrg_delta > cap_gain / nrg_gain = + * nrg_delta * cap_gain > cap_delta * nrg_gain + * (note: nrg_delta < 0, nrg_gain > 0) + */ + payoff = nrg_delta * threshold_gains[perf_boost_idx].cap_gain; + payoff -= cap_delta * threshold_gains[perf_boost_idx].nrg_gain; return payoff; } -- 1.9.1

9 years, 11 months

2
4
0 0

DEV, Courier was unable to deliver the parcel, ID000459282

by FedEx International Economy

Dear Dev, We could not deliver your item. You can review complete details of your order in the find attached. Yours faithfully, Dwight Mccall, Sr. Delivery Manager.

9 years, 11 months

1
0
0 0

Re: [Eas-dev] Lisa Testcase - two levels' task group

by Leo Yan

Hi Deitmar, [ + eas-dev ] On Sun, May 22, 2016 at 06:06:47PM +0100, Dietmar Eggemann wrote: > On 05/21/2016 06:25 AM, Leo Yan wrote: > >On Fri, May 20, 2016 at 08:08:16PM +0100, Dietmar Eggemann wrote: > >>Hi Leo, > >> > >>try the attached testcase in LISA which runs the wl's in '/tg_1/tg_11' > >>You can create different tg level-hierarchies in rfc_tg.config if you wish. > >> > >>08:05:17 DEBUG : sudo -- sh -c '/root/devlib-target/bin/shutils cgroups_run_into /tg_1/tg_11 '\''/root/devlib-target/bin/rt-app /root/devlib-target/run_dir/06_pct_00.json'\''' > > > >Very appreciate for the case. > > I thought about this again and maybe you want to test task migration of > a task running in a task group? This would make much more sense than > only running task in a task group in case you want to test the pelt signals. After enable EAS, I can see the task running in task group is migrated between different CPUs when task is waken up. > I added some functionality to rt-app which lets you restrict the cpu > affinity of a task per phase of its run so you can create a task inside > a task group which alternates between two cpus while running. This > migration is done by the running task (so it's > sched_setaffinity()->__set_cpus_allowed_ptr()->stop_one_cpu(..., > migration_cpu_stop, ...)->__migrate_task()->move_queued_task() > > So if you interested in this just ask me on eas-dev so I can share the > rt-app functionality and a how-to build rt-app on the list for a broader > audience. Yes, this is another path we should test for task migration. So could you share this on mailing list? We also can consider to integrate this into rt-app's repo. Thanks, Leo Yan

9 years, 11 months

2
1
0 0

[PATCH] idlestat: Initialize CPU's idle state properly

by Leo Yan

In current code the CPU's idle state cpufreq_pstates::idle is initialized to '-1'; and until parse first "cpu_idle" event for the CPU then set CPU's idle state to '0' or '1' corresponding to active or idle. This will cause error for P-state's statistics: from the beginning to first "cpu_idle" event, during this period the CPU's idle state is '-1' so function cpu_change_pstate() will always think it's first update and finally abandon previous time. This will introduce very big error if the CPU is always running and never run into idle state. So this patch is to fix this issue by initialize CPU's idle state before parse P-state and C-state's time. Initialize CPU's idle state according to first cpu_idle log: - If the CPU first cpu_idle state is '-1', that means from the beginning the CPU is stayed on idle state; - If the CPU first cpu_idle state is other value, means the CPU is active. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- tracefile_idlestat.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/tracefile_idlestat.c b/tracefile_idlestat.c index 3430693..fb0b3d8 100644 --- a/tracefile_idlestat.c +++ b/tracefile_idlestat.c @@ -152,6 +152,55 @@ int load_text_data_line(char *buffer, struct cpuidle_datas *datas, char *format, return get_wakeup_irq(datas, buffer); } +/** + * init_cpu_idle_state - Init CPU's idle state according to first cpu_idle log. + * For a specific cpu_idle event, its state is '-1' then that means from the + * beginning the CPU is stayed on idle state; Otherwise means the CPU is active. + * So initilize per-CPU idle flag to get more accurate time. + * + * @datas: structure for P-state and C-state's statistics + * @f: the file handle of the idlestat trace file + */ +void init_cpu_idle_state(struct cpuidle_datas *datas, FILE *f) +{ + struct cpufreq_pstates *ps; + unsigned int state, freq, cpu; + double time; + char buffer[BUFSIZE]; + + do { + if (strstr(buffer, "cpu_idle")) { + if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu) + != 3) { + fprintf(stderr, "warning: Unrecognized cpuidle " + "record. The result of analysis might " + "be wrong.\n"); + return -1; + } + } + + ps = &(datas->pstates[cpu]); + + /* CPU's state has been initialized, skip it */ + if (ps->idle != -1) + continue; + + /* + * The CPU's first cpu_idle is '-1', means CPU is staying in + * idle state and exit from idle until first cpu_idle event. + * Otherwise, means the CPU is active at beginning. + */ + if (state == -1) + ps->idle = 1; + else + ps->idle = 0; + + } while (fgets(buffer, BUFSIZE, f)); + + /* After traverse file, reset offset */ + fseek(f, 0, SEEK_SET); +} + void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) { double begin = 0, end = 0; @@ -159,6 +208,8 @@ void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas) setup_topo_states(datas); + init_cpu_idle_state(datas, f); + do { if (load_text_data_line(buffer, datas, TRACE_FORMAT, &begin, &end, &start) != -1) { -- 1.9.1

10 years

2
2
0 0

Question: set SchedTune's CGroup with Android non-root user

by Leo Yan

Hi Patrick, [ + eas-dev ] With non-root user in Android, I cannot add PID to SchedTune's cgroup; At beginning I thought it's related with cgroup's file node attribution, so tried to use "root" user to change permission with "a+rwx", even so still cannot set cgroup's node by non-root user. hikey:/ $ su hikey:/ # chmod a+rwx /sys/fs/cgroup/stune/performance/cgroup.procs hikey:/ # exit hikey:/ $ echo 1937 > /sys/fs/cgroup/stune/performance/cgroup.procs hikey:/ $ cat /sys/fs/cgroup/stune/performance/cgroup.procs Do you have suggestion for what's the formal method for adding PID to SchedTune's cgroup with non-root user? Thanks, Leo Yan

10 years

2
5
0 0

[PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration

by Leo Yan

When task is migrated from CPU_A to CPU_B, scheduler will decrease the task's load/util from the task's cfs_rq and also add them into migrated cfs_rq. But if kernel enables CONFIG_FAIR_GROUP_SCHED then this cfs_rq is not the same one with cpu's cfs_rq. As a result, after task is migrated to CPU_B, then CPU_A still have task's stale value for load/util; on the other hand CPU_B also cannot reflect new load/util which introduced by the task. So this patch is to operate the task's load/util to cpu's cfs_rq, so finally cpu's cfs_rq can really reflect task's migration. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- kernel/sched/fair.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fe30e6..10ca1a9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2825,12 +2825,24 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) { struct sched_avg *sa = &cfs_rq->avg; + struct sched_avg *cpu_sa = NULL; int decayed, removed = 0; + int cpu = cpu_of(rq_of(cfs_rq)); + + if (&cpu_rq(cpu)->cfs != cfs_rq) + cpu_sa = &cpu_rq(cpu)->cfs.avg; if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); + + if (cpu_sa) { + cpu_sa->load_avg = max_t(long, cpu_sa->load_avg - r, 0); + cpu_sa->load_sum = max_t(s64, + cpu_sa->load_sum - r * LOAD_AVG_MAX, 0); + } + removed = 1; } @@ -2838,6 +2850,12 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); sa->util_avg = max_t(long, sa->util_avg - r, 0); sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); + + if (cpu_sa) { + cpu_sa->util_avg = max_t(long, cpu_sa->util_avg - r, 0); + cpu_sa->util_sum = max_t(s64, + cpu_sa->util_sum - r * LOAD_AVG_MAX, 0); + } } decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, @@ -2896,6 +2914,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { + int cpu = cpu_of(rq_of(cfs_rq)); + if (!sched_feat(ATTACH_AGE_LOAD)) goto skip_aging; @@ -2919,6 +2939,13 @@ skip_aging: cfs_rq->avg.load_sum += se->avg.load_sum; cfs_rq->avg.util_avg += se->avg.util_avg; cfs_rq->avg.util_sum += se->avg.util_sum; + + if (&cpu_rq(cpu)->cfs != cfs_rq) { + cpu_rq(cpu)->cfs.avg.load_avg += se->avg.load_avg; + cpu_rq(cpu)->cfs.avg.load_sum += se->avg.load_sum; + cpu_rq(cpu)->cfs.avg.util_avg += se->avg.util_avg; + cpu_rq(cpu)->cfs.avg.util_sum += se->avg.util_sum; + } } static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) -- 1.9.1

10 years, 1 month

7
17
0 0

[PATCH] DEBUG: sched: always use CPU's cfs_rq for load tracking

by Leo Yan

When log CPU's load and utilization, should directly use CPU's cfs_rq for tracking. If use the task's cfs_rq, it may introduce error value by using task_group's cfs_rq but not real CPU's cfs_rq. Signed-off-by: Leo Yan <leo.yan(a)linaro.org> --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6ac9ea3..26f3f2d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2825,7 +2825,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) if (entity_is_task(se)) trace_sched_load_avg_task(task_of(se), &se->avg); - trace_sched_load_avg_cpu(cpu, cfs_rq); + trace_sched_load_avg_cpu(cpu, &cpu_rq(cpu)->cfs); } static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) -- 1.9.1

10 years, 1 month

2
3
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

eas-dev