Hi Viresh,
On 06/07/2013 07:23 PM, Viresh Kumar wrote:
Hi Chanwoo,
On 5 June 2013 13:41, Chanwoo Choi cw00.choi@samsung.com wrote:
This patch add new sysfs file to show previous accumulated data of CPU load
Please mention we are only accumulating latest 20 values.
It should be modified so that user could enter a proper number of data for accumulating data. I'll fix it by using Kconfig.
as following path. This sysfs file is used to judge the correct system state or determine suitable system resource on user-space.
Please write it as:
load_table will be used to judge how many cpus would be sufficient for managing current load.
I define 'busy_cpu_threshold' as following way: - busy_cpu_threshold = (current cpu load * current cpu frequency ) / maximum cpu frequency and then users can define busy_cpu_threshold according to various cpu hotplug policy or environment dependent on h/w.
If busy_cpu_threshold is 30%, show each busy_cpu_threshold with cpu frequency:
Frequency | busy_cpu_threshold 1600MHz | 30% 1400MHz | 34% 1200MHz | 40% 1000MHz | 48% 800MHz | 60% 500MHz | 96%
So, I use 'busy_cpu_threshold' value to judge whether cpu is busy or not. If specific cpu exceeds defined busy_cpu_threshold, I determine to need additional cpu resource. and then turn on additional CPU.
For example, I expalin how to get a numer of busy cpu on current cpu load.
$ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table ...
1301500082290 800000 61 11 1 43
...
When 1301500082290 ns: cpu0's busy_cpu_threshold : 32 = 64 * (800000/1600000) cpu1's busy_cpu_threshold : 5.5 = 11 * (800000/1600000) cpu2's busy_cpu_threshold : 0 = 1 * (800000/1600000) cpu3's busy_cpu_threshold : 16 = 43 * (800000/1600000)
A number of busy cpu is only 1 because cpu0's busy_cpu_threshold(32) is larger than defined busy_cpu_threshold(30).
I can get a number of busy cpu through load_table sysfs.
- /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table
This sysfs file include following data:
- Measurement point of time
- CPU frequency
- Per-CPU load
Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Myungjoo Ham myungjoo.ham@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com
Additionally, I explain an example using 'load_table' sysfs entry.
Exynos4412 series has Quad-core and all cores share the power-line. I cann't set diffent voltage/frequency to each CPU. To reduce power- consumption, I certainly have to turn on/off CPU online state according to CPU load on runtime. As a result, I peridically need to monitor current cpu state to determine a proper amount of system resource(necessary number of online cpu) and to delete wasted power. So, I need 'load_table' sysfs file to monitor current cpu state.
I add a table which show power consumption of target based on Exynos4412 SoC. This table indicate the difference power-consumption according to a number of online core and with same number of running task.
[Environment of power estimation]
- cpufreq governor used performance mode to estimate power-consumption on each frequency step.
- Use infinite-loop test program which execute while statement infinitely.
- Always measure the power consumption under same temperature during 1 minutes.
- Unit is mA.
A number of Online core | Core 1 | Core 2 | Core 3 | Core 4 A number of nr_running | 0 1 | 0 1 2 | 0 1 2 3 | 0 1 2 3 4
CPU Frequency | 800 MHz | 293 330 | 295 338 379 | 300 339 386 433 | 303 341 390 412 482 1000 MHz | 312 371 | 316 377 435 | 318 383 454 522 | 322 391 462 526 596 1200 MHz | 323 404 | 328 418 504 | 336 423 521 615 | 343 433 499 639 748 1600 MHz | 380 525 | 388 556 771 | 399 575 771 1011 | 412 597 822 1172 1684
For example, The case A/B/C have the same condition except for a number of online core.
- case A: Online core is 2, 1000MHz and nr_running is 1 : 377mA
- case B: Online core is 3, 1000MHz and nr_running is 1 : 383mA
- case C: Online core is 4, 1000Mz and nr_running is 1 : 391mA
If system has only one running task, cpu hotplug policy, by monitoring cpu state through 'load_table' sysfs file on user-space, will determine 'case A' state for reducing power-consumption.
Show the result of reading 'load_table sysfs file as following:
- cpufreq governor is ondemand_org governor.
$ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/load_table Time Frequency cpu0 cpu1 cpu2 cpu3 1300500043122 1600000 32 6 0 26 1300600079080 800000 63 10 2 45 1300700065288 800000 51 9 1 42 1300800228747 800000 51 9 1 43 1300900182997 800000 78 11 3 47 1301000106163 800000 96 26 6 48 1301100056247 1600000 45 7 1 27 1301200071373 1000000 55 9 1 37 1301300096082 800000 54 10 0 45 1301400132832 800000 70 11 2 46 1301500082290 800000 61 11 1 43 1301600236415 800000 61 9 2 43 1301700071498 800000 59 10 2 43 1301800159290 800000 55 9 1 42 1301900076332 800000 66 10 2 43 1302000102165 800000 47 9 0 43 1302100086623 800000 75 11 2 50 1302200101082 800000 66 10 4 46 1302300108873 800000 53 10 1 44 1302400373373 600000 63 12 1 54
How are you getting loads different for all your cpus? I believe you are just recording these values for policy->cpu and all cpus share same policy on your platform.
I got the Per-CPU load by using cpufreq_notify_transition(). when cpufreq governor call dbs_check_cpu().
Please let me know some opinion about this patch.
Best regards and Thanks, Chanwoo Choi
drivers/cpufreq/cpufreq.c | 4 +++ drivers/cpufreq/cpufreq_governor.c | 21 ++++++++++-- drivers/cpufreq/cpufreq_stats.c | 70 ++++++++++++++++++++++++++++++++++++++ include/linux/cpufreq.h | 6 ++++ 4 files changed, 99 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2d53f47..cbaaff0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -292,6 +292,10 @@ void __cpufreq_notify_transition(struct cpufreq_policy *policy, if (likely(policy) && likely(policy->cpu == freqs->cpu)) policy->cur = freqs->new; break;
case CPUFREQ_LOADCHECK:
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_LOADCHECK, freqs);
break; }
} /** diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 5af40ad..bc50624 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -23,12 +23,17 @@ #include <linux/kernel_stat.h> #include <linux/mutex.h> #include <linux/slab.h> +#include <linux/sched.h> #include <linux/tick.h> #include <linux/types.h> #include <linux/workqueue.h>
#include "cpufreq_governor.h"
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
struct cpufreq_freqs freqs;
+#endif
Why do you need this to be global?
I'll remove global variable and move 'freqs' in some structure.
static struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy) { if (have_governor_per_policy()) @@ -143,11 +148,17 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) idle_time += jiffies_to_usecs(cur_nice_jiffies); }
if (unlikely(!wall_time || wall_time < idle_time))
if (unlikely(!wall_time || wall_time < idle_time)) {
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
freqs.load[j] = 0;
+#endif continue;
} load = 100 * (wall_time - idle_time) / wall_time;
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
freqs.load[j] = load;
+#endif if (dbs_data->cdata->governor == GOV_ONDEMAND) { int freq_avg = __cpufreq_driver_getavg(policy, j); if (freq_avg <= 0) @@ -160,6 +171,12 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) max_load = load; }
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
freqs.time = ktime_to_ns(ktime_get());
freqs.old = freqs.new = policy->cur;
cpufreq_notify_transition(policy, &freqs, CPUFREQ_LOADCHECK);
+#endif dbs_data->cdata->gov_check_cpu(cpu, max_load); } EXPORT_SYMBOL_GPL(dbs_check_cpu); diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index fb65dec..2379b1d 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -22,6 +22,8 @@ #include <linux/notifier.h> #include <asm/cputime.h>
+#define CPUFREQ_LOAD_TABLE_MAX 20
static spinlock_t cpufreq_stats_lock;
struct cpufreq_stats { @@ -35,6 +37,10 @@ struct cpufreq_stats { unsigned int *freq_table; #ifdef CONFIG_CPU_FREQ_STAT_DETAILS unsigned int *trans_table;
struct cpufreq_freqs *load_table;
unsigned int load_last_index;
unsigned int load_max_index;
#endif };
@@ -131,6 +137,38 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) return len; } cpufreq_freq_attr_ro(trans_table);
+static ssize_t show_load_table(struct cpufreq_policy *policy, char *buf) +{
struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu);
struct cpufreq_freqs *load_table = stat->load_table;
ssize_t len = 0;
int i;
int j;
merge above two lines.
OK.
len += sprintf(buf + len, "%11s %10s", "Time", "Frequency");
for (j = 0; j < NR_CPUS; j++)
len += sprintf(buf + len, " %3s%d", "cpu", j);
len += sprintf(buf + len, "\n");
i = stat->load_last_index;
do {
len += sprintf(buf + len, "%lld %9d",
load_table[i].time,
load_table[i].old);
for (j = 0; j < NR_CPUS; j++)
len += sprintf(buf + len, " %4d",
load_table[i].load[j]);
len += sprintf(buf + len, "\n");
if (++i == stat->load_max_index)
i = 0;
} while (i != stat->load_last_index);
You want/need some locking to protect addition to this list while we are reading from it?
OK, I'll consider locking method and implement it on next version patch.
return len;
+} +cpufreq_freq_attr_ro(load_table); #endif
cpufreq_freq_attr_ro(total_trans); @@ -141,6 +179,7 @@ static struct attribute *default_attrs[] = { &time_in_state.attr, #ifdef CONFIG_CPU_FREQ_STAT_DETAILS &trans_table.attr,
&load_table.attr,
#endif NULL }; @@ -167,6 +206,9 @@ static void cpufreq_stats_free_table(unsigned int cpu)
if (stat) { pr_debug("%s: Free stat table\n", __func__);
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
kfree(stat->load_table);
+#endif kfree(stat->time_in_state); kfree(stat); per_cpu(cpufreq_stats_table, cpu) = NULL; @@ -244,6 +286,16 @@ static int cpufreq_stats_create_table(struct cpufreq_policy *policy,
#ifdef CONFIG_CPU_FREQ_STAT_DETAILS stat->trans_table = stat->freq_table + count;
stat->load_max_index = CPUFREQ_LOAD_TABLE_MAX;
stat->load_last_index = 0;
alloc_size = sizeof(struct cpufreq_freqs) * stat->load_max_index;
We aren't using this variable multiple times so get rid of it and also you need to do: sizeof(*stat->load_table).
OK, I'll fix it.
stat->load_table = kzalloc(alloc_size, GFP_KERNEL);
if (!stat->load_table) {
ret = -ENOMEM;
goto error_out;
}
#endif j = 0; for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) { @@ -312,13 +364,31 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, struct cpufreq_stats *stat; int old_index, new_index;
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
if (val != CPUFREQ_POSTCHANGE && val != CPUFREQ_LOADCHECK)
+#else if (val != CPUFREQ_POSTCHANGE) +#endif return 0;
stat = per_cpu(cpufreq_stats_table, freq->cpu); if (!stat) return 0;
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
if (val == CPUFREQ_LOADCHECK) {
struct cpufreq_freqs *dest_freq;
dest_freq = &stat->load_table[stat->load_last_index];
memcpy(dest_freq, freq, sizeof(struct cpufreq_freqs));
again sizeof()...
You don't need to copy full structure probably.
OK, I'll fix it.
if (++stat->load_last_index == stat->load_max_index)
stat->load_last_index = 0;
return 0;
}
+#endif
old_index = stat->last_index; new_index = freq_table_get_index(stat, freq->new);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 037d36a..f780645 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -140,12 +140,18 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) #define CPUFREQ_POSTCHANGE (1) #define CPUFREQ_RESUMECHANGE (8) #define CPUFREQ_SUSPENDCHANGE (9) +#define CPUFREQ_LOADCHECK (10)
struct cpufreq_freqs { unsigned int cpu; /* cpu nr */ unsigned int old; unsigned int new; u8 flags; /* flags of cpufreq_driver, see below. */
+#ifdef CONFIG_CPU_FREQ_STAT_DETAILS
int64_t time;
unsigned int load[NR_CPUS];
+#endif };
Other wise it looks good mostly.
Thanks for your comment.
PS: I have cc'd you for a patch of mine which will get rid of most of the CONFIG_*** you used in your code.. But wait for it to be applied to change your code accordingly..
OK, I will resend this patch after applied below patch. - [PATCH] cpufreq: stats: Remove CONFIG_CPU_FREQ_STAT_DETAILS
Best Regards, Chanwoo Choi