Hi Rafael,
I finally started looking at how to make cpufreq-core somewhat less complex with respect to locking, its state machine, etc..
The first thing I had to get into was cpufreq-stats and so this series. Mostly cleanups to make things better..
Please see if they make any sense at all.
I know we are at Merge window right now and you wouldn't apply it even to linux-next. So, if they look fine, please don't hesitate in dropping a mail to confirm they look fine, so that I can send the next set of cleanups atleast for reviews. We can get these applied one by one then, once we are a bit confident of them.
Pushed here: git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/stats/cleanups
Viresh Kumar (13): cpufreq: stats: don't break strings into multiple lines cpufreq: stats: return -EEXIST when stats are already allocated cpufreq: stats: don't check for freq table while freeing stats cpufreq: stats: pass 'stat' to cpufreq_stats_update() cpufreq: stats: get rid of per-cpu cpufreq_stats_table cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy cpufreq: stats: remove cpufreq_stats_update_policy_cpu() cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications cpufreq: stats: create sysfs group once we are ready cpufreq: stats: don't update stats from show_trans_table() cpufreq: stats: don't update stats on false notifiers cpufreq: stats: Fix locking cpufreq: stats: call cpufreq_stats_update() with locks held
drivers/cpufreq/cpufreq.c | 6 -- drivers/cpufreq/cpufreq_stats.c | 140 ++++++++++++++++++---------------------- include/linux/cpufreq.h | 10 +-- 3 files changed, 69 insertions(+), 87 deletions(-)
The MODULE_DESCRIPTION() string is broken into multiple lines just to make checkpatch happy. Its less readable now and prone to errors. Fix it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 0cd9b4dcef99..81be4d637ab4 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -374,8 +374,7 @@ static void __exit cpufreq_stats_exit(void) }
MODULE_AUTHOR("Zou Nan hai nanhai.zou@intel.com"); -MODULE_DESCRIPTION("'cpufreq_stats' - A driver to export cpufreq stats " - "through sysfs filesystem"); +MODULE_DESCRIPTION("'cpufreq_stats' - exports cpufreq stats through sysfs filesystem"); MODULE_LICENSE("GPL");
module_init(cpufreq_stats_init);
__cpufreq_stats_create_table() is called from: - policy create notifier: stats will always be NULL here - cpufreq_stats_init() calls it for all CPUs as cpufreq-stats can be initialized after cpufreq driver. Because CPUs share clock lines, 'stats' will be NULL here for the first cpu only and will return back for others.
While we return for other CPUs, we don't return the right error value. We must return -EEXIST, as that is the case here.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 81be4d637ab4..403671b1a5ee 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -192,8 +192,10 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) if (unlikely(!table)) return 0;
+ /* stats already initialized */ if (per_cpu(cpufreq_stats_table, cpu)) - return -EBUSY; + return -EEXIST; + stat = kzalloc(sizeof(*stat), GFP_KERNEL); if ((stat) == NULL) return -ENOMEM;
While we allocate stats, we do need to check if freq-table is present or not as we need to use it then. But while freeing stats, all we need to know is if stats holds a valid pointer value. There is no use of testing if cpufreq table is present or not.
Don't check it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 403671b1a5ee..62b6133f06aa 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -174,8 +174,7 @@ static void cpufreq_stats_free_table(unsigned int cpu) if (!policy) return;
- if (cpufreq_frequency_get_table(policy->cpu)) - __cpufreq_stats_free_table(policy); + __cpufreq_stats_free_table(policy);
cpufreq_cpu_put(policy); }
cpufreq_stats_update() needs only 'stat' pointer for its use and nothing else. Currently it works fine as we have a per-cpu variable for storing stats. But later commits would remove it and hence we better pass 'stat' to cpufreq_stats_update().
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 62b6133f06aa..ebd9e4c5c124 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -38,14 +38,11 @@ struct cpufreq_stats_attribute { ssize_t(*show) (struct cpufreq_stats *, char *); };
-static int cpufreq_stats_update(unsigned int cpu) +static int cpufreq_stats_update(struct cpufreq_stats *stat) { - struct cpufreq_stats *stat; - unsigned long long cur_time; + unsigned long long cur_time = get_jiffies_64();
- cur_time = get_jiffies_64(); spin_lock(&cpufreq_stats_lock); - stat = per_cpu(cpufreq_stats_table, cpu); if (stat->time_in_state) stat->time_in_state[stat->last_index] += cur_time - stat->last_time; @@ -70,7 +67,7 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); if (!stat) return 0; - cpufreq_stats_update(stat->cpu); + cpufreq_stats_update(stat); for (i = 0; i < stat->state_num; i++) { len += sprintf(buf + len, "%u %llu\n", stat->freq_table[i], (unsigned long long) @@ -88,7 +85,7 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); if (!stat) return 0; - cpufreq_stats_update(stat->cpu); + cpufreq_stats_update(stat); len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n"); len += snprintf(buf + len, PAGE_SIZE - len, " : "); for (i = 0; i < stat->state_num; i++) { @@ -313,7 +310,7 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, if (old_index == -1 || new_index == -1) return 0;
- cpufreq_stats_update(freq->cpu); + cpufreq_stats_update(stat);
if (old_index == new_index) return 0;
Because all CPUs in a policy share clock line, they share stats as well. But cpufreq-stats maintains a per-cpu list of tables for managing this. Get rid of it by assigning a special field for stats in 'struct cpufreq_policy'.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 51 ++++++++++++++++++----------------------- include/linux/cpufreq.h | 3 +++ 2 files changed, 25 insertions(+), 29 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index ebd9e4c5c124..106c89ccef30 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -31,8 +31,6 @@ struct cpufreq_stats { #endif };
-static DEFINE_PER_CPU(struct cpufreq_stats *, cpufreq_stats_table); - struct cpufreq_stats_attribute { struct attribute attr; ssize_t(*show) (struct cpufreq_stats *, char *); @@ -53,20 +51,17 @@ static int cpufreq_stats_update(struct cpufreq_stats *stat)
static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf) { - struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); - if (!stat) - return 0; - return sprintf(buf, "%d\n", - per_cpu(cpufreq_stats_table, stat->cpu)->total_trans); + struct cpufreq_stats *stat = policy->stats_data; + + return sprintf(buf, "%d\n", stat->total_trans); }
static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) { + struct cpufreq_stats *stat = policy->stats_data; ssize_t len = 0; int i; - struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); - if (!stat) - return 0; + cpufreq_stats_update(stat); for (i = 0; i < stat->state_num; i++) { len += sprintf(buf + len, "%u %llu\n", stat->freq_table[i], @@ -79,12 +74,10 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) #ifdef CONFIG_CPU_FREQ_STAT_DETAILS static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) { + struct cpufreq_stats *stat = policy->stats_data; ssize_t len = 0; int i, j;
- struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); - if (!stat) - return 0; cpufreq_stats_update(stat); len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n"); len += snprintf(buf + len, PAGE_SIZE - len, " : "); @@ -150,8 +143,9 @@ static int freq_table_get_index(struct cpufreq_stats *stat, unsigned int freq)
static void __cpufreq_stats_free_table(struct cpufreq_policy *policy) { - struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, policy->cpu); + struct cpufreq_stats *stat = policy->stats_data;
+ /* Already freed */ if (!stat) return;
@@ -160,7 +154,7 @@ static void __cpufreq_stats_free_table(struct cpufreq_policy *policy) sysfs_remove_group(&policy->kobj, &stats_attr_group); kfree(stat->time_in_state); kfree(stat); - per_cpu(cpufreq_stats_table, policy->cpu) = NULL; + policy->stats_data = NULL; }
static void cpufreq_stats_free_table(unsigned int cpu) @@ -189,7 +183,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) return 0;
/* stats already initialized */ - if (per_cpu(cpufreq_stats_table, cpu)) + if (policy->stats_data) return -EEXIST;
stat = kzalloc(sizeof(*stat), GFP_KERNEL); @@ -201,7 +195,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) goto error_out;
stat->cpu = cpu; - per_cpu(cpufreq_stats_table, cpu) = stat; + policy->stats_data = stat;
cpufreq_for_each_valid_entry(pos, table) count++; @@ -236,7 +230,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) sysfs_remove_group(&policy->kobj, &stats_attr_group); error_out: kfree(stat); - per_cpu(cpufreq_stats_table, cpu) = NULL; + policy->stats_data = NULL; return ret; }
@@ -259,14 +253,8 @@ static void cpufreq_stats_create_table(unsigned int cpu)
static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) { - struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, - policy->last_cpu); - - pr_debug("Updating stats_table for new_cpu %u from last_cpu %u\n", - policy->cpu, policy->last_cpu); - per_cpu(cpufreq_stats_table, policy->cpu) = per_cpu(cpufreq_stats_table, - policy->last_cpu); - per_cpu(cpufreq_stats_table, policy->last_cpu) = NULL; + struct cpufreq_stats *stat = policy->stats_data; + stat->cpu = policy->cpu; }
@@ -293,15 +281,19 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, unsigned long val, void *data) { struct cpufreq_freqs *freq = data; + struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu); struct cpufreq_stats *stat; int old_index, new_index;
if (val != CPUFREQ_POSTCHANGE) return 0;
- stat = per_cpu(cpufreq_stats_table, freq->cpu); - if (!stat) - return 0; + if (!policy) { + pr_err("%s: No policy found\n", __func__); + return -ENODEV; + } + + stat = policy->stats_data;
old_index = stat->last_index; new_index = freq_table_get_index(stat, freq->new); @@ -322,6 +314,7 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, #endif stat->total_trans++; spin_unlock(&cpufreq_stats_lock); + cpufreq_cpu_put(policy); return 0; }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 4d078cebafd2..3cb097b86d20 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -113,6 +113,9 @@ struct cpufreq_policy { wait_queue_head_t transition_wait; struct task_struct *transition_task; /* Task which is doing the transition */
+ /* cpufreq-stats data */ + void *stats_data; + /* For cpufreq driver's internal use */ void *driver_data; };
'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 --- include/linux/cpufreq.h | 2 -- 2 files changed, 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index a09a29c312a9..b88894578fd1 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1091,10 +1091,7 @@ static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu, }
down_write(&policy->rwsem); - - policy->last_cpu = policy->cpu; policy->cpu = cpu; - up_write(&policy->rwsem);
blocking_notifier_call_chain(&cpufreq_policy_notifier_list, diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 3cb097b86d20..2d99f12e4199 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -66,8 +66,6 @@ struct cpufreq_policy { unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */ unsigned int cpu; /* cpu nr of CPU managing this policy */ - unsigned int last_cpu; /* cpu nr of previous CPU that managed - * this policy */ struct clk *clk; struct cpufreq_cpuinfo cpuinfo;/* see above */
cpufreq_stats_update_policy_cpu() doesn't do any useful stuff anymore and must be removed. Along with it, also remove 'cpu' field from 'struct cpufreq_stats' as that isn't used as well.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 14 -------------- 1 file changed, 14 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 106c89ccef30..f458fdbc5bfd 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -18,7 +18,6 @@ static spinlock_t cpufreq_stats_lock;
struct cpufreq_stats { - unsigned int cpu; unsigned int total_trans; unsigned long long last_time; unsigned int max_state; @@ -194,7 +193,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) if (ret) goto error_out;
- stat->cpu = cpu; policy->stats_data = stat;
cpufreq_for_each_valid_entry(pos, table) @@ -251,24 +249,12 @@ static void cpufreq_stats_create_table(unsigned int cpu) cpufreq_cpu_put(policy); }
-static void cpufreq_stats_update_policy_cpu(struct cpufreq_policy *policy) -{ - struct cpufreq_stats *stat = policy->stats_data; - - stat->cpu = policy->cpu; -} - static int cpufreq_stat_notifier_policy(struct notifier_block *nb, unsigned long val, void *data) { int ret = 0; struct cpufreq_policy *policy = data;
- if (val == CPUFREQ_UPDATE_POLICY_CPU) { - cpufreq_stats_update_policy_cpu(policy); - return 0; - } - if (val == CPUFREQ_CREATE_POLICY) ret = __cpufreq_stats_create_table(policy); else if (val == CPUFREQ_REMOVE_POLICY)
CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which doesn't use it anymore. Remove them.
This also fixes the value of other notification macros, hopefully all users are using macro's instead of direct values and nothing will break :)
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq.c | 3 --- include/linux/cpufreq.h | 5 ++--- 2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b88894578fd1..6d97dffeaaf7 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1094,9 +1094,6 @@ static int update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu, policy->cpu = cpu; up_write(&policy->rwsem);
- blocking_notifier_call_chain(&cpufreq_policy_notifier_list, - CPUFREQ_UPDATE_POLICY_CPU, policy); - return 0; }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 2d99f12e4199..43013eac26fd 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -368,9 +368,8 @@ static inline void cpufreq_resume(void) {} #define CPUFREQ_INCOMPATIBLE (1) #define CPUFREQ_NOTIFY (2) #define CPUFREQ_START (3) -#define CPUFREQ_UPDATE_POLICY_CPU (4) -#define CPUFREQ_CREATE_POLICY (5) -#define CPUFREQ_REMOVE_POLICY (6) +#define CPUFREQ_CREATE_POLICY (4) +#define CPUFREQ_REMOVE_POLICY (5)
#ifdef CONFIG_CPU_FREQ int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
Userspace is free to read value of any file from cpufreq/stats directory once they are created. __cpufreq_stats_create_table() is creating the sysfs files first and then allocating resources for them. Though it would be quite difficult to trigger the racy situation here, but for the sake of keeping sensible code lets create sysfs entries only after we are ready to go.
This also does some makeup to the routine to make it look better.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 44 +++++++++++++++++++++++------------------ 1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index f458fdbc5bfd..dd55d57655ce 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -171,12 +171,13 @@ static void cpufreq_stats_free_table(unsigned int cpu)
static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) { - unsigned int i, count = 0, ret = 0; + unsigned int i = 0, count = 0, ret = -ENOMEM; struct cpufreq_stats *stat; unsigned int alloc_size; unsigned int cpu = policy->cpu; struct cpufreq_frequency_table *pos, *table;
+ /* We need cpufreq table for creating stats table */ table = cpufreq_frequency_get_table(cpu); if (unlikely(!table)) return 0; @@ -186,15 +187,10 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) return -EEXIST;
stat = kzalloc(sizeof(*stat), GFP_KERNEL); - if ((stat) == NULL) + if (!stat) return -ENOMEM;
- ret = sysfs_create_group(&policy->kobj, &stats_attr_group); - if (ret) - goto error_out; - - policy->stats_data = stat; - + /* Find total allocation size */ cpufreq_for_each_valid_entry(pos, table) count++;
@@ -203,32 +199,42 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) #ifdef CONFIG_CPU_FREQ_STAT_DETAILS alloc_size += count * count * sizeof(int); #endif - stat->max_state = count; + + /* Allocate memory for time_in_state/freq_table/trans_table in one go */ stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL); - if (!stat->time_in_state) { - ret = -ENOMEM; - goto error_alloc; - } + if (!stat->time_in_state) + goto free_stat; + stat->freq_table = (unsigned int *)(stat->time_in_state + count);
#ifdef CONFIG_CPU_FREQ_STAT_DETAILS stat->trans_table = stat->freq_table + count; #endif - i = 0; + + stat->max_state = count; + + /* Find valid-unique entries */ cpufreq_for_each_valid_entry(pos, table) if (freq_table_get_index(stat, pos->frequency) == -1) stat->freq_table[i++] = pos->frequency; stat->state_num = i; + spin_lock(&cpufreq_stats_lock); stat->last_time = get_jiffies_64(); stat->last_index = freq_table_get_index(stat, policy->cur); spin_unlock(&cpufreq_stats_lock); - return 0; -error_alloc: - sysfs_remove_group(&policy->kobj, &stats_attr_group); -error_out: - kfree(stat); + + policy->stats_data = stat; + ret = sysfs_create_group(&policy->kobj, &stats_attr_group); + if (!ret) + return 0; + + /* We failed, release resources */ policy->stats_data = NULL; + kfree(stat->time_in_state); +free_stat: + kfree(stat); + return ret; }
cpufreq_stats_update() updates time_in_state and nothing else. It should ideally be updated only in two cases: - User requested for the current value of time_in_state. - We have switched states and so need to update time for the last state.
Currently, we are also doing this while user asks for the transition table of frequencies. It wouldn't do any harm, but no good as well. Its useless here.
Remove it.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index dd55d57655ce..c0c2ddece8fe 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -77,7 +77,6 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) ssize_t len = 0; int i, j;
- cpufreq_stats_update(stat); len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n"); len += snprintf(buf + len, PAGE_SIZE - len, " : "); for (i = 0; i < stat->state_num; i++) {
No idea when 'old_index == new_index' in transition notifier (probably never?), but if this happens, we aren't supposed to update time_in_state by calling cpufreq_stats_update(). Its only relevant when we are changing frequencies.
So, move the call after matching indexes.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index c0c2ddece8fe..7701532b32c8 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -293,11 +293,11 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, if (old_index == -1 || new_index == -1) return 0;
- cpufreq_stats_update(stat); - if (old_index == new_index) return 0;
+ cpufreq_stats_update(stat); + spin_lock(&cpufreq_stats_lock); stat->last_index = new_index; #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
Locking wasn't handled properly at places and this patch is an attempt to fix it. Specially while creating/removing stat tables.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 7701532b32c8..e18735816908 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -149,10 +149,12 @@ static void __cpufreq_stats_free_table(struct cpufreq_policy *policy)
pr_debug("%s: Free stat table\n", __func__);
+ spin_lock(&cpufreq_stats_lock); sysfs_remove_group(&policy->kobj, &stats_attr_group); kfree(stat->time_in_state); kfree(stat); policy->stats_data = NULL; + spin_unlock(&cpufreq_stats_lock); }
static void cpufreq_stats_free_table(unsigned int cpu) @@ -181,13 +183,17 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) if (unlikely(!table)) return 0;
+ spin_lock(&cpufreq_stats_lock); + /* stats already initialized */ - if (policy->stats_data) - return -EEXIST; + if (policy->stats_data) { + ret = -EEXIST; + goto unlock; + }
stat = kzalloc(sizeof(*stat), GFP_KERNEL); if (!stat) - return -ENOMEM; + goto unlock;
/* Find total allocation size */ cpufreq_for_each_valid_entry(pos, table) @@ -218,21 +224,21 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy) stat->freq_table[i++] = pos->frequency; stat->state_num = i;
- spin_lock(&cpufreq_stats_lock); stat->last_time = get_jiffies_64(); stat->last_index = freq_table_get_index(stat, policy->cur); - spin_unlock(&cpufreq_stats_lock);
policy->stats_data = stat; ret = sysfs_create_group(&policy->kobj, &stats_attr_group); if (!ret) - return 0; + goto unlock;
/* We failed, release resources */ policy->stats_data = NULL; kfree(stat->time_in_state); free_stat: kfree(stat); +unlock: + spin_unlock(&cpufreq_stats_lock);
return ret; }
On 18 December 2014 at 17:30, Viresh Kumar viresh.kumar@linaro.org wrote:
Locking wasn't handled properly at places and this patch is an attempt to fix it. Specially while creating/removing stat tables.
Because we were required to allocated memory from within locks, we had to use mutex here. So, just consider a mutex instead of spinlock everywhere in this file.
On 12/23/2014 01:06 AM, Viresh Kumar wrote:
On 18 December 2014 at 17:30, Viresh Kumar viresh.kumar@linaro.org wrote:
Locking wasn't handled properly at places and this patch is an attempt to fix it. Specially while creating/removing stat tables.
Because we were required to allocated memory from within locks, we had to use mutex here. So, just consider a mutex instead of spinlock everywhere in this file.
There is a subtle locking issue here that I want to make sure you're aware of (mostly because it has bitten me in the past a few times).
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 7701532b32c8..e18735816908 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -149,10 +149,12 @@ static void __cpufreq_stats_free_table(struct cpufreq_policy *policy) pr_debug("%s: Free stat table\n", __func__);
- spin_lock(&cpufreq_stats_lock); sysfs_remove_group(&policy->kobj, &stats_attr_group);
^^^ this line is not guaranteed to have removed the group files by the time it returns. That is important in this code IMO, as we are adding and removing the stats entries rapidly. You may hit the dreaded "sysfs file already exists" WARNING if this is done too fast, or if sysfs is locked for some other reason and delays the removal of the group. In the worst case, it is possible you hit a NULL pointer due to stale data access (because you kfree and set to a NULL pointer below).
I've never found a good method to block this sort of add/remove sysfs file race from happening. Perhaps someone else may have a suggestion. In the past I've thought about adding a usage count to the sysfs file handlers themselves but that seems to lead to blocking on userspace access...
kfree(stat->time_in_state); kfree(stat); policy->stats_data = NULL;
- spin_unlock(&cpufreq_stats_lock);
}
There may not be an easy way around this ...
P.
On 23 December 2014 at 16:17, Prarit Bhargava prarit@redhat.com wrote:
sysfs_remove_group(&policy->kobj, &stats_attr_group);
^^^ this line is not guaranteed to have removed the group files by the time it
Strange .. Under what conditions can the above statement be true?
returns. That is important in this code IMO, as we are adding and removing the stats entries rapidly. You may hit the dreaded "sysfs file already exists" WARNING if this is done too fast, or if sysfs is locked for some other reason and delays the removal of the group. In the worst case, it is possible you hit a NULL pointer due to stale data access (because you kfree and set to a NULL pointer below).
Hmm, I have seen such stuff yet for sure. But anyway, that would be out of scope of this patch. As w or w/o locking, it is broken :)
I've never found a good method to block this sort of add/remove sysfs file race from happening. Perhaps someone else may have a suggestion. In the past I've thought about adding a usage count to the sysfs file handlers themselves but that seems to lead to blocking on userspace access...
Some more details and we can drag Greg-kh into this, but lets be sure of what we are asking..
kfree(stat->time_in_state); kfree(stat); policy->stats_data = NULL;
spin_unlock(&cpufreq_stats_lock);
}
There may not be an easy way around this ...
You were just completing the earlier horror story or is this a new comment which I couldn't understand ?
Which IRC do you hangout on? Just in case I need..
-- viresh
On 12/23/2014 05:55 AM, Viresh Kumar wrote:
On 23 December 2014 at 16:17, Prarit Bhargava prarit@redhat.com wrote:
sysfs_remove_group(&policy->kobj, &stats_attr_group);
^^^ this line is not guaranteed to have removed the group files by the time it
Strange .. Under what conditions can the above statement be true?
Suppose the following happens:
1. CPU A holds a file descriptor in userspace for one of the cpu stats files.
2. CPU B is removing the group
3. CPU B calls sysfs_remove_group, which does not remove the file because there is a use count on it (because CPU A holds the file open).
4. CPU B now brings up stats again ... and we hit the a collision in the sysfs space.
FWIW: I've hit this scenario in all types of hotplug (memory, pci, and cpu). I understand why the sysfs_group and sysfs_file remove functions do NOT block on use counts because it is entirely possible that userspace has done something braindead which would then prevent the kernel from making progress, or in the case of an error return (ex, -EBUSY) prevent hotplug from succeeding. I dunno if there is a way out of this :/
returns. That is important in this code IMO, as we are adding and removing the stats entries rapidly. You may hit the dreaded "sysfs file already exists" WARNING if this is done too fast, or if sysfs is locked for some other reason and delays the removal of the group. In the worst case, it is possible you hit a NULL pointer due to stale data access (because you kfree and set to a NULL pointer below).
Hmm, I have seen such stuff yet for sure. But anyway, that would be out of scope of this patch. As w or w/o locking, it is broken :)
Yeah, I agree ... I'm just pointing it out as a problem here. I'm simply trying to make you aware of the conditions on which this locking may fail.
I've never found a good method to block this sort of add/remove sysfs file race from happening. Perhaps someone else may have a suggestion. In the past I've thought about adding a usage count to the sysfs file handlers themselves but that seems to lead to blocking on userspace access...
Some more details and we can drag Greg-kh into this, but lets be sure of what we are asking..
kfree(stat->time_in_state); kfree(stat); policy->stats_data = NULL;
spin_unlock(&cpufreq_stats_lock);
}
There may not be an easy way around this ...
You were just completing the earlier horror story or is this a new comment which I couldn't understand ?
Sorry -- completing the earlier horror story :)
Which IRC do you hangout on? Just in case I need..
You can usually find me in freenode, #fedora-kernel. I'm *really* bad about marking myself away though ;) so don't be annoyed if I don't answer right away.
P.
-- viresh
cpufreq_stats_update() has only two callers, one of them is already taking locks and other one don't.
To make locking efficient for cpufreq_stat_notifier_trans(), lets call cpufreq_stats_update() from within locks. Also update show_time_in_state() to call cpufreq_stats_update() after taking locks.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq_stats.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e18735816908..955abb54f9ec 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -39,12 +39,10 @@ static int cpufreq_stats_update(struct cpufreq_stats *stat) { unsigned long long cur_time = get_jiffies_64();
- spin_lock(&cpufreq_stats_lock); if (stat->time_in_state) stat->time_in_state[stat->last_index] += cur_time - stat->last_time; stat->last_time = cur_time; - spin_unlock(&cpufreq_stats_lock); return 0; }
@@ -61,7 +59,10 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) ssize_t len = 0; int i;
+ spin_lock(&cpufreq_stats_lock); cpufreq_stats_update(stat); + spin_unlock(&cpufreq_stats_lock); + for (i = 0; i < stat->state_num; i++) { len += sprintf(buf + len, "%u %llu\n", stat->freq_table[i], (unsigned long long) @@ -302,9 +303,9 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb, if (old_index == new_index) return 0;
+ spin_lock(&cpufreq_stats_lock); cpufreq_stats_update(stat);
- spin_lock(&cpufreq_stats_lock); stat->last_index = new_index; #ifdef CONFIG_CPU_FREQ_STAT_DETAILS stat->trans_table[old_index * stat->max_state + new_index]++;
On Thursday, December 18, 2014 05:30:17 PM Viresh Kumar wrote:
Hi Rafael,
I finally started looking at how to make cpufreq-core somewhat less complex with respect to locking, its state machine, etc..
The first thing I had to get into was cpufreq-stats and so this series. Mostly cleanups to make things better..
Please see if they make any sense at all.
I know we are at Merge window right now and you wouldn't apply it even to linux-next. So, if they look fine, please don't hesitate in dropping a mail to confirm they look fine, so that I can send the next set of cleanups atleast for reviews. We can get these applied one by one then, once we are a bit confident of them.
Well, the timing is pretty bad for various reasons. For one, this isn't 3.19 material, so I'm not even going to look at it tomorrow and I'm officially on vacation from Monday onward pretty much through the first half of January.
If I have some time and really nothing else to do, I'll have a look at it, but I can't promise anything.
It would be good if you could ask poeple to have a look at it and give comments in the meantime.
On 19 December 2014 at 07:45, Rafael J. Wysocki rjw@rjwysocki.net wrote:
On Thursday, December 18, 2014 05:30:17 PM Viresh Kumar wrote:
Hi Rafael,
I finally started looking at how to make cpufreq-core somewhat less complex with respect to locking, its state machine, etc..
The first thing I had to get into was cpufreq-stats and so this series. Mostly cleanups to make things better..
Please see if they make any sense at all.
I know we are at Merge window right now and you wouldn't apply it even to linux-next. So, if they look fine, please don't hesitate in dropping a mail to confirm they look fine, so that I can send the next set of cleanups atleast for reviews. We can get these applied one by one then, once we are a bit confident of them.
Well, the timing is pretty bad for various reasons. For one, this isn't 3.19 material, so I'm not even going to look at it tomorrow and I'm officially on vacation from Monday onward pretty much through the first half of January.
If I have some time and really nothing else to do, I'll have a look at it, but I can't promise anything.
It would be good if you could ask poeple to have a look at it and give comments in the meantime.
Prarit, any more comments about this series ?
Latest stuff is here: git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/stats/cleanups
-- viresh
On 12/28/2014 11:41 PM, Viresh Kumar wrote:
On 19 December 2014 at 07:45, Rafael J. Wysocki rjw@rjwysocki.net wrote:
On Thursday, December 18, 2014 05:30:17 PM Viresh Kumar wrote:
Hi Rafael,
I finally started looking at how to make cpufreq-core somewhat less complex with respect to locking, its state machine, etc..
The first thing I had to get into was cpufreq-stats and so this series. Mostly cleanups to make things better..
Please see if they make any sense at all.
I know we are at Merge window right now and you wouldn't apply it even to linux-next. So, if they look fine, please don't hesitate in dropping a mail to confirm they look fine, so that I can send the next set of cleanups atleast for reviews. We can get these applied one by one then, once we are a bit confident of them.
Well, the timing is pretty bad for various reasons. For one, this isn't 3.19 material, so I'm not even going to look at it tomorrow and I'm officially on vacation from Monday onward pretty much through the first half of January.
If I have some time and really nothing else to do, I'll have a look at it, but I can't promise anything.
It would be good if you could ask poeple to have a look at it and give comments in the meantime.
Prarit, any more comments about this series ?
Latest stuff is here: git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/stats/cleanups
Hey Viresh, seems fine to me. I do want to again point out the issue with the sysfs removal but fixing that type of issue is out-of-scope for this patchset.
Reviewed-by: Prarit Bhargava prarit@redhat.com
P.
-- viresh
linaro-kernel@lists.linaro.org