On 4 February 2013 20:35, Borislav Petkov bp@alien8.de wrote:
On Mon, Feb 04, 2013 at 07:51:33PM +0530, Viresh Kumar wrote:
We correlate things with cpus rather than policies and so the current directory structure of cpu/cpu*/cpufreq/*** is the best suited ones.
Ok, show me the details of that layout. How is that going to look?
I don't have board right now to take the snapshot, but it would be like:
$ tree /sys/devices/system/cpu/cpu0/cpufreq/ /sys/devices/system/cpu/cpu0/cpufreq/ ├── affected_cpus ├── bios_limit ├── cpb ├── cpuinfo_cur_freq ├── cpuinfo_max_freq ├── cpuinfo_min_freq ├── cpuinfo_transition_latency ├── related_cpus ├── scaling_available_frequencies ├── scaling_available_governors ├── scaling_cur_freq ├── scaling_driver ├── scaling_governor ├── scaling_max_freq ├── scaling_min_freq ├── scaling_setspeed └── stats ├── time_in_state ├── total_trans └── trans_table └── ondemand ├── sampling_rate ├── up_threshold └── ignore_nice etc..
One thing I've come to realize with the current interface is that if you want to change stuff, you need to iterate over all cpus instead of writing to a system-wide node.
Not really. Following is the way by which cpu/cpu*/cpufreq directories are created:
For policy->cpu: ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj, "cpufreq");
This creates cpufreq directory for policy in policy->cpu...
For all other cpus in policy->cpus, we do: ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
And so whatever gets added in cpu/cpu0/cpufreq directory is reflected in all other policy->cpus.
And, in this case, if you can and need to change the policy per clock-domain, I wouldn't make it needlessly too-granulary per-cpu.
That's why I'm advocating the cpu/cpufreq/ path.
Its already like this, i.e. per policy or clock-domain. Other cpus just have a link. And that's why in my code, i just add governor directory in policy->cpu's cpufreq directory and it gets reflected in other cpus of policy->cpus.
That's why i said P-states as policy tunables.
Hmm.. confused.. Consider two systems:
- A dual core system, with cores sharing clocks.
- A dual cluster system (dual core per cluster), with separate clocks
per cluster.
Where will you keep governor directories for both of these configurations?
Easy: as said above, make the policy granularity per clock-domain. On systems which have only one set of P-states - like it is the case with the overwhelming majority of systems running linux now - nothing should change.
Currently its not per policy, but single instance of any governor is supported. And it is present in cpu/cpufreq . That's why i said earlier, it isn't the right place for governor's directory. It is very much related to a policy or clock-domain.
We need to select only one... cpu/cpufreq doesn't suit the second case at all as we need to use ondemand governor for both the clusters but with separate tunables. And so a single cpu/cpufreq/ondemand directory wouldn't solve the issue.
Think of it this way: what is the highest granularity you need per clock-domain? If you want to control the policy per clock-domain, then cpu/cpufreq/ is what you want. If you want finer-grained control - and you need to think hard of what use cases are sensible for that finer-grained solution - then you're better off with cpu/cpu*/ layout.
I want to control it over clock-domain, but can't get that in cpu/cpufreq/. Policies don't have numbers assigned to them.
In both cases though, having clear examples of why you've come up with the layout you're advocating would help reviewers a lot. If you simply come and say we need this because there might be systems out there who could use it, then that probably is not going to get you that far.
So, i am working on ARM's big.LITTLE system where we have two clusters. One of A15s and other of A7s. Because of their different power ratings or performance figures, we need to have separate set of ondemand tunables for them. And hence this patch. Though this patch is required for any multi-cluster system.
-- viresh