Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
Some part is still now done: - Interface for adding new detailed OPPs from platform code instead of DT - Providing cpufreq helpers for the next OPPs - Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
-- viresh
Viresh Kumar (7): OPP: Redefine bindings to overcome shortcomings opp: Relocate few routines OPP: Break _opp_add_dynamic() into smaller functions opp: Parse new (v2) bindings opp: convert device_opp->dev to a list of devices opp: Add helpers for initializing CPU opps cpufreq-dt: Use DT to set policy->cpus/related_cpus
Documentation/devicetree/bindings/power/opp.txt | 407 ++++++++- drivers/base/power/opp.c | 1041 +++++++++++++++++------ drivers/cpufreq/cpufreq-dt.c | 16 +- include/linux/pm_opp.h | 23 + 4 files changed, 1232 insertions(+), 255 deletions(-)
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
There had been multiple band-aid approaches to get them fixed (The latest one being: http://www.mail-archive.com/devicetree@vger.kernel.org/msg53398.html). For obvious reasons Rob rejected them and shown the right path forward.
The shortcomings we are trying to solve here:
- Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
- Support for turbo modes
- Support for intermediate frequencies or OPPs we can switch to.
- Other per OPP settings: transition latencies, disabled status, etc.?
- Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/power/opp.txt | 407 +++++++++++++++++++++++- 1 file changed, 403 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 74499e5033fc..a64621819d7c 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -1,8 +1,407 @@ -* Generic OPP Interface +Generic OPP (Operating Performance Points) Bindings +----------------------------------------------------
-SoCs have a standard set of tuples consisting of frequency and -voltage pairs that the device will support per voltage domain. These -are called Operating Performance Points or OPPs. +Devices work at voltage-frequency pairs and some implementations have the +liberty of choosing these pairs. These pairs are called Operating Performance +Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device. + + +* Property: operating-points-v2 + +Devices supporting OPPs must set their "operating-points-v2" property with +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle +to find the operating points for the device. + + +* OPP Descriptor Node + +This describes the OPPs belonging to a device. This node can have following +properties: + +Required properties: +- compatible: Allow OPPs to express their compatibility. It should be: + "operating-points-v2". +- OPP nodes: One or more OPP nodes describing frequency-voltage pairs. Their + name isn't significant but their phandles can be used to reference an OPP. + +Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle + switch their DVFS state together, i.e. they share clock lines. Missing + property means devices have independent clock lines, but they share OPPs. + + +* OPP Node + +This defines frequency-voltage pairs along with other related properties. + +Required properties: +- opp-khz: Frequency in kHz + +Optional properties: +- opp-microvolt: voltage in micro Volts. Its an array with size one or three. + Single entry is for target voltage and three entries are for (in the specified + order) <target min max> voltage. +- clock-latency-ns: Specifies the maximum possible transition latency (in + nanoseconds) for switching to this OPP from any other OPP. +- opp-next: It contains a list of phandles of other OPPs, to which we can switch + directly from this OPP (Explained later with examples). Missing property means + no restriction on switching to other OPPs. +- turbo-mode: Marks the OPP to be used only for turbo modes. +- status: Marks the node enabled/disabled. + +Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a9"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@1 { + compatible = "arm,cortex-a9"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; +}; + +Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states +independently. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "qcom,krait"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@1 { + compatible = "qcom,krait"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@2 { + compatible = "qcom,krait"; + reg = <2>; + next-level-cache = <&L2>; + clocks = <&clk_controller 2>; + clock-names = "cpu"; + opp-supply = <&cpu_supply2>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@3 { + compatible = "qcom,krait"; + reg = <3>; + next-level-cache = <&L2>; + clocks = <&clk_controller 3>; + clock-names = "cpu"; + opp-supply = <&cpu_supply3>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + + /* + * Missing shared-opp property means CPUs switch DVFS states + * independently. + */ + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; +}; + +Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch +DVFS state together. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@1 { + compatible = "arm,cortex-a7"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@100 { + compatible = "arm,cortex-a15"; + reg = <100>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + + cpu@101 { + compatible = "arm,cortex-a15"; + reg = <101>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + }; + + cluster0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; + + cluster1_opp: opp1 { + compatible = "operating-points-v2"; + shared-opp; + + entry10 { + opp-khz = <1300000>; + opp-microvolt = <1045000 1050000 1055000>; + clock-latency-ns = <400000>; + }; + entry11 { + opp-khz = <1400000>; + opp-microvolt = <1075000>; + clock-latency-ns = <400000>; + }; + entry12 { + opp-khz = <1500000>; + opp-microvolt = <1010000 1100000 1110000>; + clock-latency-ns = <400000>; + turbo-mode; + }; + }; +}; + +Example 4: How to use "opp-next" property ? + +1.) Switch to a intermediate OPP (entry00) before switching to any other OPP. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + opp_next: entry00 { + opp-khz = <500000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + /* Can switch to any OPP from here */ + }; + entry01 { + opp-khz = <600000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry02 { + opp-khz = <900000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry03 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry04 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + opp-next = <&opp_next>; + }; + entry05 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + opp-next = <&opp_next>; + turbo-mode; + }; + }; +}; + +2.) Can only switch to the next or previous OPP directly. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + opp_next0: entry00 { + opp-khz = <500000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next1>; + }; + opp_next1: entry01 { + opp-khz = <600000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next0>, <&opp_next2>; + }; + opp_next2: entry02 { + opp-khz = <900000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next1>, <&opp_next3>; + }; + opp_next3: entry03 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next2>, <&opp_next4>; + }; + opp_next4: entry04 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + opp-next = <&opp_next3>, <&opp_next5>; + }; + opp_next5: entry05 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + opp-next = <&opp_next4>; + turbo-mode; + }; + }; +}; + + + +Deprecated Bindings +-------------------
Properties: - operating-points: An array of 2-tuples items, and each item consists
Viresh Kumar viresh.kumar@linaro.org writes:
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
There had been multiple band-aid approaches to get them fixed (The latest one being: http://www.mail-archive.com/devicetree@vger.kernel.org/msg53398.html). For obvious reasons Rob rejected them and shown the right path forward.
The shortcomings we are trying to solve here:
Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
Support for turbo modes
Support for intermediate frequencies or OPPs we can switch to.
Other per OPP settings: transition latencies, disabled status, etc.?
Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
[...]
+* OPP Descriptor Node
+This describes the OPPs belonging to a device. This node can have following +properties:
+Required properties: +- compatible: Allow OPPs to express their compatibility. It should be:
- "operating-points-v2".
+- OPP nodes: One or more OPP nodes describing frequency-voltage pairs. Their
- name isn't significant but their phandles can be used to reference an OPP.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock lines.
... or shared voltage rail?
- Missing property means devices have independent clock lines, but they share OPPs.
huh? missing 'shared-opp' property means they share OPPs? -ECONFUSED.
Maybe I missed some of the discussion of why this property is needed, but I'm left wondering why it's needed explicitly. With the OPPs as part of the CPU nodes, shouldnt' the "shared" nature of the OPP be easily derived from the clock and or regulator (opp-supply) property of the CPU nodes? IOW, if two CPU nodes share a clock and/or a regulator, the framework should know it's "shared".
Or, were there other reasons besides clocks/regulators why this property was needed (if so, the documentation doesn't describe them.)
Kevin
On 24 February 2015 at 04:06, Kevin Hilman khilman@kernel.org wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock lines.
... or shared voltage rail?
The point is that they switch their frequencies together. Which means, sharing voltage rail + PLLs .. How do we write it properly ?
- Missing property means devices have independent clock lines, but they share OPPs.
huh? missing 'shared-opp' property means they share OPPs? -ECONFUSED.
Right. s/OPPs/OPP tables ..
Makes sense now ?
Maybe I missed some of the discussion of why this property is needed, but I'm left wondering why it's needed explicitly. With the OPPs as
So that same OPP tables can be shared across CPUs which don't share voltage rails.. For example, Krait. We only need to define single set of tables and all CPUs will point to it. But this property would be missing in that case as CPUs don't change their DVFS state together.
part of the CPU nodes, shouldnt' the "shared" nature of the OPP be easily derived from the clock and or regulator (opp-supply) property of the CPU nodes? IOW, if two CPU nodes share a clock and/or a regulator, the framework should know it's "shared".
So you missed all earlier discussions :), there were lots of concerns around that. And the best solution we found out is to do it this way..
- There can be multiple clocks/regulators present in CPU's DT node and that makes it complex. - There are cases where immediate clock parents of CPUs are different but somewhere higher in the hierarchy they have a common ancestor, which is responsible for rate change. And so it would be difficult to find out if they share clock/regulator or not.. - People wanted it to be some static data instead of trying to find with help of some algorithms..
Or, were there other reasons besides clocks/regulators why this property was needed (if so, the documentation doesn't describe them.)
I am not sure if we need to mention anything else.. But yes, please let me know if it can be improved a bit.
Viresh Kumar viresh.kumar@linaro.org writes:
On 24 February 2015 at 04:06, Kevin Hilman khilman@kernel.org wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock lines.
... or shared voltage rail?
The point is that they switch their frequencies together. Which means, sharing voltage rail + PLLs .. How do we write it properly ?
s/they share clock lines/they share clock and/or voltage lines/
- Missing property means devices have independent clock lines, but they share OPPs.
huh? missing 'shared-opp' property means they share OPPs? -ECONFUSED.
Right. s/OPPs/OPP tables ..
Makes sense now ?
Yes.
Maybe I missed some of the discussion of why this property is needed, but I'm left wondering why it's needed explicitly. With the OPPs as
So that same OPP tables can be shared across CPUs which don't share voltage rails.. For example, Krait. We only need to define single set of tables and all CPUs will point to it. But this property would be missing in that case as CPUs don't change their DVFS state together.
part of the CPU nodes, shouldnt' the "shared" nature of the OPP be easily derived from the clock and or regulator (opp-supply) property of the CPU nodes? IOW, if two CPU nodes share a clock and/or a regulator, the framework should know it's "shared".
So you missed all earlier discussions :), there were lots of concerns around that. And the best solution we found out is to do it this way..
- There can be multiple clocks/regulators present in CPU's DT node and
that makes it complex.
- There are cases where immediate clock parents of CPUs are different
but somewhere higher in the hierarchy they have a common ancestor, which is responsible for rate change. And so it would be difficult to find out if they share clock/regulator or not..
- People wanted it to be some static data instead of trying to find
with help of some algorithms..
OK, fair enough. Looks like it's been thought through.
However, in the end, I don't think it's going to avoid the "help of some algorithms." The flag will tell you it's shared, but not how, and that will likely still need to be determined.
Kevin
On Tuesday 24 February 2015 10:42 PM, Kevin Hilman wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
s/they share clock lines/they share clock and/or voltage lines/
Hmm, will update that.
OK, fair enough. Looks like it's been thought through.
However, in the end, I don't think it's going to avoid the "help of some algorithms." The flag will tell you it's shared, but not how, and that will likely still need to be determined.
What did you mean by 'how' here? You wanted to say which devices share the OPP? Yeah, a small piece of code is required and that's present in my patchset.
In order to prepare for the later commits, this relocates few routines towards the top as they will be used earlier in the code.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp.c | 192 +++++++++++++++++++++++------------------------ 1 file changed, 96 insertions(+), 96 deletions(-)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c index 15bf29974c31..9e2118284f02 100644 --- a/drivers/base/power/opp.c +++ b/drivers/base/power/opp.c @@ -408,6 +408,102 @@ struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, EXPORT_SYMBOL_GPL(dev_pm_opp_find_freq_floor);
/** + * _kfree_opp_rcu() - Free OPP RCU handler + * @head: RCU head + */ +static void _kfree_opp_rcu(struct rcu_head *head) +{ + struct dev_pm_opp *opp = container_of(head, struct dev_pm_opp, rcu_head); + + kfree_rcu(opp, rcu_head); +} + +/** + * _kfree_device_rcu() - Free device_opp RCU handler + * @head: RCU head + */ +static void _kfree_device_rcu(struct rcu_head *head) +{ + struct device_opp *device_opp = container_of(head, struct device_opp, rcu_head); + + kfree_rcu(device_opp, rcu_head); +} + +/** + * _opp_remove() - Remove an OPP from a table definition + * @dev_opp: points back to the device_opp struct this opp belongs to + * @opp: pointer to the OPP to remove + * + * This function removes an opp definition from the opp list. + * + * Locking: The internal device_opp and opp structures are RCU protected. + * It is assumed that the caller holds required mutex for an RCU updater + * strategy. + */ +static void _opp_remove(struct device_opp *dev_opp, + struct dev_pm_opp *opp) +{ + /* + * Notify the changes in the availability of the operable + * frequency/voltage list. + */ + srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp); + list_del_rcu(&opp->node); + call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, _kfree_opp_rcu); + + if (list_empty(&dev_opp->opp_list)) { + list_del_rcu(&dev_opp->node); + call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head, + _kfree_device_rcu); + } +} + +/** + * dev_pm_opp_remove() - Remove an OPP from OPP list + * @dev: device for which we do this operation + * @freq: OPP to remove with matching 'freq' + * + * This function removes an opp from the opp list. + * + * Locking: The internal device_opp and opp structures are RCU protected. + * Hence this function internally uses RCU updater strategy with mutex locks + * to keep the integrity of the internal data structures. Callers should ensure + * that this function is *NOT* called under RCU protection or in contexts where + * mutex cannot be locked. + */ +void dev_pm_opp_remove(struct device *dev, unsigned long freq) +{ + struct dev_pm_opp *opp; + struct device_opp *dev_opp; + bool found = false; + + /* Hold our list modification lock here */ + mutex_lock(&dev_opp_list_lock); + + dev_opp = _find_device_opp(dev); + if (IS_ERR(dev_opp)) + goto unlock; + + list_for_each_entry(opp, &dev_opp->opp_list, node) { + if (opp->rate == freq) { + found = true; + break; + } + } + + if (!found) { + dev_warn(dev, "%s: Couldn't find OPP with freq: %lu\n", + __func__, freq); + goto unlock; + } + + _opp_remove(dev_opp, opp); +unlock: + mutex_unlock(&dev_opp_list_lock); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_remove); + +/** * _add_device_opp() - Allocate a new device OPP table * @dev: device for which we do this operation * @@ -572,102 +668,6 @@ int dev_pm_opp_add(struct device *dev, unsigned long freq, unsigned long u_volt) EXPORT_SYMBOL_GPL(dev_pm_opp_add);
/** - * _kfree_opp_rcu() - Free OPP RCU handler - * @head: RCU head - */ -static void _kfree_opp_rcu(struct rcu_head *head) -{ - struct dev_pm_opp *opp = container_of(head, struct dev_pm_opp, rcu_head); - - kfree_rcu(opp, rcu_head); -} - -/** - * _kfree_device_rcu() - Free device_opp RCU handler - * @head: RCU head - */ -static void _kfree_device_rcu(struct rcu_head *head) -{ - struct device_opp *device_opp = container_of(head, struct device_opp, rcu_head); - - kfree_rcu(device_opp, rcu_head); -} - -/** - * _opp_remove() - Remove an OPP from a table definition - * @dev_opp: points back to the device_opp struct this opp belongs to - * @opp: pointer to the OPP to remove - * - * This function removes an opp definition from the opp list. - * - * Locking: The internal device_opp and opp structures are RCU protected. - * It is assumed that the caller holds required mutex for an RCU updater - * strategy. - */ -static void _opp_remove(struct device_opp *dev_opp, - struct dev_pm_opp *opp) -{ - /* - * Notify the changes in the availability of the operable - * frequency/voltage list. - */ - srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp); - list_del_rcu(&opp->node); - call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, _kfree_opp_rcu); - - if (list_empty(&dev_opp->opp_list)) { - list_del_rcu(&dev_opp->node); - call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head, - _kfree_device_rcu); - } -} - -/** - * dev_pm_opp_remove() - Remove an OPP from OPP list - * @dev: device for which we do this operation - * @freq: OPP to remove with matching 'freq' - * - * This function removes an opp from the opp list. - * - * Locking: The internal device_opp and opp structures are RCU protected. - * Hence this function internally uses RCU updater strategy with mutex locks - * to keep the integrity of the internal data structures. Callers should ensure - * that this function is *NOT* called under RCU protection or in contexts where - * mutex cannot be locked. - */ -void dev_pm_opp_remove(struct device *dev, unsigned long freq) -{ - struct dev_pm_opp *opp; - struct device_opp *dev_opp; - bool found = false; - - /* Hold our list modification lock here */ - mutex_lock(&dev_opp_list_lock); - - dev_opp = _find_device_opp(dev); - if (IS_ERR(dev_opp)) - goto unlock; - - list_for_each_entry(opp, &dev_opp->opp_list, node) { - if (opp->rate == freq) { - found = true; - break; - } - } - - if (!found) { - dev_warn(dev, "%s: Couldn't find OPP with freq: %lu\n", - __func__, freq); - goto unlock; - } - - _opp_remove(dev_opp, opp); -unlock: - mutex_unlock(&dev_opp_list_lock); -} -EXPORT_SYMBOL_GPL(dev_pm_opp_remove); - -/** * _opp_set_availability() - helper to set the availability of an opp * @dev: device for which we do this operation * @freq: OPP frequency to modify availability
Later commits would add support for new OPP bindings and this would be required then. So, lets do it in a separate patch to make it reviewing easy.
Another change worth noticing is INIT_LIST_HEAD(&opp->node). We weren't doing it earlier as we never tried to delete a list node before it is added to list. But this wouldn't be the case anymore. We might try to delete a node (just to reuse the same code paths), without it being getting added to the list.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp.c | 127 ++++++++++++++++++++++++++++------------------- 1 file changed, 75 insertions(+), 52 deletions(-)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c index 9e2118284f02..904dcf386747 100644 --- a/drivers/base/power/opp.c +++ b/drivers/base/power/opp.c @@ -433,6 +433,7 @@ static void _kfree_device_rcu(struct rcu_head *head) * _opp_remove() - Remove an OPP from a table definition * @dev_opp: points back to the device_opp struct this opp belongs to * @opp: pointer to the OPP to remove + * @notify: OPP_EVENT_REMOVE notification should be sent or not * * This function removes an opp definition from the opp list. * @@ -441,13 +442,14 @@ static void _kfree_device_rcu(struct rcu_head *head) * strategy. */ static void _opp_remove(struct device_opp *dev_opp, - struct dev_pm_opp *opp) + struct dev_pm_opp *opp, bool notify) { /* * Notify the changes in the availability of the operable * frequency/voltage list. */ - srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp); + if (notify) + srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp); list_del_rcu(&opp->node); call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, _kfree_opp_rcu);
@@ -497,7 +499,7 @@ void dev_pm_opp_remove(struct device *dev, unsigned long freq) goto unlock; }
- _opp_remove(dev_opp, opp); + _opp_remove(dev_opp, opp, true); unlock: mutex_unlock(&dev_opp_list_lock); } @@ -533,6 +535,63 @@ static struct device_opp *_add_device_opp(struct device *dev) return dev_opp; }
+static struct dev_pm_opp *_allocate_opp(struct device *dev, + struct device_opp **dev_opp) +{ + struct dev_pm_opp *opp; + + /* allocate new OPP node */ + opp = kzalloc(sizeof(*opp), GFP_KERNEL); + if (!opp) + return NULL; + + INIT_LIST_HEAD(&opp->node); + + /* Check for existing list for 'dev' */ + *dev_opp = _find_device_opp(dev); + if (IS_ERR(*dev_opp)) { + *dev_opp = _add_device_opp(dev); + if (!*dev_opp) { + kfree(opp); + return NULL; + } + } + + return opp; +} + +static int _opp_add(struct dev_pm_opp *new_opp, struct device_opp *dev_opp) +{ + struct dev_pm_opp *opp = NULL; + struct list_head *head = &dev_opp->opp_list; + + /* + * Insert new OPP in order of increasing frequency + * and discard if already present + */ + list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) { + if (new_opp->rate <= opp->rate) + break; + + head = &opp->node; + } + + /* Duplicate OPPs ? */ + if (opp && new_opp->rate == opp->rate) { + dev_warn(dev_opp->dev, + "%s: duplicate OPPs detected. Existing: freq: %lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n", + __func__, opp->rate, opp->u_volt, opp->available, + new_opp->rate, new_opp->u_volt, new_opp->available); + return opp->available && new_opp->u_volt == opp->u_volt ? + 0 : -EEXIST; + } + + new_opp->dev_opp = dev_opp; + list_add_rcu(&new_opp->node, head); + + return 0; +} + /** * _opp_add_dynamic() - Allocate a dynamic OPP. * @dev: device for which we do this operation @@ -563,66 +622,29 @@ static struct device_opp *_add_device_opp(struct device *dev) static int _opp_add_dynamic(struct device *dev, unsigned long freq, long u_volt, bool dynamic) { - struct device_opp *dev_opp = NULL; - struct dev_pm_opp *opp, *new_opp; - struct list_head *head; + struct device_opp *dev_opp; + struct dev_pm_opp *new_opp; int ret;
- /* allocate new OPP node */ - new_opp = kzalloc(sizeof(*new_opp), GFP_KERNEL); - if (!new_opp) { - dev_warn(dev, "%s: Unable to create new OPP node\n", __func__); - return -ENOMEM; - } - /* Hold our list modification lock here */ mutex_lock(&dev_opp_list_lock);
+ new_opp = _allocate_opp(dev, &dev_opp); + if (!new_opp) { + ret = -ENOMEM; + goto unlock; + } + /* populate the opp table */ new_opp->rate = freq; new_opp->u_volt = u_volt; new_opp->available = true; new_opp->dynamic = dynamic;
- /* Check for existing list for 'dev' */ - dev_opp = _find_device_opp(dev); - if (IS_ERR(dev_opp)) { - dev_opp = _add_device_opp(dev); - if (!dev_opp) { - ret = -ENOMEM; - goto free_opp; - } - - head = &dev_opp->opp_list; - goto list_add; - } - - /* - * Insert new OPP in order of increasing frequency - * and discard if already present - */ - head = &dev_opp->opp_list; - list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) { - if (new_opp->rate <= opp->rate) - break; - else - head = &opp->node; - } - - /* Duplicate OPPs ? */ - if (new_opp->rate == opp->rate) { - ret = opp->available && new_opp->u_volt == opp->u_volt ? - 0 : -EEXIST; - - dev_warn(dev, "%s: duplicate OPPs detected. Existing: freq: %lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n", - __func__, opp->rate, opp->u_volt, opp->available, - new_opp->rate, new_opp->u_volt, new_opp->available); + ret = _opp_add(new_opp, dev_opp); + if (ret) goto free_opp; - }
-list_add: - new_opp->dev_opp = dev_opp; - list_add_rcu(&new_opp->node, head); mutex_unlock(&dev_opp_list_lock);
/* @@ -633,8 +655,9 @@ static int _opp_add_dynamic(struct device *dev, unsigned long freq, return 0;
free_opp: + _opp_remove(dev_opp, new_opp, false); +unlock: mutex_unlock(&dev_opp_list_lock); - kfree(new_opp); return ret; }
@@ -922,7 +945,7 @@ void of_free_opp_table(struct device *dev) /* Free static OPPs */ list_for_each_entry_safe(opp, tmp, &dev_opp->opp_list, node) { if (!opp->dynamic) - _opp_remove(dev_opp, opp); + _opp_remove(dev_opp, opp, true); }
mutex_unlock(&dev_opp_list_lock);
This adds support in OPP library to parse and create list of OPPs from operating-points-v2 bindings. It takes care of most of the properties of new bindings (except shared-opp, which will be handled separately).
For backward compatibility, we keep supporting earlier bindings. We try to search for the new bindings first, in case they aren't present we look for the old deprecated ones.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp.c | 342 +++++++++++++++++++++++++++++++++++++++++++---- include/linux/pm_opp.h | 6 + 2 files changed, 323 insertions(+), 25 deletions(-)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c index 904dcf386747..b7b9c33fbb65 100644 --- a/drivers/base/power/opp.c +++ b/drivers/base/power/opp.c @@ -49,12 +49,21 @@ * are protected by the dev_opp_list_lock for integrity. * IMPORTANT: the opp nodes should be maintained in increasing * order. - * @dynamic: not-created from static DT entries. * @available: true/false - marks if this OPP as available or not + * @dynamic: not-created from static DT entries. + * @turbo: true if turbo (boost) OPP * @rate: Frequency in hertz - * @u_volt: Nominal voltage in microvolts corresponding to this OPP + * @u_volt: Target voltage in microvolts corresponding to this OPP + * @u_volt_min: Minimum voltage in microvolts corresponding to this OPP + * @u_volt_max: Maximum voltage in microvolts corresponding to this OPP + * @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's + * frequency from any other OPP's frequency. * @dev_opp: points back to the device_opp struct this opp belongs to * @rcu_head: RCU callback head used for deferred freeing + * @np: OPP's device node. + * @next: Array of OPPs we can switch to from this OPP. + * @next_np: Array of device nodes of OPP to which we can switch from this OPP. + * @count: count of 'next' array. * * This structure stores the OPP information for a given device. */ @@ -63,11 +72,20 @@ struct dev_pm_opp {
bool available; bool dynamic; + bool turbo; unsigned long rate; unsigned long u_volt; + unsigned long u_volt_min; + unsigned long u_volt_max; + unsigned long clock_latency_ns;
struct device_opp *dev_opp; struct rcu_head rcu_head; + + struct device_node *np; + struct dev_pm_opp **next; + struct device_node **next_np; + unsigned int next_count; };
/** @@ -444,15 +462,21 @@ static void _kfree_device_rcu(struct rcu_head *head) static void _opp_remove(struct device_opp *dev_opp, struct dev_pm_opp *opp, bool notify) { + /* Free any next OPP nodes */ + kfree(opp->next); + /* * Notify the changes in the availability of the operable * frequency/voltage list. */ if (notify) srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_REMOVE, opp); + + /* Free OPP */ list_del_rcu(&opp->node); call_srcu(&dev_opp->srcu_head.srcu, &opp->rcu_head, _kfree_opp_rcu);
+ /* Free device if all OPPs are freed */ if (list_empty(&dev_opp->opp_list)) { list_del_rcu(&dev_opp->node); call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head, @@ -662,6 +686,170 @@ static int _opp_add_dynamic(struct device *dev, unsigned long freq, }
/** + * _opp_add_dynamic_v2() - Allocate a dynamic OPP according to 'v2' DT bindings. + * @dev: device for which we do this operation + * @np: device node + * @dynamic: Dynamically added OPPs. + * + * This function adds an opp definition to the opp list and returns status. The + * opp can be controlled using dev_pm_opp_enable/disable functions and may be + * removed by dev_pm_opp_remove. + * + * NOTE: "dynamic" parameter impacts OPPs added by the of_init_opp_table and + * freed by of_free_opp_table. + * + * Locking: The internal device_opp and opp structures are RCU protected. + * Hence this function internally uses RCU updater strategy with mutex locks + * to keep the integrity of the internal data structures. Callers should ensure + * that this function is *NOT* called under RCU protection or in contexts where + * mutex cannot be locked. + * + * Return: + * 0 On success OR + * Duplicate OPPs (both freq and volt are same) and opp->available + * -EEXIST Freq are same and volt are different OR + * Duplicate OPPs (both freq and volt are same) and !opp->available + * -ENOMEM Memory allocation failure + * -EINVAL Failed parsing the OPP node + */ +static int _opp_add_dynamic_v2(struct device *dev, struct device_node *np, + bool dynamic) +{ + struct device_opp *dev_opp; + struct dev_pm_opp *new_opp; + int ret; + + /* Hold our list modification lock here */ + mutex_lock(&dev_opp_list_lock); + + new_opp = _allocate_opp(dev, &dev_opp); + if (!new_opp) { + ret = -ENOMEM; + goto unlock; + } + + ret = of_property_read_u32(np, "opp-khz", (u32 *)&new_opp->rate); + if (ret < 0) { + dev_err(dev, "%s: opp-khz not found\n", __func__); + goto free_opp; + } + new_opp->rate *= 1000; + + if (of_get_property(np, "turbo-mode", NULL)) + new_opp->turbo = true; + + new_opp->np = np; + new_opp->dynamic = dynamic; + new_opp->available = of_device_is_available(np); + of_property_read_u32(np, "clock-latency-ns", + (u32 *)&new_opp->clock_latency_ns); + + /* read opp-microvolt array */ + ret = of_property_count_u32_elems(np, "opp-microvolt"); + if (ret == 1 || ret == 3) { + /* There can be one or three elements here */ + ret = of_property_read_u32_array(np, "opp-microvolt", + (u32 *)&new_opp->u_volt, ret); + if (ret) { + dev_err(dev, "%s: error parsing opp-microvolt: %d\n", + __func__, ret); + goto free_opp; + } + } + + /* Parse OPP phandles */ + ret = of_property_count_u32_elems(np, "opp-next"); + if (ret > 0) { + int i, size = sizeof(*new_opp->next) + sizeof(*new_opp->next_np); + struct device_node **next_np; + + /* Allocate memory for both 'next' and 'next_np' */ + new_opp->next = kmalloc(size * ret, GFP_KERNEL); + if (!new_opp->next) { + ret = -ENOMEM; + goto free_opp; + } + + new_opp->next_count = ret; + new_opp->next_np = (struct device_node **)(new_opp->next + ret); + next_np = new_opp->next_np; + + /* + * Parse OPP phandles for now, we will create the list of OPPs + * once all are available. + */ + for (i = 0; i < ret; i++) { + next_np[i] = of_parse_phandle(np, "opp-next", i); + if (!next_np[i]) { + dev_err(dev, "%s: Failed to get opp phandle\n", + __func__); + ret = -EINVAL; + goto free_opp; + } + } + } + + ret = _opp_add(new_opp, dev_opp); + if (ret) + goto free_opp; + + pr_debug("%s: dynamic:%d turbo:%d rate:%lu uv:%lu uvmin:%lu uvmax:%lu latency:%lu next-count:%u\n", + __func__, new_opp->dynamic, new_opp->turbo, new_opp->rate, + new_opp->u_volt, new_opp->u_volt_min, new_opp->u_volt_max, + new_opp->clock_latency_ns, new_opp->next_count); + + mutex_unlock(&dev_opp_list_lock); + + /* + * Notify the changes in the availability of the operable + * frequency/voltage list. + */ + srcu_notifier_call_chain(&dev_opp->srcu_head, OPP_EVENT_ADD, new_opp); + return 0; + +free_opp: + _opp_remove(dev_opp, new_opp, false); +unlock: + mutex_unlock(&dev_opp_list_lock); + return ret; +} + +static struct dev_pm_opp *_get_opp_from_np(struct device_opp *dev_opp, + struct device_node *np) +{ + struct dev_pm_opp *opp; + + list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) + if (opp->np == np) + return opp; + + return NULL; +} + +static void _opp_fill_opp_next(struct device *dev) +{ + struct device_opp *dev_opp; + struct dev_pm_opp *opp; + int i; + + dev_opp = _find_device_opp(dev); + if (WARN_ON(!dev_opp)) + return; + + list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) { + if (!opp->next) + continue; + + for (i = 0; i < opp->next_count; i++) { + opp->next[i] = _get_opp_from_np(dev_opp, opp->next_np[i]); + if (!opp->next[i]) + dev_err(dev, "%s: Have np but no OPP: %d\n", + __func__, i); + } + } +} + +/** * dev_pm_opp_add() - Add an OPP table from a table definitions * @dev: device for which we do this operation * @freq: Frequency in Hz for this OPP @@ -851,29 +1039,89 @@ struct srcu_notifier_head *dev_pm_opp_get_notifier(struct device *dev) EXPORT_SYMBOL_GPL(dev_pm_opp_get_notifier);
#ifdef CONFIG_OF -/** - * of_init_opp_table() - Initialize opp table from device tree - * @dev: device pointer used to lookup device OPPs. - * - * Register the initial OPP table with the OPP library for given device. - * - * Locking: The internal device_opp and opp structures are RCU protected. - * Hence this function indirectly uses RCU updater strategy with mutex locks - * to keep the integrity of the internal data structures. Callers should ensure - * that this function is *NOT* called under RCU protection or in contexts where - * mutex cannot be locked. - * - * Return: - * 0 On success OR - * Duplicate OPPs (both freq and volt are same) and opp->available - * -EEXIST Freq are same and volt are different OR - * Duplicate OPPs (both freq and volt are same) and !opp->available - * -ENOMEM Memory allocation failure - * -ENODEV when 'operating-points' property is not found or is invalid data - * in device node. - * -ENODATA when empty 'operating-points' property is found - */ -int of_init_opp_table(struct device *dev) +void of_free_opp_table(struct device *dev); + +/* Returns opp descriptor node from its phandle. Caller must do of_node_put() */ +static struct device_node * +_of_get_opp_desc_node_from_prop(struct device *dev, const struct property *prop) +{ + struct device_node *opp_np; + + opp_np = of_find_node_by_phandle(be32_to_cpup(prop->value)); + if (!opp_np) { + dev_err(dev, "%s: Prop: %s contains invalid opp desc phandle\n", + __func__, prop->name); + return ERR_PTR(-EINVAL); + } + + return opp_np; +} + +/* Returns opp descriptor node for a device. Caller must do of_node_put() */ +struct device_node *of_get_opp_desc_node(struct device *dev) +{ + const struct property *prop; + + prop = of_find_property(dev->of_node, "operating-points-v2", NULL); + if (!prop) + return ERR_PTR(-ENODEV); + if (!prop->value) + return ERR_PTR(-ENODATA); + + /* + * There should be only ONE phandle present in "operating-points-v2" + * property. + */ + if (prop->length != sizeof(__be32)) { + dev_err(dev, "%s: Invalid opp desc phandle\n", __func__); + return ERR_PTR(-EINVAL); + } + + return _of_get_opp_desc_node_from_prop(dev, prop); +} +EXPORT_SYMBOL_GPL(of_get_opp_desc_node); + +/* Initializes OPP tables based on new bindings */ +static int _of_init_opp_table_v2(struct device *dev, + const struct property *prop) +{ + struct device_node *opp_np, *np; + int ret = 0, count = 0; + + /* Get opp node */ + opp_np = _of_get_opp_desc_node_from_prop(dev, prop); + if (IS_ERR(opp_np)) + return PTR_ERR(opp_np); + + /* We have opp-list node now, iterate over it and add OPPs */ + for_each_available_child_of_node(opp_np, np) { + count++; + + ret = _opp_add_dynamic_v2(dev, np, false); + if (ret) { + dev_err(dev, "%s: Failed to add OPP, %d\n", __func__, + ret); + break; + } + } + + /* There should be one of more OPP defined */ + if (WARN_ON(!count)) + goto put_opp_np; + + if (ret) + of_free_opp_table(dev); + else + _opp_fill_opp_next(dev); + +put_opp_np: + of_node_put(opp_np); + + return ret; +} + +/* Initializes OPP tables based on old-deprecated bindings */ +static int _of_init_opp_table_v1(struct device *dev) { const struct property *prop; const __be32 *val; @@ -908,6 +1156,50 @@ int of_init_opp_table(struct device *dev)
return 0; } + +/** + * of_init_opp_table() - Initialize opp table from device tree + * @dev: device pointer used to lookup device OPPs. + * + * Register the initial OPP table with the OPP library for given device. + * + * Locking: The internal device_opp and opp structures are RCU protected. + * Hence this function indirectly uses RCU updater strategy with mutex locks + * to keep the integrity of the internal data structures. Callers should ensure + * that this function is *NOT* called under RCU protection or in contexts where + * mutex cannot be locked. + * + * Return: + * 0 On success OR + * Duplicate OPPs (both freq and volt are same) and opp->available + * -EEXIST Freq are same and volt are different OR + * Duplicate OPPs (both freq and volt are same) and !opp->available + * -ENOMEM Memory allocation failure + * -ENODEV when 'operating-points' property is not found or is invalid data + * in device node. + * -ENODATA when empty 'operating-points' property is found + */ +int of_init_opp_table(struct device *dev) +{ + const struct property *prop; + + /* + * OPPs have two version of bindings now. The older one is deprecated, + * try for the new binding first. + */ + prop = of_find_property(dev->of_node, "operating-points-v2", NULL); + if (!prop) { + /* + * Try old-deprecated bindings for backward compatibility with + * older dtbs. + */ + return _of_init_opp_table_v1(dev); + } + + if (!prop->value) + return -ENODATA; + return _of_init_opp_table_v2(dev, prop); +} EXPORT_SYMBOL_GPL(of_init_opp_table);
/** diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index cec2d4540914..9949d07a93f9 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -115,6 +115,7 @@ static inline struct srcu_notifier_head *dev_pm_opp_get_notifier( #if defined(CONFIG_PM_OPP) && defined(CONFIG_OF) int of_init_opp_table(struct device *dev); void of_free_opp_table(struct device *dev); +struct device_node *of_get_opp_desc_node(struct device *dev); #else static inline int of_init_opp_table(struct device *dev) { @@ -124,6 +125,11 @@ static inline int of_init_opp_table(struct device *dev) static inline void of_free_opp_table(struct device *dev) { } + +static inline struct device_node *of_get_opp_desc_node(struct device *dev) +{ + return ERR_PTR(-EINVAL); +} #endif
#endif /* __LINUX_OPP_H__ */
An opp can be shared by multiple devices, for example its very common for CPUs to share the OPPs, i.e. when they share clock lines.
Until now, the OPPs are bound to a single device and there is no way to mark them shared.
This patch adds support to manage a list of devices sharing OPPs. It also senses if the device (we are trying to initialize OPPs for) shares OPPs with an device added earlier and in that case we just update the list of devices managed by OPPs instead of initializing them again.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp.c | 189 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 157 insertions(+), 32 deletions(-)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c index b7b9c33fbb65..24a014b7a68a 100644 --- a/drivers/base/power/opp.c +++ b/drivers/base/power/opp.c @@ -89,16 +89,33 @@ struct dev_pm_opp { };
/** + * struct device_list_opp - devices managed by 'struct device_opp' + * @node: list node + * @dev: device to which the struct object belongs + * @rcu_head: RCU callback head used for deferred freeing + * + * This is an internal data structure maintaining the list of devices that are + * managed by 'struct device_opp'. + */ +struct device_list_opp { + struct list_head node; + struct device *dev; + struct rcu_head rcu_head; +}; + +/** * struct device_opp - Device opp structure * @node: list node - contains the devices with OPPs that * have been registered. Nodes once added are not modified in this * list. * RCU usage: nodes are not modified in the list of device_opp, * however addition is possible and is secured by dev_opp_list_lock - * @dev: device pointer * @srcu_head: notifier head to notify the OPP availability changes. * @rcu_head: RCU callback head used for deferred freeing + * @dev_list: list of devices that share these OPPs * @opp_list: list of opps + * @np: struct device_node pointer for opp's DT node. + * @shared_opp: OPP is shared between multiple devices. * * This is an internal data structure maintaining the link to opps attached to * a device. This structure is not meant to be shared to users as it is @@ -111,10 +128,12 @@ struct dev_pm_opp { struct device_opp { struct list_head node;
- struct device *dev; struct srcu_notifier_head srcu_head; struct rcu_head rcu_head; + struct list_head dev_list; struct list_head opp_list; + struct device_node *np; + bool shared_opp; };
/* @@ -134,6 +153,45 @@ do { \ "dev_opp_list_lock protection"); \ } while (0)
+static struct device_list_opp *_find_list_dev(struct device *dev, + struct device_opp *dev_opp) +{ + struct device_list_opp *list_dev; + + list_for_each_entry(list_dev, &dev_opp->dev_list, node) + if (list_dev->dev == dev) + return list_dev; + + return NULL; +} + +static int _list_dev_count(struct device_opp *dev_opp) +{ + struct device_list_opp *list_dev; + int count = 0; + + list_for_each_entry(list_dev, &dev_opp->dev_list, node) + count++; + + return count; +} + +static struct device_opp *_managed_opp(struct device_node *np) +{ + struct device_opp *dev_opp; + + list_for_each_entry_rcu(dev_opp, &dev_opp_list, node) + if (dev_opp->np == np) { + /* Managed only if the OPPs are shared */ + if (dev_opp->shared_opp) + return dev_opp; + else + return NULL; + } + + return NULL; +} + /** * _find_device_opp() - find device_opp struct using device pointer * @dev: device pointer used to lookup device OPPs @@ -150,21 +208,18 @@ do { \ */ static struct device_opp *_find_device_opp(struct device *dev) { - struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV); + struct device_opp *dev_opp;
if (unlikely(IS_ERR_OR_NULL(dev))) { pr_err("%s: Invalid parameters\n", __func__); return ERR_PTR(-EINVAL); }
- list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) { - if (tmp_dev_opp->dev == dev) { - dev_opp = tmp_dev_opp; - break; - } - } + list_for_each_entry_rcu(dev_opp, &dev_opp_list, node) + if (_find_list_dev(dev, dev_opp)) + return dev_opp;
- return dev_opp; + return ERR_PTR(-ENODEV); }
/** @@ -425,6 +480,38 @@ struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, } EXPORT_SYMBOL_GPL(dev_pm_opp_find_freq_floor);
+static struct device_list_opp *_add_list_dev(struct device *dev, + struct device_opp *dev_opp) +{ + struct device_list_opp *list_dev; + + list_dev = kzalloc(sizeof(*list_dev), GFP_KERNEL); + if (!list_dev) + return NULL; + + /* Initialize list-dev */ + list_add_rcu(&list_dev->node, &dev_opp->dev_list); + list_dev->dev = dev; + + return list_dev; +} + +static void _kfree_list_dev_rcu(struct rcu_head *head) +{ + struct device_list_opp *list_dev; + + list_dev = container_of(head, struct device_list_opp, rcu_head); + kfree_rcu(list_dev, rcu_head); +} + +static void _remove_list_dev(struct device_list_opp *list_dev, + struct device_opp *dev_opp) +{ + list_del(&list_dev->node); + call_srcu(&dev_opp->srcu_head.srcu, &list_dev->rcu_head, + _kfree_list_dev_rcu); +} + /** * _kfree_opp_rcu() - Free OPP RCU handler * @head: RCU head @@ -478,6 +565,17 @@ static void _opp_remove(struct device_opp *dev_opp,
/* Free device if all OPPs are freed */ if (list_empty(&dev_opp->opp_list)) { + struct device_list_opp *list_dev; + + list_dev = list_first_entry(&dev_opp->dev_list, + struct device_list_opp, node); + + _remove_list_dev(list_dev, dev_opp); + + /* dev_list must be empty now */ + WARN_ON(!list_empty(&dev_opp->dev_list)); + + /* Free dev-opp */ list_del_rcu(&dev_opp->node); call_srcu(&dev_opp->srcu_head.srcu, &dev_opp->rcu_head, _kfree_device_rcu); @@ -541,6 +639,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_remove); static struct device_opp *_add_device_opp(struct device *dev) { struct device_opp *dev_opp; + struct device_list_opp *list_dev;
/* * Allocate a new device OPP table. In the infrequent case where a new @@ -550,7 +649,13 @@ static struct device_opp *_add_device_opp(struct device *dev) if (!dev_opp) return NULL;
- dev_opp->dev = dev; + INIT_LIST_HEAD(&dev_opp->dev_list); + list_dev = _add_list_dev(dev, dev_opp); + if (!list_dev) { + kfree(dev_opp); + return NULL; + } + srcu_init_notifier_head(&dev_opp->srcu_head); INIT_LIST_HEAD(&dev_opp->opp_list);
@@ -584,7 +689,8 @@ static struct dev_pm_opp *_allocate_opp(struct device *dev, return opp; }
-static int _opp_add(struct dev_pm_opp *new_opp, struct device_opp *dev_opp) +static int _opp_add(struct device *dev, struct dev_pm_opp *new_opp, + struct device_opp *dev_opp) { struct dev_pm_opp *opp = NULL; struct list_head *head = &dev_opp->opp_list; @@ -602,7 +708,7 @@ static int _opp_add(struct dev_pm_opp *new_opp, struct device_opp *dev_opp)
/* Duplicate OPPs ? */ if (opp && new_opp->rate == opp->rate) { - dev_warn(dev_opp->dev, + dev_warn(dev, "%s: duplicate OPPs detected. Existing: freq: %lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n", __func__, opp->rate, opp->u_volt, opp->available, new_opp->rate, new_opp->u_volt, new_opp->available); @@ -665,7 +771,7 @@ static int _opp_add_dynamic(struct device *dev, unsigned long freq, new_opp->available = true; new_opp->dynamic = dynamic;
- ret = _opp_add(new_opp, dev_opp); + ret = _opp_add(dev, new_opp, dev_opp); if (ret) goto free_opp;
@@ -789,7 +895,7 @@ static int _opp_add_dynamic_v2(struct device *dev, struct device_node *np, } }
- ret = _opp_add(new_opp, dev_opp); + ret = _opp_add(dev, new_opp, dev_opp); if (ret) goto free_opp;
@@ -826,16 +932,11 @@ static struct dev_pm_opp *_get_opp_from_np(struct device_opp *dev_opp, return NULL; }
-static void _opp_fill_opp_next(struct device *dev) +static void _opp_fill_opp_next(struct device_opp *dev_opp, struct device *dev) { - struct device_opp *dev_opp; struct dev_pm_opp *opp; int i;
- dev_opp = _find_device_opp(dev); - if (WARN_ON(!dev_opp)) - return; - list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) { if (!opp->next) continue; @@ -1086,6 +1187,7 @@ static int _of_init_opp_table_v2(struct device *dev, const struct property *prop) { struct device_node *opp_np, *np; + struct device_opp *dev_opp; int ret = 0, count = 0;
/* Get opp node */ @@ -1093,6 +1195,14 @@ static int _of_init_opp_table_v2(struct device *dev, if (IS_ERR(opp_np)) return PTR_ERR(opp_np);
+ dev_opp = _managed_opp(opp_np); + if (dev_opp) { + /* OPPs are already managed */ + if (!_add_list_dev(dev, dev_opp)) + ret = -ENOMEM; + goto put_opp_np; + } + /* We have opp-list node now, iterate over it and add OPPs */ for_each_available_child_of_node(opp_np, np) { count++; @@ -1109,10 +1219,19 @@ static int _of_init_opp_table_v2(struct device *dev, if (WARN_ON(!count)) goto put_opp_np;
- if (ret) + if (ret) { of_free_opp_table(dev); - else - _opp_fill_opp_next(dev); + } else { + /* Update np and shared_opp */ + dev_opp = _find_device_opp(dev); + if (WARN_ON(!dev_opp)) + goto put_opp_np; + + dev_opp->np = opp_np; + if (of_get_property(opp_np, "shared-opp", NULL)) + dev_opp->shared_opp = true; + _opp_fill_opp_next(dev_opp, dev); + }
put_opp_np: of_node_put(opp_np); @@ -1219,6 +1338,9 @@ void of_free_opp_table(struct device *dev) struct device_opp *dev_opp; struct dev_pm_opp *opp, *tmp;
+ /* Hold our list modification lock here */ + mutex_lock(&dev_opp_list_lock); + /* Check for existing list for 'dev' */ dev_opp = _find_device_opp(dev); if (IS_ERR(dev_opp)) { @@ -1228,18 +1350,21 @@ void of_free_opp_table(struct device *dev) IS_ERR_OR_NULL(dev) ? "Invalid device" : dev_name(dev), error); - return; + goto unlock; }
- /* Hold our list modification lock here */ - mutex_lock(&dev_opp_list_lock); - - /* Free static OPPs */ - list_for_each_entry_safe(opp, tmp, &dev_opp->opp_list, node) { - if (!opp->dynamic) - _opp_remove(dev_opp, opp, true); + /* Find if dev_opp manages a single device */ + if (_list_dev_count(dev_opp) == 1) { + /* Free static OPPs */ + list_for_each_entry_safe(opp, tmp, &dev_opp->opp_list, node) { + if (!opp->dynamic) + _opp_remove(dev_opp, opp, true); + } + } else { + _remove_list_dev(_find_list_dev(dev, dev_opp), dev_opp); }
+unlock: mutex_unlock(&dev_opp_list_lock); } EXPORT_SYMBOL_GPL(of_free_opp_table);
This provides helpers to find which CPUs share OPPs and initialize OPPs for them.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_opp.h | 17 +++++++ 2 files changed, 130 insertions(+)
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c index 24a014b7a68a..751f56f323bf 100644 --- a/drivers/base/power/opp.c +++ b/drivers/base/power/opp.c @@ -11,6 +11,7 @@ * published by the Free Software Foundation. */
+#include <linux/cpu.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/err.h> @@ -1321,6 +1322,44 @@ int of_init_opp_table(struct device *dev) } EXPORT_SYMBOL_GPL(of_init_opp_table);
+int of_cpumask_init_opp_table(cpumask_var_t cpumask) +{ + struct device *cpu_dev; + int cpu, tcpu, ret = 0; + + if (cpumask_empty(cpumask)) + return -EINVAL; + + for_each_cpu(cpu, cpumask) { + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) { + pr_err("%s: failed to get cpu%d device\n", __func__, + cpu); + continue; + } + + ret = of_init_opp_table(cpu_dev); + if (ret) { + pr_err("%s: couldn't find opp table for cpu:%d, %d\n", + __func__, cpu, ret); + + /* Free all other OPPs */ + for_each_cpu(tcpu, cpumask) { + if (cpu == tcpu) + break; + + cpu_dev = get_cpu_device(tcpu); + if (cpu_dev) + of_free_opp_table(cpu_dev); + } + break; + } + } + + return ret; +} +EXPORT_SYMBOL_GPL(of_cpumask_init_opp_table); + /** * of_free_opp_table() - Free OPP table entries created from static DT entries * @dev: device pointer used to lookup device OPPs. @@ -1368,4 +1407,78 @@ void of_free_opp_table(struct device *dev) mutex_unlock(&dev_opp_list_lock); } EXPORT_SYMBOL_GPL(of_free_opp_table); + +void of_cpumask_free_opp_table(cpumask_var_t cpumask) +{ + struct device *cpu_dev; + int cpu; + + WARN_ON(cpumask_empty(cpumask)); + + for_each_cpu(cpu, cpumask) { + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) { + pr_err("%s: failed to get cpu%d device\n", __func__, + cpu); + continue; + } + + of_free_opp_table(cpu_dev); + } +} +EXPORT_SYMBOL_GPL(of_cpumask_free_opp_table); + +/* Works only for OPP v2 bindings */ +int of_get_cpus_sharing_opps(struct device *cpu_dev, cpumask_var_t cpumask) +{ + struct device_node *np, *tmp_np; + struct device *tcpu_dev; + int cpu, ret = 0; + + cpumask_set_cpu(cpu_dev->id, cpumask); + + /* Get OPP descriptor node */ + np = of_get_opp_desc_node(cpu_dev); + if (IS_ERR(np)) { + dev_info(cpu_dev, "%s: Couldn't find opp node: %ld\n", __func__, + PTR_ERR(np)); + return -EINVAL; + } + + /* OPPs are shared ? */ + if (!of_get_property(np, "shared-opp", NULL)) + goto put_cpu_node; + + for_each_possible_cpu(cpu) { + if (cpu == cpu_dev->id) + continue; + + tcpu_dev = get_cpu_device(cpu); + if (!tcpu_dev) { + pr_err("failed to get cpu%d device\n", cpu); + ret = -ENODEV; + goto put_cpu_node; + } + + /* Get OPP descriptor node */ + tmp_np = of_get_opp_desc_node(tcpu_dev); + if (IS_ERR(tmp_np)) { + dev_info(tcpu_dev, "%s: Couldn't find opp node: %ld\n", + __func__, PTR_ERR(np)); + ret = -EINVAL; + goto put_cpu_node; + } + + /* CPUs are sharing opp node */ + if (np == tmp_np) + cpumask_set_cpu(cpu, cpumask); + + of_node_put(tmp_np); + } + +put_cpu_node: + of_node_put(np); + return ret; +} +EXPORT_SYMBOL_GPL(of_get_cpus_sharing_opps); #endif diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 9949d07a93f9..41cbc7469b5a 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -115,6 +115,9 @@ static inline struct srcu_notifier_head *dev_pm_opp_get_notifier( #if defined(CONFIG_PM_OPP) && defined(CONFIG_OF) int of_init_opp_table(struct device *dev); void of_free_opp_table(struct device *dev); +int of_cpumask_init_opp_table(cpumask_var_t cpumask); +void of_cpumask_free_opp_table(cpumask_var_t cpumask); +int of_get_cpus_sharing_opps(struct device *cpu_dev, cpumask_var_t cpumask); struct device_node *of_get_opp_desc_node(struct device *dev); #else static inline int of_init_opp_table(struct device *dev) @@ -126,6 +129,20 @@ static inline void of_free_opp_table(struct device *dev) { }
+static inline int of_cpumask_init_opp_table(cpumask_var_t cpumask) +{ + return -EINVAL; +} + +static inline void of_cpumask_free_opp_table(cpumask_var_t cpumask) +{ +} + +static inline int of_get_cpus_sharing_opps(struct device *cpu_dev, cpumask_var_t cpumask) +{ + return -EINVAL; +} + static inline struct device_node *of_get_opp_desc_node(struct device *dev) { return ERR_PTR(-EINVAL);
This updates cpufreq-dt driver to read clock sharing information from new OPP bindings. And then initialize OPPs for CPUs with help of new bindings.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/cpufreq/cpufreq-dt.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index bab67db54b7e..6f307897e17a 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -208,8 +208,14 @@ static int cpufreq_init(struct cpufreq_policy *policy) goto out_put_reg_clk; }
+ pd = cpufreq_get_driver_data(); + if (!pd || !pd->independent_clocks) { + if (of_get_cpus_sharing_opps(cpu_dev, policy->cpus)) + cpumask_setall(policy->cpus); + } + /* OPPs might be populated at runtime, don't check for error here */ - of_init_opp_table(cpu_dev); + of_cpumask_init_opp_table(policy->cpus);
/* * But we need OPP table to function so if it is not there let's @@ -293,10 +299,6 @@ static int cpufreq_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = transition_latency;
- pd = cpufreq_get_driver_data(); - if (!pd || !pd->independent_clocks) - cpumask_setall(policy->cpus); - of_node_put(np);
return 0; @@ -306,7 +308,7 @@ static int cpufreq_init(struct cpufreq_policy *policy) out_free_priv: kfree(priv); out_free_opp: - of_free_opp_table(cpu_dev); + of_cpumask_free_opp_table(policy->cpus); of_node_put(np); out_put_reg_clk: clk_put(cpu_clk); @@ -322,7 +324,7 @@ static int cpufreq_exit(struct cpufreq_policy *policy)
cpufreq_cooling_unregister(priv->cdev); dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); - of_free_opp_table(priv->cpu_dev); + of_cpumask_free_opp_table(policy->related_cpus); clk_put(policy->clk); if (!IS_ERR(priv->cpu_reg)) regulator_put(priv->cpu_reg);
On 02/11, Viresh Kumar wrote:
Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
Some part is still now done:
- Interface for adding new detailed OPPs from platform code instead of DT
- Providing cpufreq helpers for the next OPPs
- Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
Here's some feedback on how we can't use OPPs (and OPPs in DT) on qcom platforms.
On these platforms the OPPs are not always frequency voltage pairs. Sometimes they're a frequency voltage voltage triplet, or frequency voltage current triplet. I know that the OPP framework in the kernel doesn't support more than frequency voltage pairs, but that doesn't mean it can't in the future, and if it needs to do so the DT bindings shouldn't need a v3 revision.
Furthermore, we have a large number of OPP sets that apply to different speed bins and silicon characteristics of the SoC. In our systems we read some efuses (an eeprom of sorts) that tell us to use a certain set of OPPs because the silicon is so fast or has these certain characteristics. The bootloader is *not* reading these fuses and populating OPPs in DT. So right now we just put all these custom OPPish tables in DT and then pick the right one based on a node name match constructed from the bits we read in the efuses. How can we express this in DT with these bindings?
For example, on msm8974 we have a frequency voltage current triplet and there are 18 different sets of these triplets, each with 23 triplets per set. We could encode all of these tables as a bunch of nodes with compatible = "operating-points-v2" but how do we pick the right one to assign and populate for the CPU devices? Do we need some sort of opp-fuse-map table that encodes the information we want somewhere?
qcom,opp-fuse-map = <speedX binY versionZ &oppN>, ...
but where do we put it? In the cpus node? Or maybe we can keep doing the node name matching? That would required dropping the oppN convention.
Or take msm8916 as another example. On this device the voltage for a few frequencies comes from the efuses and then we interpolate the rest of the frequency voltage pairs. The speed bins are picked from another set of efuses so we can do the interpolation. Unfortunately we don't encode the frequency in the fuses, so we rely on a handful of tables being defined somewhere so that we know speed bin 0 means this set of frequencies and speed bin 1 means this set of frequencies. How do we encode this in DT? Should we have the frequencies as OPPs and leave the voltage part out, filling it in at runtime based on what we read out of the efuses? I assume it's desirable to have the frequency tables in DT but we could also have them in the driver and if we did that there wouldn't be any shared-opp property to set and have the cpufreq-dt driver use to figure out clock sharing.
Also sometimes we need to correlate OPPs between each other. For example on msm8960/apq8064 if the CPU is running at a frequency and voltage, the L2 needs to be running at another frequency, voltage, and voltage (triplet). The L2 is in two power domains but it only has one clock. Can/should this be expressed in DT? It certainly seems that it's at least easier to add it on as a feature because OPPs are nodes instead of an array. But we need to make sure we can support multiple regulators somehow, either through correlated OPPs and multiple OPPs for a single device or by being able to say opp-0-microvolt, opp-1-microvolt. I would guess something similar could happen if there were two clocks and one regulator although I've never seen such a scenario in practice.
On 12 February 2015 at 08:52, Stephen Boyd sboyd@codeaurora.org wrote:
Here's some feedback on how we can't use OPPs (and OPPs in DT) on qcom platforms.
On these platforms the OPPs are not always frequency voltage pairs. Sometimes they're a frequency voltage voltage triplet, or
So, making opp-microvolt an array of <target/min/max>, values should fix this? Do we also need a opp-microvolt-names array as well? or can we reused similar ones from the CPU node where regulator are defined.
frequency voltage current triplet. I know that the OPP framework
so we do need opp-milliamp here..
in the kernel doesn't support more than frequency voltage pairs, but that doesn't mean it can't in the future, and if it needs to do so the DT bindings shouldn't need a v3 revision.
Sure.
Furthermore, we have a large number of OPP sets that apply to different speed bins and silicon characteristics of the SoC. In our systems we read some efuses (an eeprom of sorts) that tell us to use a certain set of OPPs because the silicon is so fast or has these certain characteristics. The bootloader is *not* reading these fuses and populating OPPs in DT. So right now we just put all these custom OPPish tables in DT and then pick the right one based on a node name match constructed from the bits we read in the efuses. How can we express this in DT with these bindings?
What about keeping things as is in DT and disabling the OPPs which you don't support at boot? And only keep enabled the set of OPPs that you want to use based on these efuses ?
For example, on msm8974 we have a frequency voltage current triplet and there are 18 different sets of these triplets, each with 23 triplets per set. We could encode all of these tables as a bunch of nodes with compatible = "operating-points-v2" but how do we pick the right one to assign and populate for the CPU devices? Do we need some sort of opp-fuse-map table that encodes the information we want somewhere?
Probably add all these groups in a single OPP descriptor and enable/disable them at runtime..
qcom,opp-fuse-map = <speedX binY versionZ &oppN>, ...
but where do we put it? In the cpus node? Or maybe we can keep doing the node name matching? That would required dropping the oppN convention.
Probably this might not be required then ?
Or take msm8916 as another example. On this device the voltage for a few frequencies comes from the efuses and then we interpolate the rest of the frequency voltage pairs. The speed bins are picked from another set of efuses so we can do the interpolation. Unfortunately we don't encode the frequency in the fuses, so we rely on a handful of tables being defined somewhere so that we know speed bin 0 means this set of frequencies and speed bin 1 means this set of frequencies. How do we encode this in DT? Should we have the frequencies as OPPs and leave the voltage part out, filling it in at runtime based on what we read
Maybe yes.
out of the efuses? I assume it's desirable to have the frequency tables in DT but we could also have them in the driver and if we did that there wouldn't be any shared-opp property to set and have the cpufreq-dt driver use to figure out clock sharing.
Probably better keep them in DT. But for platforms with dynamic OPPs only, we will surely provide some API to make OPPs shared at runtime too..
Also sometimes we need to correlate OPPs between each other. For example on msm8960/apq8064 if the CPU is running at a frequency and voltage, the L2 needs to be running at another frequency, voltage, and voltage (triplet). The L2 is in two power domains but it only has one clock. Can/should this be expressed in DT? It
Not sure.
certainly seems that it's at least easier to add it on as a feature because OPPs are nodes instead of an array. But we need
It wouldn't be a great idea to make these nodes too large just to support some corner cases, and we better try to find the best way out. But if we need it this way, we need it this way :)
to make sure we can support multiple regulators somehow, either through correlated OPPs and multiple OPPs for a single device or by being able to say opp-0-microvolt, opp-1-microvolt. I would
an array would be better.
guess something similar could happen if there were two clocks and one regulator although I've never seen such a scenario in practice.
Isn't this common? A single regulator voltage supporting multiple clock rates? Or did I misunderstood it :)
We can have separate OPP nodes in this case.
On 02/12, Viresh Kumar wrote:
On 12 February 2015 at 08:52, Stephen Boyd sboyd@codeaurora.org wrote:
Here's some feedback on how we can't use OPPs (and OPPs in DT) on qcom platforms.
On these platforms the OPPs are not always frequency voltage pairs. Sometimes they're a frequency voltage voltage triplet, or
So, making opp-microvolt an array of <target/min/max>, values should fix this? Do we also need a opp-microvolt-names array as well? or can we reused similar ones from the CPU node where regulator are defined.
I don't follow how target/min/max does anything for two different voltages. I suppose something like opp-microvolt-names would work though but I don't know how software would correlate that to the regulator it uses because that information is elsewhere in the device's node. Why not put the information about which clock and regulator is used into the opp node?
Furthermore, we have a large number of OPP sets that apply to different speed bins and silicon characteristics of the SoC. In our systems we read some efuses (an eeprom of sorts) that tell us to use a certain set of OPPs because the silicon is so fast or has these certain characteristics. The bootloader is *not* reading these fuses and populating OPPs in DT. So right now we just put all these custom OPPish tables in DT and then pick the right one based on a node name match constructed from the bits we read in the efuses. How can we express this in DT with these bindings?
What about keeping things as is in DT and disabling the OPPs which you don't support at boot? And only keep enabled the set of OPPs that you want to use based on these efuses ?
Let's look at an example:
speed0 bin0 version0 = /* Hz uV uA */ < 300000000 815000 73 >, < 345600000 825000 85 >, < 422400000 835000 104 >, ....
speed0 bin1 version0 = < 300000000 800000 73 >, < 345600000 810000 85 >, < 422400000 820000 104 >, ...
Each set of fuses has the exact same frequency (as long as the speed is the same) but the bin changes the voltage and sometimes current. Maybe we could make this into:
speed0_bin0_version0_opp: opp0 { compatible = "qcom,speed0-bin0-version0-opp", "operating-points-v2"; entry0 { opp-khz = <300000>; opp-microvolt = <815000>; opp-milliamp = <73>; }; entry1 { opp-khz = <345600>; opp-microvolt = <825000>; opp-milliamp = <85>; }; ... };
speed0_bin1_version0_opp: opp0 { compatible = "qcom,speed0-bin1-version0-opp", "operating-points-v2"; entry0 { opp-khz = <300000>; opp-microvolt = <800000>; opp-milliamp = <73>; }; entry1 { opp-khz = <345600>; opp-microvolt = <810000>; opp-milliamp = <85>; }; ... };
And then we can construct a compatible string to search the cpus node for. I wonder if we shouldn't put all this into an opps node under the cpus node?
cpus { opps { opp0 { entry0 { } ... } opp1 { entry0 { } ... } } }
Or take msm8916 as another example. On this device the voltage for a few frequencies comes from the efuses and then we interpolate the rest of the frequency voltage pairs. The speed bins are picked from another set of efuses so we can do the interpolation. Unfortunately we don't encode the frequency in the fuses, so we rely on a handful of tables being defined somewhere so that we know speed bin 0 means this set of frequencies and speed bin 1 means this set of frequencies. How do we encode this in DT? Should we have the frequencies as OPPs and leave the voltage part out, filling it in at runtime based on what we read
Maybe yes.
out of the efuses? I assume it's desirable to have the frequency tables in DT but we could also have them in the driver and if we did that there wouldn't be any shared-opp property to set and have the cpufreq-dt driver use to figure out clock sharing.
Probably better keep them in DT. But for platforms with dynamic OPPs only, we will surely provide some API to make OPPs shared at runtime too..
That would be good. Hopefully we can do that so that the cpufreq-dt driver doesn't have to open code DT probing to figure out that the OPPs are shared between CPUs.
Also sometimes we need to correlate OPPs between each other. For example on msm8960/apq8064 if the CPU is running at a frequency and voltage, the L2 needs to be running at another frequency, voltage, and voltage (triplet). The L2 is in two power domains but it only has one clock. Can/should this be expressed in DT? It
Not sure.
certainly seems that it's at least easier to add it on as a feature because OPPs are nodes instead of an array. But we need
It wouldn't be a great idea to make these nodes too large just to support some corner cases, and we better try to find the best way out. But if we need it this way, we need it this way :)
to make sure we can support multiple regulators somehow, either through correlated OPPs and multiple OPPs for a single device or by being able to say opp-0-microvolt, opp-1-microvolt. I would
an array would be better.
Sure.
guess something similar could happen if there were two clocks and one regulator although I've never seen such a scenario in practice.
Isn't this common? A single regulator voltage supporting multiple clock rates? Or did I misunderstood it :)
We can have separate OPP nodes in this case.
I was thinking the same device has two clocks that share the same voltage and these two clocks can run at different rates. If two opp nodes under the same device works then it sounds like it will be ok. We probably still need some way to correlate the two so that if one clock is at some freq, the other clock should be at the corresponding freq in the OPP, assuming that the OPP is really two clocks and a voltage (i.e. the clocks need to scale together).
On 12 February 2015 at 16:20, Stephen Boyd sboyd@codeaurora.org wrote:
On 02/12, Viresh Kumar wrote:
On 12 February 2015 at 08:52, Stephen Boyd sboyd@codeaurora.org wrote:
Here's some feedback on how we can't use OPPs (and OPPs in DT) on qcom platforms.
On these platforms the OPPs are not always frequency voltage pairs. Sometimes they're a frequency voltage voltage triplet, or
So, making opp-microvolt an array of <target/min/max>, values should fix this? Do we also need a opp-microvolt-names array as well? or can we reused similar ones from the CPU node where regulator are defined.
I don't follow how target/min/max does anything for two different
<target/min/max> represents a single regulators voltage levels for any OPP, this replaces the existing target+voltage-tolerance stuff..
Now to support multiple regulators, we can have an array here.
voltages. I suppose something like opp-microvolt-names would work though but I don't know how software would correlate that to the regulator it uses because that information is elsewhere in the
I hope the CPU node will have an array of supplies to support multiple regulators, right? If yes, then we can just keep the array of <t/min/max> in the same order. And software should be able to correlate then?
device's node. Why not put the information about which clock and regulator is used into the opp node?
Don't know. I have been asked specifically to keem them out of the OPPs, as they belong to the CPU or device instead.
Furthermore, we have a large number of OPP sets that apply to different speed bins and silicon characteristics of the SoC. In our systems we read some efuses (an eeprom of sorts) that tell us to use a certain set of OPPs because the silicon is so fast or has these certain characteristics. The bootloader is *not* reading these fuses and populating OPPs in DT. So right now we just put all these custom OPPish tables in DT and then pick the right one based on a node name match constructed from the bits we read in the efuses. How can we express this in DT with these bindings?
What about keeping things as is in DT and disabling the OPPs which you don't support at boot? And only keep enabled the set of OPPs that you want to use based on these efuses ?
Let's look at an example:
speed0 bin0 version0 = /* Hz uV uA */ < 300000000 815000 73 >, < 345600000 825000 85 >, < 422400000 835000 104 >, .... speed0 bin1 version0 = < 300000000 800000 73 >, < 345600000 810000 85 >, < 422400000 820000 104 >, ...
Each set of fuses has the exact same frequency (as long as the speed is the same) but the bin changes the voltage and sometimes current. Maybe we could make this into:
speed0_bin0_version0_opp: opp0 { compatible = "qcom,speed0-bin0-version0-opp", "operating-points-v2"; entry0 { opp-khz = <300000>; opp-microvolt = <815000>; opp-milliamp = <73>; };
entry1 { opp-khz = <345600>; opp-microvolt = <825000>; opp-milliamp = <85>; }; ...
};
speed0_bin1_version0_opp: opp0 { compatible = "qcom,speed0-bin1-version0-opp", "operating-points-v2"; entry0 { opp-khz = <300000>; opp-microvolt = <800000>; opp-milliamp = <73>; };
entry1 { opp-khz = <345600>; opp-microvolt = <810000>; opp-milliamp = <85>; }; ...
};
And then we can construct a compatible string to search the cpus
So we can make:
operating-points-v2 = <&cpu0_opp>;
an array instead, and that btw is expandable in future as well. So we may leave it as is right now, and update it later.
node for. I wonder if we shouldn't put all this into an opps node under the cpus node?
Nodes in 'cpus' node are thought of as CPUs and so was asked to put this out at the top.
guess something similar could happen if there were two clocks and one regulator although I've never seen such a scenario in practice.
Isn't this common? A single regulator voltage supporting multiple clock rates? Or did I misunderstood it :)
We can have separate OPP nodes in this case.
I was thinking the same device has two clocks that share the same voltage and these two clocks can run at different rates. If two opp nodes under the same device works then it sounds like it will be ok. We probably still need some way to correlate the two so that if one clock is at some freq, the other clock should be at the corresponding freq in the OPP, assuming that the OPP is really two clocks and a voltage (i.e. the clocks need to scale together).
Oh, I got it wrong. So what this requires is an array of clock-rates like what I am proposing for regulators.
Over that we can make these entries arrays later on as well, as this wouldn't break anything. So, keep life simple for now.
On Tue, Feb 17, 2015 at 01:16:25PM +0530, Viresh Kumar wrote:
On 12 February 2015 at 16:20, Stephen Boyd sboyd@codeaurora.org wrote:
voltages. I suppose something like opp-microvolt-names would work though but I don't know how software would correlate that to the regulator it uses because that information is elsewhere in the
I hope the CPU node will have an array of supplies to support multiple regulators, right? If yes, then we can just keep the array of <t/min/max> in the same order. And software should be able to correlate then?
It's probably not *too* bad for small numbers of supplies but bindings that require that indexes into two different arrays match up (particularly with different step sizes as here) do have legibility issues - it's easy to miscount when reading things.
device's node. Why not put the information about which clock and regulator is used into the opp node?
Don't know. I have been asked specifically to keem them out of the OPPs, as they belong to the CPU or device instead.
The DT should describe the hardware and how it's connected. No power supplies or clocks are connected to an OPP, an OPP is a property of something else.
On 11 February 2015 at 13:46, Viresh Kumar viresh.kumar@linaro.org wrote:
Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
Some part is still now done:
s/now/not
- Interface for adding new detailed OPPs from platform code instead of DT
- Providing cpufreq helpers for the next OPPs
- Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
Can we have some reviews of the code changes please ?
-- viresh
On 27 February 2015 at 10:55, Viresh Kumar viresh.kumar@linaro.org wrote:
On 11 February 2015 at 13:46, Viresh Kumar viresh.kumar@linaro.org wrote:
Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
Some part is still now done:
s/now/not
- Interface for adding new detailed OPPs from platform code instead of DT
- Providing cpufreq helpers for the next OPPs
- Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
Can we have some reviews of the code changes please ?
Rob/Rafael/Arnd/Nishanth,
Reviews please? This had been hanging since a long time, can we get this merged in next merge window please?
Bindings are already OK'd by Rob (in personal conversation) and its all about code now. Which can be updated later on as well if we find an issue..
@Rob: Can you please Ack whatever stuff you are okay with ?
-- viresh
On 11 February 2015 at 13:46, Viresh Kumar viresh.kumar@linaro.org wrote:
Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
Some part is still now done:
- Interface for adding new detailed OPPs from platform code instead of DT
- Providing cpufreq helpers for the next OPPs
- Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
@Rafael: Its been two months now since this was posted. Can we get this merged please? If someone finds issues later, we can go fix them ..
And this *should* not break any existing stuff as this is mostly new stuff..
-- viresh
On Wed, Apr 1, 2015 at 1:22 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 11 February 2015 at 13:46, Viresh Kumar viresh.kumar@linaro.org wrote:
Now that I have received an verbal Ack from Rob Herring (in a personal conversation) about the bindings, I am showing how the code looks like with these new bindings.
I have also told you multiple times I want to see others' inputs and acks. So yes it looks fine to me, but I am not a user of it.
Some part is still now done:
- Interface for adding new detailed OPPs from platform code instead of DT
- Providing cpufreq helpers for the next OPPs
- Providing regulator helpers for the target/min/max ranges
Please provide feedback on how this looks like..
@Rafael: Its been two months now since this was posted. Can we get this merged please? If someone finds issues later, we can go fix them ..
If there are users waiting for this, then get their acks first. As you already know, I spoke to Stephen at ELC and he still has concerns about it. As Qualcomm is probably the most complicated case, it needs to work for them (or we accept that it can't and they do everything their own way).
Rob
On 1 April 2015 at 22:13, Rob Herring robherring2@gmail.com wrote:
I have also told you multiple times I want to see others' inputs and acks. So yes it looks fine to me, but I am not a user of it.
Right :) and so I am trying hard to get those people to review it. Probably Stephen will respond now and I have also managed to convince Nishanth to review it :)
@Rafael: Its been two months now since this was posted. Can we get this merged please? If someone finds issues later, we can go fix them ..
So, atleast I got the required attention :)
If there are users waiting for this, then get their acks first. As you
I am one of those for my cpufreq-dt driver. :)
already know, I spoke to Stephen at ELC and he still has concerns about it. As Qualcomm is probably the most complicated case, it needs to work for them (or we accept that it can't and they do everything their own way).
Yeah, so we can't wait for them for ever. They have to respond now if they want to use it. Or we go ahead ...
linaro-kernel@lists.linaro.org