Hi Guys (Specially Stephen),
These are dangling since a long time now and I am really getting discouraged to take these further. Not many people have reviewed them, apart from few nice folks.
One last try ..
I have tried to take care of most of the concerns you had Stephen, please let me know if they workout for you or you don't want these bindings to take care of qcom stuff at all. Then we can keep them simple and carry on. Ofcourse we would like to handle every platform here :)
V3->V4: - Dropped code changes as we are still concerned about bindings. - separated out into three patches, some of which might be NAK'd. :) - The first patch presents basic OPP stuff that was reviewed earlier. It also has support for multiple regulators, with values for both current and voltage. - Second patch is based on a special concern that Stephen had about multiple OPP tables, one of which the parsing code will select at runtime. - Third one separates out 'opp-next' or Intermediate freq support as Mike T. had few concerns over it. He wanted the clock driver to take care of this and so do not want it to be passed by DT and used by cpufreq. Also, there were concerns like the platform may not want to choose intermediate frequency as a target frequency for longer runs, which wasn't prevented in earlier bindings. And so it is kept separate to be NAK'd quietly, without much disturbances.
---------------x-------------------x------------------------
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
The shortcomings we are trying to solve here:
- Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
- Support for specifying current levels along with voltages.
- Support for multiple regulators.
- Support for turbo modes.
- Other per OPP settings: transition latencies, disabled status, etc.?
- Expandability of OPPs in future.
This patchset shows a path forward to get all these fixed. Please give some comments this time and lets get this done.
-- viresh
Viresh Kumar (3): OPP: Redefine bindings to overcome shortcomings OPP: Allow multiple OPP tables to be passed via DT OPP: Add 'opp-next' in operating-points-v2 bindings
Documentation/devicetree/bindings/power/opp.txt | 501 +++++++++++++++++++++++- 1 file changed, 497 insertions(+), 4 deletions(-)
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
The shortcomings we are trying to solve here:
- Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
- Support for specifying current levels along with voltages.
- Support for multiple regulators.
- Support for turbo modes.
- Other per OPP settings: transition latencies, disabled status, etc.?
- Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++- 1 file changed, 362 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 74499e5033fc..3b67a5c8d965 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -1,8 +1,366 @@ -* Generic OPP Interface +Generic OPP (Operating Performance Points) Bindings +----------------------------------------------------
-SoCs have a standard set of tuples consisting of frequency and -voltage pairs that the device will support per voltage domain. These -are called Operating Performance Points or OPPs. +Devices work at voltage-current-frequency triplets and some implementations have +the liberty of choosing these. These triplets are called Operating Performance +Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device. + + +* Property: operating-points-v2 + +Devices supporting OPPs must set their "operating-points-v2" property with +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle +to find the operating points for the device. + + +* OPP Descriptor Node + +This describes the OPPs belonging to a device. This node can have following +properties: + +Required properties: +- compatible: Allow OPPs to express their compatibility. It should be: + "operating-points-v2". +- OPP nodes: One or more OPP nodes describing voltage-current-frequency + triplets. Their name isn't significant but their phandle can be used to + reference an OPP. + +Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle + switch their DVFS state together, i.e. they share clock/voltage/current lines. + Missing property means devices have independent clock/voltage/current lines, + but they share OPP tables. + + +* OPP Node + +This defines voltage-current-frequency triplets along with other related +properties. + +Required properties: +- opp-khz: Frequency in kHz + +Optional properties: +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple + regulators. + + A single regulator's voltage is specified with an array of size one or three. + Single entry is for target voltage and three entries are for <target min max> + voltages. + + Entries for multiple regulators must be present in the same order as + regulators are specified in device's DT node. + +- opp-microamp: current in micro Amperes. It can contain entries for multiple + regulators. + + A single regulator's current is specified with an array of size one or three. + Single entry is for target current and three entries are for <target min max> + currents. + + Entries for multiple regulators must be present in the same order as + regulators are specified in device's DT node. If few regulators don't provide + capability to configure current, then values for then should be marked as + zero. + +- clock-latency-ns: Specifies the maximum possible transition latency (in + nanoseconds) for switching to this OPP from any other OPP. +- turbo-mode: Marks the OPP to be used only for turbo modes. +- status: Marks the node enabled/disabled. + +Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a9"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@1 { + compatible = "arm,cortex-a9"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + opp-microamp = <70000 75000 85000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + opp-microamp = <80000 81000 82000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; +}; + +Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states +independently. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "qcom,krait"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@1 { + compatible = "qcom,krait"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@2 { + compatible = "qcom,krait"; + reg = <2>; + next-level-cache = <&L2>; + clocks = <&clk_controller 2>; + clock-names = "cpu"; + opp-supply = <&cpu_supply2>; + operating-points-v2 = <&cpu0_opp>; + }; + + cpu@3 { + compatible = "qcom,krait"; + reg = <3>; + next-level-cache = <&L2>; + clocks = <&clk_controller 3>; + clock-names = "cpu"; + opp-supply = <&cpu_supply3>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + + /* + * Missing shared-opp property means CPUs switch DVFS states + * independently. + */ + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + opp-microamp = <70000 75000 85000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + opp-microamp = <80000 81000 82000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + opp-microamp = <90000; + lock-latency-ns = <290000>; + turbo-mode; + }; + }; +}; + +Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch +DVFS state together. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@1 { + compatible = "arm,cortex-a7"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@100 { + compatible = "arm,cortex-a15"; + reg = <100>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + + cpu@101 { + compatible = "arm,cortex-a15"; + reg = <101>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + opp-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + }; + + cluster0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + opp-microamp = <70000 75000 85000>; + clock-latency-ns = <300000>; + }; + entry01 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + opp-microamp = <80000 81000 82000>; + clock-latency-ns = <310000>; + }; + entry02 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + opp-microamp = <90000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; + + cluster1_opp: opp1 { + compatible = "operating-points-v2"; + shared-opp; + + entry10 { + opp-khz = <1300000>; + opp-microvolt = <1045000 1050000 1055000>; + opp-microamp = <95000 100000 105000>; + clock-latency-ns = <400000>; + }; + entry11 { + opp-khz = <1400000>; + opp-microvolt = <1075000>; + opp-microamp = <100000>; + clock-latency-ns = <400000>; + }; + entry12 { + opp-khz = <1500000>; + opp-microvolt = <1010000 1100000 1110000>; + opp-microamp = <95000 100000 105000>; + clock-latency-ns = <400000>; + turbo-mode; + }; + }; +}; + +Example 4: Handling multiple regulators + +/ { + cpus { + cpu@0 { + compatible = "arm,cortex-a7"; + ... + + opp-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000>, /* Supply 0 */ + <960000>, /* Supply 1 */ + <960000>; /* Supply 2 */ + opp-microamp = <70000>, /* Supply 0 */ + <70000>, /* Supply 1 */ + <70000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + + /* OR */ + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>, /* Supply 0 */ + <960000 965000 975000>, /* Supply 1 */ + <960000 965000 975000>; /* Supply 2 */ + opp-microamp = <70000 75000 85000>, /* Supply 0 */ + <70000 75000 85000>, /* Supply 1 */ + <70000 75000 85000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + + /* OR */ + + entry00 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>, /* Supply 0 */ + <960000 965000 975000>, /* Supply 1 */ + <960000 965000 975000>; /* Supply 2 */ + opp-microamp = <70000 75000 85000>, /* Supply 0 */ + <0 0 0>, /* Supply 1 doesn't support current change */ + <70000 75000 85000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + }; +}; + + +Deprecated Bindings +-------------------
Properties: - operating-points: An array of 2-tuples items, and each item consists
On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
What is this for - are you trying to define OPPs for current regulators? If you are that's worrying, I can't think of a sensible use case. If that's not what's happening then the binding needs to be more specific about what this is supposed to mean.
On 4 May 2015 at 17:42, Mark Brown broonie@kernel.org wrote:
On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
What is this for - are you trying to define OPPs for current regulators? If you are that's worrying, I can't think of a sensible use case. If that's not what's happening then the binding needs to be more specific about what this is supposed to mean.
Its another property of the same voltage regulator we are using in opp-microvolt. I hope that makes sense ?
And that's why I wrote this as well:
+ Entries for multiple regulators must be present in the same order as + regulators are specified in device's DT node. If few regulators don't provide + capability to configure current, then values for then should be marked as + zero.
i.e. this is optional but the voltage regulator isn't..
But, perhaps I need to write it with more clarity? Or this field doesn't make sense ?
FWIW, its an outcome of this request from Stephen:
https://www.marc.info/?l=linux-arm-kernel&m=142370250522589&w=3
On Tue, May 05, 2015 at 04:18:59PM +0530, Viresh Kumar wrote:
On 4 May 2015 at 17:42, Mark Brown broonie@kernel.org wrote:
On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
What is this for - are you trying to define OPPs for current regulators? If you are that's worrying, I can't think of a sensible use case. If that's not what's happening then the binding needs to be more specific about what this is supposed to mean.
Its another property of the same voltage regulator we are using in opp-microvolt. I hope that makes sense ?
No, it doesn't - you're not answering the question about what this is for.
But, perhaps I need to write it with more clarity? Or this field doesn't make sense ?
To know if this makes sense I need to know what you beleive "setting the current" does. If you literally mean setting the current it makes no sense at all. If you mean something else that something else should probably be written into the binding.
On 5 May 2015 at 16:27, Mark Brown broonie@kernel.org wrote:
No, it doesn't - you're not answering the question about what this is for.
I don't know how this information will be used finally. Probably the platform driver will do the configuration based on the volt-cur pair.
To know if this makes sense I need to know what you beleive "setting the current" does. If you literally mean setting the current it makes no sense at all. If you mean something else that something else should probably be written into the binding.
Yeah, that was a wrong statement. We can't configure current separately.
Does this diff make it any better ?
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index c96dc77121b7..a57e88ab4554 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -59,16 +59,18 @@ properties. regulators are specified in device's DT node.
- opp-microamp: current in micro Amperes. It can contain entries for multiple - regulators. + regulators. This can be referenced (along with voltage and freqency) while + programming the regulator.
A single regulator's current is specified with an array of size one or three. Single entry is for target current and three entries are for <target min max> currents.
Entries for multiple regulators must be present in the same order as - regulators are specified in device's DT node. If few regulators don't provide - capability to configure current, then values for then should be marked as - zero. + regulators are specified in device's DT node. If current value for few + regulators isn't required to be passed, then values for such regulators should + be marked as zero. If it isn't required for any regulator, then this property + need not be present.
- clock-latency-ns: Specifies the maximum possible transition latency (in nanoseconds) for switching to this OPP from any other OPP.
(Restoring my laptop after a corrupted disk, and so sending it from gmail, might be a bit corrupted)..
On Tue, May 05, 2015 at 05:13:33PM +0530, Viresh Kumar wrote:
On 5 May 2015 at 16:27, Mark Brown broonie@kernel.org wrote:
No, it doesn't - you're not answering the question about what this is for.
I don't know how this information will be used finally. Probably the platform driver will do the configuration based on the volt-cur pair.
I don't think we should be defining a binding without understanding what it is supposed to mean.
- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- regulators. This can be referenced (along with voltage and freqency) while
- programming the regulator.
This still doesn't say what the value is supposed to mean which means it's not possible for someone to write generic code which handles the binding - saying that people can reference the value doesn't tell them how to interpret it. A limit? A nominal constant draw value? A value to regulate to? Those are all different things expressed as a current.
On 5 May 2015 at 22:42, Mark Brown broonie@kernel.org wrote:
This still doesn't say what the value is supposed to mean which means it's not possible for someone to write generic code which handles the binding - saying that people can reference the value doesn't tell them how to interpret it. A limit? A nominal constant draw value? A value to regulate to? Those are all different things expressed as a current.
@Stephen: Can you please answer these queries please, so that we can get the bindings correctly ?
On 05/06, Viresh Kumar wrote:
On 5 May 2015 at 22:42, Mark Brown broonie@kernel.org wrote:
This still doesn't say what the value is supposed to mean which means it's not possible for someone to write generic code which handles the binding - saying that people can reference the value doesn't tell them how to interpret it. A limit? A nominal constant draw value? A value to regulate to? Those are all different things expressed as a current.
@Stephen: Can you please answer these queries please, so that we can get the bindings correctly ?
If you look at the cpufreq/clock/pmic code on our codeaurora.org tree you'll see that it's used to pass a value with uA units through the regulator_set_optimum_mode() API. The call to regulator_set_optimum_mode() is here[1], and the place where we parse the OPP table from DT is here[2]. My understanding is that with recent changes to the regulator framework, we would do a s/regulator_set_optimum_mode/regulator_set_load/ and things would otherwise be the same on the consumer side. On the regulator side, we would move the .get_optimum_mode op to .set_load instead of the hack where we write to the hardware in .get_optimum_mode[3].
So does that mean it corresponds to a limit, or a nominal constant draw value, or a value to regulator to? From what I can tell it's used to figure out how many phases to enable[4]. I suspect that means it's being used to figure out what value it should regulate to. In newer PMICs this is all done automatically, but at least for some of the PMICs we need to do it manually.
[1] https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.10/tree/drivers/clk/qco... [2] https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.10/tree/drivers/clk/qco... [3] https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.10/tree/arch/arm/mach-m... [4] https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.10/tree/arch/arm/mach-m...
On Wed, May 06, 2015 at 10:52:31PM -0700, Stephen Boyd wrote:
If you look at the cpufreq/clock/pmic code on our codeaurora.org tree you'll see that it's used to pass a value with uA units through the regulator_set_optimum_mode() API. The call to regulator_set_optimum_mode() is here[1], and the place where we parse the OPP table from DT is here[2]. My understanding is that
I'm not looking for anyone to explain this to me in e-mail, what I'm looking for is for the binding document to be clear so someone can tell what the binding means by reading the documentation for the binding.
On 05/07, Mark Brown wrote:
On Wed, May 06, 2015 at 10:52:31PM -0700, Stephen Boyd wrote:
If you look at the cpufreq/clock/pmic code on our codeaurora.org tree you'll see that it's used to pass a value with uA units through the regulator_set_optimum_mode() API. The call to regulator_set_optimum_mode() is here[1], and the place where we parse the OPP table from DT is here[2]. My understanding is that
I'm not looking for anyone to explain this to me in e-mail, what I'm looking for is for the binding document to be clear so someone can tell what the binding means by reading the documentation for the binding.
Ok. Perhaps the simplest thing to do then is to reuse wording from the regulator_set_load() API documentation? That's the only usage of this value I'm aware of. Something like:
The current load of the device when using this OPP. Used to set the most efficient regulator operating mode.
We don't need any sort of min/max for this property either, so a single value should be all that's required.
On Thu, May 07, 2015 at 02:18:55PM -0700, Stephen Boyd wrote:
On 05/07, Mark Brown wrote:
I'm not looking for anyone to explain this to me in e-mail, what I'm looking for is for the binding document to be clear so someone can tell what the binding means by reading the documentation for the binding.
Ok. Perhaps the simplest thing to do then is to reuse wording from the regulator_set_load() API documentation? That's the only usage of this value I'm aware of. Something like:
The current load of the device when using this OPP. Used to set the most efficient regulator operating mode.
We don't need any sort of min/max for this property either, so a single value should be all that's required.
That makes sense to me. Perhaps "current drawn by the device" rather than "current load of the device" since current is sadly overloaded :/
On 8 May 2015 at 03:48, Mark Brown broonie@kernel.org wrote:
That makes sense to me. Perhaps "current drawn by the device" rather than "current load of the device" since current is sadly overloaded :/
Thanks Mark. I have reworded it as:
- opp-microamp: Current drawn by the device in micro Amperes. It is used to set the most efficient regulator operating mode.
Should only be set if opp-microvolt is set for the OPP.
Entries for multiple regulators must be present in the same order as regulators are specified in device's DT node. If this property isn't required for few regulators, then this should be marked as zero for them. If it isn't required for any regulator, then this property need not be present.
Please let me know if this looks fine now.
On Fri, May 08, 2015 at 12:17:44PM +0530, Viresh Kumar wrote:
Thanks Mark. I have reworded it as:
opp-microamp: Current drawn by the device in micro Amperes. It is used to set the most efficient regulator operating mode.
Should only be set if opp-microvolt is set for the OPP.
Entries for multiple regulators must be present in the same order as regulators are specified in device's DT node. If this property isn't required for few regulators, then this should be marked as zero for them. If it isn't required for any regulator, then this property need not be present.
Please let me know if this looks fine now.
That should be fine, microamperes is usually one word.
On 8 May 2015 at 16:28, Mark Brown broonie@kernel.org wrote:
That should be fine, microamperes is usually one word.
Will fix that. Thanks a lot Mark.
On 05/08/2015 01:47 AM, Viresh Kumar wrote:
On 8 May 2015 at 03:48, Mark Brown broonie@kernel.org wrote:
That makes sense to me. Perhaps "current drawn by the device" rather than "current load of the device" since current is sadly overloaded :/
Thanks Mark. I have reworded it as:
- opp-microamp: Current drawn by the device in micro Amperes. It is used to set the most efficient regulator operating mode.
just one minor concern being in the SoC end of the world :). In most times, the current consumption for a specific OPP varies depending on the specific location in the process node the die is -> this is even true among a single lot of wafers as well. some SoC vendors use hot, nominal and cold terminology to indicate the characteristics of the specific sample.
it might help state which sample end of the spectrum we are talking about here. reason being: if I put in values based on my board measurement, the results may not be similar to what someone else's sample be. Depending on technology, speed binning strategy used by the vendor, temperature and few other characteristics, these numbers could be widely divergent.
Should only be set if opp-microvolt is set for the OPP.
Entries for multiple regulators must be present in the same order as regulators are specified in device's DT node. If this property isn't required for few regulators, then this should be marked as zero for them. If it isn't required for any regulator, then this property need not be present.
Please let me know if this looks fine now.
On 10-05-15, 20:07, Nishanth Menon wrote:
just one minor concern being in the SoC end of the world :). In most times, the current consumption for a specific OPP varies depending on the specific location in the process node the die is -> this is even true among a single lot of wafers as well. some SoC vendors use hot, nominal and cold terminology to indicate the characteristics of the specific sample.
it might help state which sample end of the spectrum we are talking about here. reason being: if I put in values based on my board measurement, the results may not be similar to what someone else's sample be. Depending on technology, speed binning strategy used by the vendor, temperature and few other characteristics, these numbers could be widely divergent.
I don't have any clue about this.. :(
@Mark/stephen: Any inputs ?
Quoting Viresh Kumar (2015-05-11 22:20:33)
On 10-05-15, 20:07, Nishanth Menon wrote:
just one minor concern being in the SoC end of the world :). In most times, the current consumption for a specific OPP varies depending on the specific location in the process node the die is -> this is even true among a single lot of wafers as well. some SoC vendors use hot, nominal and cold terminology to indicate the characteristics of the specific sample.
it might help state which sample end of the spectrum we are talking about here. reason being: if I put in values based on my board measurement, the results may not be similar to what someone else's sample be. Depending on technology, speed binning strategy used by the vendor, temperature and few other characteristics, these numbers could be widely divergent.
I don't have any clue about this.. :(
@Mark/stephen: Any inputs ?
Nishanth,
I do not think the idea of the mA property is to perfectly model current consumption at a given opp. Instead it is a nominal value that may be useful, e.g. for configuring a regulator in Stephen's case.
The TWLxxxx series of PMICs from TI have configurable SMPS which could possibly benefit from this info as well. Most of the time those are left in an "automatic mode", but there are manual programming steps to achieve higher efficiencies and this property could potentially help you do that.
Regards, Mike
Mike, On 05/12/2015 02:01 PM, Michael Turquette wrote:
Quoting Viresh Kumar (2015-05-11 22:20:33)
On 10-05-15, 20:07, Nishanth Menon wrote:
just one minor concern being in the SoC end of the world :). In most times, the current consumption for a specific OPP varies depending on the specific location in the process node the die is -> this is even true among a single lot of wafers as well. some SoC vendors use hot, nominal and cold terminology to indicate the characteristics of the specific sample.
it might help state which sample end of the spectrum we are talking about here. reason being: if I put in values based on my board measurement, the results may not be similar to what someone else's sample be. Depending on technology, speed binning strategy used by the vendor, temperature and few other characteristics, these numbers could be widely divergent.
I don't have any clue about this.. :(
@Mark/stephen: Any inputs ?
I do not think the idea of the mA property is to perfectly model current consumption at a given opp. Instead it is a nominal value that may be useful, e.g. for configuring a regulator in Stephen's case.
The TWLxxxx series of PMICs from TI have configurable SMPS which could possibly benefit from this info as well. Most of the time those are left in an "automatic mode", but there are manual programming steps to achieve higher efficiencies and this property could potentially help you do that.
While TWLxx series was kind of nascent in it's ability of choosing PWM/PFM or auto mode depending on the current targets, newer PMICs have their own unique techniques -> but, that said, this is a description of power consumption for a given OPP for the "device", How would stephen's case work with a PMIC and 2 devices which have different leakage characteristics (based on which end of the process spectrum they come from), Lets take an example: device X consumes 800mA for OPPx device Y consumes 900mA for OPPy
Taking the simpler example of TWLxxx, the PMIC switches to PWM mode at 850mA for efficiency reasons, below 850mA, it is better to operate for accuracy and efficiency reasons at PFM mode.
by putting 800mA we break efficiency on device Y, and if we choose to put 900mA or 850mA, we break efficiency on device X.
It is a lot more impactful than using relative numbers for other purposes - for example energy aware scheduling as an example -> here the actuals might have better optimization, but hints of relative power numbers by itself is pretty powerful information to help scheduling. The usage, in this case, unlike the usage for a PMIC efficiency selection, is not based on absolutes and is meant more of a hint (closer to usage such as clock transition latency numbers).
This is the reason I suggested to have a clear guidance in the bindings, since we'd very much like bindings to be ABI.
On Tue, May 12, 2015 at 02:14:19PM -0500, Nishanth Menon wrote:
While TWLxx series was kind of nascent in it's ability of choosing PWM/PFM or auto mode depending on the current targets, newer PMICs have their own unique techniques -> but, that said, this is a description of power consumption for a given OPP for the "device", How would stephen's case work with a PMIC and 2 devices which have different leakage characteristics (based on which end of the process spectrum they come from), Lets take an example: device X consumes 800mA for OPPx device Y consumes 900mA for OPPy
The system integrator would need to be somewhat conservative when specifying the currents involved. For plausible applications these are likely to be ballpark figures rather than anything too accurate - if nothing else the instantaneous current draw normally varies very substantially so realistically you're talking about a maximum here. If the corners vary that dramatically then I'd expect you'd see different OPP tables being used anyway.
It is a lot more impactful than using relative numbers for other purposes - for example energy aware scheduling as an example -> here the actuals might have better optimization, but hints of relative power numbers by itself is pretty powerful information to help scheduling. The usage, in this case, unlike the usage for a PMIC efficiency selection, is not based on absolutes and is meant more of a hint (closer to usage such as clock transition latency numbers).
You're not comparing two similar types of object here, you're trying to provide information on an actual physical value to get fed into other actual physical values.
On 05/12/2015 02:41 PM, Mark Brown wrote:
On Tue, May 12, 2015 at 02:14:19PM -0500, Nishanth Menon wrote:
While TWLxx series was kind of nascent in it's ability of choosing PWM/PFM or auto mode depending on the current targets, newer PMICs have their own unique techniques -> but, that said, this is a description of power consumption for a given OPP for the "device", How would stephen's case work with a PMIC and 2 devices which have different leakage characteristics (based on which end of the process spectrum they come from), Lets take an example: device X consumes 800mA for OPPx device Y consumes 900mA for OPPy
uggh... should have been OPPx here.
The system integrator would need to be somewhat conservative when specifying the currents involved. For plausible applications these are likely to be ballpark figures rather than anything too accurate - if nothing else the instantaneous current draw normally varies very substantially so realistically you're talking about a maximum here. If the corners vary that dramatically then I'd expect you'd see different OPP tables being used anyway.
OK - If we state "worst case", then it is quantifiable (if SoC vendors would like to expose such information - I doubt mine ever will ;) ).
- We might be able to quantify it better by stating worst case(under maximum load) steady state current (to avoid including transient spikes which are never representative) at ambient temperature(25C).
Current draw for a given OPP across temperature ranges are substantial enough even for a given chip - most of us in SoC PM world have already seen enough characterization graphs across process and temperature to realize that.
It is a lot more impactful than using relative numbers for other purposes - for example energy aware scheduling as an example -> here the actuals might have better optimization, but hints of relative power numbers by itself is pretty powerful information to help scheduling. The usage, in this case, unlike the usage for a PMIC efficiency selection, is not based on absolutes and is meant more of a hint (closer to usage such as clock transition latency numbers).
You're not comparing two similar types of object here, you're trying to provide information on an actual physical value to get fed into other actual physical values.
True - I was trying to highlight how the same "consumption data" could be interpreted differently. Hence my request for more clarity on the description. By describing the specificity of current consumption, the binding should ideally prevent mis-interpretation in usage elsewhere.
On Tue, May 12, 2015 at 02:57:29PM -0500, Nishanth Menon wrote:
On 05/12/2015 02:41 PM, Mark Brown wrote:
On Tue, May 12, 2015 at 02:14:19PM -0500, Nishanth Menon wrote:
specifying the currents involved. For plausible applications these are likely to be ballpark figures rather than anything too accurate - if nothing else the instantaneous current draw normally varies very substantially so realistically you're talking about a maximum here. If the corners vary that dramatically then I'd expect you'd see different OPP tables being used anyway.
OK - If we state "worst case", then it is quantifiable (if SoC vendors would like to expose such information - I doubt mine ever will ;) ).
- We might be able to quantify it better by stating worst case(under
maximum load) steady state current (to avoid including transient spikes which are never representative) at ambient temperature(25C).
Qualcomm do. It kind of depends on the system how worst case it needs to be - it'll depend on the expected performance of both the system and potentially the regulator. For some systems it may be important to have some accounting for transients. Equally systems would only need to be as accurate as the underlying hardware was able to make use of.
On Wed, May 13, 2015 at 6:54 AM, Mark Brown broonie@kernel.org wrote:
On Tue, May 12, 2015 at 02:57:29PM -0500, Nishanth Menon wrote:
On 05/12/2015 02:41 PM, Mark Brown wrote:
On Tue, May 12, 2015 at 02:14:19PM -0500, Nishanth Menon wrote:
specifying the currents involved. For plausible applications these are likely to be ballpark figures rather than anything too accurate - if nothing else the instantaneous current draw normally varies very substantially so realistically you're talking about a maximum here. If the corners vary that dramatically then I'd expect you'd see different OPP tables being used anyway.
OK - If we state "worst case", then it is quantifiable (if SoC vendors would like to expose such information - I doubt mine ever will ;) ).
- We might be able to quantify it better by stating worst case(under
maximum load) steady state current (to avoid including transient spikes which are never representative) at ambient temperature(25C).
Qualcomm do. It kind of depends on the system how worst case it needs to be - it'll depend on the expected performance of both the system and potentially the regulator. For some systems it may be important to have some accounting for transients. Equally systems would only need to be as accurate as the underlying hardware was able to make use of.
True, some parts of the system like choice in regulator sizing has to account for transients.
How about rephrasing as following:
- opp-microamp: The maximum current drawn by the device in microamperes considering system specific parameters (such as transients, process, aging, etc.) as necessary, at an ambient temperature of 25C. This may be used to set the most efficient regulator operating mode.
Valid only if opp-microvolt is set for the OPP.
Entries for multiple regulators must be present in the same order as regulators are specified in device's DT node. If this property isn't required for few regulators, then this should be marked as zero for them. If it isn't required for any regulator, then this property need not be present.
--- Regards, Nishanth Menon
On Wed, May 13, 2015 at 09:24:57AM -0500, Nishanth Menon wrote:
- opp-microamp: The maximum current drawn by the device in microamperes considering system specific parameters (such as transients, process,
aging, etc.) as necessary, at an ambient temperature of 25C. This may be used to set the most efficient regulator operating mode.
Why specify the temperature? That 25C is going to be terribly inefficient for my snowplow engine control system (or whatever).
On 05/13/2015 10:07 AM, Mark Brown wrote:
On Wed, May 13, 2015 at 09:24:57AM -0500, Nishanth Menon wrote:
- opp-microamp: The maximum current drawn by the device in microamperes considering system specific parameters (such as transients, process,
aging, etc.) as necessary, at an ambient temperature of 25C. This may be used to set the most efficient regulator operating mode.
Why specify the temperature? That 25C is going to be terribly inefficient for my snowplow engine control system (or whatever).
:) was trying to put a constraint since we do know that leakage changes over temperature profile and the fact that operating temperature ranges vary depending on device and market. rephrased again below:
- opp-microamp: The maximum current drawn by the device in microamperes considering system specific parameters (such as transients, process, aging, maximum operating temperature range etc.) as necessary. This may be used to set the most efficient regulator operating mode.
On 7 May 2015 at 11:22, Stephen Boyd sboyd@codeaurora.org wrote:
If you look at the cpufreq/clock/pmic code on our codeaurora.org tree you'll see that it's used to pass a value with uA units through the regulator_set_optimum_mode() API. The call to regulator_set_optimum_mode() is here[1], and the place where we parse the OPP table from DT is here[2]. My understanding is that with recent changes to the regulator framework, we would do a s/regulator_set_optimum_mode/regulator_set_load/ and things would otherwise be the same on the consumer side. On the regulator side, we would move the .get_optimum_mode op to .set_load instead of the hack where we write to the hardware in .get_optimum_mode[3].
So does that mean it corresponds to a limit, or a nominal constant draw value, or a value to regulator to? From what I can tell it's used to figure out how many phases to enable[4]. I suspect that means it's being used to figure out what value it should regulate to. In newer PMICs this is all done automatically, but at least for some of the PMICs we need to do it manually.
By current phases, you probably meant [1], right ?
Also what I couldn't understand is how are you controlling the current here?
- Is this a current only regulator ? Probably not as your code has a uv value as well..
- Now if uV is fixed, how is the target current controlled separately ?
- Should we have a min/max/target value of this current ? Like what we are doing for voltage. Or is this just the 'target' current ?
- And finally how exactly should we write this in bindings to make it clear to all users ?
Thanks.
-- viresh
On 05/07, Viresh Kumar wrote:
On 7 May 2015 at 11:22, Stephen Boyd sboyd@codeaurora.org wrote:
If you look at the cpufreq/clock/pmic code on our codeaurora.org tree you'll see that it's used to pass a value with uA units through the regulator_set_optimum_mode() API. The call to regulator_set_optimum_mode() is here[1], and the place where we parse the OPP table from DT is here[2]. My understanding is that with recent changes to the regulator framework, we would do a s/regulator_set_optimum_mode/regulator_set_load/ and things would otherwise be the same on the consumer side. On the regulator side, we would move the .get_optimum_mode op to .set_load instead of the hack where we write to the hardware in .get_optimum_mode[3].
So does that mean it corresponds to a limit, or a nominal constant draw value, or a value to regulator to? From what I can tell it's used to figure out how many phases to enable[4]. I suspect that means it's being used to figure out what value it should regulate to. In newer PMICs this is all done automatically, but at least for some of the PMICs we need to do it manually.
By current phases, you probably meant [1], right ?
Also what I couldn't understand is how are you controlling the current here?
- Is this a current only regulator ? Probably not as your code has
a uv value as well..
No.
- Now if uV is fixed, how is the target current controlled separately ?
uV is not fixed
- Should we have a min/max/target value of this current ? Like what
we are doing for voltage. Or is this just the 'target' current ?
I don't think so. It's just about setting the load through the regulator_set_load() API when this OPP is used.
- And finally how exactly should we write this in bindings to make
it clear to all users ?
Let's take this up on my reply to Mark.
On 8 May 2015 at 03:00, Stephen Boyd sboyd@codeaurora.org wrote:
On 05/07, Viresh Kumar wrote:
- Now if uV is fixed, how is the target current controlled separately ?
uV is not fixed
I meant uV is fixed for this particular OPP, i.e. min/tar/max value.
What about the other changes? Does these fix the use cases you shared earlier ? I will send a new updated version then..
On 04/30/2015 07:07 AM, Viresh Kumar wrote:
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
The shortcomings we are trying to solve here:
Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
Support for specifying current levels along with voltages.
Support for multiple regulators.
Support for turbo modes.
Other per OPP settings: transition latencies, disabled status, etc.?
Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++- 1 file changed, 362 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 74499e5033fc..3b67a5c8d965 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -1,8 +1,366 @@ -* Generic OPP Interface +Generic OPP (Operating Performance Points) Bindings +---------------------------------------------------- -SoCs have a standard set of tuples consisting of frequency and -voltage pairs that the device will support per voltage domain. These -are called Operating Performance Points or OPPs. +Devices work at voltage-current-frequency triplets and some implementations have +the liberty of choosing these. These triplets are called Operating Performance
I suggest we dont call them as triplets either -> note, we have added clk transition latency per OPP, so they are not exactly triplets either.
+Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device.
Would be good to point to operating-points (the legacy document) as well. It might be good to state that only one of the properties should be used for the device node OPP description.
+* Property: operating-points-v2
+Devices supporting OPPs must set their "operating-points-v2" property with +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle +to find the operating points for the device.
+* OPP Descriptor Node
+This describes the OPPs belonging to a device. This node can have following +properties:
+Required properties: +- compatible: Allow OPPs to express their compatibility. It should be:
- "operating-points-v2".
+- OPP nodes: One or more OPP nodes describing voltage-current-frequency
- triplets. Their name isn't significant but their phandle can be used to
we might want to use something generic instead of "triplets" here. -> considering that voltage, current are optional properties, we may not even need to describe a triplet.
- reference an OPP.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock/voltage/current lines.
- Missing property means devices have independent clock/voltage/current lines,
- but they share OPP tables.
Is'nt this property redundant? by the fact that phandle is reused for cpu1 should indicate shared OPP table, correct?
+* OPP Node
+This defines voltage-current-frequency triplets along with other related +properties.
+Required properties: +- opp-khz: Frequency in kHz
+Optional properties: +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
- regulators.
- A single regulator's voltage is specified with an array of size one or three.
- Single entry is for target voltage and three entries are for <target min max>
- voltages.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node.
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node. If few regulators don't provide
- capability to configure current, then values for then should be marked as
- zero.
+- clock-latency-ns: Specifies the maximum possible transition latency (in
- nanoseconds) for switching to this OPP from any other OPP.
+- turbo-mode: Marks the OPP to be used only for turbo modes.
Might be great to add some description for what "turbo mode" means here.
That said, flexibility of this approach is awesome, thanks for doing this, I believe we did have a discussion about "safe OPP" for thermal behavior -> now that we have phandles for OPPs, we can give phandle pointer to the thermal framework. in addition, we can also use phandle for marking throttling information in thermal framework instead of indices as well. which allows a lot of flexibility.
in some systems, we'd have need to mark certain OPPs for entry into suspend(tegra?) or during shutdown (for example) - do we allow for such description as phandle pointer back to the OPP nodes (from cpu0 device) OR do we describe them as additional properties?
+- status: Marks the node enabled/disabled.
One question we'd probably want the new framework to address is the need for OPPs based on Efuse options - Am I correct in believing that the solution that iMX, and TI SoCs probably need is something similar to modifying the status to disabled? or do we have Vendor specfic modifier properties that would be allowed?
One additional behavior need I noticed in various old threads is a need to restrict transition OPPs -> certain SoCs might have constraints on next OPP they can transition from certain OPPs in-order to achieve a new OPP. again, such behavior could be phandle lists OR properties that link the chain up together. Number of such needs did sound less, but still just thought to bring it up.
+Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
+/ {
- cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu@0 {
compatible = "arm,cortex-a9";
reg = <0>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cpu0_opp>;
};
cpu@1 {
compatible = "arm,cortex-a9";
reg = <1>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cpu0_opp>;
};
- };
- cpu0_opp: opp0 {
compatible = "operating-points-v2";
shared-opp;
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000 75000 85000>;
clock-latency-ns = <300000>;
};
entry01 {
opp-khz = <1100000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000 81000 82000>;
clock-latency-ns = <310000>;
};
entry02 {
opp-khz = <1200000>;
opp-microvolt = <1025000>;
clock-latency-ns = <290000>;
turbo-mode;
};
- };
+};
+Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states +independently.
+/ {
- cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu@0 {
compatible = "qcom,krait";
reg = <0>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cpu0_opp>;
};
cpu@1 {
compatible = "qcom,krait";
reg = <1>;
next-level-cache = <&L2>;
clocks = <&clk_controller 1>;
clock-names = "cpu";
opp-supply = <&cpu_supply1>;
operating-points-v2 = <&cpu0_opp>;
};
cpu@2 {
compatible = "qcom,krait";
reg = <2>;
next-level-cache = <&L2>;
clocks = <&clk_controller 2>;
clock-names = "cpu";
opp-supply = <&cpu_supply2>;
operating-points-v2 = <&cpu0_opp>;
};
cpu@3 {
compatible = "qcom,krait";
reg = <3>;
next-level-cache = <&L2>;
clocks = <&clk_controller 3>;
clock-names = "cpu";
opp-supply = <&cpu_supply3>;
operating-points-v2 = <&cpu0_opp>;
};
- };
- cpu0_opp: opp0 {
compatible = "operating-points-v2";
/*
* Missing shared-opp property means CPUs switch DVFS states
* independently.
*/
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000 75000 85000>;
clock-latency-ns = <300000>;
};
entry01 {
opp-khz = <1100000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000 81000 82000>;
clock-latency-ns = <310000>;
};
entry02 {
opp-khz = <1200000>;
opp-microvolt = <1025000>;
opp-microamp = <90000;
lock-latency-ns = <290000>;
turbo-mode;
};
- };
+};
+Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch +DVFS state together.
+/ {
- cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu@0 {
compatible = "arm,cortex-a7";
reg = <0>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cluster0_opp>;
};
cpu@1 {
compatible = "arm,cortex-a7";
reg = <1>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cluster0_opp>;
};
cpu@100 {
compatible = "arm,cortex-a15";
reg = <100>;
next-level-cache = <&L2>;
clocks = <&clk_controller 1>;
clock-names = "cpu";
opp-supply = <&cpu_supply1>;
operating-points-v2 = <&cluster1_opp>;
};
cpu@101 {
compatible = "arm,cortex-a15";
reg = <101>;
next-level-cache = <&L2>;
clocks = <&clk_controller 1>;
clock-names = "cpu";
opp-supply = <&cpu_supply1>;
operating-points-v2 = <&cluster1_opp>;
};
- };
- cluster0_opp: opp0 {
compatible = "operating-points-v2";
shared-opp;
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000 75000 85000>;
clock-latency-ns = <300000>;
};
entry01 {
opp-khz = <1100000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000 81000 82000>;
clock-latency-ns = <310000>;
};
entry02 {
opp-khz = <1200000>;
opp-microvolt = <1025000>;
opp-microamp = <90000>;
clock-latency-ns = <290000>;
turbo-mode;
};
- };
- cluster1_opp: opp1 {
compatible = "operating-points-v2";
shared-opp;
entry10 {
opp-khz = <1300000>;
opp-microvolt = <1045000 1050000 1055000>;
opp-microamp = <95000 100000 105000>;
clock-latency-ns = <400000>;
};
entry11 {
opp-khz = <1400000>;
opp-microvolt = <1075000>;
opp-microamp = <100000>;
clock-latency-ns = <400000>;
};
entry12 {
opp-khz = <1500000>;
opp-microvolt = <1010000 1100000 1110000>;
opp-microamp = <95000 100000 105000>;
clock-latency-ns = <400000>;
turbo-mode;
};
- };
+};
+Example 4: Handling multiple regulators
+/ {
- cpus {
cpu@0 {
compatible = "arm,cortex-a7";
...
opp-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
operating-points-v2 = <&cpu0_opp>;
};
- };
- cpu0_opp: opp0 {
compatible = "operating-points-v2";
shared-opp;
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000>, /* Supply 0 */
<960000>, /* Supply 1 */
<960000>; /* Supply 2 */
opp-microamp = <70000>, /* Supply 0 */
<70000>, /* Supply 1 */
<70000>; /* Supply 2 */
clock-latency-ns = <300000>;
};
/* OR */
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>, /* Supply 0 */
<960000 965000 975000>, /* Supply 1 */
<960000 965000 975000>; /* Supply 2 */
opp-microamp = <70000 75000 85000>, /* Supply 0 */
<70000 75000 85000>, /* Supply 1 */
<70000 75000 85000>; /* Supply 2 */
clock-latency-ns = <300000>;
};
/* OR */
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>, /* Supply 0 */
<960000 965000 975000>, /* Supply 1 */
<960000 965000 975000>; /* Supply 2 */
opp-microamp = <70000 75000 85000>, /* Supply 0 */
<0 0 0>, /* Supply 1 doesn't support current change */
<70000 75000 85000>; /* Supply 2 */
clock-latency-ns = <300000>;
};
- };
+};
+Deprecated Bindings +------------------- Properties:
- operating-points: An array of 2-tuples items, and each item consists
Hi Nishanth,
On 10-05-15, 20:02, Nishanth Menon wrote:
On 04/30/2015 07:07 AM, Viresh Kumar wrote:
+the liberty of choosing these. These triplets are called Operating Performance
I suggest we dont call them as triplets either -> note, we have added clk transition latency per OPP, so they are not exactly triplets either.
Sure.
+Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device.
Would be good to point to operating-points (the legacy document) as well. It might be good to state that only one of the properties should be used for the device node OPP description.
Hmm, okay.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock/voltage/current lines.
- Missing property means devices have independent clock/voltage/current lines,
- but they share OPP tables.
Is'nt this property redundant? by the fact that phandle is reused for cpu1 should indicate shared OPP table, correct?
There are two things we can share: - OPP tables (And not the clock lines themselves). For example, consider a quad-core platform with independent clock/voltage lines for all CPUs but all the CPUs have same OPP table.
- Clock/voltage rails: This is described by the above property. I thought the examples should have made this clear, anything left there ?
+- turbo-mode: Marks the OPP to be used only for turbo modes.
Might be great to add some description for what "turbo mode" means here.
Okay.
That said, flexibility of this approach is awesome, thanks for doing this, I believe we did have a discussion about "safe OPP" for thermal behavior -> now that we have phandles for OPPs, we can give phandle pointer to the thermal framework. in addition, we can also use phandle for marking throttling information in thermal framework instead of indices as well. which allows a lot of flexibility.
in some systems, we'd have need to mark certain OPPs for entry into suspend(tegra?) or during shutdown (for example) - do we allow for such description as phandle pointer back to the OPP nodes (from cpu0 device) OR do we describe them as additional properties?
Haven't thought about it earlier. In case we need them, probably it should go in the OPP table.
NOTE: Mike T. once told me that we shouldn't over-abuse the bindings by putting every detail there. And I am not sure at all on how to draw that line.
+- status: Marks the node enabled/disabled.
One question we'd probably want the new framework to address is the need for OPPs based on Efuse options - Am I correct in believing that the solution that iMX, and TI SoCs probably need is something similar to modifying the status to disabled? or do we have Vendor specfic modifier properties that would be allowed?
See PATCH 2/3 for that.
One additional behavior need I noticed in various old threads is a need to restrict transition OPPs -> certain SoCs might have constraints on next OPP they can transition from certain OPPs in-order to achieve a new OPP. again, such behavior could be phandle lists OR properties that link the chain up together. Number of such needs did sound less, but still just thought to bring it up.
See 0/3 and 3/3 for that. Its present in my solution but Mike T. is strictly against keeping that in OPP table. And so 3/3 may get Nak'd.
Thanks a lot for your reviews.
Hi Viresh, On 05/12/2015 12:16 AM, Viresh Kumar wrote: [..]
On 10-05-15, 20:02, Nishanth Menon wrote:
On 04/30/2015 07:07 AM, Viresh Kumar wrote:
One additional comment - after re-reading the bindings again..
+- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
- regulators.
- A single regulator's voltage is specified with an array of size one or three.
- Single entry is for target voltage and three entries are for <target min max>
- voltages.
Just curious -> is'nt it better to just have min<->max range? binding as it stands right now is open to interpretation as to what will be attempted and in what sequence? with a valid min, target or max - is'nt it more power efficient always to go for a "min" than a target?
Further, min<->max implies anywhere in that range and is more consistent with "regulator like" description.
That said, flexibility of this approach is awesome, thanks for doing this, I believe we did have a discussion about "safe OPP" for thermal behavior -> now that we have phandles for OPPs, we can give phandle pointer to the thermal framework. in addition, we can also use phandle for marking throttling information in thermal framework instead of indices as well. which allows a lot of flexibility.
in some systems, we'd have need to mark certain OPPs for entry into suspend(tegra?) or during shutdown (for example) - do we allow for such description as phandle pointer back to the OPP nodes (from cpu0 device) OR do we describe them as additional properties?
Haven't thought about it earlier. In case we need them, probably it should go in the OPP table.
NOTE: Mike T. once told me that we shouldn't over-abuse the bindings by putting every detail there. And I am not sure at all on how to draw that line.
True. one option might be to allow for vendor specific property extensions - that will let vendors add in additional quirky data custom to their SoCs as needed.
+- status: Marks the node enabled/disabled.
One question we'd probably want the new framework to address is the need for OPPs based on Efuse options - Am I correct in believing that the solution that iMX, and TI SoCs probably need is something similar to modifying the status to disabled? or do we have Vendor specfic modifier properties that would be allowed?
See PATCH 2/3 for that.
Sorry about that. I had kind of expected all bindings to be a single patch :)..
One additional behavior need I noticed in various old threads is a need to restrict transition OPPs -> certain SoCs might have constraints on next OPP they can transition from certain OPPs in-order to achieve a new OPP. again, such behavior could be phandle lists OR properties that link the chain up together. Number of such needs did sound less, but still just thought to bring it up.
See 0/3 and 3/3 for that. Its present in my solution but Mike T. is strictly against keeping that in OPP table. And so 3/3 may get Nak'd.
missed this as well :(
On 12-05-15, 11:04, Nishanth Menon wrote:
Just curious -> is'nt it better to just have min<->max range? binding as it stands right now is open to interpretation as to what will be attempted and in what sequence? with a valid min, target or max - is'nt it more power efficient always to go for a "min" than a target?
Further, min<->max implies anywhere in that range and is more consistent with "regulator like" description.
It came out after some discussions on the list, you may want to go through that.
https://lists.linaro.org/pipermail/linaro-kernel/2015-January/019844.html
True. one option might be to allow for vendor specific property extensions - that will let vendors add in additional quirky data custom to their SoCs as needed.
Yeah, I am planning to support them.
On 05/13/2015 12:05 AM, Viresh Kumar wrote:
On 12-05-15, 11:04, Nishanth Menon wrote:
Just curious -> is'nt it better to just have min<->max range? binding as it stands right now is open to interpretation as to what will be attempted and in what sequence? with a valid min, target or max - is'nt it more power efficient always to go for a "min" than a target?
Further, min<->max implies anywhere in that range and is more consistent with "regulator like" description.
It came out after some discussions on the list, you may want to go through that.
https://lists.linaro.org/pipermail/linaro-kernel/2015-January/019844.html
I see the thread saying that voltage-tolerance is a crappy idea -> I agree to that.
What I dont see in the thread, and the point I raised here, why have nominal/typical voltage at all? min<->max should be sufficient, correct? If the device cannot function at min, then it should not be documented as part of valid range at all.
On Wed, May 13, 2015 at 10:00:41AM -0500, Nishanth Menon wrote:
What I dont see in the thread, and the point I raised here, why have nominal/typical voltage at all? min<->max should be sufficient, correct? If the device cannot function at min, then it should not be documented as part of valid range at all.
Going for the minimum specified voltage is asking for trouble with regard to tolerances and so on, see also your concern about process corners. If the electrical engineers specify things as X +/- Y the most conservative thing to do is to try to hit X rather than going straight for the limits.
On 05/13/2015 10:16 AM, Mark Brown wrote:
On Wed, May 13, 2015 at 10:00:41AM -0500, Nishanth Menon wrote:
What I dont see in the thread, and the point I raised here, why have nominal/typical voltage at all? min<->max should be sufficient, correct? If the device cannot function at min, then it should not be documented as part of valid range at all.
Going for the minimum specified voltage is asking for trouble with regard to tolerances and so on, see also your concern about process corners. If the electrical engineers specify things as X +/- Y the most conservative thing to do is to try to hit X rather than going straight for the limits.
I've had the same debate with my company's SoC designers as well on various occasions as well. At least for the SoCs I deal with, X +/- Y range specification involves PMIC/Board variations, where X is the least voltage that is attempted to be set on PMIC, X-Y is the min voltage allowed at the SoC ball, due to SMPS noise/IRDrop and other board specific behavior.
I am not saying all SoC vendors do specifications the same way, but our interest from device tree description is the operating voltage range for the device that we can control on the PMIC. if setting (X-Y) is not stable voltage for the device, then it should not be stated in the description.
To illustrate: What does it mean for a driver using regulator API? Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try (X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable for all operating conditions for the device, correct? If this is is not stable at (X-Y), then we have a wrong specification for the device for claiming X +/- Y is a valid range for the device.
On Wed, May 13, 2015 at 11:14:07AM -0500, Nishanth Menon wrote:
To illustrate: What does it mean for a driver using regulator API? Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try (X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable for all operating conditions for the device, correct? If this is is not stable at (X-Y), then we have a wrong specification for the device for claiming X +/- Y is a valid range for the device.
We have an explicit regulator API call for setting by tolerance which tries to get as close as possible to the target voltage (currently the implementation is very cheap and nasty but ideally it'll get improved).
On 05/13/2015 11:21 AM, Mark Brown wrote:
On Wed, May 13, 2015 at 11:14:07AM -0500, Nishanth Menon wrote:
To illustrate: What does it mean for a driver using regulator API? Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try (X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable for all operating conditions for the device, correct? If this is is not stable at (X-Y), then we have a wrong specification for the device for claiming X +/- Y is a valid range for the device.
We have an explicit regulator API call for setting by tolerance which tries to get as close as possible to the target voltage (currently the implementation is very cheap and nasty but ideally it'll get improved).
Thanks - I think the original implementation was triggered by the old "voltage-tolerance" mess, so wont be too surprised.
With respect to the property itself, I still am not completely convinced that having typ actually helps. but having it does not hurt either, since min could be == typ in SoC description if desired.
So, since I dont strongly feel either way here to reject things completely, and supposedly, it does help some other SoC vendors(which I am not convinced how), will leave the discussion alone for the moment.
On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
The shortcomings we are trying to solve here:
Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
Support for specifying current levels along with voltages.
Support for multiple regulators.
Support for turbo modes.
Other per OPP settings: transition latencies, disabled status, etc.?
Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems
operating-points-v2 is kinda awkward, how about operating-points-extended, much like interrupts-extended ?
Other than this, I very much like the new approach.
Regards
On 12-05-15, 11:19, Felipe Balbi wrote:
operating-points-v2 is kinda awkward, how about operating-points-extended, much like interrupts-extended ?
What about the next version if required ? :)
Other than this, I very much like the new approach.
Thanks.
Quoting Viresh Kumar (2015-04-30 05:07:59)
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
Hi Viresh,
Sorry for chiming in so late in the process. Also, sorry for the long email. Lots of repetition below:
Why should this new binding exist? Is Devicetree really the right place to put all of this data? If DT is the right place for some users, some of the time ... is it always the right place for all users, all of the time?
The shortcomings we are trying to solve here:
Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
Support for specifying current levels along with voltages.
Support for multiple regulators.
Support for turbo modes.
Other per OPP settings: transition latencies, disabled status, etc.?
I agree that operating-points-v1 does not do this stuff. There is a need to express this data, but is DT the right place to do so? Why doesn't a driver contain this data?
- Expandability of OPPs in future.
This point above gives me heartburn. It appears that this binding is not meant to be "sub-classed" by vendor-specific compatible strings. Please let me know if I'm wrong on that.
The problem with this approach is that every little problem that someone has with the binding will have to be solved by changing the generic opp binding. I do not see how this can scale given the complexity of data, programming sequences and other stuff associated with operating points and dvfs transitions.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
I like the existing operating points binding. It is very simple. The data can be entirely encoded in DT and then used by the driver for simple dvfs use cases.
Maybe we should just stop there and keep it simple? If the data needed to express an operating point or dvfs transition is very complex, why use DT for that?
We went through the same issue with the clock bindings where some people (myself included) wanted to use DT as a data-driven interface into the kernel. Turns out DT isn't great for that. What we have now is more scalable: clock providers (usually a clock generator IP block) get a node in DT. Clock consumers reference that provider node plus an index value which corresponds to a specific clock signal that the downstream node consumes. That's it. The binding basically only creates connections across devices using phandles.
All of the data about clock rates, parents, programming sequences, register definitions and bitfields, supported operations, etc all belong in drivers.
For very simple clock providers (such as a fixed-rate crystal) we do have a dedicated binding that allows for all of the data to belong in DT (just the clock name and rate is required). I think this is analogous to the existing operating-points-v1 today, which works nicely for the simple case.
For the complex case, why not just create a link between the device that provides the operating points (e.g. a pmic, or system controller, or clock provider, or whatever) and the device that consumes them (e.g. cpufreq driver, gpu driver, io controller driver, etc)? Leave all of the data in C.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++- 1 file changed, 362 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 74499e5033fc..3b67a5c8d965 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -1,8 +1,366 @@ -* Generic OPP Interface +Generic OPP (Operating Performance Points) Bindings +---------------------------------------------------- -SoCs have a standard set of tuples consisting of frequency and -voltage pairs that the device will support per voltage domain. These -are called Operating Performance Points or OPPs. +Devices work at voltage-current-frequency triplets and some implementations have +the liberty of choosing these. These triplets are called Operating Performance +Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device.
+* Property: operating-points-v2
+Devices supporting OPPs must set their "operating-points-v2" property with +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle +to find the operating points for the device.
+* OPP Descriptor Node
+This describes the OPPs belonging to a device. This node can have following +properties:
+Required properties: +- compatible: Allow OPPs to express their compatibility. It should be:
- "operating-points-v2".
+- OPP nodes: One or more OPP nodes describing voltage-current-frequency
- triplets. Their name isn't significant but their phandle can be used to
- reference an OPP.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock/voltage/current lines.
- Missing property means devices have independent clock/voltage/current lines,
- but they share OPP tables.
What is the behavior of not setting 'shared-opp'? The same table is re-used by multiple consumers/devices?
I think a provider/consumer model works better here. E.g. if we have 4 cpus that scale independently then there would be 4 opp providers, each provider corresponding to the unique frequency and voltage domains per cpu. If multiple cpu nodes consume the same opp phandle then the sharing becomes implicit: those cpus are in the same frequency and power domains.
This is how it works for other resources, such as specifying a clock or regulator in DT. If two device nodes reference that same resource then it clear that they are using the same physical hardware. Having just a table of data in a node that does not clearly map onto hardware (or a software abstraction that provides access to that hardware) is not as nice IMO.
+* OPP Node
+This defines voltage-current-frequency triplets along with other related +properties.
+Required properties: +- opp-khz: Frequency in kHz
+Optional properties: +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
- regulators.
- A single regulator's voltage is specified with an array of size one or three.
- Single entry is for target voltage and three entries are for <target min max>
- voltages.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node.
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node. If few regulators don't provide
- capability to configure current, then values for then should be marked as
- zero.
+- clock-latency-ns: Specifies the maximum possible transition latency (in
- nanoseconds) for switching to this OPP from any other OPP.
+- turbo-mode: Marks the OPP to be used only for turbo modes. +- status: Marks the node enabled/disabled.
s/clock/transition/
Scaling the regulator can take longer than the clock. Better to reflect that this value is capturing the total transition time.
+Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
+/ {
cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu@0 {
compatible = "arm,cortex-a9";
reg = <0>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cpu0_opp>;
};
cpu@1 {
compatible = "arm,cortex-a9";
reg = <1>;
next-level-cache = <&L2>;
clocks = <&clk_controller 0>;
clock-names = "cpu";
opp-supply = <&cpu_supply0>;
operating-points-v2 = <&cpu0_opp>;
};
};
cpu0_opp: opp0 {
compatible = "operating-points-v2";
shared-opp;
entry00 {
opp-khz = <1000000>;
opp-microvolt = <970000 975000 985000>;
opp-microamp = <70000 75000 85000>;
clock-latency-ns = <300000>;
};
entry01 {
opp-khz = <1100000>;
opp-microvolt = <980000 1000000 1010000>;
opp-microamp = <80000 81000 82000>;
clock-latency-ns = <310000>;
};
entry02 {
opp-khz = <1200000>;
opp-microvolt = <1025000>;
clock-latency-ns = <290000>;
turbo-mode;
};
};
+};
It seems wrong to me that the clock and supply data is owned by the cpu node, and not the opp descriptor. Everything about the opp transition should belong to a provider node. Then the cpu simply needs to consume that via a phandle.
<snip>
+Deprecated Bindings +------------------- Properties:
- operating-points: An array of 2-tuples items, and each item consists
I think we should keep these. They work for the simple case.
I know that DT bindings are notorious for bike shedding. But in this case I'm not debating the color of the bike shed but instead questioning whether we need the shed at all :-)
Regards, Mike
-- 2.3.0.rc0.44.ga94655d
On 12-05-15, 14:42, Michael Turquette wrote:
Quoting Viresh Kumar (2015-04-30 05:07:59)
Sorry for chiming in so late in the process. Also, sorry for the long email. Lots of repetition below:
You are always welcome..
Why should this new binding exist?
The answer to this particular query is perhaps simple, i.e. we have unsolved problems that we wanted to solve in a generic way.
But probably the bigger question is "Should we really put the OPPs (new or old bindings) in DT".
Lets start by agreeing on what can be kept in DT. AFAIU, anything that describes the device in a OS independent way. Like: - base address - irq line + parent - clocks, regulators, etc.. - What about things like register information? That's not what we do normally, but why?
Perhaps because we want to get rid of redundancy as much as possible. - An implementation of the same device is going to be same across all platforms (unless the platform has tweaked it). And so there is no fun passing the same register information from multiple DT files. - And so we better keep things like register information, bit field descriptions, etc. in driver itself. - For example consider a proprietary controller that has three registers A, B and C. Now the bitwise description/behavior of all the registers in perfectly same for every implementation but the order of these in memory varies per implementation.
Now we can keep the offsets of these registers in either DT or C code. If only few implementations are using it, then we might keep it in C code, but if there are 10-20 implementations using it in order ABC or BAC or CAB, then it will probably be a good option to try keeping these offsets in DT.
So its not really that we just describe connectivity between devices/nodes in DT, it can be anything related to the device.
What if OPPs were kept in C code instead ? - In most of the cases (not all of course), we will end up replicating code for platforms. But yes we can still do it and this is already allowed if you really think your platform is special. - (Copied from booting-without-of.txt): "It also makes it more flexible for board vendors to do minor hardware upgrades without significantly impacting the kernel code or cluttering it with special cases."
Is Devicetree really the right place to put all of this data? If DT is the right place for some users, some of the time ... is it always the right place for all users, all of the time?
This is just an attempt to reuse generic code. And platforms are free to choose DT or non-DT way for keeping this information.
For some users it might be a good place to keep it, while for others it may not be. For example, if a platform really needs to do some stuff from its platform code, then it is free to do so.
- Expandability of OPPs in future.
This point above gives me heartburn. It appears that this binding is not meant to be "sub-classed" by vendor-specific compatible strings. Please let me know if I'm wrong on that.
The problem with this approach is that every little problem that someone has with the binding will have to be solved by changing the generic opp binding. I do not see how this can scale given the complexity of data, programming sequences and other stuff associated with operating points and dvfs transitions.
(I thought about it a bit more after our offline discussion):
If there is something generic enough, which is required by multiple platforms, then of course we can make additions to the bindings.
But I believe we can 'sub-class' this by vendor-specific compatible strings as well.
What I told you earlier (in our offline discussion) was that it isn't allowed to put driver specific strings here to just choose a driver amongst few available at runtime. For example, choosing between cpufreq-dt or arm_big_little or any platform driver.
But if a Vendor do need few bindings just for his platforms, then we can have a compatible sting for that. As the compatible string now will express device's compatibility.
I like the existing operating points binding. It is very simple. The data can be entirely encoded in DT and then used by the driver for simple dvfs use cases.
The new ones are also simple :)
Maybe we should just stop there and keep it simple? If the data needed to express an operating point or dvfs transition is very complex, why use DT for that?
Its simple but not complete. And I don't really agree that things are way too complex here. In some cases, yes they can be.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock/voltage/current lines.
- Missing property means devices have independent clock/voltage/current lines,
- but they share OPP tables.
What is the behavior of not setting 'shared-opp'? The same table is re-used by multiple consumers/devices?
Yes.
I think a provider/consumer model works better here. E.g. if we have 4 cpus that scale independently then there would be 4 opp providers, each provider corresponding to the unique frequency and voltage domains per cpu. If multiple cpu nodes consume the same opp phandle then the sharing becomes implicit: those cpus are in the same frequency and power domains.
This is how it works for other resources, such as specifying a clock or regulator in DT. If two device nodes reference that same resource then it clear that they are using the same physical hardware. Having just a table of data in a node that does not clearly map onto hardware (or a software abstraction that provides access to that hardware) is not as nice IMO.
I partially agree to what you are saying. It is written this way to reduce redundancy in DT. For example, in a quad-core system with independently scalable CPUs that have exactly same tables for all CPUs, we do not want to replicate the same set of 20 OPPs, four times.
But perhaps we can improve it, based on your suggestion. The way I have written it today:
/ { cpus { cpu@0 { operating-points-v2 = <&cpu_opp>; };
cpu@1 { operating-points-v2 = <&cpu_opp>; }; };
cpu_opp: opp { compatible = "operating-points-v2"; // shared-opp; Not an shared OPP
entry00 { opp-khz = <1000000>; }; entry01 { opp-khz = <1100000>; }; }; };
And we can rewrite it as:
/ { cpus { cpu@0 { operating-points-v2 = <&opp0_provider>; };
cpu@1 { operating-points-v2 = <&opp1_provider>; }; };
/* Table is shared between providers */ opp_table: opp_table { entry00 { opp-khz = <1000000>; }; entry01 { opp-khz = <1100000>; }; };
opp0_provider: opp0 { compatible = "operating-points-v2"; opp_table = <&opp_table>; };
opp1_provider: opp1 { compatible = "operating-points-v2"; opp_table = <&opp_table>; }; };
But this is similar to what I proposed initially and people objected to it.
+- clock-latency-ns: Specifies the maximum possible transition latency (in
- nanoseconds) for switching to this OPP from any other OPP.
+- turbo-mode: Marks the OPP to be used only for turbo modes. +- status: Marks the node enabled/disabled.
s/clock/transition/
Scaling the regulator can take longer than the clock. Better to reflect that this value is capturing the total transition time.
So this is how we are doing it today and I am not sure if we want to get it from DT now.
ret = regulator_set_voltage_time(cpu_reg, min_uV, max_uV); if (ret > 0) transition_latency += ret * 1000;
It seems wrong to me that the clock and supply data is owned by the cpu node, and not the opp descriptor. Everything about the opp transition should belong to a provider node. Then the cpu simply needs to consume that via a phandle.
https://lists.linaro.org/pipermail/linaro-kernel/2014-December/019505.html
+Deprecated Bindings +------------------- Properties:
- operating-points: An array of 2-tuples items, and each item consists
I think we should keep these. They work for the simple case.
Hmm, maybe.
On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
On 12-05-15, 14:42, Michael Turquette wrote:
Quoting Viresh Kumar (2015-04-30 05:07:59)
Why should this new binding exist?
The answer to this particular query is perhaps simple, i.e. we have unsolved problems that we wanted to solve in a generic way.
But probably the bigger question is "Should we really put the OPPs (new or old bindings) in DT".
And also is trying to do this in a completely generic manner the right way of going about things - do we really understand the problem area well enough to create a completely generic solution for all cases, bearing in mind that once things go into a DT binding they become an ABI?
Quoting Mark Brown (2015-05-13 04:03:57)
On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
On 12-05-15, 14:42, Michael Turquette wrote:
Quoting Viresh Kumar (2015-04-30 05:07:59)
Why should this new binding exist?
The answer to this particular query is perhaps simple, i.e. we have unsolved problems that we wanted to solve in a generic way.
But probably the bigger question is "Should we really put the OPPs (new or old bindings) in DT".
And also is trying to do this in a completely generic manner the right way of going about things - do we really understand the problem area well enough to create a completely generic solution for all cases, bearing in mind that once things go into a DT binding they become an ABI?
No, we don't understand the problem space well enough to form an ABI. And even if we did understand it perfectly today, the constraints and expressions will change tomorrow.
Putting this stuff in C without any philosophical constraints on whether or not we can change it later is the way to go.
Regards, Mike
On Wed, May 13, 2015 at 7:32 PM, Michael Turquette mike.turquette@linaro.org wrote:
Quoting Mark Brown (2015-05-13 04:03:57)
On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
On 12-05-15, 14:42, Michael Turquette wrote:
Quoting Viresh Kumar (2015-04-30 05:07:59)
Why should this new binding exist?
The answer to this particular query is perhaps simple, i.e. we have unsolved problems that we wanted to solve in a generic way.
But probably the bigger question is "Should we really put the OPPs (new or old bindings) in DT".
And also is trying to do this in a completely generic manner the right way of going about things - do we really understand the problem area well enough to create a completely generic solution for all cases, bearing in mind that once things go into a DT binding they become an ABI?
No, we don't understand the problem space well enough to form an ABI. And even if we did understand it perfectly today, the constraints and expressions will change tomorrow.
Putting this stuff in C without any philosophical constraints on whether or not we can change it later is the way to go.
Having a binding does not preclude you from doing your own thing in C. You can choose to ignore it (or parts of it). It also doesn't mean you have to use a generic driver with a generic binding.
While yes, we could do the same thing in C, the reality is that we often don't force common data into common structures. This leads to obfuscating what are the real h/w differences because everyone does things their own way. Conveniently, struct dev_pm_opp is a perfect example here. We already know that just freq and volt alone are not enough, yet that is all the kernel structure has.
The way you avoid the ABI issue is you create a new one which is what we've done here. I'm not saying we want 10 different OPP bindings. If we are back here in a year with v3, then yes you are probably right. But I believe v2 is a framework that can last. Most of the review issues have resulted in simply dropping properties (until needed) without otherwise changing the binding. I think that demonstrates this bindiing is flexible and extendable.
Rob
On 05/13/2015 03:55 AM, Viresh Kumar wrote: [...]
It seems wrong to me that the clock and supply data is owned by the cpu node, and not the opp descriptor. Everything about the opp transition should belong to a provider node. Then the cpu simply needs to consume that via a phandle.
https://lists.linaro.org/pipermail/linaro-kernel/2014-December/019505.html
This argues that clock is an input to the cpu, this is not in-correct, but, it could also be argued that OPP tables are clock dependent.
For example, with multiple clock source options that a device might choose to select from internally(by some means.. lets not just restrict ourselves to just CPUs here for a moment), the tables might be different. We can always debate that this then is the responsibility of the driver handling the description for that device and we might want possibility of vice versa as well - same OPP table used by different clock source selections as well.
On 21-05-15, 01:02, Nishanth Menon wrote:
This argues that clock is an input to the cpu, this is not in-correct, but, it could also be argued that OPP tables are clock dependent.
For example, with multiple clock source options that a device might choose to select from internally(by some means.. lets not just restrict ourselves to just CPUs here for a moment), the tables might be different. We can always debate that this then is the responsibility of the driver handling the description for that device and we might want possibility of vice versa as well - same OPP table used by different clock source selections as well.
@Rob: Any inputs ?
On Fri, May 22, 2015 at 9:04 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 21-05-15, 01:02, Nishanth Menon wrote:
This argues that clock is an input to the cpu, this is not in-correct, but, it could also be argued that OPP tables are clock dependent.
What piece of h/w is the clock an input to then?
For example, with multiple clock source options that a device might choose to select from internally(by some means.. lets not just restrict ourselves to just CPUs here for a moment), the tables might be different. We can always debate that this then is the responsibility of the driver handling the description for that device and we might want possibility of vice versa as well - same OPP table used by different clock source selections as well.
@Rob: Any inputs ?
If you are going to describe this clock mux in DT, then that mux should be part of the h/w block that controls it. You could always add entries that describe what parent clock must be used for a given OPP, but that is a new binding and not part of the existing clock binding.
If things get too complicated, then don't try to describe this in DT. That is always an option.
Rob
On 05/22/2015 11:04 AM, Rob Herring wrote:
On Fri, May 22, 2015 at 9:04 AM, Viresh Kumar viresh.kumar@linaro.org wrote:
On 21-05-15, 01:02, Nishanth Menon wrote:
This argues that clock is an input to the cpu, this is not in-correct, but, it could also be argued that OPP tables are clock dependent.
What piece of h/w is the clock an input to then?
The case(from one of the SoCs I deal with) I had in my mind was something as follows: GPU can get it's clock from a tree derived out of DPLL X OR a clock derived out of DPLL Y.
Frequencies a,b,c can only be supported on DPLL X, where as frequencies d,e,f on DPLL Y based tree.
There are many ways to skin the cat in this case.. if the clk mux is glitchless, then a-f can be supported, else, we have to have two opp tables which are selected by driver based on clock parent.
With the case where we cannot switch between the DPLLs, the question was interesting if we decided to hook up the clock into the opp table based on the fact that the frequencies are based of the clock.
The final clock feeding in to GPU, is the mux clock - every thing else gets hidden by CCF, unless the driver is aware of the clock topology (which we dont like to see in driver, since the driver is supposed to work across multiple SoCs) - Now, the OPP tables would obviously be based on which DPLL the source is from.
Our interest was not to have SoC specificity inside the driver and help try and describe everything within DT itself - including the choice of DPLL X based or DPLL Y based selection being made based on board (each board tends to be targetted for some unique performance needs based on usecase the board is being planned to be used for).
Ofcourse, with some platform specific code, this might be very easy to do as well.. so not really very hard reasoning for me to have clock in OPP table description itself.
For example, with multiple clock source options that a device might choose to select from internally(by some means.. lets not just restrict ourselves to just CPUs here for a moment), the tables might be different. We can always debate that this then is the responsibility of the driver handling the description for that device and we might want possibility of vice versa as well - same OPP table used by different clock source selections as well.
@Rob: Any inputs ?
If you are going to describe this clock mux in DT, then that mux should be part of the h/w block that controls it. You could always add entries that describe what parent clock must be used for a given OPP, but that is a new binding and not part of the existing clock binding.
If things get too complicated, then don't try to describe this in DT. That is always an option.
Yes, ofcourse... it is always an option, we just tend to like to have all data in the same place if possible - DT is the preferred approach.
On 22-05-15, 12:42, Nishanth Menon wrote:
Yes, ofcourse... it is always an option, we just tend to like to have all data in the same place if possible - DT is the preferred approach.
If we are done, may I send the next version incorporating your suggestions ?
On 04/30/15 05:07, Viresh Kumar wrote:
Current OPP (Operating performance point) DT bindings are proven to be insufficient at multiple instances.
The shortcomings we are trying to solve here:
Getting clock sharing information between CPUs. Single shared clock vs independent clock per core vs shared clock per cluster.
Support for specifying current levels along with voltages.
Support for multiple regulators.
Support for turbo modes.
Other per OPP settings: transition latencies, disabled status, etc.?
Expandability of OPPs in future.
This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
This looks like it covers my usecases. I'm not sure if I'm going to put all of this data in DT because of the philosophical question regarding what should be in DT and the technical decisions that some platforms I'm working on have made to encode parts of the OPPs in fuses, but at least the binding looks to support all the scenarios I have today.
Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++- 1 file changed, 362 insertions(+), 4 deletions(-)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 74499e5033fc..3b67a5c8d965 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -1,8 +1,366 @@ -* Generic OPP Interface +Generic OPP (Operating Performance Points) Bindings +---------------------------------------------------- -SoCs have a standard set of tuples consisting of frequency and -voltage pairs that the device will support per voltage domain. These -are called Operating Performance Points or OPPs. +Devices work at voltage-current-frequency triplets and some implementations have +the liberty of choosing these. These triplets are called Operating Performance +Points aka OPPs. This document defines bindings for these OPPs applicable across +wide range of devices. For illustration purpose, this document uses CPU as a +device.
+* Property: operating-points-v2
+Devices supporting OPPs must set their "operating-points-v2" property with +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle +to find the operating points for the device.
+* OPP Descriptor Node
+This describes the OPPs belonging to a device. This node can have following +properties:
+Required properties: +- compatible: Allow OPPs to express their compatibility. It should be:
- "operating-points-v2".
+- OPP nodes: One or more OPP nodes describing voltage-current-frequency
- triplets. Their name isn't significant but their phandle can be used to
- reference an OPP.
+Optional properties: +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
- switch their DVFS state together, i.e. they share clock/voltage/current lines.
- Missing property means devices have independent clock/voltage/current lines,
- but they share OPP tables.
+* OPP Node
+This defines voltage-current-frequency triplets along with other related +properties.
+Required properties: +- opp-khz: Frequency in kHz
Can this be in Hz? I thought there were some problems with kHz and Hz before where we wanted to make this into Hz so that rounding problems don't arise?
Also I wonder if all properties should be optional? I don't have this scenario today, but perhaps the frequencies could be encoded in fuses, but the voltages wouldn't be and so we might want to read out the frequencies for a fixed set of voltages. Of course, if there's nothing in the OPP node at all, it's not very useful, so perhaps some statement that at least one of the frequency/voltage/amperage properties should be present.
+Optional properties: +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
- regulators.
- A single regulator's voltage is specified with an array of size one or three.
- Single entry is for target voltage and three entries are for <target min max>
- voltages.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node.
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
- regulators.
- A single regulator's current is specified with an array of size one or three.
- Single entry is for target current and three entries are for <target min max>
- currents.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node. If few regulators don't provide
- capability to configure current, then values for then should be marked as
s/then/them/
- zero.
+- clock-latency-ns: Specifies the maximum possible transition latency (in
- nanoseconds) for switching to this OPP from any other OPP.
+- turbo-mode: Marks the OPP to be used only for turbo modes. +- status: Marks the node enabled/disabled.
Please put some space between properties so they can be found quickly:
clock-latency-ns:
turbo-mode:
status:
...
On 19-05-15, 17:51, Stephen Boyd wrote:
This looks like it covers my usecases. I'm not sure if I'm going to put all of this data in DT because of the philosophical question regarding what should be in DT and the technical decisions that some platforms I'm working on have made to encode parts of the OPPs in fuses, but at least the binding looks to support all the scenarios I have today.
Thanks.
+Required properties: +- opp-khz: Frequency in kHz
Can this be in Hz? I thought there were some problems with kHz and Hz before where we wanted to make this into Hz so that rounding problems don't arise?
Yeah, good point.
Also I wonder if all properties should be optional? I don't have this scenario today, but perhaps the frequencies could be encoded in fuses, but the voltages wouldn't be and so we might want to read out the frequencies for a fixed set of voltages. Of course, if there's nothing in the OPP node at all, it's not very useful, so perhaps some statement that at least one of the frequency/voltage/amperage properties should be present.
I am not sure. What we are trying to do (fill partially in DT and partially in platform), is a trick and not the right use of bindings.
Ideally whatever is passed in DT should be complete by itself and doesn't require platform to tweak it (which it can't). For example, the cpufreq-dt driver will try to initialize OPPs from the DT directly and wouldn't know about the platform tweaks. That can work eventually as platform will add OPPs for the same bindings before cpufreq driver will try to do, but that's a trick.
And then its all about frequency in the first place, and so marking that optional looks wrong. Probably not the right use of these bindings.
- Entries for multiple regulators must be present in the same order as
- regulators are specified in device's DT node. If few regulators don't provide
- capability to configure current, then values for then should be marked as
s/then/them/
Thanks. But this part is already re-written and doesn't have this issue anymore.
Please put some space between properties so they can be found quickly:
Sure, thanks.
On 05/19/15 19:07, Viresh Kumar wrote:
On 19-05-15, 17:51, Stephen Boyd wrote:
Also I wonder if all properties should be optional? I don't have this scenario today, but perhaps the frequencies could be encoded in fuses, but the voltages wouldn't be and so we might want to read out the frequencies for a fixed set of voltages. Of course, if there's nothing in the OPP node at all, it's not very useful, so perhaps some statement that at least one of the frequency/voltage/amperage properties should be present.
I am not sure. What we are trying to do (fill partially in DT and partially in platform), is a trick and not the right use of bindings.
Ideally whatever is passed in DT should be complete by itself and doesn't require platform to tweak it (which it can't). For example, the cpufreq-dt driver will try to initialize OPPs from the DT directly and wouldn't know about the platform tweaks. That can work eventually as platform will add OPPs for the same bindings before cpufreq driver will try to do, but that's a trick.
And then its all about frequency in the first place, and so marking that optional looks wrong. Probably not the right use of these bindings.
Ok then I won't be using these bindings on any of the new platforms I have where half the data is in one place, and half in another. But for some of Krait based platforms I have they should be useable.
On 20-05-15, 12:39, Stephen Boyd wrote:
On 05/19/15 19:07, Viresh Kumar wrote:
On 19-05-15, 17:51, Stephen Boyd wrote:
Also I wonder if all properties should be optional? I don't have this scenario today, but perhaps the frequencies could be encoded in fuses, but the voltages wouldn't be and so we might want to read out the frequencies for a fixed set of voltages. Of course, if there's nothing in the OPP node at all, it's not very useful, so perhaps some statement that at least one of the frequency/voltage/amperage properties should be present.
I am not sure. What we are trying to do (fill partially in DT and partially in platform), is a trick and not the right use of bindings.
Ideally whatever is passed in DT should be complete by itself and doesn't require platform to tweak it (which it can't). For example, the cpufreq-dt driver will try to initialize OPPs from the DT directly and wouldn't know about the platform tweaks. That can work eventually as platform will add OPPs for the same bindings before cpufreq driver will try to do, but that's a trick.
And then its all about frequency in the first place, and so marking that optional looks wrong. Probably not the right use of these bindings.
Ok then I won't be using these bindings on any of the new platforms I have where half the data is in one place, and half in another. But for some of Krait based platforms I have they should be useable.
You are not the only one, I have seen other requests (even for the existing bindings) to fill stuff partially in DT as they want freq to come from bootloader.
@Rob: What do you suggest for such platforms? Let them keep (ab)using old or new OPP DT bindings or create another binding to just pass on freq table?
@Stephen: Can you please provide your feedback on the updated version please. Thanks.
On 21-05-15, 10:03, Viresh Kumar wrote:
On 20-05-15, 12:39, Stephen Boyd wrote:
On 05/19/15 19:07, Viresh Kumar wrote:
On 19-05-15, 17:51, Stephen Boyd wrote:
Also I wonder if all properties should be optional? I don't have this scenario today, but perhaps the frequencies could be encoded in fuses, but the voltages wouldn't be and so we might want to read out the frequencies for a fixed set of voltages. Of course, if there's nothing in the OPP node at all, it's not very useful, so perhaps some statement that at least one of the frequency/voltage/amperage properties should be present.
I am not sure. What we are trying to do (fill partially in DT and partially in platform), is a trick and not the right use of bindings.
Ideally whatever is passed in DT should be complete by itself and doesn't require platform to tweak it (which it can't). For example, the cpufreq-dt driver will try to initialize OPPs from the DT directly and wouldn't know about the platform tweaks. That can work eventually as platform will add OPPs for the same bindings before cpufreq driver will try to do, but that's a trick.
And then its all about frequency in the first place, and so marking that optional looks wrong. Probably not the right use of these bindings.
Ok then I won't be using these bindings on any of the new platforms I have where half the data is in one place, and half in another. But for some of Krait based platforms I have they should be useable.
You are not the only one, I have seen other requests (even for the existing bindings) to fill stuff partially in DT as they want freq to come from bootloader.
@Rob: What do you suggest for such platforms? Let them keep (ab)using old or new OPP DT bindings or create another binding to just pass on freq table?
Rob: Can you please comment on this one as well ? Thanks.
On some platforms (Like Qualcomm's SoCs), it is not decided until runtime on what OPPs to use. The OPP tables can be fixed at compile time, but which table to use is found out only after reading some efuses (sort of an eeprom) and knowing characteristics of the SoC.
To support such platform we need to pass multiple OPP tables per device and hardware should be able to choose one and only one table out of those.
Update OPP-v2 bindings to support that.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/power/opp.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 3b67a5c8d965..07959903ec32 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -14,6 +14,9 @@ Devices supporting OPPs must set their "operating-points-v2" property with phandle to a OPP descriptor in their DT node. The OPP core will use this phandle to find the operating points for the device.
+Devices may want to choose OPP tables at runtime and so can provide a list of +phandles here. But only *one* of them should be chosen at runtime. +
* OPP Descriptor Node
@@ -28,6 +31,8 @@ This describes the OPPs belonging to a device. This node can have following reference an OPP.
Optional properties: +- opp-name: Name of the OPP table, to uniquely identify it if more than one OPP + table is supplied in "operating-points-v2" property of device. - shared-opp: Indicates that device nodes using this OPP descriptor's phandle switch their DVFS state together, i.e. they share clock/voltage/current lines. Missing property means devices have independent clock/voltage/current lines,
On 04/30/2015 07:08 AM, Viresh Kumar wrote:
On some platforms (Like Qualcomm's SoCs), it is not decided until runtime on what OPPs to use. The OPP tables can be fixed at compile time, but which table to use is found out only after reading some efuses (sort of an eeprom) and knowing characteristics of the SoC.
they are more like prom than eeprom in many instances.
To support such platform we need to pass multiple OPP tables per device and hardware should be able to choose one and only one table out of those.
Update OPP-v2 bindings to support that.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/opp.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 3b67a5c8d965..07959903ec32 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -14,6 +14,9 @@ Devices supporting OPPs must set their "operating-points-v2" property with phandle to a OPP descriptor in their DT node. The OPP core will use this phandle to find the operating points for the device. +Devices may want to choose OPP tables at runtime and so can provide a list of +phandles here. But only *one* of them should be chosen at runtime.
- OPP Descriptor Node
@@ -28,6 +31,8 @@ This describes the OPPs belonging to a device. This node can have following reference an OPP. Optional properties: +- opp-name: Name of the OPP table, to uniquely identify it if more than one OPP
- table is supplied in "operating-points-v2" property of device.
- shared-opp: Indicates that device nodes using this OPP descriptor's phandle switch their DVFS state together, i.e. they share clock/voltage/current lines. Missing property means devices have independent clock/voltage/current lines,
With some SoCs like AM335x - thanks to some brain dead incompatible hardware design choices, this might end up as a big list or various OPP tables. but overall, I must prefer this approach as well.
Thanks for proposing this. will be great to see examples documented in bindings doc as well.
With no further issues, Acked-by: Nishanth Menon nm@ti.com
On 12-05-15, 11:09, Nishanth Menon wrote:
On 04/30/2015 07:08 AM, Viresh Kumar wrote:
(sort of an eeprom) and knowing characteristics of the SoC.
they are more like prom than eeprom in many instances.
Will replace.
With some SoCs like AM335x - thanks to some brain dead incompatible hardware design choices, this might end up as a big list or various OPP tables. but overall, I must prefer this approach as well.
Thanks for proposing this. will be great to see examples documented in bindings doc as well.
Right.
On 04/30, Viresh Kumar wrote:
On some platforms (Like Qualcomm's SoCs), it is not decided until runtime on what OPPs to use. The OPP tables can be fixed at compile time, but which table to use is found out only after reading some efuses (sort of an eeprom) and knowing characteristics of the SoC.
To support such platform we need to pass multiple OPP tables per device and hardware should be able to choose one and only one table out of those.
Update OPP-v2 bindings to support that.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Reviewed-by: Stephen Boyd sboyd@codeaurora.org
Many platforms require to switch to a intermediate frequency before switching to a final frequency. Or they can switch to only particular OPPs from any OPP.
For these add another property in OPP-v2, 'opp-next'.
Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/power/opp.txt | 130 ++++++++++++++++++++++++ 1 file changed, 130 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt index 07959903ec32..c96dc77121b7 100644 --- a/Documentation/devicetree/bindings/power/opp.txt +++ b/Documentation/devicetree/bindings/power/opp.txt @@ -72,6 +72,9 @@ properties.
- clock-latency-ns: Specifies the maximum possible transition latency (in nanoseconds) for switching to this OPP from any other OPP. +- opp-next: It contains a list of phandles to other OPPs, to which we can switch + directly from this OPP (Explained later with examples). Missing property means + no restriction on switching to other OPPs. - turbo-mode: Marks the OPP to be used only for turbo modes. - status: Marks the node enabled/disabled.
@@ -363,6 +366,133 @@ Example 4: Handling multiple regulators }; };
+Example 5: How to use "opp-next" property ? + +1.) Switch to a intermediate OPP (entry00) before switching to any other OPP. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + opp_next: entry00 { + opp-khz = <500000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + /* Can switch to any OPP from here */ + }; + entry01 { + opp-khz = <600000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry02 { + opp-khz = <900000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry03 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next>; + }; + entry04 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + opp-next = <&opp_next>; + }; + entry05 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + opp-next = <&opp_next>; + turbo-mode; + }; + }; +}; + +2.) Can only switch to the next or previous OPP directly. + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + opp-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp>; + }; + }; + + cpu0_opp: opp0 { + compatible = "operating-points-v2"; + shared-opp; + + opp_next0: entry00 { + opp-khz = <500000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next1>; + }; + opp_next1: entry01 { + opp-khz = <600000>; + opp-microvolt = <800000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next0>, <&opp_next2>; + }; + opp_next2: entry02 { + opp-khz = <900000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next1>, <&opp_next3>; + }; + opp_next3: entry03 { + opp-khz = <1000000>; + opp-microvolt = <970000 975000 985000>; + clock-latency-ns = <300000>; + opp-next = <&opp_next2>, <&opp_next4>; + }; + opp_next4: entry04 { + opp-khz = <1100000>; + opp-microvolt = <980000 1000000 1010000>; + clock-latency-ns = <310000>; + opp-next = <&opp_next3>, <&opp_next5>; + }; + opp_next5: entry05 { + opp-khz = <1200000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + opp-next = <&opp_next4>; + turbo-mode; + }; + }; +}; + +
Deprecated Bindings -------------------
Quoting Viresh Kumar (2015-04-30 05:08:01)
Many platforms require to switch to a intermediate frequency before switching to a final frequency. Or they can switch to only particular OPPs from any OPP.
For these add another property in OPP-v2, 'opp-next'.
Refer to the bindings for more details.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
We should never encode any aspect of the dvfs transition sequence into DT. Raw data such as clock frequency or supply voltage might be OK, but the programming steps should be left entirely to drivers.
So this patch gets a NAK from me.
Regards, Mike
linaro-kernel@lists.linaro.org