Hello,
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
We had some discussions about it in the past on the PM list [1], which is followed by discussions during the LPC. The outcome of all that was that we should extend Power Domain framework to support active state power management as well.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
To get a complete picture of the proposed plan, following is what we need to do: - Create DT bindings to get domain performance state information for the platforms. - Enhance OPP framework to parse these and call into the PM Qos framework with a performance state request. - Enhance PM Qos framework to provide the API to be used by consumers (or OPP framework) and pass it on to the (Generic) Power Domain framework. - Enhance Generic Power Domain framework to accept such requests, accumulate all belonging to a single power domain and call domain driver specific callback with the performance state we want for the domain. - The domain driver shall then, in a platform specific way, set the requested performance level. - Note that these features are applicable to the CPU, GPU and other IIO or non-IIO devices. - There can be cases where a device can choose between multiple power domains based on what performance level we want for the device. In such cases, we should represent the multiplexer with a separate power domain. In effect, the device (or OPP table) will correspond to a single power domain, but the backend driver of that domain shall implement the multiplexing functionality.
This patchset implements the very first part of this chain and introduces a new optional property for the consumers of the power-domains: domain-performance-state. This property can be used directly by the consumer or its OPP table.
-- viresh
[1] https://marc.info/?l=linux-pm&m=147747923708075&w=2
Viresh Kumar (2): PM / Domains: Introduce domain-performance-state binding PM / OPP: Introduce domain-performance-state binding to OPP nodes
Documentation/devicetree/bindings/opp/opp.txt | 57 ++++++++++++++++++++++ .../devicetree/bindings/power/power_domain.txt | 6 +++ 2 files changed, 63 insertions(+)
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller. - power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum + performance level (of the parent domain) required by the consumer for its + working. The integer value '1' represents the lowest performance level and the + highest value represents the highest performance level. + Example:
leaky-device@12350000 {
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values? It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
Rob
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
So, if we're going to have a generic DT binding for this, it needs to be something that's useful on platforms that are not using magic numbers managed by a uC as well.
Kevin
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
So, if we're going to have a generic DT binding for this, it needs to be something that's useful on platforms that are not using magic numbers managed by a uC as well.
Kevin
Thanks for explaining on my behalf Vincent :)
On 22-11-16, 19:34, Vincent Guittot wrote:
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
There is no unit. If we have units like Hz and volts etc, then we can actually use the existing clk/regulator frameworks straight away.
The whole problem here is that the regulator (and maybe the clock on a different platform) for a power domain are hidden from the kernel and handled by a black box (An M3 core in my case). All we can ask is for a performance state, a simple positive integer value.
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput.
We are talking in terms of power domains here and so if the bus/interconnect has a power domain for itself, then yes we can very much have that situation. But if the kernel have the capability of configuring clk and voltages directly, then we don't need this new infrastructure at all.
That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
That's right.
So, if we're going to have a generic DT binding for this, it needs to be something that's useful on platforms that are not using magic numbers managed by a uC as well.
What suggestions do you have for this and I am not sure what all cases we want to solve by this ?
Vincent Guittot vincent.guittot@linaro.org writes:
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
Then, at least for this use case, we're talking about voltage, not some unspecified units.
But that makes me wonder, this performance state sounds like something that is changing dynamically at runtime, so why do you want to describe this statically in DT?
This sounds to me like the job of the genpd. When any device in the domain does its pm_runtime_get(), the domain could check the device frequency and see if it needs to change the domain voltage in order for that device to operate at that frequency. When the device goes away (using pm_runtime_put()) the domain can check again if it could lower the voltage and still meet the requirements of the remaining devices.
Kevin
On 23 November 2016 at 16:51, Kevin Hilman khilman@baylibre.com wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
The power-domains until now were only concentrating on the idle state management of the device and this needs to change in order to reuse the infrastructure of power domains for active state management.
This patch introduces a new optional property for the consumers of the power-domains: domain-performance-state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their node directly. Otherwise the consumers can define their requirements with help of other infrastructure, for example the OPP table.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index e1650364b296..db42eacf8b5c 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller.
- power-domains : A phandle and PM domain specifier as defined by bindings of the power controller specified by phandle.
+Optional properties: +- domain-performance-state: A positive integer value representing the minimum
- performance level (of the parent domain) required by the consumer for its
- working. The integer value '1' represents the lowest performance level and the
- highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
Then, at least for this use case, we're talking about voltage, not some unspecified units.
But that makes me wonder, this performance state sounds like something that is changing dynamically at runtime, so why do you want to describe this statically in DT?
This sounds to me like the job of the genpd. When any device in the domain does its pm_runtime_get(), the domain could check the device frequency and see if it needs to change the domain voltage in order for that device to operate at that frequency. When the device goes away (using pm_runtime_put()) the domain can check again if it could lower the voltage and still meet the requirements of the remaining devices.
That's only part of the job. The device can change its frequency and as a result ask for a new voltage index while it is already running
Vincent
Kevin
Vincent Guittot vincent.guittot@linaro.org writes:
On 23 November 2016 at 16:51, Kevin Hilman khilman@baylibre.com wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote:
On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote: > Some platforms have the capability to configure the performance state of > their Power Domains. The performance levels are represented by positive > integer values, a lower value represents lower performance state. > > The power-domains until now were only concentrating on the idle state > management of the device and this needs to change in order to reuse the > infrastructure of power domains for active state management. > > This patch introduces a new optional property for the consumers of the > power-domains: domain-performance-state. > > If the consumers don't need the capability of switching to different > domain performance states at runtime, then they can simply define their > required domain performance state in their node directly. Otherwise the > consumers can define their requirements with help of other > infrastructure, for example the OPP table. > > Signed-off-by: Viresh Kumar viresh.kumar@linaro.org > --- > Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt > index e1650364b296..db42eacf8b5c 100644 > --- a/Documentation/devicetree/bindings/power/power_domain.txt > +++ b/Documentation/devicetree/bindings/power/power_domain.txt > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller. > - power-domains : A phandle and PM domain specifier as defined by bindings of > the power controller specified by phandle. > > +Optional properties: > +- domain-performance-state: A positive integer value representing the minimum > + performance level (of the parent domain) required by the consumer for its > + working. The integer value '1' represents the lowest performance level and the > + highest value represents the highest performance level.
How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
It seems like you are just making up numbers. Couldn't the domain performance level be an OPP in the sense that it is a collection of clock frequencies and voltage settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
Then, at least for this use case, we're talking about voltage, not some unspecified units.
But that makes me wonder, this performance state sounds like something that is changing dynamically at runtime, so why do you want to describe this statically in DT?
This sounds to me like the job of the genpd. When any device in the domain does its pm_runtime_get(), the domain could check the device frequency and see if it needs to change the domain voltage in order for that device to operate at that frequency. When the device goes away (using pm_runtime_put()) the domain can check again if it could lower the voltage and still meet the requirements of the remaining devices.
That's only part of the job. The device can change its frequency and as a result ask for a new voltage index while it is already running
That's fine. Use clock notifiers, or better use QoS (with notifiers) so that the genpd knows when any of those change.
Kevin
On 11/23, Kevin Hilman wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 23 November 2016 at 16:51, Kevin Hilman khilman@baylibre.com wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 22 November 2016 at 19:12, Kevin Hilman khilman@baylibre.com wrote:
Viresh Kumar viresh.kumar@linaro.org writes:
On 21-11-16, 09:07, Rob Herring wrote: > On Fri, Nov 18, 2016 at 02:53:12PM +0530, Viresh Kumar wrote: > > Some platforms have the capability to configure the performance state of > > their Power Domains. The performance levels are represented by positive > > integer values, a lower value represents lower performance state. > > > > The power-domains until now were only concentrating on the idle state > > management of the device and this needs to change in order to reuse the > > infrastructure of power domains for active state management. > > > > This patch introduces a new optional property for the consumers of the > > power-domains: domain-performance-state. > > > > If the consumers don't need the capability of switching to different > > domain performance states at runtime, then they can simply define their > > required domain performance state in their node directly. Otherwise the > > consumers can define their requirements with help of other > > infrastructure, for example the OPP table. > > > > Signed-off-by: Viresh Kumar viresh.kumar@linaro.org > > --- > > Documentation/devicetree/bindings/power/power_domain.txt | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt > > index e1650364b296..db42eacf8b5c 100644 > > --- a/Documentation/devicetree/bindings/power/power_domain.txt > > +++ b/Documentation/devicetree/bindings/power/power_domain.txt > > @@ -106,6 +106,12 @@ domain provided by the 'parent' power controller. > > - power-domains : A phandle and PM domain specifier as defined by bindings of > > the power controller specified by phandle. > > > > +Optional properties: > > +- domain-performance-state: A positive integer value representing the minimum > > + performance level (of the parent domain) required by the consumer for its > > + working. The integer value '1' represents the lowest performance level and the > > + highest value represents the highest performance level. > > How does one come up with the range of values?
Why would we need a range here? The value here represents the minimum 'state' and the assumption is that everything above that level would be fine. So the range is automatically: domain-performance-state -> MAX.
> It seems like you are > just making up numbers. Couldn't the domain performance level be an OPP > in the sense that it is a collection of clock frequencies and voltage > settings?
The clock is going to be handled by the device itself (at least for the case we have today) and the performance-state lies with the power-domain which is configured separately. If the performance level includes both clk and voltage, then why would we need to show the clock rates in the DT ? Wouldn't a performance level be enough in such cases?
I think the question is: what does the performance-level of a domain actually mean? Or, what are the units?
Depending on the SoC, there's probably a few things this could mean. It might mean is that an underlying bus/interconnect can be configured to guarantee a specific bandwidth or throughput. That in turn might mean that that bus/interconnect might have to be set at a specific frequency/voltage.
In your case, IIUC, you're just passing some magic value to some firmware running on a micro-controller, but under the hood that uC is probably configuring a frequency/voltage someplace.
In the case described by Viresh, it's only about setting the voltage of a power domain that is shared between different devices. these devices wants to run at different frequency (set by the devices) but we have to select a Volateg value that will match with the constraint of all devices (in this case the highest voltage)
Then, at least for this use case, we're talking about voltage, not some unspecified units.
In some cases we actually know the voltage of the domain and would want to put some voltage mapping in DT. For example, level 1 is voltage 2V and level 2 is voltage 2.5V. In other cases we don't know the voltage, all we know is the voltage "corner" which is a number from 0 to N that is translated into a voltage by the firmware but is otherwise unknown what that is outside of the firmware. In this case we've lost the units, but otherwise we're still interested in requesting some 'level' that the domain be operating in.
But that makes me wonder, this performance state sounds like something that is changing dynamically at runtime, so why do you want to describe this statically in DT?
This sounds to me like the job of the genpd. When any device in the domain does its pm_runtime_get(), the domain could check the device frequency and see if it needs to change the domain voltage in order for that device to operate at that frequency.
How do we check the device frequency? Does the domain need to know about the clocks for all devices that are in the domain and what clocks in there are contributing to the voltage requirement?
In out of tree solutions we've 'bucketized' the requirements of the devices into an array sized to the number of levels of the voltage domain. When a device requires a new level, we increment the new level and decrement the old level and then look for the largest non-zero index in the array. This is the inverse design of iterating over all devices in the domain to see what frequency they're running at to determine the voltage requirement. I guess using PM QoS would be similar here to do the aggregation and then tell the domain to go to that level.
When the device goes away (using pm_runtime_put()) the domain can check again if it could lower the voltage and still meet the requirements of the remaining devices.
That's only part of the job. The device can change its frequency and as a result ask for a new voltage index while it is already running
That's fine. Use clock notifiers, or better use QoS (with notifiers) so that the genpd knows when any of those change.
From my perspective clock notifiers are going to be ugly. At the
point we notify that a rate has changed we're deep in the clk framework holding the prepare mutex and we're calling it from an SRCU callback. If those callbacks need to turn on an i2c clk to communicate with some PMIC to change voltages we're in a world of pain due to our locking scheme. Maybe that's solvable with a different clk locking scheme though so I may be overly concerned here and everything will work out. Also, we don't have any notification that a clock is turned on or off right now, which sounds like we're going to assume is the case when a device gets pm_runtime_put().
On 23-11-16, 18:03, Stephen Boyd wrote:
On 11/23, Kevin Hilman wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 23 November 2016 at 16:51, Kevin Hilman khilman@baylibre.com wrote:
Then, at least for this use case, we're talking about voltage, not some unspecified units.
In some cases we actually know the voltage of the domain and would want to put some voltage mapping in DT. For example, level 1 is voltage 2V and level 2 is voltage 2.5V.
But even in these cases we wouldn't be using the voltage values within the kernel as we will be giving only a performance state to the M3 core, right?
In other cases we don't know the voltage, all we know is the voltage "corner" which is a number from 0 to N that is translated into a voltage by the firmware but is otherwise unknown what that is outside of the firmware. In this case we've lost the units, but otherwise we're still interested in requesting some 'level' that the domain be operating in.
But that makes me wonder, this performance state sounds like something that is changing dynamically at runtime, so why do you want to describe this statically in DT?
Each frequency a device can operate in has the requirement of minimum performance state of the domain and so we need these values in the DT.
This sounds to me like the job of the genpd. When any device in the domain does its pm_runtime_get(), the domain could check the device frequency and see if it needs to change the domain voltage in order for that device to operate at that frequency.
Also note that the performance index may be required to be changed before updating the frequency in case we are increasing the frequency which needs a higher performance index to be set.
How do we check the device frequency? Does the domain need to know about the clocks for all devices that are in the domain and what clocks in there are contributing to the voltage requirement?
In out of tree solutions we've 'bucketized' the requirements of the devices into an array sized to the number of levels of the voltage domain. When a device requires a new level, we increment the new level and decrement the old level and then look for the largest non-zero index in the array.
For such a design we need to know the index-size in advance and I am not sure if we should get anything like that from the DT.
This is the inverse design of iterating over all devices in the domain to see what frequency they're running at to determine the voltage requirement. I guess using PM QoS would be similar here to do the aggregation and then tell the domain to go to that level.
When the device goes away (using pm_runtime_put()) the domain can check again if it could lower the voltage and still meet the requirements of the remaining devices.
This will be done nevertheless.
That's only part of the job. The device can change its frequency and as a result ask for a new voltage index while it is already running
That's fine. Use clock notifiers, or better use QoS (with notifiers) so that the genpd knows when any of those change.
Yes genpd will be handling it all but it will surely need to know the performance index for each individual clock rate we support.
The way I have written the code for now is this with another QOS request type DEV_PM_QOS_PERFORMANCE:
+static int _generic_set_opp_pd(...) +{ + ...
+ /* Scaling up? Scale voltage before frequency */ + if (freq > old_freq) + dev_pm_qos_update_request(req, perf); + + clk_set_rate(...); + + if (freq < old_freq) + dev_pm_qos_update_request(req, perf); + + return 0; +}
And genpd is registering its notifier for DEV_PM_QOS_PERFORMANCE request type where it accumulates requests from all the devices and selects the highest one.
On 11/23/2016 08:40 PM, Viresh Kumar wrote:
On 23-11-16, 18:03, Stephen Boyd wrote:
On 11/23, Kevin Hilman wrote:
Vincent Guittot vincent.guittot@linaro.org writes:
On 23 November 2016 at 16:51, Kevin Hilman khilman@baylibre.com wrote:
Then, at least for this use case, we're talking about voltage, not some unspecified units.
In some cases we actually know the voltage of the domain and would want to put some voltage mapping in DT. For example, level 1 is voltage 2V and level 2 is voltage 2.5V.
But even in these cases we wouldn't be using the voltage values within the kernel as we will be giving only a performance state to the M3 core, right?
Nope. In these cases we need to set a certain voltage and we do that by requesting it via the M3 core.
On 28-11-16, 10:27, Stephen Boyd wrote:
On 11/23/2016 08:40 PM, Viresh Kumar wrote:
But even in these cases we wouldn't be using the voltage values within the kernel as we will be giving only a performance state to the M3 core, right?
Nope. In these cases we need to set a certain voltage and we do that by requesting it via the M3 core.
Don't we need something like this then ?
parent: power-controller@12340000 { compatible = "foo,power-controller"; reg = <0x12340000 0x1000>; #power-domain-cells = <0>; domain-performance-states = <&perf_state0>; };
perf_state0: performance_states { pstate1: pstate@1 { index = <1>; /* Optional */ microvolt = <970000 975000 985000>; }; pstate2: pstate@2 { index = <2>; /* Optional */ microvolt = <970000 975000 985000>; }; pstate3: pstate@3 { index = <3>; /* Optional */ microvolt = <970000 975000 985000>; }; }
cpus { cpu@0 { ... power-domain = <&parent>; operating-points-v2 = <&cpu0_opp_table>; }; };
cpu0_opp_table: opp_table0 { compatible = "operating-points-v2"; opp-shared;
opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; domain-performance-state = <&pstate1>; }; opp@1100000000 { opp-hz = /bits/ 64 <1100000000>; domain-performance-state = <&pstate2>; }; opp@1200000000 { opp-hz = /bits/ 64 <1200000000>; domain-performance-state = <&pstate3>; }; };
On 11/29, Viresh Kumar wrote:
On 28-11-16, 10:27, Stephen Boyd wrote:
On 11/23/2016 08:40 PM, Viresh Kumar wrote:
But even in these cases we wouldn't be using the voltage values within the kernel as we will be giving only a performance state to the M3 core, right?
Nope. In these cases we need to set a certain voltage and we do that by requesting it via the M3 core.
Don't we need something like this then ?
Perhaps. One question is if we consider a shared regulator as a regulator in the kernel, or if we want to hide the regulator behind some other API that aggregates the users of the voltage. I don't see how to draw the line clearly between a regulator and a power domain that modifies a regulator underneath. It seems like everything that's using a regulator on the SoC could be using a power domain instead and then we could be aggregating the voltage requirements outside of the regulator APIs.
The only other way I can think of doing it is by having the voltages in the OPP tables for each device. That gets sort of messy though because all the devices calling regulator_set_voltage() have to set the min voltage to be their required voltage and the max to be the global max voltage on the system. Otherwise a higher voltage may not be used while it may be required. Of course, we could encode that as the last value in the triplet and everything works.
parent: power-controller@12340000 { compatible = "foo,power-controller"; reg = <0x12340000 0x1000>; #power-domain-cells = <0>; domain-performance-states = <&perf_state0>; };
perf_state0: performance_states { pstate1: pstate@1 { index = <1>; /* Optional */ microvolt = <970000 975000 985000>; }; pstate2: pstate@2 { index = <2>; /* Optional */ microvolt = <970000 975000 985000>; }; pstate3: pstate@3 { index = <3>; /* Optional */ microvolt = <970000 975000 985000>; }; }
cpus { cpu@0 { ... power-domain = <&parent>; operating-points-v2 = <&cpu0_opp_table>; }; };
cpu0_opp_table: opp_table0 { compatible = "operating-points-v2"; opp-shared;
opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; domain-performance-state = <&pstate1>;
What do we do if the device is part of multiple domains? For example it may be part of two power domains for different pieces of the silicon within one device, and we may want to independently control those domains depending on the clock frequency.
}; opp@1100000000 { opp-hz = /bits/ 64 <1100000000>; domain-performance-state = <&pstate2>; }; opp@1200000000 { opp-hz = /bits/ 64 <1200000000>; domain-performance-state = <&pstate3>; };
On 29-11-16, 17:08, Stephen Boyd wrote:
Perhaps. One question is if we consider a shared regulator as a regulator in the kernel, or if we want to hide the regulator behind some other API that aggregates the users of the voltage. I don't see how to draw the line clearly between a regulator and a
Neither do I :)
power domain that modifies a regulator underneath. It seems like everything that's using a regulator on the SoC could be using a power domain instead and then we could be aggregating the voltage requirements outside of the regulator APIs.
If I am not wrong the regulator API chooses an intersection instead of the max requested voltage and so it may not fit very well in the use-case we are trying to solve.
The only other way I can think of doing it is by having the voltages in the OPP tables for each device. That gets sort of messy though because all the devices calling regulator_set_voltage() have to set the min voltage to be their required voltage and the max to be the global max voltage on the system. Otherwise a higher voltage may not be used while it may be required. Of course, we could encode that as the last value in the triplet and everything works.
Where will we get the performance levels in this case ?
What do we do if the device is part of multiple domains? For example it may be part of two power domains for different pieces of the silicon within one device, and we may want to independently control those domains depending on the clock frequency.
I thought the best way to handle would be to add a virtual domain for such a device. That domain shall be responsible for changing its parent based on the performance level selected, and at that point only the recalculations for both the parents should happen to select the new best performance level.
@Rob / Kevin: Do you have any inputs on the things we are discussing here? I want to involve you guys as early as possible, or we will come back to this again :(
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are represented by positive integer values, a lower value represents lower performance state.
If the consumers don't need the capability of switching to different domain performance states at runtime, then they can simply define their required domain performance state in their nodes directly.
But if the device needs the capability of switching to different domain performance states, as they may need to support different clock rates, then the per OPP node can be used to contain that information.
This patch introduces the domain-performance-state (already defined by Power Domain bindings) to the per OPP node. It can contain a single positive integer value. An example is also provided.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Documentation/devicetree/bindings/opp/opp.txt | 57 +++++++++++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt index ee91cbdd95ee..9fb7804f784d 100644 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ b/Documentation/devicetree/bindings/opp/opp.txt @@ -145,6 +145,14 @@ properties.
- status: Marks the node enabled/disabled.
+- domain-performance-state: A positive integer value representing the minimum + performance level (of the parent domain) required by the consumer for the + working of respective OPP. The integer value '1' represents the lowest + performance level and the highest value represents the highest performance + level. The consumer device node (which contains phandle to the OPP table in + its "operating-points-v2" property) should have its "power-domains" property + set as well. + Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
/ { @@ -517,3 +525,52 @@ Example 5: opp-supported-hw }; }; }; + +Example 7: domain-Performance-state: +(example: For 1GHz require domain state 1 and for 1.1 & 1.2 GHz require state 2) + +/ { + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a9"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp_table>; + power-domains = <&foo_pd>; + }; + }; + + cpu0_opp_table: opp_table0 { + compatible = "operating-points-v2"; + opp-shared; + + opp@1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <970000 975000 985000>; + opp-microamp = <70000>; + clock-latency-ns = <300000>; + opp-suspend; + domain-performance-state = <1>; + }; + opp@1100000000 { + opp-hz = /bits/ 64 <1100000000>; + opp-microvolt = <980000 1000000 1010000>; + opp-microamp = <80000>; + clock-latency-ns = <310000>; + domain-performance-state = <2>; + }; + opp@1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + domain-performance-state = <2>; + }; + }; +};
linaro-kernel@lists.linaro.org