Hi,
Here is the 7th version of the series and it looks very different from whatever is sent until now. Almost a rewrite.
Here is a brief summary of the problem I am trying to solve.
Some platforms have the capability to configure the performance state of their power domains. The process of configuring the performance state is pretty much platform dependent and we may need to work with a wide range of configurables. For some platforms, like Qcom, it can be a positive integer value alone, while in other cases it can be voltage levels, etc.
The power-domain framework until now was only designed for the idle state management of the device and this needs to change in order to reuse the power-domain framework for active state management of the devices.
Solution:
Kevin suggested in V6 that we should wait for a while before introducing any new binding (power-domain-opp) to the OPP table for this stuff and translate the device requirements into a performance index from within the power domain driver as it already knows which devices it supports.
If we do that, then there is also no need to represent the performance states of the power domains in the DT. Keep that as well in the driver for now.
This simplified things a lot and we can make things work with just two patches. The first one updates the genpd framework to supply new APIs and the second patch uses them from the OPP core. Its quite straight forward and simple.
The ideal way is still to get the relation between device and domain states via the DT instead of platform code, but that can be done incrementally later once we have some users for it. It would be much simpler to get these two patches merged. The code never got any real reviews in the last 6 versions as we were stuck with bindings :)
This is tested currently by hacking the kernel a bit with virtual power-domains for the dual A15 exynos platform.
Driver code: ------------
Here is some sample power-domain driver code that I have. It only supports a single device for now, CPU.
int pd_get_performance(struct device *dev, unsigned long rate) { struct device *cpu_dev = get_cpu_device(0);
if (cpu_dev != dev) return -EINVAL;
if (rate <= 500000000) return 1; else if (rate <= 700000000) return 2; else if (rate <= 900000000) return 3; else if (rate <= 1200000000) return 4; else if (rate <= 1600000000) return 5; else return 6; }
static int pd_set_performance(struct generic_pm_domain *domain, unsigned int state) { /* Set performance of the domain in platform dependent way */
return 0; }
static const struct of_device_id pm_domain_of_match[] __initconst = { { .compatible = "foo,genpd", }, { }, };
static int __init genpd_test_init(void) { struct device *dev = get_cpu_device(0); struct device_node *np; const struct of_device_id *match; int n; int ret;
for_each_matching_node_and_match(np, pm_domain_of_match, &match) { pd.name = kstrdup_const(strrchr(np->full_name, '/') + 1, GFP_KERNEL); if (!pd.name) { of_node_put(np); return -ENOMEM; }
pd.get_performance_state = pd_get_performance; pd.set_performance_state = pd_set_performance;
pm_genpd_init(&pd, NULL, false); of_genpd_add_provider_simple(np, &pd); }
ret = dev_pm_domain_attach(dev, false);
return ret; }
Pushed here as well:
https://git.linaro.org/people/viresh.kumar/linux.git/log/?h=opp/genpd-perfor...
Rebased on: pm/linux-next + some OPP cleanups (https://marc.info/?l=linux-kernel&m=149499607030364&w=2)
V6->V7: - Almost a rewrite, only two patches against 9 in earlier version. - No bindings updated now and domain's performance state aren't passed via DT for now (until we know how users are going to use it). - We also skipped the QoS framework completely and new APIs are provided directly by genpd.
V5->V6: - Use freq/voltage in OPP table as it is for power domain and don't create "domain-performance-level" property - Create new "power-domain-opp" property for the devices. - Take care of domain providers that provide multiple domains and extend "operating-points-v2" property to contain a list of phandles - Update code according to those bindings.
V4->V5: - Only 3 patches were resent and 2 of them are Acked from Ulf.
V3->V4: - Use OPP table for genpd devices as well. - Add struct device to genpd, in order to reuse OPP infrastructure. - Based over: https://marc.info/?l=linux-kernel&m=148972988002317&w=2 - Fixed examples in DT document to have voltage in target,min,max order.
V2->V3: - Based over latest pm/linux-next - Bindings and code are merged together - Lots of updates in bindings - the performance-states node is present within the power-domain now, instead of its phandle. - performance-level property is replaced by "reg". - domain-performance-state property of the consumers contain an integer value now instead of phandle. - Lots of updates to the code as well - Patch "PM / QOS: Add default case to the switch" is merged with other patches and the code is changed a bit as well. - Don't pass 'type' to dev_pm_qos_add_notifier(), rather handle all notifiers with a single list. A new patch is added for that. - The OPP framework patch can be applied now and has proper SoB from me. - Dropped "PM / domain: Save/restore performance state at runtime suspend/resume". - Drop all WARN(). - Tested-by Rajendra nayak.
V1->V2: - Based over latest pm/linux-next - It is mostly a resend of what is sent earlier as this series hasn't got any reviews so far and Rafael suggested that its better I resend it. - Only the 4/6 patch got an update, which was shared earlier as reply to V1 as well. It has got several fixes for taking care of power domain hierarchy, etc.
-- viresh
Viresh Kumar (2): PM / Domains: Add support to select performance-state of domains PM / OPP: Support updating performance state of device's power domains
drivers/base/power/domain.c | 172 ++++++++++++++++++++++++++++++++++++++++++ drivers/base/power/opp/core.c | 48 +++++++++++- drivers/base/power/opp/opp.h | 2 + include/linux/pm_domain.h | 19 +++++ 4 files changed, 240 insertions(+), 1 deletion(-)
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are identified by positive integer values, a lower value represents lower performance state.
This patch adds a new genpd API: pm_genpd_update_performance_state(). The caller passes the affected device and the frequency representing its next DVFS state.
The power domains get two new callbacks:
- get_performance_state(): This is called by the genpd core to retrieve the performance state (integer value) corresponding to a target frequency for the device. This state is used by the genpd core to find the highest requested state by all the devices powered by a domain.
- set_performance_state(): The highest state retrieved from above interface is then passed to this callback to finally program the performance state of the power domain.
The power domains can avoid supplying these callbacks, if they don't support setting performance-states.
A power domain may have only get_performance_state() callback, if it doesn't have the capability of changing the performance state itself but someone in its parent hierarchy has.
A power domain may have only set_performance_state(), it doesn't have any direct devices below it and only subdomains. And so the get_performance_state() will never get called.
The more common case would be to have both the callbacks set.
Another API, pm_genpd_has_performance_state(), is also added to let other parts of the kernel check if the power domain of a device supports performance-states or not.
Note that the same performance level as returned by the parent domain of a device is used for every other domain in parent hierarchy.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/domain.c | 172 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 19 +++++ 2 files changed, 191 insertions(+)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 71c95ad808d5..56b666eb1a71 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -466,6 +466,166 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, return NOTIFY_DONE; }
+/* + * Returns true if anyone in genpd's parent hierarchy has + * set_performance_state() set. + */ +static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd) +{ + struct gpd_link *link; + + if (genpd->set_performance_state) + return true; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + if (genpd_has_set_performance_state(link->master)) + return true; + } + + return false; +} + +/** + * pm_genpd_has_performance_state - Checks if power domain does performance + * state management. + * + * @dev: Device whose power domain is getting inquired. + */ +bool pm_genpd_has_performance_state(struct device *dev) +{ + struct generic_pm_domain *genpd = dev_to_genpd(dev); + + /* The parent domain must have set get_performance_state() */ + if (!IS_ERR(genpd) && genpd->get_performance_state) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state); + +static int genpd_update_performance_state(struct generic_pm_domain *genpd, + int depth) +{ + struct generic_pm_domain_data *pd_data; + struct generic_pm_domain *subdomain; + struct pm_domain_data *pdd; + unsigned int state = 0, prev; + struct gpd_link *link; + int ret; + + /* Traverse all devices within the domain */ + list_for_each_entry(pdd, &genpd->dev_list, list_node) { + pd_data = to_gpd_data(pdd); + + if (pd_data->performance_state > state) + state = pd_data->performance_state; + } + + /* Traverse all subdomains within the domain */ + list_for_each_entry(link, &genpd->master_links, master_node) { + subdomain = link->slave; + + if (subdomain->performance_state > state) + state = subdomain->performance_state; + } + + if (genpd->performance_state == state) + return 0; + + /* + * Not all domains support updating performance state. Move on to their + * parent domains in that case. + */ + if (genpd->set_performance_state) { + ret = genpd->set_performance_state(genpd, state); + if (!ret) + genpd->performance_state = state; + + return ret; + } + + prev = genpd->performance_state; + genpd->performance_state = state; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + struct generic_pm_domain *master = link->master; + + genpd_lock_nested(master, depth + 1); + ret = genpd_update_performance_state(master, depth + 1); + genpd_unlock(master); + + if (!ret) + continue; + + /* + * Preserve earlier state for all the power domains which have + * been updated. + */ + genpd->performance_state = prev; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + struct generic_pm_domain *nmaster = link->master; + + if (nmaster == master) + break; + + genpd_lock_nested(nmaster, depth + 1); + genpd_update_performance_state(master, depth + 1); + genpd_unlock(master); + } + + return ret; + } + + return 0; +} + +/** + * pm_genpd_update_performance_state - Update performance state of parent power + * domain for the target frequency for the device. + * + * @dev: Device for which the performance-state needs to be adjusted. + * @rate: Device's next frequency. + */ +int pm_genpd_update_performance_state(struct device *dev, unsigned long rate) +{ + struct generic_pm_domain *genpd = ERR_PTR(-ENODATA); + struct generic_pm_domain_data *gpd_data; + int ret, prev; + + spin_lock_irq(&dev->power.lock); + if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { + gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); + genpd = dev_to_genpd(dev); + } + spin_unlock_irq(&dev->power.lock); + + if (IS_ERR(genpd)) + return -ENODEV; + + if (!genpd->get_performance_state) { + dev_err(dev, "Performance states aren't supported by power domain\n"); + return -EINVAL; + } + + genpd_lock(genpd); + ret = genpd->get_performance_state(dev, rate); + if (ret < 0) + goto unlock; + + prev = gpd_data->performance_state; + gpd_data->performance_state = ret; + ret = genpd_update_performance_state(genpd, 0); + if (ret) + gpd_data->performance_state = prev; + +unlock: + genpd_unlock(genpd); + + return ret; +} +EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state); + /** * genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0. * @work: Work structure used for scheduling the execution of this function. @@ -1156,6 +1316,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->td.constraint_changed = true; gpd_data->td.effective_constraint_ns = -1; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; + gpd_data->performance_state = 0;
spin_lock_irq(&dev->power.lock);
@@ -1502,6 +1663,17 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->dev_ops.start = pm_clk_resume; }
+ /* + * If this genpd supports getting performance state, then someone in its + * hierarchy must support setting it too. + */ + if (genpd->get_performance_state && + !genpd_has_set_performance_state(genpd)) { + pr_err("%s: genpd doesn't support updating performance state\n", + genpd->name); + return -ENODEV; + } + /* Always-on domains must be powered on at initialization. */ if (genpd_is_always_on(genpd) && !genpd_status_on(genpd)) return -EINVAL; diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index b7803a251044..74502664ea33 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -63,8 +63,12 @@ struct generic_pm_domain { unsigned int device_count; /* Number of devices */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */ + unsigned int performance_state; /* Max requested performance state */ int (*power_off)(struct generic_pm_domain *domain); int (*power_on)(struct generic_pm_domain *domain); + int (*get_performance_state)(struct device *dev, unsigned long rate); + int (*set_performance_state)(struct generic_pm_domain *domain, + unsigned int state); struct gpd_dev_ops dev_ops; s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ bool max_off_time_changed; @@ -118,6 +122,7 @@ struct generic_pm_domain_data { struct pm_domain_data base; struct gpd_timing_data td; struct notifier_block nb; + unsigned int performance_state; void *data; };
@@ -148,6 +153,9 @@ extern int pm_genpd_remove(struct generic_pm_domain *genpd);
extern struct dev_power_governor simple_qos_governor; extern struct dev_power_governor pm_domain_always_on_gov; +extern bool pm_genpd_has_performance_state(struct device *dev); +extern int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate); #else
static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev) @@ -185,6 +193,17 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd) return -ENOTSUPP; }
+static inline bool pm_genpd_has_performance_state(struct device *dev) +{ + return false; +} + +static inline int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate) +{ + return -ENOTSUPP; +} + #define simple_qos_governor (*(struct dev_power_governor *)(NULL)) #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL)) #endif
On 17 May 2017 at 09:02, Viresh Kumar viresh.kumar@linaro.org wrote:
Some platforms have the capability to configure the performance state of their Power Domains. The performance levels are identified by positive integer values, a lower value represents lower performance state.
This patch adds a new genpd API: pm_genpd_update_performance_state(). The caller passes the affected device and the frequency representing its next DVFS state.
The power domains get two new callbacks:
get_performance_state(): This is called by the genpd core to retrieve the performance state (integer value) corresponding to a target frequency for the device. This state is used by the genpd core to find the highest requested state by all the devices powered by a domain.
set_performance_state(): The highest state retrieved from above interface is then passed to this callback to finally program the performance state of the power domain.
The power domains can avoid supplying these callbacks, if they don't support setting performance-states.
A power domain may have only get_performance_state() callback, if it doesn't have the capability of changing the performance state itself but someone in its parent hierarchy has.
A power domain may have only set_performance_state(), it doesn't have any direct devices below it and only subdomains. And so the get_performance_state() will never get called.
The more common case would be to have both the callbacks set.
Another API, pm_genpd_has_performance_state(), is also added to let other parts of the kernel check if the power domain of a device supports performance-states or not.
Note that the same performance level as returned by the parent domain of a device is used for every other domain in parent hierarchy.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Apologize for the delay! I was kind of waiting for Kevin to jump in as he has already been in the loop.
Anyway, let me give you my thoughts about his. Starting with some comment, then an overall thought about this.
drivers/base/power/domain.c | 172 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 19 +++++ 2 files changed, 191 insertions(+)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 71c95ad808d5..56b666eb1a71 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -466,6 +466,166 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, return NOTIFY_DONE; }
+/*
- Returns true if anyone in genpd's parent hierarchy has
- set_performance_state() set.
- */
+static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd) +{
struct gpd_link *link;
if (genpd->set_performance_state)
return true;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
if (genpd_has_set_performance_state(link->master))
return true;
}
return false;
+}
+/**
- pm_genpd_has_performance_state - Checks if power domain does performance
- state management.
- @dev: Device whose power domain is getting inquired.
- */
+bool pm_genpd_has_performance_state(struct device *dev) +{
struct generic_pm_domain *genpd = dev_to_genpd(dev);
/* The parent domain must have set get_performance_state() */
if (!IS_ERR(genpd) && genpd->get_performance_state)
return true;
return false;
+} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);
There is no protection from locking point view and there is no validation the pm_domain pointer being a valid genpd.
Please study the locks that is used in genpd more carefully and try to figure out what is needed here. I know it's a bit tricky, but the above just isn't sufficient.
+static int genpd_update_performance_state(struct generic_pm_domain *genpd,
int depth)
+{
struct generic_pm_domain_data *pd_data;
struct generic_pm_domain *subdomain;
struct pm_domain_data *pdd;
unsigned int state = 0, prev;
struct gpd_link *link;
int ret;
/* Traverse all devices within the domain */
list_for_each_entry(pdd, &genpd->dev_list, list_node) {
pd_data = to_gpd_data(pdd);
if (pd_data->performance_state > state)
state = pd_data->performance_state;
}
/* Traverse all subdomains within the domain */
list_for_each_entry(link, &genpd->master_links, master_node) {
subdomain = link->slave;
if (subdomain->performance_state > state)
state = subdomain->performance_state;
You need to take the genpd lock of the subdomain before assigning this, right?
Is there a way to do this the opposite direction? It's important that we walk upwards in the topology, as we need to preserve the order for how genpd locks are taken.
The way we do it is always taking the lock of the subdomain first, then its master, then the master's master and so own. You can have look at for example genpd_power_on().
}
if (genpd->performance_state == state)
return 0;
/*
* Not all domains support updating performance state. Move on to their
* parent domains in that case.
*/
if (genpd->set_performance_state) {
ret = genpd->set_performance_state(genpd, state);
if (!ret)
genpd->performance_state = state;
return ret;
}
prev = genpd->performance_state;
genpd->performance_state = state;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
struct generic_pm_domain *master = link->master;
genpd_lock_nested(master, depth + 1);
ret = genpd_update_performance_state(master, depth + 1);
genpd_unlock(master);
if (!ret)
continue;
/*
* Preserve earlier state for all the power domains which have
* been updated.
*/
genpd->performance_state = prev;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
struct generic_pm_domain *nmaster = link->master;
if (nmaster == master)
break;
genpd_lock_nested(nmaster, depth + 1);
genpd_update_performance_state(master, depth + 1);
genpd_unlock(master);
}
I think some comment on how to walk the genpd topology to fetch the current selected state would be nice. I am not sure I get it by just looking at the code.
First you are walking master links then slave links. Then you start over again with the slave links in the error path.
Normally in the error path we use list_for_each_entry_continue_reverse() to restore data, so this looks a bit weird to me. Are you sure the error path is correct?
return ret;
}
return 0;
+}
+/**
- pm_genpd_update_performance_state - Update performance state of parent power
- domain for the target frequency for the device.
- @dev: Device for which the performance-state needs to be adjusted.
- @rate: Device's next frequency.
- */
+int pm_genpd_update_performance_state(struct device *dev, unsigned long rate) +{
struct generic_pm_domain *genpd = ERR_PTR(-ENODATA);
struct generic_pm_domain_data *gpd_data;
int ret, prev;
spin_lock_irq(&dev->power.lock);
if (dev->power.subsys_data && dev->power.subsys_data->domain_data) {
gpd_data = to_gpd_data(dev->power.subsys_data->domain_data);
genpd = dev_to_genpd(dev);
}
spin_unlock_irq(&dev->power.lock);
So even if you fetch the genpd pointer here, you don't protect the device from detached from its genpd after this point (thus freeing the domain data).
To move this issue even further, there is no guarantee that the genpd don't get removed after this point, but you are still operating on the pointer...
if (IS_ERR(genpd))
return -ENODEV;
if (!genpd->get_performance_state) {
dev_err(dev, "Performance states aren't supported by power domain\n");
return -EINVAL;
}
genpd_lock(genpd);
ret = genpd->get_performance_state(dev, rate);
if (ret < 0)
goto unlock;
prev = gpd_data->performance_state;
gpd_data->performance_state = ret;
ret = genpd_update_performance_state(genpd, 0);
if (ret)
gpd_data->performance_state = prev;
+unlock:
genpd_unlock(genpd);
return ret;
+} +EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state);
/**
- genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0.
- @work: Work structure used for scheduling the execution of this function.
@@ -1156,6 +1316,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->td.constraint_changed = true; gpd_data->td.effective_constraint_ns = -1; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier;
gpd_data->performance_state = 0; spin_lock_irq(&dev->power.lock);
@@ -1502,6 +1663,17 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->dev_ops.start = pm_clk_resume; }
/*
* If this genpd supports getting performance state, then someone in its
* hierarchy must support setting it too.
*/
if (genpd->get_performance_state &&
!genpd_has_set_performance_state(genpd)) {
pr_err("%s: genpd doesn't support updating performance state\n",
genpd->name);
return -ENODEV;
}
/* Always-on domains must be powered on at initialization. */ if (genpd_is_always_on(genpd) && !genpd_status_on(genpd)) return -EINVAL;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index b7803a251044..74502664ea33 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -63,8 +63,12 @@ struct generic_pm_domain { unsigned int device_count; /* Number of devices */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */
unsigned int performance_state; /* Max requested performance state */ int (*power_off)(struct generic_pm_domain *domain); int (*power_on)(struct generic_pm_domain *domain);
int (*get_performance_state)(struct device *dev, unsigned long rate);
int (*set_performance_state)(struct generic_pm_domain *domain,
unsigned int state); struct gpd_dev_ops dev_ops; s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ bool max_off_time_changed;
@@ -118,6 +122,7 @@ struct generic_pm_domain_data { struct pm_domain_data base; struct gpd_timing_data td; struct notifier_block nb;
unsigned int performance_state; void *data;
};
@@ -148,6 +153,9 @@ extern int pm_genpd_remove(struct generic_pm_domain *genpd);
extern struct dev_power_governor simple_qos_governor; extern struct dev_power_governor pm_domain_always_on_gov; +extern bool pm_genpd_has_performance_state(struct device *dev); +extern int pm_genpd_update_performance_state(struct device *dev,
unsigned long rate);
#else
static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev) @@ -185,6 +193,17 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd) return -ENOTSUPP; }
+static inline bool pm_genpd_has_performance_state(struct device *dev) +{
return false;
+}
+static inline int pm_genpd_update_performance_state(struct device *dev,
unsigned long rate)
+{
return -ENOTSUPP;
+}
#define simple_qos_governor (*(struct dev_power_governor *)(NULL)) #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL))
#endif
2.13.0.303.g4ebf3021692d
As you probably figured out from my above comments, the solution here isn't really working.
Adding these new APIs, more or less requires us to manage reference counting on the genpd for the device. I haven't thought about that in great detail, just that we of course need to be careful if we want to introduce something like that, to make sure all aspect of locking becomes correct.
Moreover adding reference counting touches the idea of adding APIs to genpd, to allow users/drivers to control their genpd explicitly. This is required to support > 1 pm_domain per device. I assume you have been following that discussion for genpd on linux-pm as well. My point is that, we should consider that while inventing *any* new APIs.
Kind regards Uffe
On 07-06-17, 14:26, Ulf Hansson wrote:
On 17 May 2017 at 09:02, Viresh Kumar viresh.kumar@linaro.org wrote:
Apologize for the delay! I was kind of waiting for Kevin to jump in as he has already been in the loop.
Thanks for getting these reviewed eventually :)
+bool pm_genpd_has_performance_state(struct device *dev) +{
struct generic_pm_domain *genpd = dev_to_genpd(dev);
/* The parent domain must have set get_performance_state() */
if (!IS_ERR(genpd) && genpd->get_performance_state)
return true;
return false;
+} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);
There is no protection from locking point view
I hope you are talking about not taking genpd_lock(genpd) here ? If yes, can you please explain what kind of race will that protect here with an example ?
As I really fail to see how can this be racy at all as we are just checking if a callback is set or not.
and there is no validation the pm_domain pointer being a valid genpd.
Isn't dev_to_genpd() already doing that? And the check !IS_ERR(genpd) is all we need. Sorry if I missed something really stupid.
Please study the locks that is used in genpd more carefully and try to figure out what is needed here. I know it's a bit tricky, but the above just isn't sufficient.
Sure.
+static int genpd_update_performance_state(struct generic_pm_domain *genpd,
int depth)
+{
struct generic_pm_domain_data *pd_data;
struct generic_pm_domain *subdomain;
struct pm_domain_data *pdd;
unsigned int state = 0, prev;
struct gpd_link *link;
int ret;
/* Traverse all devices within the domain */
list_for_each_entry(pdd, &genpd->dev_list, list_node) {
pd_data = to_gpd_data(pdd);
if (pd_data->performance_state > state)
state = pd_data->performance_state;
}
/* Traverse all subdomains within the domain */
list_for_each_entry(link, &genpd->master_links, master_node) {
subdomain = link->slave;
if (subdomain->performance_state > state)
state = subdomain->performance_state;
You need to take the genpd lock of the subdomain before assigning this, right?
Maybe not, but sure there is one problem you highlighted very well. subdomain->performance_state might have been updated from the same function (from a different thread) and we may end up using that incorrectly. Thanks for noticing the stupid mistake.
Is there a way to do this the opposite direction? It's important that we walk upwards in the topology, as we need to preserve the order for how genpd locks are taken.
At least I couldn't get around a solution which would be able to do this. All these requests generate from a device and its domain may have subdomains and devices. Though, we can add a constraint (see below) which can get us around this for now at least.
The way we do it is always taking the lock of the subdomain first, then its master, then the master's master and so own. You can have look at for example genpd_power_on().
Right. Locking was surely messy here.
}
if (genpd->performance_state == state)
return 0;
/*
* Not all domains support updating performance state. Move on to their
* parent domains in that case.
*/
if (genpd->set_performance_state) {
ret = genpd->set_performance_state(genpd, state);
if (!ret)
genpd->performance_state = state;
return ret;
}
prev = genpd->performance_state;
genpd->performance_state = state;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
struct generic_pm_domain *master = link->master;
genpd_lock_nested(master, depth + 1);
ret = genpd_update_performance_state(master, depth + 1);
genpd_unlock(master);
if (!ret)
continue;
/*
* Preserve earlier state for all the power domains which have
* been updated.
*/
genpd->performance_state = prev;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
struct generic_pm_domain *nmaster = link->master;
if (nmaster == master)
break;
genpd_lock_nested(nmaster, depth + 1);
genpd_update_performance_state(master, depth + 1);
genpd_unlock(master);
}
I think some comment on how to walk the genpd topology to fetch the current selected state would be nice. I am not sure I get it by just looking at the code.
First you are walking master links then slave links. Then you start over again with the slave links in the error path.
Sure, a comment might be helpful.
+int pm_genpd_update_performance_state(struct device *dev, unsigned long rate) +{
struct generic_pm_domain *genpd = ERR_PTR(-ENODATA);
struct generic_pm_domain_data *gpd_data;
int ret, prev;
spin_lock_irq(&dev->power.lock);
if (dev->power.subsys_data && dev->power.subsys_data->domain_data) {
gpd_data = to_gpd_data(dev->power.subsys_data->domain_data);
genpd = dev_to_genpd(dev);
}
spin_unlock_irq(&dev->power.lock);
So even if you fetch the genpd pointer here, you don't protect the device from detached from its genpd after this point (thus freeing the domain data).
To move this issue even further, there is no guarantee that the genpd don't get removed after this point, but you are still operating on the pointer...
That's another blunder. I agree. Thanks.
As you probably figured out from my above comments, the solution here isn't really working.
Adding these new APIs, more or less requires us to manage reference counting on the genpd for the device.
Hmm, I am not able to understand why would we need that with this series :(
I haven't thought about that in great detail, just that we of course need to be careful if we want to introduce something like that, to make sure all aspect of locking becomes correct.
Moreover adding reference counting touches the idea of adding APIs to genpd, to allow users/drivers to control their genpd explicitly. This is required to support > 1 pm_domain per device. I assume you have been following that discussion for genpd on linux-pm as well. My point is that, we should consider that while inventing *any* new APIs.
Not very deeply but yeah I have seen that thread.
So, I couldn't find way to get the locking thing fixed to avoid deadlocks but we can live with a constraint (if you guys think it is fine) to have this solved.
Constraint: Update performance state API wouldn't support power domains that don't provide a set_performance_state() callback and have multiple parent power domains.
Why: In order to guarantee that we read and update the performance_state of subdomains and devices properly, we need to do that under the parent domain's lock. And if we have multiple parent power domains, then we would be required to get them locked at once before updating subdomain's state and that would be a nightmare.
And here is the new diff based on above (only compile tested):
--- drivers/base/power/domain.c | 177 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 19 +++++ 2 files changed, 196 insertions(+)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 71c95ad808d5..cf090adc7967 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -466,6 +466,171 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, return NOTIFY_DONE; }
+/* + * Returns true if anyone in genpd's parent hierarchy has + * set_performance_state() set. + */ +static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd) +{ + struct gpd_link *link; + + if (genpd->set_performance_state) + return true; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + if (genpd_has_set_performance_state(link->master)) + return true; + } + + return false; +} + +/** + * pm_genpd_has_performance_state - Checks if power domain does performance + * state management. + * + * @dev: Device whose power domain is getting inquired. + */ +bool pm_genpd_has_performance_state(struct device *dev) +{ + struct generic_pm_domain *genpd = dev_to_genpd(dev); + + /* The parent domain must have set get_performance_state() */ + if (!IS_ERR(genpd) && genpd->get_performance_state) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state); + +/* + * Re-evaluate performance state of a power domain. Finds the highest requested + * performance state by the devices and subdomains within the power domain and + * then tries to change its performance state. If the power domain doesn't have + * a set_performance_state() callback, then we move the request to its parent + * power domain (with constraints explained below). + */ +static int genpd_update_performance_state(struct generic_pm_domain *genpd, + int depth) +{ + struct generic_pm_domain_data *pd_data; + struct generic_pm_domain *master; + struct generic_pm_domain *subdomain; + struct pm_domain_data *pdd; + unsigned int state = 0, prev; + struct gpd_link *link; + int ret; + + /* Traverse all devices within the domain */ + list_for_each_entry(pdd, &genpd->dev_list, list_node) { + pd_data = to_gpd_data(pdd); + + if (pd_data->performance_state > state) + state = pd_data->performance_state; + } + + /* Traverse all subdomains within the domain */ + list_for_each_entry(link, &genpd->master_links, master_node) { + subdomain = link->slave; + + if (subdomain->performance_state > state) + state = subdomain->performance_state; + } + + if (genpd->performance_state == state) + return 0; + + /* + * Not all domains support updating performance state. Move on to their + * parent domains in that case. + */ + if (genpd->set_performance_state) { + ret = genpd->set_performance_state(genpd, state); + if (!ret) + genpd->performance_state = state; + + return ret; + } + + /* + * Only support subdomains with single parent power domain to avoid + * complicated locking problems. We only take locks in the upwards + * direction (i.e. subdomain first, followed by parent domain). We also + * need to guarantee that the performance_state field of devices and + * subdomains of a domain is not updated (or read) without the domain's + * lock and that wouldn't work with multiple parents of a subdomain. + */ + if (!list_is_singular(&genpd->slave_links)) { + pr_err("%s: Domain has multiple parents, can't update performance state\n", + genpd->name); + return -EINVAL; + } + + link = list_first_entry(&genpd->slave_links, struct gpd_link, slave_node); + master = link->master; + + genpd_lock_nested(master, depth + 1); + prev = genpd->performance_state; + genpd->performance_state = state; + + ret = genpd_update_performance_state(master, depth + 1); + if (ret) + genpd->performance_state = prev; + genpd_unlock(master); + + return ret; +} + +/** + * pm_genpd_update_performance_state - Update performance state of parent power + * domain for the target frequency for the device. + * + * @dev: Device for which the performance-state needs to be adjusted. + * @rate: Device's next frequency. + */ +int pm_genpd_update_performance_state(struct device *dev, unsigned long rate) +{ + struct generic_pm_domain *genpd = ERR_PTR(-ENODATA); + struct generic_pm_domain_data *gpd_data; + int ret, prev; + + spin_lock_irq(&dev->power.lock); + if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { + gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); + genpd = dev_to_genpd(dev); + } + + if (IS_ERR(genpd)) { + ret = -ENODEV; + goto unlock; + } + + if (!genpd->get_performance_state) { + dev_err(dev, "Performance states aren't supported by power domain\n"); + ret = -EINVAL; + goto unlock; + } + + genpd_lock(genpd); + ret = genpd->get_performance_state(dev, rate); + if (ret < 0) + goto unlock_genpd; + + prev = gpd_data->performance_state; + gpd_data->performance_state = ret; + ret = genpd_update_performance_state(genpd, 0); + if (ret) + gpd_data->performance_state = prev; + +unlock_genpd: + genpd_unlock(genpd); +unlock: + spin_unlock_irq(&dev->power.lock); + + return ret; +} +EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state); + /** * genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0. * @work: Work structure used for scheduling the execution of this function. @@ -1156,6 +1321,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->td.constraint_changed = true; gpd_data->td.effective_constraint_ns = -1; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; + gpd_data->performance_state = 0;
spin_lock_irq(&dev->power.lock);
@@ -1502,6 +1668,17 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->dev_ops.start = pm_clk_resume; }
+ /* + * If this genpd supports getting performance state, then someone in its + * hierarchy must support setting it too. + */ + if (genpd->get_performance_state && + !genpd_has_set_performance_state(genpd)) { + pr_err("%s: genpd doesn't support updating performance state\n", + genpd->name); + return -ENODEV; + } + /* Always-on domains must be powered on at initialization. */ if (genpd_is_always_on(genpd) && !genpd_status_on(genpd)) return -EINVAL; diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index b7803a251044..74502664ea33 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -63,8 +63,12 @@ struct generic_pm_domain { unsigned int device_count; /* Number of devices */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */ + unsigned int performance_state; /* Max requested performance state */ int (*power_off)(struct generic_pm_domain *domain); int (*power_on)(struct generic_pm_domain *domain); + int (*get_performance_state)(struct device *dev, unsigned long rate); + int (*set_performance_state)(struct generic_pm_domain *domain, + unsigned int state); struct gpd_dev_ops dev_ops; s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ bool max_off_time_changed; @@ -118,6 +122,7 @@ struct generic_pm_domain_data { struct pm_domain_data base; struct gpd_timing_data td; struct notifier_block nb; + unsigned int performance_state; void *data; };
@@ -148,6 +153,9 @@ extern int pm_genpd_remove(struct generic_pm_domain *genpd);
extern struct dev_power_governor simple_qos_governor; extern struct dev_power_governor pm_domain_always_on_gov; +extern bool pm_genpd_has_performance_state(struct device *dev); +extern int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate); #else
static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev) @@ -185,6 +193,17 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd) return -ENOTSUPP; }
+static inline bool pm_genpd_has_performance_state(struct device *dev) +{ + return false; +} + +static inline int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate) +{ + return -ENOTSUPP; +} + #define simple_qos_governor (*(struct dev_power_governor *)(NULL)) #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL)) #endif
[...]
As you probably figured out from my above comments, the solution here isn't really working.
Adding these new APIs, more or less requires us to manage reference counting on the genpd for the device.
Hmm, I am not able to understand why would we need that with this series :(
I haven't thought about that in great detail, just that we of course need to be careful if we want to introduce something like that, to make sure all aspect of locking becomes correct.
Moreover adding reference counting touches the idea of adding APIs to genpd, to allow users/drivers to control their genpd explicitly. This is required to support > 1 pm_domain per device. I assume you have been following that discussion for genpd on linux-pm as well. My point is that, we should consider that while inventing *any* new APIs.
Not very deeply but yeah I have seen that thread.
So, I couldn't find way to get the locking thing fixed to avoid deadlocks but we can live with a constraint (if you guys think it is fine) to have this solved.
Constraint: Update performance state API wouldn't support power domains that don't provide a set_performance_state() callback and have multiple parent power domains.
No thanks.
If we are going to do this, let's do it properly. To me some minor constraints would be okay, but this is just too much.
Why: In order to guarantee that we read and update the performance_state of subdomains and devices properly, we need to do that under the parent domain's lock. And if we have multiple parent power domains, then we would be required to get them locked at once before updating subdomain's state and that would be a nightmare.
It's not a nightmare, just a tricky thing to solve. :-)
[...]
Kind regards Uffe
On 08-06-17, 09:48, Ulf Hansson wrote:
It's not a nightmare, just a tricky thing to solve. :-)
I may have just solved it actually :)
So the real locking problem was the case where a subdomain have multiple parent domains and how do we access its performance_state field from all the paths that originate from the parent's update and from the subdomains own path. And a single performance_state field isn't going to help in that as multiple locks are involved here. I have added another per parent-domain field and that will help get the locking quite simple. Here is the new patch (compile tested):
--- drivers/base/power/domain.c | 193 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 22 +++++ 2 files changed, 215 insertions(+)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 71c95ad808d5..40815974287f 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -466,6 +466,187 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, return NOTIFY_DONE; }
+/* + * Returns true if anyone in genpd's parent hierarchy has + * set_performance_state() set. + */ +static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd) +{ + struct gpd_link *link; + + if (genpd->set_performance_state) + return true; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + if (genpd_has_set_performance_state(link->master)) + return true; + } + + return false; +} + +/** + * pm_genpd_has_performance_state - Checks if power domain does performance + * state management. + * + * @dev: Device whose power domain is getting inquired. + */ +bool pm_genpd_has_performance_state(struct device *dev) +{ + struct generic_pm_domain *genpd = dev_to_genpd(dev); + + /* The parent domain must have set get_performance_state() */ + if (!IS_ERR(genpd) && genpd->get_performance_state) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state); + +/* + * Re-evaluate performance state of a power domain. Finds the highest requested + * performance state by the devices and subdomains within the power domain and + * then tries to change its performance state. If the power domain doesn't have + * a set_performance_state() callback, then we move the request to its parent + * power domain. + * + * Locking: Access (or update) to device's "pd_data->performance_state" field + * happens only with parent domain's lock held. Subdomains have their + * "genpd->performance_state" protected with their own lock (and they are the + * only user of this field) and their per-parent-domain + * "link->performance_state" field is protected with individual parent power + * domain's lock and is only accessed/updated with that lock held. + */ +static int genpd_update_performance_state(struct generic_pm_domain *genpd, + int depth) +{ + struct generic_pm_domain_data *pd_data; + struct generic_pm_domain *master; + struct pm_domain_data *pdd; + unsigned int state = 0, prev; + struct gpd_link *link; + int ret; + + /* Traverse all devices within the domain */ + list_for_each_entry(pdd, &genpd->dev_list, list_node) { + pd_data = to_gpd_data(pdd); + + if (pd_data->performance_state > state) + state = pd_data->performance_state; + } + + /* Traverse all subdomains within the domain */ + list_for_each_entry(link, &genpd->master_links, master_node) { + if (link->performance_state > state) + state = link->performance_state; + } + + if (genpd->performance_state == state) + return 0; + + /* + * Not all domains support updating performance state. Move on to their + * parent domains in that case. + */ + if (genpd->set_performance_state) { + ret = genpd->set_performance_state(genpd, state); + if (!ret) + genpd->performance_state = state; + + return ret; + } + + prev = genpd->performance_state; + + list_for_each_entry(link, &genpd->slave_links, slave_node) { + master = link->master; + + genpd_lock_nested(master, depth + 1); + + link->performance_state = state; + ret = genpd_update_performance_state(master, depth + 1); + if (ret) + link->performance_state = prev; + + genpd_unlock(master); + + if (ret) + goto err; + } + + /* + * All the parent domains are updated now, lets get genpd + * performance_state in sync with those. + */ + genpd->performance_state = state; + return 0; + +err: + list_for_each_entry_continue_reverse(link, &genpd->slave_links, + slave_node) { + master = link->master; + + genpd_lock_nested(master, depth + 1); + link->performance_state = prev; + if (genpd_update_performance_state(master, depth + 1)) + pr_err("%s: Failed to roll back to %d performance state\n", + genpd->name, prev); + genpd_unlock(master); + } + + return ret; +} + +/** + * pm_genpd_update_performance_state - Update performance state of parent power + * domain for the target frequency for the device. + * + * @dev: Device for which the performance-state needs to be adjusted. + * @rate: Device's next frequency. + */ +int pm_genpd_update_performance_state(struct device *dev, unsigned long rate) +{ + struct generic_pm_domain *genpd = ERR_PTR(-ENODATA); + struct generic_pm_domain_data *gpd_data; + int ret, prev; + + spin_lock_irq(&dev->power.lock); + if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { + gpd_data = to_gpd_data(dev->power.subsys_data->domain_data); + genpd = dev_to_genpd(dev); + } + + if (IS_ERR(genpd)) { + ret = -ENODEV; + goto unlock; + } + + if (!genpd->get_performance_state) { + dev_err(dev, "Performance states aren't supported by power domain\n"); + ret = -EINVAL; + goto unlock; + } + + genpd_lock(genpd); + ret = genpd->get_performance_state(dev, rate); + if (ret < 0) + goto unlock_genpd; + + prev = gpd_data->performance_state; + gpd_data->performance_state = ret; + ret = genpd_update_performance_state(genpd, 0); + if (ret) + gpd_data->performance_state = prev; + +unlock_genpd: + genpd_unlock(genpd); +unlock: + spin_unlock_irq(&dev->power.lock); + + return ret; +} +EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state); + /** * genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0. * @work: Work structure used for scheduling the execution of this function. @@ -1156,6 +1337,7 @@ static struct generic_pm_domain_data *genpd_alloc_dev_data(struct device *dev, gpd_data->td.constraint_changed = true; gpd_data->td.effective_constraint_ns = -1; gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; + gpd_data->performance_state = 0;
spin_lock_irq(&dev->power.lock);
@@ -1502,6 +1684,17 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->dev_ops.start = pm_clk_resume; }
+ /* + * If this genpd supports getting performance state, then someone in its + * hierarchy must support setting it too. + */ + if (genpd->get_performance_state && + !genpd_has_set_performance_state(genpd)) { + pr_err("%s: genpd doesn't support updating performance state\n", + genpd->name); + return -ENODEV; + } + /* Always-on domains must be powered on at initialization. */ if (genpd_is_always_on(genpd) && !genpd_status_on(genpd)) return -EINVAL; diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index b7803a251044..bf90177208a2 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -63,8 +63,12 @@ struct generic_pm_domain { unsigned int device_count; /* Number of devices */ unsigned int suspended_count; /* System suspend device counter */ unsigned int prepared_count; /* Suspend counter of prepared devices */ + unsigned int performance_state; /* Max requested performance state */ int (*power_off)(struct generic_pm_domain *domain); int (*power_on)(struct generic_pm_domain *domain); + int (*get_performance_state)(struct device *dev, unsigned long rate); + int (*set_performance_state)(struct generic_pm_domain *domain, + unsigned int state); struct gpd_dev_ops dev_ops; s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ bool max_off_time_changed; @@ -99,6 +103,9 @@ struct gpd_link { struct list_head master_node; struct generic_pm_domain *slave; struct list_head slave_node; + + /* Sub-domain's per-parent domain performance state */ + unsigned int performance_state; };
struct gpd_timing_data { @@ -118,6 +125,7 @@ struct generic_pm_domain_data { struct pm_domain_data base; struct gpd_timing_data td; struct notifier_block nb; + unsigned int performance_state; void *data; };
@@ -148,6 +156,9 @@ extern int pm_genpd_remove(struct generic_pm_domain *genpd);
extern struct dev_power_governor simple_qos_governor; extern struct dev_power_governor pm_domain_always_on_gov; +extern bool pm_genpd_has_performance_state(struct device *dev); +extern int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate); #else
static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev) @@ -185,6 +196,17 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd) return -ENOTSUPP; }
+static inline bool pm_genpd_has_performance_state(struct device *dev) +{ + return false; +} + +static inline int pm_genpd_update_performance_state(struct device *dev, + unsigned long rate) +{ + return -ENOTSUPP; +} + #define simple_qos_governor (*(struct dev_power_governor *)(NULL)) #define pm_domain_always_on_gov (*(struct dev_power_governor *)(NULL)) #endif
On 8 June 2017 at 11:42, Viresh Kumar viresh.kumar@linaro.org wrote:
On 08-06-17, 09:48, Ulf Hansson wrote:
It's not a nightmare, just a tricky thing to solve. :-)
I may have just solved it actually :)
That was quick. :-)
So the real locking problem was the case where a subdomain have multiple parent domains and how do we access its performance_state field from all the paths that originate from the parent's update and from the subdomains own path. And a single performance_state field isn't going to help in that as multiple locks are involved here. I have added another per parent-domain field and that will help get the locking quite simple. Here is the new patch (compile tested):
Obviously you didn't think about this long enough. Please spare me from having to review something of this complexity just being compile tested. I don't have unlimited bandwidth. :-)
I recommend at least some functional tests run together with lockdep tests.
drivers/base/power/domain.c | 193 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_domain.h | 22 +++++ 2 files changed, 215 insertions(+)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 71c95ad808d5..40815974287f 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -466,6 +466,187 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, return NOTIFY_DONE; }
+/*
- Returns true if anyone in genpd's parent hierarchy has
- set_performance_state() set.
- */
+static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd) +{
struct gpd_link *link;
if (genpd->set_performance_state)
return true;
list_for_each_entry(link, &genpd->slave_links, slave_node) {
if (genpd_has_set_performance_state(link->master))
return true;
}
return false;
+}
+/**
- pm_genpd_has_performance_state - Checks if power domain does performance
- state management.
- @dev: Device whose power domain is getting inquired.
- */
+bool pm_genpd_has_performance_state(struct device *dev) +{
struct generic_pm_domain *genpd = dev_to_genpd(dev);
/* The parent domain must have set get_performance_state() */
if (!IS_ERR(genpd) && genpd->get_performance_state)
return true;
return false;
+} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);
I think you didn't read my earlier comments well enough.
Who is going to call this? What prevents the genpd object from being removed in the middle of when you are operating on the genpd pointer? How can you even be sure the pm_domain pointer is a valid genpd object?
[...]
Kind regards Uffe
On 09-06-17, 13:00, Ulf Hansson wrote:
On 8 June 2017 at 11:42, Viresh Kumar viresh.kumar@linaro.org wrote:
On 08-06-17, 09:48, Ulf Hansson wrote:
It's not a nightmare, just a tricky thing to solve. :-)
I may have just solved it actually :)
That was quick. :-)
:)
So the real locking problem was the case where a subdomain have multiple parent domains and how do we access its performance_state field from all the paths that originate from the parent's update and from the subdomains own path. And a single performance_state field isn't going to help in that as multiple locks are involved here. I have added another per parent-domain field and that will help get the locking quite simple. Here is the new patch (compile tested):
Obviously you didn't think about this long enough.
That's a fair point of view of yours, but something similar to what I have done in this patch was running around in my head for a couple of days and so I went implementing it really quickly.
Please spare me from having to review something of this complexity just being compile tested. I don't have unlimited bandwidth. :-)
I agree, my aim wasn't to waste your time. I just wanted to get some sort of Ack on the idea of re-using the per-parent link structure for storing performance state and so went ahead just with compile tests.
I recommend at least some functional tests run together with lockdep tests.
I have done that now (with lockdep) for multiple cases:
- Device with a parent genpd which supports setting performance state. - Device with parent genpd which doesn't allow setting performance state, but the parent domains of the genpd allow that. This focusses on the complex part of this patch where the update is propagated to parents of a sub-domain.
And they are all working fine now.
+/**
- pm_genpd_has_performance_state - Checks if power domain does performance
- state management.
- @dev: Device whose power domain is getting inquired.
- */
+bool pm_genpd_has_performance_state(struct device *dev) +{
struct generic_pm_domain *genpd = dev_to_genpd(dev);
/* The parent domain must have set get_performance_state() */
if (!IS_ERR(genpd) && genpd->get_performance_state)
return true;
return false;
+} +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);
I think you didn't read my earlier comments well enough.
Actually I did and asked you a couple of questions in my previous email, but as you haven't replied to them I thought you are fine with my arguments.
Anyway, we had some chat regarding this yesterday on IRC and I have understood what the problems can be here and so I have used your suggestions here now.
Who is going to call this?
Some user driver. For devices that support frequency scaling, the OPP core (if used by device) will call it.
For cases where there is no frequency scaling, the driver may call it directly.
What prevents the genpd object from being removed in the middle of when you are operating on the genpd pointer?
So there are three cases we have and they kind of follow the sort of assumptions already made in genpd.
1). Devices which have drivers for them, like MMC, etc. In such cases the genpd is attached to the device from the bus layer (platform/amba/etc) and so until the time driver is around, we are guaranteed to have the genpd structure. And because the driver creates/removes OPP tables, we will be fine here.
2). Devices for which we don't have drivers (at least for ARM platforms), like CPU. Here the platform specific power domain driver would be attaching the genpd with the devices instead of bus cores and so the domain driver has to guarantee that the genpd doesn't go away until the time some user driver (like cpufreq-dt.c) is using the device. This can be controlled by creating the cpufreq virtual device (which we create from cpufreq-dt-platdev.c today) from the domain driver itself.
So we perhaps don't need any refcount stuff for now at lest.
How can you even be sure the pm_domain pointer is a valid genpd object?
I understood this comment yesterday only and now I have used genpd_lookup_dev() instead of dev_to_genpd() here. The new patch with some minor updates from the last version is pasted below.
I have also asked Rajendra Nayak to provide some sample code for actual platform along with platform specific power domain driver. He should be providing that to me sometime this week. I may post the entire series again then along with his code.
The gendpd framework now provides an API to request device's power domain to update its performance state based on a particular target frequency for the device.
Use that interface from the OPP core for devices whose power domains support performance states.
Note that the current implementation is restricted to the case where the device doesn't have separate regulators for itself. We shouldn't over engineer the code before we have real use case for them. We can always come back and add more code to support such cases.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/base/power/opp/core.c | 48 ++++++++++++++++++++++++++++++++++++++++++- drivers/base/power/opp/opp.h | 2 ++ 2 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index 898f19ea0f60..3c1036f638f6 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -19,6 +19,7 @@ #include <linux/slab.h> #include <linux/device.h> #include <linux/export.h> +#include <linux/pm_domain.h> #include <linux/regulator/consumer.h>
#include "opp.h" @@ -535,6 +536,42 @@ _generic_set_opp_clk_only(struct device *dev, struct clk *clk, return ret; }
+static inline int +_generic_set_opp_domain(struct device *dev, struct clk *clk, + unsigned long old_freq, unsigned long freq) +{ + int ret; + + /* Scaling up? Scale domain performance state before frequency */ + if (freq > old_freq) { + ret = pm_genpd_update_performance_state(dev, freq); + if (ret) + return ret; + } + + ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); + if (ret) + goto restore_domain_state; + + /* Scaling down? Scale domain performance state after frequency */ + if (freq < old_freq) { + ret = pm_genpd_update_performance_state(dev, freq); + if (ret) + goto restore_freq; + } + + return 0; + +restore_freq: + if (_generic_set_opp_clk_only(dev, clk, freq, old_freq)) + dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", + __func__, old_freq); +restore_domain_state: + pm_genpd_update_performance_state(dev, old_freq); + + return ret; +} + static int _generic_set_opp_regulator(struct opp_table *opp_table, struct device *dev, unsigned long old_freq, @@ -653,7 +690,14 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
/* Only frequency scaling */ if (!opp_table->regulators) { - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); + /* + * We don't support devices with both regulator and + * domain performance-state for now. + */ + if (opp_table->genpd_performance_state) + ret = _generic_set_opp_domain(dev, clk, old_freq, freq); + else + ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); } else if (!opp_table->set_opp) { ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, IS_ERR(old_opp) ? NULL : old_opp->supplies, @@ -755,6 +799,8 @@ static struct opp_table *_allocate_opp_table(struct device *dev) ret); }
+ opp_table->genpd_performance_state = pm_genpd_has_performance_state(dev); + BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head); INIT_LIST_HEAD(&opp_table->opp_list); mutex_init(&opp_table->lock); diff --git a/drivers/base/power/opp/opp.h b/drivers/base/power/opp/opp.h index 166eef990599..1efa253e1934 100644 --- a/drivers/base/power/opp/opp.h +++ b/drivers/base/power/opp/opp.h @@ -135,6 +135,7 @@ enum opp_table_access { * @clk: Device's clock handle * @regulators: Supply regulators * @regulator_count: Number of power supply regulators + * @genpd_performance_state: Device's power domain support performance state. * @set_opp: Platform specific set_opp callback * @set_opp_data: Data to be passed to set_opp callback * @dentry: debugfs dentry pointer of the real device directory (not links). @@ -170,6 +171,7 @@ struct opp_table { struct clk *clk; struct regulator **regulators; unsigned int regulator_count; + bool genpd_performance_state;
int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_set_opp_data *set_opp_data;
linaro-kernel@lists.linaro.org