In _set_opp() we are normally bailing out when trying to set an OPP that is the current one. This make perfect sense, but becomes a problem when _set_required_opps() calls it recursively.
More precisely, when a required OPP is being shared by multiple PM domains, we end up skipping to request the corresponding performance-state for all of the PM domains, but the first one. Let's fix the problem, by calling _set_opp_level() from _set_required_opps() instead.
Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org --- drivers/opp/core.c | 47 +++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index cb4611fe1b5b..45eca65f27f9 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1061,6 +1061,28 @@ static int _set_opp_bw(const struct opp_table *opp_table, return 0; }
+static int _set_opp_level(struct device *dev, struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + unsigned int level = 0; + int ret = 0; + + if (opp) { + if (opp->level == OPP_LEVEL_UNSET) + return 0; + + level = opp->level; + } + + /* Request a new performance state through the device's PM domain. */ + ret = dev_pm_domain_set_performance_state(dev, level); + if (ret) + dev_err(dev, "Failed to set performance state %u (%d)\n", level, + ret); + + return ret; +} + /* This is only called for PM domain for now */ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, struct dev_pm_opp *opp, bool up) @@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
- ret = dev_pm_opp_set_opp(devs[index], required_opp); + ret = _set_opp_level(devs[index], opp_table, + required_opp); if (ret) return ret; } @@ -1102,28 +1125,6 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, return 0; }
-static int _set_opp_level(struct device *dev, struct opp_table *opp_table, - struct dev_pm_opp *opp) -{ - unsigned int level = 0; - int ret = 0; - - if (opp) { - if (opp->level == OPP_LEVEL_UNSET) - return 0; - - level = opp->level; - } - - /* Request a new performance state through the device's PM domain. */ - ret = dev_pm_domain_set_performance_state(dev, level); - if (ret) - dev_err(dev, "Failed to set performance state %u (%d)\n", level, - ret); - - return ret; -} - static void _find_current_opp(struct device *dev, struct opp_table *opp_table) { struct dev_pm_opp *opp = ERR_PTR(-ENODEV);
On 18-06-24, 17:50, Ulf Hansson wrote:
In _set_opp() we are normally bailing out when trying to set an OPP that is the current one. This make perfect sense, but becomes a problem when _set_required_opps() calls it recursively.
More precisely, when a required OPP is being shared by multiple PM domains, we end up skipping to request the corresponding performance-state for all of the PM domains, but the first one. Let's fix the problem, by calling _set_opp_level() from _set_required_opps() instead.
Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org
drivers/opp/core.c | 47 +++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-)
/* This is only called for PM domain for now */ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, struct dev_pm_opp *opp, bool up) @@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
ret = dev_pm_opp_set_opp(devs[index], required_opp);
ret = _set_opp_level(devs[index], opp_table,
required_opp);
No, we won't be doing this I guess. Its like going back instead of moving forward :)
The required OPPs is not just a performance domain thing, but specially with devs[] here, it can be used to set OPP for any device type and so dev_pm_opp_set_opp() is the right call here.
Coming back to the problem you are pointing to, I am not very clear of the whole picture here. Can you please help me get some details on that ?
From what I understand, you have a device which has multiple power domains. Now all these power domains share the same OPP table in the device tree (i.e. to avoid duplication of tables only). Is that right ?
Even if in DT we have the same OPP table for all the domains, the OPP core will have separate OPP tables structures (as the domains aren't connected). And these OPP tables will have their own `current_opp` fields and so we shouldn't really bail out earlier.
Maybe there is a bug somewhere that is causing it. Maybe I can look at the DT to find the issue ? (Hint: The OPP table shouldn't have the `shared` flag set).
Maybe I completely misunderstood the whole thing :)
On Tue, 25 Jun 2024 at 12:54, Viresh Kumar viresh.kumar@linaro.org wrote:
On 18-06-24, 17:50, Ulf Hansson wrote:
In _set_opp() we are normally bailing out when trying to set an OPP that is the current one. This make perfect sense, but becomes a problem when _set_required_opps() calls it recursively.
More precisely, when a required OPP is being shared by multiple PM domains, we end up skipping to request the corresponding performance-state for all of the PM domains, but the first one. Let's fix the problem, by calling _set_opp_level() from _set_required_opps() instead.
Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org
drivers/opp/core.c | 47 +++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-)
/* This is only called for PM domain for now */ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, struct dev_pm_opp *opp, bool up) @@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
ret = dev_pm_opp_set_opp(devs[index], required_opp);
ret = _set_opp_level(devs[index], opp_table,
required_opp);
No, we won't be doing this I guess. Its like going back instead of moving forward :)
The required OPPs is not just a performance domain thing, but specially with devs[] here, it can be used to set OPP for any device type and so dev_pm_opp_set_opp() is the right call here.
Coming back to the problem you are pointing to, I am not very clear of the whole picture here. Can you please help me get some details on that ?
I get your point, but I am not sure I agree with it.
For the required-opps, the only existing use case is power/perf domains with performance-states, so why make the code more complicated than it needs to be?
From what I understand, you have a device which has multiple power domains. Now all these power domains share the same OPP table in the device tree (i.e. to avoid duplication of tables only). Is that right ?
No, that's not correct. Let me try to elaborate on my setup, which is very similar to a use case on a Tegra platform.
...
pd_perf0: pd-perf0 { #power-domain-cells = <0>; operating-points-v2 = <&opp_table_pd_perf0>; };
//Note: no opp-table pd_power4: pd-power4 { #power-domain-cells = <0>; power-domains = <&pd_perf0>; };
//Note: no opp-table pd_power5: pd-power5 { #power-domain-cells = <0>; power-domains = <&pd_perf0>; };
//Note: The opp_table_pm_test10 are having required-opps pointing to pd_perf0's opp-table. pm_test10 { ... power-domains = <&pd_power4>, <&pd_power5>; power-domain-names = "perf4", "perf5"; operating-points-v2 = <&opp_table_pm_test10>; };
...
Even if in DT we have the same OPP table for all the domains, the OPP core will have separate OPP tables structures (as the domains aren't connected). And these OPP tables will have their own `current_opp` fields and so we shouldn't really bail out earlier.
In the use case above, we end up never voting on pd_power5.
Maybe there is a bug somewhere that is causing it. Maybe I can look at the DT to find the issue ? (Hint: The OPP table shouldn't have the `shared` flag set).
Maybe I completely misunderstood the whole thing :)
The DT parsing of the required-opps is already complicated and there seems to be endless new corner-cases showing up. Maybe we can fix this too, but perhaps we should simply take a step back and go for simplifications instead?
Kind regards Uffe
On 29-06-24, 11:09, Ulf Hansson wrote:
I get your point, but I am not sure I agree with it.
For the required-opps, the only existing use case is power/perf domains with performance-states, so why make the code more complicated than it needs to be?
That is a fair argument generally, i.e. keep things as simple as we can, but this is a bit different. We are talking about setting the (required) OPP for a device (parent genpd) here and it should follow the full path.
Even in case of genpds we may want to configure more properties and not just vote, like bandwidth, regulator, clk, etc. And so I would really like to set the OPP in a standard way, no matter what.
No, that's not correct. Let me try to elaborate on my setup, which is very similar to a use case on a Tegra platform.
Thanks, I wasn't thinking about this setup earlier.
pd_perf0: pd-perf0 { #power-domain-cells = <0>; operating-points-v2 = <&opp_table_pd_perf0>; };
//Note: no opp-table pd_power4: pd-power4 { #power-domain-cells = <0>; power-domains = <&pd_perf0>; };
//Note: no opp-table pd_power5: pd-power5 { #power-domain-cells = <0>; power-domains = <&pd_perf0>; };
//Note: The opp_table_pm_test10 are having required-opps pointing to pd_perf0's opp-table. pm_test10 { ... power-domains = <&pd_power4>, <&pd_power5>; power-domain-names = "perf4", "perf5"; operating-points-v2 = <&opp_table_pm_test10>; };
In the use case above, we end up never voting on pd_power5.
The DT parsing of the required-opps is already complicated and there seems to be endless new corner-cases showing up. Maybe we can fix this too, but perhaps we should simply take a step back and go for simplifications instead?
I truly believe that keeping a standard way of updating OPPs is the right way to go and that will only prevent complicated corner cases coming later on.
What about this patch instead ?
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 5f4598246a87..2086292f8355 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
- ret = dev_pm_opp_set_opp(devs[index], required_opp); + /* Set required OPPs forcefully */ + ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true); if (ret) return ret; } @@ -1365,17 +1366,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) } EXPORT_SYMBOL_GPL(dev_pm_opp_set_rate);
-/** - * dev_pm_opp_set_opp() - Configure device for OPP - * @dev: device for which we do this operation - * @opp: OPP to set to - * - * This configures the device based on the properties of the OPP passed to this - * routine. - * - * Return: 0 on success, a negative error number otherwise. - */ -int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) +static int dev_pm_opp_set_opp_forced(struct device *dev, struct dev_pm_opp *opp, + bool forced) { struct opp_table *opp_table; int ret; @@ -1386,11 +1378,25 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) return PTR_ERR(opp_table); }
- ret = _set_opp(dev, opp_table, opp, NULL, false); + ret = _set_opp(dev, opp_table, opp, NULL, forced); dev_pm_opp_put_opp_table(opp_table);
return ret; } +/** + * dev_pm_opp_set_opp() - Configure device for OPP + * @dev: device for which we do this operation + * @opp: OPP to set to + * + * This configures the device based on the properties of the OPP passed to this + * routine. + * + * Return: 0 on success, a negative error number otherwise. + */ +int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) +{ + return dev_pm_opp_set_opp_forced(dev, opp, false); +} EXPORT_SYMBOL_GPL(dev_pm_opp_set_opp);
/* OPP-dev Helpers */
On 01-07-24, 17:17, Viresh Kumar wrote:
What about this patch instead ?
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 5f4598246a87..2086292f8355 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
ret = dev_pm_opp_set_opp(devs[index], required_opp);
/* Set required OPPs forcefully */
ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true);
Maybe better to do just this instead:
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 5f4598246a87..9484acbeaa66 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1386,7 +1386,12 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) return PTR_ERR(opp_table); }
- ret = _set_opp(dev, opp_table, opp, NULL, false); + /* + * For a genpd's OPP table, we always want to set the OPP (and + * performance level) and let the genpd core take care of aggregating + * the votes. Set `forced` to true for a genpd here. + */ + ret = _set_opp(dev, opp_table, opp, NULL, opp_table->is_genpd); dev_pm_opp_put_opp_table(opp_table);
return ret;
On Tue, 2 Jul 2024 at 07:15, Viresh Kumar viresh.kumar@linaro.org wrote:
On 01-07-24, 17:17, Viresh Kumar wrote:
What about this patch instead ?
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 5f4598246a87..2086292f8355 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table, if (devs[index]) { required_opp = opp ? opp->required_opps[index] : NULL;
ret = dev_pm_opp_set_opp(devs[index], required_opp);
/* Set required OPPs forcefully */
ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true);
Maybe better to do just this instead:
diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 5f4598246a87..9484acbeaa66 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1386,7 +1386,12 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp) return PTR_ERR(opp_table); }
ret = _set_opp(dev, opp_table, opp, NULL, false);
/*
* For a genpd's OPP table, we always want to set the OPP (and
* performance level) and let the genpd core take care of aggregating
* the votes. Set `forced` to true for a genpd here.
*/
ret = _set_opp(dev, opp_table, opp, NULL, opp_table->is_genpd); dev_pm_opp_put_opp_table(opp_table);
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
Another option is to let _set_opp() check "opp_table->is_genpd" and enforce the opp to be set in that case. Whatever you prefer, I can re-spin the patch.
Kind regards Uffe
On 10-07-24, 15:51, Ulf Hansson wrote:
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
We don't go to that path for genpd's I recall. Do we ? For genpd's, since there is no freq, we always call _set_opp().
On Thu, 11 Jul 2024 at 05:13, Viresh Kumar viresh.kumar@linaro.org wrote:
On 10-07-24, 15:51, Ulf Hansson wrote:
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
We don't go to that path for genpd's I recall. Do we ? For genpd's, since there is no freq, we always call _set_opp().
You are right! Although, maybe it's still preferred to do it in _set_opp() as it looks like the code would be more consistent? No?
Kind regards Uffe
On Thu, 11 Jul 2024 at 12:31, Ulf Hansson ulf.hansson@linaro.org wrote:
On Thu, 11 Jul 2024 at 05:13, Viresh Kumar viresh.kumar@linaro.org wrote:
On 10-07-24, 15:51, Ulf Hansson wrote:
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
We don't go to that path for genpd's I recall. Do we ? For genpd's, since there is no freq, we always call _set_opp().
You are right! Although, maybe it's still preferred to do it in _set_opp() as it looks like the code would be more consistent? No?
No matter how we do this, we end up enforcing OPPs for genpds.
It means that we may be requesting the same performance-level that we have already requested for the device. Of course genpd manages this, but it just seems a bit in-efficient to mee. Or maybe this isn't a big deal as consumer drivers should end up doing this anyway?
Kind regards Uffe
On 11-07-24, 13:05, Ulf Hansson wrote:
On Thu, 11 Jul 2024 at 12:31, Ulf Hansson ulf.hansson@linaro.org wrote:
On Thu, 11 Jul 2024 at 05:13, Viresh Kumar viresh.kumar@linaro.org wrote:
On 10-07-24, 15:51, Ulf Hansson wrote:
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
We don't go to that path for genpd's I recall. Do we ? For genpd's, since there is no freq, we always call _set_opp().
You are right! Although, maybe it's still preferred to do it in _set_opp() as it looks like the code would be more consistent? No?
Since the function already accepted a flag, it was very easier to just reuse it without.
No matter how we do this, we end up enforcing OPPs for genpds.
It means that we may be requesting the same performance-level that we have already requested for the device. Of course genpd manages this, but it just seems a bit in-efficient to mee. Or maybe this isn't a big deal as consumer drivers should end up doing this anyway?
Normally I won't expect a consumer driver to do this check and so was the opp core handling that. But for genpd's we need to make this inefficient to not miss a vote.
The problem is at another level though. Normally for any other device, like CPU, there is one vote for the entire range of devices supported by the OPP table. For example all CPUs of a cluster will share an OPP table (and they do dvfs together), and you call set_opp() for any of the CPU, we will go and change the OPP. There is no per-device vote.
This whole design is broken in case of genpd, since you are expecting a separate vote per device. Ideally, each device should have had its own copy of the OPP table, but it is messy in case of genpd and to make it all work nicely, we may have to choose this inefficient way of doing it :(
On Thu, 11 Jul 2024 at 15:16, Viresh Kumar viresh.kumar@linaro.org wrote:
On 11-07-24, 13:05, Ulf Hansson wrote:
On Thu, 11 Jul 2024 at 12:31, Ulf Hansson ulf.hansson@linaro.org wrote:
On Thu, 11 Jul 2024 at 05:13, Viresh Kumar viresh.kumar@linaro.org wrote:
On 10-07-24, 15:51, Ulf Hansson wrote:
I think this should work, but in this case we seem to need a similar thing for dev_pm_opp_set_rate().
We don't go to that path for genpd's I recall. Do we ? For genpd's, since there is no freq, we always call _set_opp().
You are right! Although, maybe it's still preferred to do it in _set_opp() as it looks like the code would be more consistent? No?
Since the function already accepted a flag, it was very easier to just reuse it without.
No matter how we do this, we end up enforcing OPPs for genpds.
It means that we may be requesting the same performance-level that we have already requested for the device. Of course genpd manages this, but it just seems a bit in-efficient to mee. Or maybe this isn't a big deal as consumer drivers should end up doing this anyway?
Normally I won't expect a consumer driver to do this check and so was the opp core handling that. But for genpd's we need to make this inefficient to not miss a vote.
The problem is at another level though. Normally for any other device, like CPU, there is one vote for the entire range of devices supported by the OPP table. For example all CPUs of a cluster will share an OPP table (and they do dvfs together), and you call set_opp() for any of the CPU, we will go and change the OPP. There is no per-device vote.
This whole design is broken in case of genpd, since you are expecting a separate vote per device. Ideally, each device should have had its own copy of the OPP table, but it is messy in case of genpd and to make it all work nicely, we may have to choose this inefficient way of doing it :(
Right, I get your point.
Although, it seems to me that just limiting required-opps to performance-levels, could avoid us from having to enforce the OPPs for genpd. In other words, doing something along the lines of $subject patch should work fine.
In fact, it looks to me that the required-opps handling for the *single* PM domain case, is already limited to solely performance-levels (opp-level), as there are no required_devs being assigned for it. Or did I get that wrong?
Kind regards Uffe
On 11-07-24, 17:25, Ulf Hansson wrote:
Right, I get your point.
Although, it seems to me that just limiting required-opps to performance-levels, could avoid us from having to enforce the OPPs for genpd. In other words, doing something along the lines of $subject patch should work fine.
I really don't want to design the code that way. Required OPPs don't have anything to do with a genpd. Genpd is just one of the possible use cases and I would like the code to reflect it, even if we don't have any other users for this kind of stuff for now, but we surely can. Just that those problems are solved differently for now. For example, cache DVFS along with CPUs, etc.
And as I said earlier, it is entirely possible that the genpd OPP table wants to configure few more things apart from just level, and hence a full fledged set-opp is a better design choice.
In fact, it looks to me that the required-opps handling for the *single* PM domain case, is already limited to solely performance-levels (opp-level), as there are no required_devs being assigned for it. Or did I get that wrong?
That's why the API for setting required-opps was introduced, to make it a central point of entry for all use cases where we want to attach a device.
On Thu, 18 Jul 2024 at 05:06, Viresh Kumar viresh.kumar@linaro.org wrote:
On 11-07-24, 17:25, Ulf Hansson wrote:
Right, I get your point.
Although, it seems to me that just limiting required-opps to performance-levels, could avoid us from having to enforce the OPPs for genpd. In other words, doing something along the lines of $subject patch should work fine.
I really don't want to design the code that way. Required OPPs don't have anything to do with a genpd. Genpd is just one of the possible use cases and I would like the code to reflect it, even if we don't have any other users for this kind of stuff for now, but we surely can. Just that those problems are solved differently for now. For example, cache DVFS along with CPUs, etc.
And as I said earlier, it is entirely possible that the genpd OPP table wants to configure few more things apart from just level, and hence a full fledged set-opp is a better design choice.
I understand your point, but we don't need to call dev_pm_opp_set_opp() from _set_required_opps() to accomplish this. In fact, I have realized that calling dev_pm_opp_set_opp() from there doesn't work the way we expected.
More precisely, at each recursive call to dev_pm_opp_set_opp() we are changing the OPP for a genpd's OPP table for a device that has been attached to it. The problem with this, is that we may have several devices being attached to the same genpd, thus sharing the same OPP-table that is being used for their required OPPs. So, we may have several active requests for different OPPs for a genpd's OPP table simultaneously. It seems wrong from the OPP library point of view. To me, it's would be better to leave any kind of aggregation to be managed by genpd itself.
For the reason explained above, it still looks correct to call _set_opp_level() from _set_required_opps(), as $subject patch proposes (but clarifications of why is certainly needed in the commit message). Moreover, when/if we see a need for additonal resourses but the level, to be managed through a genpd's OPP table, we can extend _set_required_opps() to cover that too.
In fact, it looks to me that the required-opps handling for the *single* PM domain case, is already limited to solely performance-levels (opp-level), as there are no required_devs being assigned for it. Or did I get that wrong?
That's why the API for setting required-opps was introduced, to make it a central point of entry for all use cases where we want to attach a device.
The API as such isn't the problem, but rather that we are recursivly calling dev_pm_opp_set_opp() for the required-devs.
In the single PM domain case, this would simply not work, as there is not a separate virtual device we can assign to the required-dev to.
Kind regards Uffe
On 18-07-24, 12:38, Ulf Hansson wrote:
I understand your point, but we don't need to call dev_pm_opp_set_opp() from _set_required_opps() to accomplish this.
I _strongly_ feel that the OPP core should be doing what other frameworks, like clk, regulator, genpd, are doing in this case. Call recursively.
In fact, I have realized that calling dev_pm_opp_set_opp() from there doesn't work the way we expected.
More precisely, at each recursive call to dev_pm_opp_set_opp() we are changing the OPP for a genpd's OPP table for a device that has been attached to it. The problem with this, is that we may have several devices being attached to the same genpd, thus sharing the same OPP-table that is being used for their required OPPs. So, we may have several active requests for different OPPs for a genpd's OPP table simultaneously. It seems wrong from the OPP library point of view. To me, it's would be better to leave any kind of aggregation to be managed by genpd itself.
Right. I see this problem too and that's why I said earlier that OPP core was designed for a different use case and genpd doesn't fit perfectly. Though I don't see how several calls the dev_pm_opp_set_opp() simultaneously is a problem. This can happen without recursive calling too, where simultaneous calls for genpds occur.
I think the main problem here, on how genpd doesn't fit with OPP core, is that the OPP core is trying to do some sort of aggregation generally at its level, like avoiding a change of OPP if the OPP is same. I think the right way to fix this is by not doing any aggregation at OPP core level and genpd handle it all. Which you are also aligned with I guess. This would also mean that OPP core shouldn't try configuring, clk, regulator, bandwidth, etc for a genpd. The Genpd core should handle that, else we may end up incorrectly configuring things.
I guess this is what you were trying to say as well, when you were trying to replace the recursive call with set-level only.
I think, we don't need that change but rather avoid all these extra settings from dev_pm_opp_set_opp() itself.
Also consider that genpd configuration doesn't only happen with recursive call, but can happen with a call to dev_pm_opp_set_opp() directly too for the genpd.
The API as such isn't the problem, but rather that we are recursivly calling dev_pm_opp_set_opp() for the required-devs.
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
In the single PM domain case, this would simply not work, as there is not a separate virtual device we can assign to the required-dev to.
We can assign the real device in that case, why is that a problem ?
On Thu, 25 Jul 2024 at 08:02, Viresh Kumar viresh.kumar@linaro.org wrote:
On 18-07-24, 12:38, Ulf Hansson wrote:
I understand your point, but we don't need to call dev_pm_opp_set_opp() from _set_required_opps() to accomplish this.
I _strongly_ feel that the OPP core should be doing what other frameworks, like clk, regulator, genpd, are doing in this case. Call recursively.
In fact, I have realized that calling dev_pm_opp_set_opp() from there doesn't work the way we expected.
More precisely, at each recursive call to dev_pm_opp_set_opp() we are changing the OPP for a genpd's OPP table for a device that has been attached to it. The problem with this, is that we may have several devices being attached to the same genpd, thus sharing the same OPP-table that is being used for their required OPPs. So, we may have several active requests for different OPPs for a genpd's OPP table simultaneously. It seems wrong from the OPP library point of view. To me, it's would be better to leave any kind of aggregation to be managed by genpd itself.
Right. I see this problem too and that's why I said earlier that OPP core was designed for a different use case and genpd doesn't fit perfectly. Though I don't see how several calls the dev_pm_opp_set_opp() simultaneously is a problem. This can happen without recursive calling too, where simultaneous calls for genpds occur.
Right.
The main issue in regards to the above, is that we may end up trying to vote for different devices, which votes correspond to the same OPP/OPP-table. The one that comes first will request the OPP, the other ones will be ignored as the OPP core thinks there is no reason to already set the current OPP.
I think the main problem here, on how genpd doesn't fit with OPP core, is that the OPP core is trying to do some sort of aggregation generally at its level, like avoiding a change of OPP if the OPP is same. I think the right way to fix this is by not doing any aggregation at OPP core level and genpd handle it all. Which you are also aligned with I guess. This would also mean that OPP core shouldn't try configuring, clk, regulator, bandwidth, etc for a genpd. The Genpd core should handle that, else we may end up incorrectly configuring things.
I guess this is what you were trying to say as well, when you were trying to replace the recursive call with set-level only.
Right, I think we are in agreement. Aggregation of the *performance-state* (opp-level) needs to be managed by genpd, solely.
I think, we don't need that change but rather avoid all these extra settings from dev_pm_opp_set_opp() itself.
Also consider that genpd configuration doesn't only happen with recursive call, but can happen with a call to dev_pm_opp_set_opp() directly too for the genpd.
Right.
The API as such isn't the problem, but rather that we are recursivly calling dev_pm_opp_set_opp() for the required-devs.
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
I don't quite understand what you are proposing. Do you want to add a separate path for opp-levels?
The problem with that would be that platforms (Tegra at least) are already using a combination of opp-level and clocks.
In the single PM domain case, this would simply not work, as there is not a separate virtual device we can assign to the required-dev to.
We can assign the real device in that case, why is that a problem ?
To be able to call dev_pm_opp_set_opp() on the required-dev (which would be the real device in this case), we need to add it to genpd's OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().
The problem with this, is that the real device already has its own OPP table (with the required-OPPs pointing to genpd's OPP table), which means that we would end up adding the device to two different OPP tables.
Kind regards Uffe
On 25-07-24, 11:21, Ulf Hansson wrote:
Right.
The main issue in regards to the above, is that we may end up trying to vote for different devices, which votes correspond to the same OPP/OPP-table. The one that comes first will request the OPP, the other ones will be ignored as the OPP core thinks there is no reason to already set the current OPP.
Right, but that won't happen with the diff I shared earlier where we set "forced" to true. Isn't it ?
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
I don't quite understand what you are proposing. Do you want to add a separate path for opp-levels?
Not separate paths, but ignore clk/regulator changes if the table belongs to a genpd.
The problem with that would be that platforms (Tegra at least) are already using a combination of opp-level and clocks.
If they are using both for a genpd's OPP table (and changes are made for both opp-level and clock by the OPP core), then it should already be wrong, isn't it? Two simultaneous calls to dev_pm_opp_set_opp() would set the level correctly (as aggregation happens in the genpd core), but clock setting would always reflect the second caller. This should be fixed too, isn't it ?
To be able to call dev_pm_opp_set_opp() on the required-dev (which would be the real device in this case), we need to add it to genpd's OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().
The problem with this, is that the real device already has its own OPP table (with the required-OPPs pointing to genpd's OPP table), which means that we would end up adding the device to two different OPP tables.
I was terrified for a minute after reading this and the current code, as I also thought there is an issue there. But I was confident that we used to take care of this case separately earlier. A short dive into git logs got me to this:
commit 6d366d0e5446 ("OPP: Use _set_opp_level() for single genpd case")
This should be working just fine I guess.
On Thu, 25 Jul 2024 at 13:25, Viresh Kumar viresh.kumar@linaro.org wrote:
On 25-07-24, 11:21, Ulf Hansson wrote:
Right.
The main issue in regards to the above, is that we may end up trying to vote for different devices, which votes correspond to the same OPP/OPP-table. The one that comes first will request the OPP, the other ones will be ignored as the OPP core thinks there is no reason to already set the current OPP.
Right, but that won't happen with the diff I shared earlier where we set "forced" to true. Isn't it ?
Correct.
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
I don't quite understand what you are proposing. Do you want to add a separate path for opp-levels?
Not separate paths, but ignore clk/regulator changes if the table belongs to a genpd.
The problem with that would be that platforms (Tegra at least) are already using a combination of opp-level and clocks.
If they are using both for a genpd's OPP table (and changes are made for both opp-level and clock by the OPP core), then it should already be wrong, isn't it?
They are changing the clock through the device's OPP table and the level (performance-state) via genpd's table (using required OPPs). This works fine as of today.
Two simultaneous calls to dev_pm_opp_set_opp() would set the level correctly (as aggregation happens in the genpd core), but clock setting would always reflect the second caller. This should be fixed too, isn't it ?
As I said before, I don't see a need for this. The recursive call to dev_pm_opp_set_opp() is today superfluous.
To be able to call dev_pm_opp_set_opp() on the required-dev (which would be the real device in this case), we need to add it to genpd's OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().
The problem with this, is that the real device already has its own OPP table (with the required-OPPs pointing to genpd's OPP table), which means that we would end up adding the device to two different OPP tables.
I was terrified for a minute after reading this and the current code, as I also thought there is an issue there. But I was confident that we used to take care of this case separately earlier. A short dive into git logs got me to this:
commit 6d366d0e5446 ("OPP: Use _set_opp_level() for single genpd case")
This should be working just fine I guess.
It's working today for *opp-level* only, because of the commit above. That's correct.
My point is that calling dev_pm_opp_set_opp() recursively from _set_required_opps() doesn't make sense for the single PM domain case, as we can't assign a required-dev for it. This leads to an inconsistent behaviour when managing the required-OPPs.
To make the behavior consistent (and to fix the bug), I still think it would be better to do something along what $subject patch proposes.
Kind regards Uffe
On 28-07-24, 22:05, Ulf Hansson wrote:
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
I don't quite understand what you are proposing. Do you want to add a separate path for opp-levels?
Not separate paths, but ignore clk/regulator changes if the table belongs to a genpd.
The problem with that would be that platforms (Tegra at least) are already using a combination of opp-level and clocks.
If they are using both for a genpd's OPP table (and changes are made for both opp-level and clock by the OPP core), then it should already be wrong, isn't it?
They are changing the clock through the device's OPP table and the level (performance-state) via genpd's table (using required OPPs). This works fine as of today.
There is a problem here I guess then. Lets say there are two devices A and B, that depend on a genpd.
A requests required OPP 5 (level 5, clk 1.4 GHz), followed by B requests required OPP 3 (level 3, clk 1 GHz).
After this level will be configured to 5 and clk to 1 GHz I think.
It's working today for *opp-level* only, because of the commit above. That's correct.
Good.
My point is that calling dev_pm_opp_set_opp() recursively from _set_required_opps() doesn't make sense for the single PM domain case, as we can't assign a required-dev for it. This leads to an inconsistent behaviour when managing the required-OPPs.
We won't be calling that because of the above patch. In case of a single dev, the required device isn't set and so we will never end up calling dev_pm_opp_set_opp() for a single genpd case.
On Mon, 29 Jul 2024 at 08:05, Viresh Kumar viresh.kumar@linaro.org wrote:
On 28-07-24, 22:05, Ulf Hansson wrote:
I think that design is rather correct, just like other frameworks. Just that we need to do only set-level for genpds and nothing else. That will have exactly the same behavior that you want.
I don't quite understand what you are proposing. Do you want to add a separate path for opp-levels?
Not separate paths, but ignore clk/regulator changes if the table belongs to a genpd.
The problem with that would be that platforms (Tegra at least) are already using a combination of opp-level and clocks.
If they are using both for a genpd's OPP table (and changes are made for both opp-level and clock by the OPP core), then it should already be wrong, isn't it?
They are changing the clock through the device's OPP table and the level (performance-state) via genpd's table (using required OPPs). This works fine as of today.
There is a problem here I guess then. Lets say there are two devices A and B, that depend on a genpd.
A requests required OPP 5 (level 5, clk 1.4 GHz), followed by B requests required OPP 3 (level 3, clk 1 GHz).
After this level will be configured to 5 and clk to 1 GHz I think.
The level would be 5, as the aggregated votes in genpd would be correct in this case.
In regards to the clocks, I assume this is handled correctly too, as the clocks are per device clocks that don't belong to the genpd.
It's working today for *opp-level* only, because of the commit above. That's correct.
Good.
My point is that calling dev_pm_opp_set_opp() recursively from _set_required_opps() doesn't make sense for the single PM domain case, as we can't assign a required-dev for it. This leads to an inconsistent behaviour when managing the required-OPPs.
We won't be calling that because of the above patch. In case of a single dev, the required device isn't set and so we will never end up calling dev_pm_opp_set_opp() for a single genpd case.
That's right, but why do we want to call dev_pm_opp_set_opp() for the multiple PM domain case then? It makes the behaviour inconsistent.
Kind regards Uffe
On 29-07-24, 22:30, Ulf Hansson wrote:
In regards to the clocks, I assume this is handled correctly too, as the clocks are per device clocks that don't belong to the genpd.
It would be if the clk node is present in the device's node. I was talking about a clock node in the genpd's table earlier. If that is ever the case, we will end up programming the wrong clk here.
That's right, but why do we want to call dev_pm_opp_set_opp() for the multiple PM domain case then? It makes the behaviour inconsistent.
To have a common path for all required OPP device types, irrespective of the fact that the required OPP device is a genpd or not. And we are talking about a required OPP of a separate device here, it must be set via this call only, technically speaking.
Genpd makes it a little complex, and I agree to that. But I strongly feel this code needs to be generic and not genpd specific. The OPP core should have as less genpd specific code as possible. It must handle all device types with a single code path.
[...]
That's right, but why do we want to call dev_pm_opp_set_opp() for the multiple PM domain case then? It makes the behaviour inconsistent.
To have a common path for all required OPP device types, irrespective of the fact that the required OPP device is a genpd or not. And we are talking about a required OPP of a separate device here, it must be set via this call only, technically speaking.
Genpd makes it a little complex, and I agree to that. But I strongly feel this code needs to be generic and not genpd specific. The OPP core should have as less genpd specific code as possible. It must handle all device types with a single code path.
I agree that we really should avoid genpd specific code and that's exactly what I am working towards too.
However, calling dev_pm_opp_set_opp() from _set_required_opps() looks to me like it has the exact opposite effect: *) To solve the bug according to the change you proposed, means more genpd hacks. **) To make the code for the required OPPs consistent between the single/multiple PM domain case, we need additional genpd hacks. ***) We can't remove some of the existing genpd hacks [1], as those would then still be needed.
Finally, while I understand that you prefer a single code path, we can still keep _set_required_opps() common and generic. Today, it's used only for performance-states of PM domains (the involved code isn't even genpd specific as it calls dev_pm_domain_set_performance_state()). If tomorrow we see a need to extend it to additional resources, it's easy also without calling dev_pm_opp_set_opp() from it.
Kind regards Uffe
[1] https://lore.kernel.org/all/20240718234319.356451-7-ulf.hansson@linaro.org/
linux-stable-mirror@lists.linaro.org