On 23 August 2012 09:57, MyungJoo Ham myungjoo.ham@samsung.com wrote:
On 20 August 2012 12:50, 함명주 myungjoo.ham@samsung.com wrote:
Prepare devfreq core framework to support devices which can idle. When device idleness is detected perhaps through runtime-pm, need some mechanism to suspend devfreq load monitoring and resume when device is back online. Present code continues monitoring unless device is removed from devfreq core.
This patch introduces following updates,
- move device load monitoring logic to ondemand governor as it is specific to ondemand.
We have this ondemand governor in the mainline devfreq. However, we have (not upstreamed yet) governors with specific purpose (for GPU or for a specific chip) with load monitoring logic. Although we don't have them upstreamed yet, why don't you keep the monitoring logic sharable by other governors. (From today, I'm not objecting to have individual workqueue, but I still don't want to let each active governor reimplement the same things.)
If more governors are planned based on load monitoring, possibly ondemand is not good enough even after adding additional configurable parameters. Could you please describe one such case which demand new governor?
Please find my next comments.
GPU devfreq driver requires behaviors similar to conservative though not exactly same.
In our internal development branch, it increases frequencies differently according to the current usage (# GPU cores used and % each core activated). If it's almost 100% and the queue is filling up, it goes to max. If it's (for example) 98%, it goes to the next faster step. Besides, GPU developers want to differ the poling intervals according to the current frequency.
Another case is with bus + memory-interface case. We are going to use a custom governor for them as each bus and memory-interface has its own DVFS capability while we want them to coordinate. The two drivers may be hooked by the custom governor. We don't have a specific detail with this as it is developed by another division. However, what I know is that they are going to use their own custon governor with ondemand-like behavior in principle.
Ok. Thanks.
Because devfreq is meant to be used by various types of devices, we cannot assume that the default governor is "mostly" fine. Device driver writers can embed governors in their own devfreq driver code ("custom governor").
- devfreq core interacts with governors via events to perform specific actions. These events include start/stop devfreq, and frequency limit changes outside devfreq. This sets ground for adding suspend/resume events.
event_handler with START/STOP/UPDATE seem fine.
However, init() and exit() should be different from START/STOP. We do not need to do init and exit every time when we do runtime suspend/resume.
I agree and its already taken care in this patch.
DEVFREQ_GOV_START : Event for initializing and starting of devfreq governor for a device. Happens only when device is added using devfreq_add_device().
DEVFREQ_GOV_STOP: Event for stopping and uninitialization of devfreq governor for a device. Happens only when device is removed using devfreq_remove_device().
Events introduced in next patch (2/3) - DEVFREQ_GOV_SUSPEND: Event for suspending devfreq of a device. I.e suspending of load monitoring of device in case of ondemand governor. Happens only when devfreq_suspend_device() is called.
DEVFREQ_GOV_RESUME: Event for resuming devfreq of a device. I.e continue load monitoring of already suspended device in case of ondemand governor. Happens only when devfreq_resume_device() is called.
Ah.. suspend/resume events are seperated. Please never mind this.
- use per device work instead of global work to monitor device load. This enables suspend/resume of device devfreq and reduces monitoring code complexity.
It's fine to have a delayed work struct per device.
However, please try to keep as many things/features in devfreq.c as possible; i.e., reduce features and code size of governors. Because, we will be supporting various types of devices with devfreq, there will be various governors and I don't want them to have shared things reimplemented. Dealing with the delayed work at devfreq.c and let governors choose to use it (at its struct) or not should be fine.
Delayed work is of only relevance to ondemand governor - justifies why struct delayed work moved from devfreq.c to governor_simpleondemand.c. When delayed work is optional for governors to use, then let governors handle them if needed instead of devfreq core.
With this change, we will not be forcing next upcoming devfreq governors to use delayed work for load monitoring(if required) but instead flexibility to have their own mechanisms. Its up to the governors to decide how it wants to do dvfs of a device. But imperative to respond the events.
As mentioned above, please do not assume that the default governors (drivers/devfreq/governor_*.c) will be only governors available and ondemand will be the only "active" governor for the future.
I think it'd be better for the driver writes to reuse code than to copy and paste the code.
Some governors may use some other polling/sampling mechanisms; however, still, providing delayed work at the core does not mean that governors must use the default delayed work struct. They may simply not use it as userspace, performance, and powersave do.
If code re-use is only the concern, instead of core handling delayed work which is optional for governors, how about providing load monitoring helper functions to governors? something like,
drivers/devfreq/governor.h
int devfreq_monitor_start(struct devfreq *devfreq); int devfreq_monitor_stop(struct devfreq *devfreq); int devfreq_monitor_suspend(struct devfreq *devfreq); int devfreq_monitor_resume(struct devfreq *devfreq);
Yes, putting a delayed-work struct in struct devfreq and let devfreq.c handle that delayed-work is what I think appropriate, too.
Anyway, what would be the difference between start vs resume and between stop vs suspend for the per-devfreq-struct delayed-work at the side of devfreq.c? Devfreq.c will simply start/stop the delayed-work with the values in devfreq, won't it? Or would there be any common tasks for suspend and resume that are not going to be required for start/stop? Assuming that start/stop will also be used by userspace input of "polling = some number / 0", it appears that there would be no difference with suspend/resume.
These delayed work handling functions will be implemented by devfreq.c and are called from governors in response to events.
With this, governors can re-use the delayed work based load monitoring mechanism and also have flexibility to implement their own monitoring mechanism based on their objectives.
In this patchset, the size of ondemand governor has been enlarged too much for that purpose.
I said this because I do not want governors to act as if they are core and I do not want to make writing governors too difficult for device driver writers. I want them to focus on the "policy" only, not the mechanism.
- Force devfreq users to set min/max supported frequencies in device profile to help governors to predict target frequecy with in limits.
Is this really necessary?
Yes. Mainly for two reasons,
- Devfreq core provides sysfs interface for usespace to set min/max
operating frequency for a device. These values must be within device supported min/max frequency limits.
No. they do not need to be within device supported limits.
When I say "CPU should be running at least 50MHz" when the CPU supports 1000MHz ~ 3000MHz, there is no problem. Running at 1000MHz does not contradict with the condition.
When I say "CPU should be running at least 4000MHz" for the same CPU, It is fine to set the CPU at max available.
- Governors should know device supported min/max opps and use
them where ever needed. Never assume UINT_MAX as max frequency.
The core and governors do not need to know such information. They only need to provide "recommended" frequency. The corresponding driver is going to interpret it properly. Whenever we can, it's better to remove device specific information from governors and cores as long as it does not significantly increase the complexity of drivers.
Sorry, but I disagree on this point. Devfreq core should not consider user input for granted and suggest next recommended target freq which is beyond device supported frequency and let driver to interpret it. I don't think passing device supported min/max from driver would increase the complexity of driver. Rather it would help in recommending freq with in limits.
This information also helps user space to know the device operating limits.
With the optional frequency statistics support proposed by Jonghwa Lee, devfreq.c is going to be accepting the list of operable frequencies, where the max/min values are trivially extracted. With this optional values, we can create sysfs entries to show max/min possible or operable frequencies. (they are difference from the min_freq and max_freq, which are user's limits)
I still don't understand why enforcing devfreq device drivers to declare min/max frequencies to devfreq core. Because the devfreq device drivers already know the hardware limits (otherwise, we are not able to enforce them to do so), it is not beneficial for the device driver to let devfreq core provide such limits to the device drivers (they already know).
Yes, passing values filtered by min/max values reported by device drivers from core does not increase the complexity at device drivers. However, it adds codes at _every_ governor to filter the values, which is going to be filtered at each device driver anyway. For devices supporting 100, 200, 400 MHz, governors will pass 150, 151, 201, 203, and such even if we filter min/max.
I just don't see a benefic to enforce device drivers to declare min/max to the core and enforce governors to recommend between the restricted values, which are going to be regulated at device drivers anyway. (and very trivial to be regulated by drivers)
If this is meant to show the min/max values to the users, making it optional should be enough.
Besides, allowing users to enter any arbitrary large values (e.g., 9999999999) to make it run at maximum frequency and enter any small values (e.g., 1) to make it run at minimum frequency makes things easier. I don't see why we should get them errors for such inputs.
Cheers! MyungJoo
UINT_MAX means "I recommend to run as fast as possible." Then, the driver only needs to provide the fastest frequency. The core and governor does not need to say the specific frequency.
Cheers! MyungJoo
The devfreq apis are not modified and are kept intact.
The ABIs are not.
You can no longer do "# echo 0 > ABI_path" in order to deactivate.
Only relevant for ondemand governor. Load monitoring can be deactivated when polling_ms is set to zero. Next version of patch will fix this. Thanks.
Cheers! MyungJoo
Signed-off-by: Rajagopal Venkat rajagopal.venkat@linaro.org
ps. please make the patch a bit more readable. (please don't shuffle the location of pre-existed functions)
-- Regards, Rajagopal
--
MyungJoo Ham (함명주), PHD
System S/W Lab, S/W Platform Team, Software Center Samsung Electronics Cell: +82-10-6714-2858
-- Regards, Rajagopal
On 30 August 2012 13:42, MyungJoo Ham myungjoo.ham@samsung.com wrote:
On 23 August 2012 09:57, MyungJoo Ham myungjoo.ham@samsung.com wrote:
On 20 August 2012 12:50, 함명주 myungjoo.ham@samsung.com wrote:
Prepare devfreq core framework to support devices which can idle. When device idleness is detected perhaps through runtime-pm, need some mechanism to suspend devfreq load monitoring and resume when device is back online. Present code continues monitoring unless device is removed from devfreq core.
This patch introduces following updates,
- move device load monitoring logic to ondemand governor as it is specific to ondemand.
We have this ondemand governor in the mainline devfreq. However, we have (not upstreamed yet) governors with specific purpose (for GPU or for a specific chip) with load monitoring logic. Although we don't have them upstreamed yet, why don't you keep the monitoring logic sharable by other governors. (From today, I'm not objecting to have individual workqueue, but I still don't want to let each active governor reimplement the same things.)
If more governors are planned based on load monitoring, possibly ondemand is not good enough even after adding additional configurable parameters. Could you please describe one such case which demand new governor?
Please find my next comments.
GPU devfreq driver requires behaviors similar to conservative though not exactly same.
In our internal development branch, it increases frequencies differently according to the current usage (# GPU cores used and % each core activated). If it's almost 100% and the queue is filling up, it goes to max. If it's (for example) 98%, it goes to the next faster step. Besides, GPU developers want to differ the poling intervals according to the current frequency.
Another case is with bus + memory-interface case. We are going to use a custom governor for them as each bus and memory-interface has its own DVFS capability while we want them to coordinate. The two drivers may be hooked by the custom governor. We don't have a specific detail with this as it is developed by another division. However, what I know is that they are going to use their own custon governor with ondemand-like behavior in principle.
Ok. Thanks.
Because devfreq is meant to be used by various types of devices, we cannot assume that the default governor is "mostly" fine. Device driver writers can embed governors in their own devfreq driver code ("custom governor").
- devfreq core interacts with governors via events to perform specific actions. These events include start/stop devfreq, and frequency limit changes outside devfreq. This sets ground for adding suspend/resume events.
event_handler with START/STOP/UPDATE seem fine.
However, init() and exit() should be different from START/STOP. We do not need to do init and exit every time when we do runtime suspend/resume.
I agree and its already taken care in this patch.
DEVFREQ_GOV_START : Event for initializing and starting of devfreq governor for a device. Happens only when device is added using devfreq_add_device().
DEVFREQ_GOV_STOP: Event for stopping and uninitialization of devfreq governor for a device. Happens only when device is removed using devfreq_remove_device().
Events introduced in next patch (2/3) - DEVFREQ_GOV_SUSPEND: Event for suspending devfreq of a device. I.e suspending of load monitoring of device in case of ondemand governor. Happens only when devfreq_suspend_device() is called.
DEVFREQ_GOV_RESUME: Event for resuming devfreq of a device. I.e continue load monitoring of already suspended device in case of ondemand governor. Happens only when devfreq_resume_device() is called.
Ah.. suspend/resume events are seperated. Please never mind this.
- use per device work instead of global work to monitor device load. This enables suspend/resume of device devfreq and reduces monitoring code complexity.
It's fine to have a delayed work struct per device.
However, please try to keep as many things/features in devfreq.c as possible; i.e., reduce features and code size of governors. Because, we will be supporting various types of devices with devfreq, there will be various governors and I don't want them to have shared things reimplemented. Dealing with the delayed work at devfreq.c and let governors choose to use it (at its struct) or not should be fine.
Delayed work is of only relevance to ondemand governor - justifies why struct delayed work moved from devfreq.c to governor_simpleondemand.c. When delayed work is optional for governors to use, then let governors handle them if needed instead of devfreq core.
With this change, we will not be forcing next upcoming devfreq governors to use delayed work for load monitoring(if required) but instead flexibility to have their own mechanisms. Its up to the governors to decide how it wants to do dvfs of a device. But imperative to respond the events.
As mentioned above, please do not assume that the default governors (drivers/devfreq/governor_*.c) will be only governors available and ondemand will be the only "active" governor for the future.
I think it'd be better for the driver writes to reuse code than to copy and paste the code.
Some governors may use some other polling/sampling mechanisms; however, still, providing delayed work at the core does not mean that governors must use the default delayed work struct. They may simply not use it as userspace, performance, and powersave do.
If code re-use is only the concern, instead of core handling delayed work which is optional for governors, how about providing load monitoring helper functions to governors? something like,
drivers/devfreq/governor.h
int devfreq_monitor_start(struct devfreq *devfreq); int devfreq_monitor_stop(struct devfreq *devfreq); int devfreq_monitor_suspend(struct devfreq *devfreq); int devfreq_monitor_resume(struct devfreq *devfreq);
Yes, putting a delayed-work struct in struct devfreq and let devfreq.c handle that delayed-work is what I think appropriate, too.
Ok. Next patch will include this.
Anyway, what would be the difference between start vs resume and between stop vs suspend for the per-devfreq-struct delayed-work at the side of devfreq.c? Devfreq.c will simply start/stop the delayed-work with the values in devfreq, won't it? Or would there be any common tasks for suspend and resume that are not going to be required for start/stop? Assuming that start/stop will also be used by userspace input of "polling = some number / 0", it appears that there would be no difference with suspend/resume.
There is difference between start vs resume and stop vs suspend. start will do the delayed work initialization and schedules the work. But, resume will only schedules the work. Likewise, stop will only cancels the work. But suspend will set the state(so that work is not re-scheduled) along with canceling the scheduled work.
These delayed work handling functions will be implemented by devfreq.c and are called from governors in response to events.
With this, governors can re-use the delayed work based load monitoring mechanism and also have flexibility to implement their own monitoring mechanism based on their objectives.
In this patchset, the size of ondemand governor has been enlarged too much for that purpose.
I said this because I do not want governors to act as if they are core and I do not want to make writing governors too difficult for device driver writers. I want them to focus on the "policy" only, not the mechanism.
- Force devfreq users to set min/max supported frequencies in device profile to help governors to predict target frequecy with in limits.
Is this really necessary?
Yes. Mainly for two reasons,
- Devfreq core provides sysfs interface for usespace to set min/max
operating frequency for a device. These values must be within device supported min/max frequency limits.
No. they do not need to be within device supported limits.
When I say "CPU should be running at least 50MHz" when the CPU supports 1000MHz ~ 3000MHz, there is no problem. Running at 1000MHz does not contradict with the condition.
When I say "CPU should be running at least 4000MHz" for the same CPU, It is fine to set the CPU at max available.
- Governors should know device supported min/max opps and use
them where ever needed. Never assume UINT_MAX as max frequency.
The core and governors do not need to know such information. They only need to provide "recommended" frequency. The corresponding driver is going to interpret it properly. Whenever we can, it's better to remove device specific information from governors and cores as long as it does not significantly increase the complexity of drivers.
Sorry, but I disagree on this point. Devfreq core should not consider user input for granted and suggest next recommended target freq which is beyond device supported frequency and let driver to interpret it. I don't think passing device supported min/max from driver would increase the complexity of driver. Rather it would help in recommending freq with in limits.
This information also helps user space to know the device operating limits.
With the optional frequency statistics support proposed by Jonghwa Lee, devfreq.c is going to be accepting the list of operable frequencies, where the max/min values are trivially extracted. With this optional values, we can create sysfs entries to show max/min possible or operable frequencies. (they are difference from the min_freq and max_freq, which are user's limits)
Yes agree.
I still don't understand why enforcing devfreq device drivers to declare min/max frequencies to devfreq core. Because the devfreq device drivers already know the hardware limits (otherwise, we are not able to enforce them to do so), it is not beneficial for the device driver to let devfreq core provide such limits to the device drivers (they already know).
Yes, passing values filtered by min/max values reported by device drivers from core does not increase the complexity at device drivers. However, it adds codes at _every_ governor to filter the values, which is going to be filtered at each device driver anyway. For devices supporting 100, 200, 400 MHz, governors will pass 150, 151, 201, 203, and such even if we filter min/max.
I just don't see a benefic to enforce device drivers to declare min/max to the core and enforce governors to recommend between the restricted values, which are going to be regulated at device drivers anyway. (and very trivial to be regulated by drivers)
If this is meant to show the min/max values to the users, making it optional should be enough.
Besides, allowing users to enter any arbitrary large values (e.g., 9999999999) to make it run at maximum frequency and enter any small values (e.g., 1) to make it run at minimum frequency makes things easier. I don't see why we should get them errors for such inputs.
Cheers! MyungJoo
UINT_MAX means "I recommend to run as fast as possible." Then, the driver only needs to provide the fastest frequency. The core and governor does not need to say the specific frequency.
Cheers! MyungJoo
The devfreq apis are not modified and are kept intact.
The ABIs are not.
You can no longer do "# echo 0 > ABI_path" in order to deactivate.
Only relevant for ondemand governor. Load monitoring can be deactivated when polling_ms is set to zero. Next version of patch will fix this. Thanks.
Cheers! MyungJoo
Signed-off-by: Rajagopal Venkat rajagopal.venkat@linaro.org
ps. please make the patch a bit more readable. (please don't shuffle the location of pre-existed functions)
-- Regards, Rajagopal
--
MyungJoo Ham (함명주), PHD
System S/W Lab, S/W Platform Team, Software Center Samsung Electronics Cell: +82-10-6714-2858
-- Regards, Rajagopal