[RFC PATCH v4 0/4] Consolidate cpuidle timekeeping and irq enabling

List overview All Threads
Download

newer

older

[ACTIVITY] Graphics weekly status...

describing who can see restricted...

Robert Lee

1 Feb 2012 1 Feb '12

3 a.m.

This patch series moves the timekeeping and irq enabling from the platform code to the core cpuidle driver. Also, the platform irq disabling was removed as it appears that all calls into cpuidle_call_idle will have already called local_irq_disable().

To save reviewers time, only a few platforms which required the most changes are included in this version. If these changes are approved, the next version will include the remaining platform code which should require minimal changes.

For those who have followed the previous patch versions, as you know, the previous version of this patch series added some helper functionality which used a wrapper function to remove the time keeping and irq enabling/disabling from the platform code. There was also initialization helper functionality which has now been removed from this version. If the basic implementation in this version is approved, then a separate patch submission effort can be made to focus on consolidation of initialziation functionality.

Based on 3.3-rc1

v3 submission can be found here: http://www.spinics.net/lists/arm-kernel/msg156751.html Changes since v3: * Removed drivers/cpuidle/common.c ** Removed the initialization helper functions ** Removed the wrapper used to consolidate time keeping and irq enable/disable * Add time keeping and local_irq_disable handling in cpuidle_call_idle(). * Made necessary modifications to a few platforms that required the most changes ** Note on omap3: changed structure of omap3_idle_drvdata and added per_next_state and per_saved_state vars to accomodate new framework.

v2 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/144199

Changes since v2: * Made various code organization and style changes as suggested in v1 review. * Removed at91 use of common code. A separate effort is underway to clean at91 code and the author has offered to convert to common interface as part of those changes (if this common interface is accepted in time). * Made platform cpuidle_driver objects __initdata and dynamically added one persistent instance of this object in common code. * Removed imx5 pm usage of gpc_dvfs clock as it is no longer needed after being enabled during clock initialization. * Re-organized patches.

v1 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/142791

Changes since v1: * Common interface moved to drivers/cpuidle and made non arch-specific. * Made various fixes and suggested additions to the common cpuidle code from v1 review. * Added callback for filling in driver_data field as needed. * Modified the various platforms with these changes.

Robert Lee (4): cpuidle: Add time keeping and irq enabling ARM: omap: Remove cpuidle timekeeping and irq enable/disable acpi: Remove cpuidle timekeeping and irq enable/disable x86: Remove cpuidle timekeeping and irq enable/disable

arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++---------- drivers/acpi/processor_idle.c | 203 ++++++++++++++++++++++--------------- drivers/cpuidle/cpuidle.c | 75 +++++++++++--- drivers/idle/intel_idle.c | 110 ++++++++++++++------ include/linux/cpuidle.h | 26 +++-- 5 files changed, 317 insertions(+), 193 deletions(-)

Show replies by date

Robert Lee

1 Feb 1 Feb

3 a.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

Make necessary changes to add implement time keepign and irq enabling in the core cpuidle code. This will allow the remove of these functionalities from the platform cpuidle implementations.

Signed-off-by: Robert Lee rob.lee@linaro.org --- drivers/cpuidle/cpuidle.c | 75 +++++++++++++++++++++++++++++++++++--------- include/linux/cpuidle.h | 26 ++++++++++----- 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 59f4261..8ea0fc3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -57,14 +57,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev); * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here + * NOTE: Should only be called from a local irq disabled context * return non-zero on failure + * */ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); struct cpuidle_state *target_state; - int next_state, entered_state; + int idx, ret = 0; + ktime_t t1, t2; + s64 diff;

if (off) return -ENODEV; @@ -86,37 +90,76 @@ int cpuidle_idle_call(void) #endif

/* ask the governor for the next state */ - next_state = cpuidle_curr_governor->select(drv, dev); + idx = cpuidle_curr_governor->select(drv, dev); + + target_state = &drv->states[idx]; + + /* + * Check with the device to see if it can enter this state or if another + * state should be used. + */ + if (target_state->pre_enter) { + idx = target_state-> + pre_enter(dev, drv, idx); + } + + if (idx < 0) { + local_irq_enable(); + return idx; + } + if (need_resched()) { local_irq_enable(); - return 0; + return -EBUSY; }

- target_state = &drv->states[next_state]; + target_state = &drv->states[idx];

- trace_power_start(POWER_CSTATE, next_state, dev->cpu); - trace_cpu_idle(next_state, dev->cpu); + if ((target_state->flags & CPUIDLE_FLAG_TIME_VALID)) + t1 = ktime_get();

- entered_state = target_state->enter(dev, drv, next_state); + trace_power_start(POWER_CSTATE, idx, dev->cpu); + trace_cpu_idle(idx, dev->cpu); + + idx = target_state->enter(dev, drv, idx);

trace_power_end(dev->cpu); trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);

- if (entered_state >= 0) { - /* Update cpuidle counters */ - /* This can be moved to within driver enter routine - * but that results in multiple copies of same code. - */ - dev->states_usage[entered_state].time += + if (idx < 0) { + local_irq_enable(); + return idx; + } + + if (likely(target_state->flags & drv->states[idx].flags & + CPUIDLE_FLAG_TIME_VALID)) + t2 = ktime_get(); + + local_irq_enable(); + + if (target_state->post_enter) + target_state->post_enter(dev, drv, idx); + + if (likely(target_state->flags & drv->states[idx].flags & + CPUIDLE_FLAG_TIME_VALID)) { + + diff = ktime_to_us(ktime_sub(t2, t1)); + if (diff > INT_MAX) + diff = INT_MAX; + + dev->last_residency = (int) diff; + + dev->states_usage[idx].time += (unsigned long long)dev->last_residency; - dev->states_usage[entered_state].usage++; }

+ dev->states_usage[idx].usage++; + /* give the governor an opportunity to reflect on the outcome */ if (cpuidle_curr_governor->reflect) - cpuidle_curr_governor->reflect(dev, entered_state); + cpuidle_curr_governor->reflect(dev, idx);

- return 0; + return ret; }

/** diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 712abcc..8154f60 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -38,17 +38,25 @@ struct cpuidle_state_usage { };

struct cpuidle_state { - char name[CPUIDLE_NAME_LEN]; - char desc[CPUIDLE_DESC_LEN]; + char name[CPUIDLE_NAME_LEN]; + char desc[CPUIDLE_DESC_LEN]; + + unsigned int flags; + unsigned int exit_latency; /* in US */ + unsigned int power_usage; /* in mW */ + unsigned int target_residency; /* in US */ + + int (*pre_enter) (struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index);

- unsigned int flags; - unsigned int exit_latency; /* in US */ - unsigned int power_usage; /* in mW */ - unsigned int target_residency; /* in US */ + int (*enter) (struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index);

- int (*enter) (struct cpuidle_device *dev, - struct cpuidle_driver *drv, - int index); + int (*post_enter) (struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index); };

/* Idle State Flags */

-- 1.7.1

Colin Cross

4 Feb 4 Feb

7:02 p.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...

Make necessary changes to add implement time keepign and irq enabling

keeping

...

in the core cpuidle code. This will allow the remove of these functionalities from the platform cpuidle implementations.

Signed-off-by: Robert Lee rob.lee@linaro.org

drivers/cpuidle/cpuidle.c | 75 +++++++++++++++++++++++++++++++++++--------- include/linux/cpuidle.h | 26 ++++++++++----- 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 59f4261..8ea0fc3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -57,14 +57,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev); * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here

NOTE: Should only be called from a local irq disabled context

* return non-zero on failure

*/ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); struct cpuidle_state *target_state;

int next_state, entered_state;

int idx, ret = 0;

ktime_t t1, t2;

s64 diff;

if (off) return -ENODEV; @@ -86,37 +90,76 @@ int cpuidle_idle_call(void) #endif

/* ask the governor for the next state */

next_state = cpuidle_curr_governor->select(drv, dev);

idx = cpuidle_curr_governor->select(drv, dev);

target_state = &drv->states[idx];

/*

* Check with the device to see if it can enter this state or if another

* state should be used.

*/

if (target_state->pre_enter) {

idx = target_state->

pre_enter(dev, drv, idx);

Unnecessary line wrap and braces.

What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

Turquette, Mike

10:06 p.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

On Sat, Feb 4, 2012 at 11:02 AM, Colin Cross ccross@google.com wrote:

...

On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...
Make necessary changes to add implement time keepign and irq enabling

keeping

...
in the core cpuidle code. This will allow the remove of these functionalities from the platform cpuidle implementations.

Signed-off-by: Robert Lee rob.lee@linaro.org

drivers/cpuidle/cpuidle.c | 75 +++++++++++++++++++++++++++++++++++--------- include/linux/cpuidle.h | 26 ++++++++++----- 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 59f4261..8ea0fc3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -57,14 +57,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev); * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here

NOTE: Should only be called from a local irq disabled context

* return non-zero on failure

*/ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); struct cpuidle_state *target_state;

int next_state, entered_state;

int idx, ret = 0;

ktime_t t1, t2;

s64 diff;

if (off) return -ENODEV; @@ -86,37 +90,76 @@ int cpuidle_idle_call(void) #endif

/* ask the governor for the next state */

next_state = cpuidle_curr_governor->select(drv, dev);

idx = cpuidle_curr_governor->select(drv, dev);

target_state = &drv->states[idx];

/*

* Check with the device to see if it can enter this state or if another

* state should be used.

*/

if (target_state->pre_enter) {

idx = target_state->

pre_enter(dev, drv, idx);

Unnecessary line wrap and braces.

What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote

Hi Colin,

I asked Rob to re-introduce the .prepare callback (not .pre_enter).

The short version of why I requested this is so that I can experiment with modifying wake-up latency and theshold values dynamically based on PM QoS constraints. For an OMAP-specific example, I'd like to see our C-states no longer model both MPU and CORE power domains, and instead only model the MPU. Then when entering idle the PM QoS constraints against the CORE power domain's wake-up latency can be considered in the .prepare callback which will affect the C-state parameters as well as program the CORE low power target state.

.pre_enter isn't really right for the above case so I'm happy for it to be dropped, but I'll probably re-introduce something like .prepare in the future...

Regards, Mike

...

the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Colin Cross

5 Feb 5 Feb

1:36 a.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

On Sat, Feb 4, 2012 at 2:06 PM, Turquette, Mike mturquette@ti.com wrote:

...

On Sat, Feb 4, 2012 at 11:02 AM, Colin Cross ccross@google.com wrote:

...
What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote

Hi Colin,

I asked Rob to re-introduce the .prepare callback (not .pre_enter).

The short version of why I requested this is so that I can experiment with modifying wake-up latency and theshold values dynamically based on PM QoS constraints. For an OMAP-specific example, I'd like to see our C-states no longer model both MPU and CORE power domains, and instead only model the MPU. Then when entering idle the PM QoS constraints against the CORE power domain's wake-up latency can be considered in the .prepare callback which will affect the C-state parameters as well as program the CORE low power target state.

prepare makes sense to adjust latencies, as long as it is not used for state demotion as well.

...

.pre_enter isn't really right for the above case so I'm happy for it to be dropped, but I'll probably re-introduce something like .prepare in the future...

Regards, Mike

...
the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Turquette, Mike

1:48 a.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

On Sat, Feb 4, 2012 at 5:36 PM, Colin Cross ccross@google.com wrote:

...

On Sat, Feb 4, 2012 at 2:06 PM, Turquette, Mike mturquette@ti.com wrote:

...
On Sat, Feb 4, 2012 at 11:02 AM, Colin Cross ccross@google.com wrote:

...
What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote

Hi Colin,

I asked Rob to re-introduce the .prepare callback (not .pre_enter).

The short version of why I requested this is so that I can experiment with modifying wake-up latency and theshold values dynamically based on PM QoS constraints. For an OMAP-specific example, I'd like to see our C-states no longer model both MPU and CORE power domains, and instead only model the MPU. Then when entering idle the PM QoS constraints against the CORE power domain's wake-up latency can be considered in the .prepare callback which will affect the C-state parameters as well as program the CORE low power target state.

prepare makes sense to adjust latencies, as long as it is not used for state demotion as well.

Latency adjustment is the plan. State demotion is not.

Rob,

If you cook up another version of this series feel free to drop .pre_enter. It would be nice if you re-introduced .prepare as per our original discussion but if that's a roadblock then you can forget that point too and I'll take a crack at it later.

Regards, Mike

...

...
.pre_enter isn't really right for the above case so I'm happy for it to be dropped, but I'll probably re-introduce something like .prepare in the future...

Regards, Mike

...
the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Rob Lee

6 Feb 6 Feb

4:38 p.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

On Sat, Feb 4, 2012 at 4:06 PM, Turquette, Mike mturquette@ti.com wrote:

...

On Sat, Feb 4, 2012 at 11:02 AM, Colin Cross ccross@google.com wrote:

...
On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...
Make necessary changes to add implement time keepign and irq enabling

keeping

...
in the core cpuidle code. This will allow the remove of these functionalities from the platform cpuidle implementations.

Signed-off-by: Robert Lee rob.lee@linaro.org

drivers/cpuidle/cpuidle.c | 75 +++++++++++++++++++++++++++++++++++--------- include/linux/cpuidle.h | 26 ++++++++++----- 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 59f4261..8ea0fc3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -57,14 +57,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev); * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here

NOTE: Should only be called from a local irq disabled context

* return non-zero on failure

*/ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); struct cpuidle_state *target_state;

int next_state, entered_state;

int idx, ret = 0;

ktime_t t1, t2;

s64 diff;

if (off) return -ENODEV; @@ -86,37 +90,76 @@ int cpuidle_idle_call(void) #endif

/* ask the governor for the next state */

next_state = cpuidle_curr_governor->select(drv, dev);

idx = cpuidle_curr_governor->select(drv, dev);

target_state = &drv->states[idx];

/*

* Check with the device to see if it can enter this state or if another

* state should be used.

*/

if (target_state->pre_enter) {

idx = target_state->

pre_enter(dev, drv, idx);

Unnecessary line wrap and braces.

What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote

Hi Colin,

I asked Rob to re-introduce the .prepare callback (not .pre_enter).

The short version of why I requested this is so that I can experiment with modifying wake-up latency and theshold values dynamically based on PM QoS constraints. For an OMAP-specific example, I'd like to see our C-states no longer model both MPU and CORE power domains, and instead only model the MPU. Then when entering idle the PM QoS constraints against the CORE power domain's wake-up latency can be considered in the .prepare callback which will affect the C-state parameters as well as program the CORE low power target state.

.pre_enter isn't really right for the above case so I'm happy for it to be dropped, but I'll probably re-introduce something like .prepare in the future...

Hey Mike, yes, this pre_enter has it's own purpose that appears separate from the reasoning for the prepare function you mention here. Perhaps it would be best to re-add the prepare function and make use of it in the OMAP code all in one patch series so that it doesn't get removed again due to apparent lack of use in the upstream kernel. I'm sure we'll discuss it further this week in person.

...

Regards, Mike

...
the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Rob Lee

5:14 p.m.

New subject: [RFC PATCH v4 1/4] cpuidle: Add time keeping and irq enabling

Hello Colin, thanks for the review.

On Sat, Feb 4, 2012 at 1:02 PM, Colin Cross ccross@google.com wrote:

...

On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...
Make necessary changes to add implement time keepign and irq enabling

keeping

...
in the core cpuidle code. This will allow the remove of these functionalities from the platform cpuidle implementations.

Signed-off-by: Robert Lee rob.lee@linaro.org

drivers/cpuidle/cpuidle.c | 75 +++++++++++++++++++++++++++++++++++--------- include/linux/cpuidle.h | 26 ++++++++++----- 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 59f4261..8ea0fc3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -57,14 +57,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev); * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here

NOTE: Should only be called from a local irq disabled context

* return non-zero on failure

*/ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); struct cpuidle_state *target_state;

int next_state, entered_state;

int idx, ret = 0;

ktime_t t1, t2;

s64 diff;

if (off) return -ENODEV; @@ -86,37 +90,76 @@ int cpuidle_idle_call(void) #endif

/* ask the governor for the next state */

next_state = cpuidle_curr_governor->select(drv, dev);

idx = cpuidle_curr_governor->select(drv, dev);

target_state = &drv->states[idx];

/*

* Check with the device to see if it can enter this state or if another

* state should be used.

*/

if (target_state->pre_enter) {

idx = target_state->

pre_enter(dev, drv, idx);

Unnecessary line wrap and braces.

Thanks.

...

What's the point of the pre_enter call? This seems very similar to the prepare call that was removed in 3.2. Drivers can already demote the target state in their enter call. The only thing you do between pre_enter and enter is trace and account for the time. Is there some long call you don't want included in the idle time? Some documentation would help, and you need to very clearly define the semantics of when post_enter gets called when pre_enter or enter return errors.

pre_enter's purpose is to perform any necessary platform technology that can be performed before entering that actual idle or some restricted idle context where only platform specific code can run. If I try to consolidate the timekeeping in the core cpuidle driver without pre_enter, then my idle time will incorrectly account for this platform preparation time. Perhaps this small idle time accounting error isn't of concern to anyone though.

The previous version of this patch submission used a different approach by adding a wrapper that could be used by fairly simple cpuidle implementations. In the v3 submission of this patch series, Daniel asked about the possibility of just performing this consolidation in the cpuidle_idle_call function itself. Instead of debating the pros and cons of it, I thought it might be better to send this version of the patch and discuss it further.

Another approach I thought about may be to add flags that indicate whether or not the platform or the core driver should perform the time keeping. Or, go back to the wrapper function approach. Please feel free to give your opinion on the preferred approach.

Agree on the documentation. I was trying to just get this patch series out there for discussion before spending more time on it in case it is not the desired approach to timekeeping and irq disabling consolidation.

Robert Lee

1 Feb 1 Feb

3 a.m.

New subject: [RFC PATCH v4 2/4] ARM: omap: Remove cpuidle timekeeping and irq enable/disable

Now that the core cpuidle driver keeps time and handles irq enabling, remove this functionality. Also, remove irq disabling as all paths to cpuidle_idle_call already call local_irq_disable. Also, restructure idle functions as needed by the cpuidle core driver changes.

Signed-off-by: Robert Lee rob.lee@linaro.org --- arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++++++++++-------------------- 1 files changed, 43 insertions(+), 53 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c index 464cffd..9ecded5 100644 --- a/arch/arm/mach-omap2/cpuidle34xx.c +++ b/arch/arm/mach-omap2/cpuidle34xx.c @@ -69,7 +69,14 @@ struct omap3_idle_statedata { u32 core_state; u8 valid; }; -struct omap3_idle_statedata omap3_idle_data[OMAP3_NUM_STATES]; + +struct omap3_idle_drvdata { + struct omap3_idle_statedata state_data[OMAP3_NUM_STATES]; + u32 per_saved_state; + u32 per_next_state; +}; + +static struct omap3_idle_drvdata omap3_idle_data;

struct powerdomain *mpu_pd, *core_pd, *per_pd, *cam_pd;

@@ -100,23 +107,20 @@ static int omap3_enter_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - struct omap3_idle_statedata *cx = - cpuidle_get_statedata(&dev->states_usage[index]); - struct timespec ts_preidle, ts_postidle, ts_idle; - u32 mpu_state = cx->mpu_state, core_state = cx->core_state; - int idle_time; + struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data

- /* Used to keep track of the total time in idle */ - getnstimeofday(&ts_preidle); + u32 mpu_state = dd->state_data[index].mpu_state; + u32 core_state = dd->state_data[index].core_state;

- local_irq_disable(); local_fiq_disable();

pwrdm_set_next_pwrst(mpu_pd, mpu_state); pwrdm_set_next_pwrst(core_pd, core_state);

- if (omap_irq_pending() || need_resched()) - goto return_sleep_time; + if (omap_irq_pending() || need_resched()) { + index = -EBUSY; + goto leave; + }

/* Deny idle for C1 */ if (index == 0) { @@ -147,18 +151,12 @@ static int omap3_enter_idle(struct cpuidle_device *dev, pwrdm_for_each_clkdm(core_pd, _cpuidle_allow_idle); }

-return_sleep_time: - getnstimeofday(&ts_postidle); - ts_idle = timespec_sub(ts_postidle, ts_preidle); - - local_irq_enable(); +leave: local_fiq_enable();

- idle_time = ts_idle.tv_nsec / NSEC_PER_USEC + ts_idle.tv_sec * \ - USEC_PER_SEC; - - /* Update cpuidle counters */ - dev->last_residency = idle_time; + /* Restore original PER state if it was modified */ + if (dd->per_next_state != dd->per_saved_state) + pwrdm_set_next_pwrst(per_pd, dd->per_saved_state);

return index; } @@ -180,9 +178,10 @@ static int next_valid_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - struct cpuidle_state_usage *curr_usage = &dev->states_usage[index]; struct cpuidle_state *curr = &drv->states[index]; - struct omap3_idle_statedata *cx = cpuidle_get_statedata(curr_usage); + struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data; + struct omap3_idle_statedata *cx = &dd->state_data[index]; + u32 mpu_deepest_state = PWRDM_POWER_RET; u32 core_deepest_state = PWRDM_POWER_RET; int next_index = -1; @@ -223,7 +222,8 @@ static int next_valid_state(struct cpuidle_device *dev, */ idx--; for (; idx >= 0; idx--) { - cx = cpuidle_get_statedata(&dev->states_usage[idx]); + dd = dev->states_usage[idx].driver_data; + cx = &dd->state_data[idx]; if ((cx->valid) && (cx->mpu_state >= mpu_deepest_state) && (cx->core_state >= core_deepest_state)) { @@ -255,8 +255,9 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev, int index) { int new_state_idx; - u32 core_next_state, per_next_state = 0, per_saved_state = 0, cam_state; - struct omap3_idle_statedata *cx; + u32 core_next_state, cam_state; + struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data; + struct omap3_idle_statedata *cx = &dd->state_data[index]; int ret;

/* @@ -264,10 +265,8 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev, * CAM does not have wakeup capability in OMAP3. */ cam_state = pwrdm_read_pwrst(cam_pd); - if (cam_state == PWRDM_POWER_ON) { - new_state_idx = drv->safe_state_index; - goto select_state; - } + if (cam_state == PWRDM_POWER_ON) + return drv->safe_state_index;

/* * FIXME: we currently manage device-specific idle states @@ -281,27 +280,20 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev, * Prevent PER off if CORE is not in retention or off as this * would disable PER wakeups completely. */ - cx = cpuidle_get_statedata(&dev->states_usage[index]); core_next_state = cx->core_state; - per_next_state = per_saved_state = pwrdm_read_next_pwrst(per_pd); - if ((per_next_state == PWRDM_POWER_OFF) && - (core_next_state > PWRDM_POWER_RET)) - per_next_state = PWRDM_POWER_RET; - - /* Are we changing PER target state? */ - if (per_next_state != per_saved_state) - pwrdm_set_next_pwrst(per_pd, per_next_state);

- new_state_idx = next_valid_state(dev, drv, index); + dd->per_next_state = dd->per_saved_state = + pwrdm_read_next_pwrst(per_pd);

-select_state: - ret = omap3_enter_idle(dev, drv, new_state_idx); + if ((dd->per_next_state == PWRDM_POWER_OFF) && + (core_next_state > PWRDM_POWER_RET)) + dd->per_next_state = PWRDM_POWER_RET;

- /* Restore original PER state if it was modified */ - if (per_next_state != per_saved_state) - pwrdm_set_next_pwrst(per_pd, per_saved_state); + /* Are we changing PER target state? */ + if (dd->per_next_state != dd->per_saved_state) + pwrdm_set_next_pwrst(per_pd, dd->per_next_state);

- return ret; + return next_valid_state(dev, drv, index); }

DEFINE_PER_CPU(struct cpuidle_device, omap3_idle_dev); @@ -337,7 +329,8 @@ static inline void _fill_cstate(struct cpuidle_driver *drv, state->exit_latency = cpuidle_params_table[idx].exit_latency; state->target_residency = cpuidle_params_table[idx].target_residency; state->flags = CPUIDLE_FLAG_TIME_VALID; - state->enter = omap3_enter_idle_bm; + state->pre_enter = omap3_enter_idle_bm; + state->enter = omap3_enter_idle; sprintf(state->name, "C%d", idx + 1); strncpy(state->desc, descr, CPUIDLE_DESC_LEN);

@@ -348,13 +341,10 @@ static inline struct omap3_idle_statedata *_fill_cstate_usage( struct cpuidle_device *dev, int idx) { - struct omap3_idle_statedata *cx = &omap3_idle_data[idx]; - struct cpuidle_state_usage *state_usage = &dev->states_usage[idx]; - - cx->valid = cpuidle_params_table[idx].valid; - cpuidle_set_statedata(state_usage, cx); + omap3_idle_data.state_data[idx].valid = cpuidle_params_table[idx].valid; + cpuidle_set_statedata(&dev->states_usage[idx], &omap3_idle_data);

- return cx; + return &omap3_idle_data.state_data[idx]; }

/**

-- 1.7.1

Jean Pihet

2 Feb 2 Feb

4:21 p.m.

New subject: [RFC PATCH v4 2/4] ARM: omap: Remove cpuidle timekeeping and irq enable/disable

Rob,

On Wed, Feb 1, 2012 at 4:00 AM, Robert Lee rob.lee@linaro.org wrote:

...

Now that the core cpuidle driver keeps time and handles irq enabling, remove this functionality. Also, remove irq disabling as all paths to cpuidle_idle_call already call local_irq_disable. Also, restructure idle functions as needed by the cpuidle core driver changes.

Signed-off-by: Robert Lee rob.lee@linaro.org

arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++++++++++-------------------- 1 files changed, 43 insertions(+), 53 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c index 464cffd..9ecded5 100644 --- a/arch/arm/mach-omap2/cpuidle34xx.c +++ b/arch/arm/mach-omap2/cpuidle34xx.c

...

@@ -100,23 +107,20 @@ static int omap3_enter_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) {

struct omap3_idle_statedata *cx =

cpuidle_get_statedata(&dev->states_usage[index]);

struct timespec ts_preidle, ts_postidle, ts_idle;

u32 mpu_state = cx->mpu_state, core_state = cx->core_state;

int idle_time;

struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data

A build error is triggered by the missing ";".

...

@@ -147,18 +151,12 @@ static int omap3_enter_idle(struct cpuidle_device *dev, pwrdm_for_each_clkdm(core_pd, _cpuidle_allow_idle); }

-return_sleep_time:

getnstimeofday(&ts_postidle);

ts_idle = timespec_sub(ts_postidle, ts_preidle);

local_irq_enable();

+leave: local_fiq_enable();

idle_time = ts_idle.tv_nsec / NSEC_PER_USEC + ts_idle.tv_sec * \

USEC_PER_SEC;

/* Update cpuidle counters */

dev->last_residency = idle_time;

/* Restore original PER state if it was modified */

if (dd->per_next_state != dd->per_saved_state)

pwrdm_set_next_pwrst(per_pd, dd->per_saved_state);

This code is not necessarily balanced with the PER state change in omap3_enter_idle_bm (cf. "/* Are we changing PER target state? */" here below), since in the core cpuidle code there is no guarantee that the calls to pre_enter and enter callbacks are balanced. In general I fear that splitting the code in two functions introduces a risk of programming non-coherent settings in the PRCM.

...

@@ -255,8 +255,9 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev, int index) { int new_state_idx;

u32 core_next_state, per_next_state = 0, per_saved_state = 0, cam_state;

struct omap3_idle_statedata *cx;

u32 core_next_state, cam_state;

struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data;

struct omap3_idle_statedata *cx = &dd->state_data[index];

int ret;

The build throws a few warnings about unused variables:

arch/arm/mach-omap2/cpuidle34xx.c: In function 'omap3_enter_idle_bm': arch/arm/mach-omap2/cpuidle34xx.c:261: warning: unused variable 'ret' arch/arm/mach-omap2/cpuidle34xx.c:257: warning: unused variable 'new_state_idx'

...

DEFINE_PER_CPU(struct cpuidle_device, omap3_idle_dev); @@ -337,7 +329,8 @@ static inline void _fill_cstate(struct cpuidle_driver *drv, state->exit_latency = cpuidle_params_table[idx].exit_latency; state->target_residency = cpuidle_params_table[idx].target_residency; state->flags = CPUIDLE_FLAG_TIME_VALID;

state->enter = omap3_enter_idle_bm;

state->pre_enter = omap3_enter_idle_bm;

state->enter = omap3_enter_idle;

sprintf(state->name, "C%d", idx + 1); strncpy(state->desc, descr, CPUIDLE_DESC_LEN);

...

Also the line at 373 is not needed anymore in omap3_idle_init, since the enter callback is filled in in the _fill_cstate function: /* C1 . MPU WFI + Core active */ _fill_cstate(drv, 0, "MPU ON + CORE ON"); (&drv->states[0])->enter = omap3_enter_idle; <== not needed anymore drv->safe_state_index = 0;

More testing on OMAP3 is needed. Let me come back with the results asap.

Regards, Jean

Rob Lee

5:35 p.m.

New subject: [RFC PATCH v4 2/4] ARM: omap: Remove cpuidle timekeeping and irq enable/disable

On Thu, Feb 2, 2012 at 10:21 AM, Jean Pihet jean.pihet@newoldbits.com wrote:

...

Rob,

On Wed, Feb 1, 2012 at 4:00 AM, Robert Lee rob.lee@linaro.org wrote:

...
Now that the core cpuidle driver keeps time and handles irq enabling, remove this functionality. Also, remove irq disabling as all paths to cpuidle_idle_call already call local_irq_disable. Also, restructure idle functions as needed by the cpuidle core driver changes.

Signed-off-by: Robert Lee rob.lee@linaro.org

arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++++++++++-------------------- 1 files changed, 43 insertions(+), 53 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c index 464cffd..9ecded5 100644 --- a/arch/arm/mach-omap2/cpuidle34xx.c +++ b/arch/arm/mach-omap2/cpuidle34xx.c

...

...
@@ -100,23 +107,20 @@ static int omap3_enter_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) {

struct omap3_idle_statedata *cx =

cpuidle_get_statedata(&dev->states_usage[index]);

struct timespec ts_preidle, ts_postidle, ts_idle;

u32 mpu_state = cx->mpu_state, core_state = cx->core_state;

int idle_time;

struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data

A build error is triggered by the missing ";".

Argh, last minute change and I didn't build afterward.

...

...

...
@@ -147,18 +151,12 @@ static int omap3_enter_idle(struct cpuidle_device *dev, pwrdm_for_each_clkdm(core_pd, _cpuidle_allow_idle); }

-return_sleep_time:

getnstimeofday(&ts_postidle);

ts_idle = timespec_sub(ts_postidle, ts_preidle);

local_irq_enable();

+leave: local_fiq_enable();

idle_time = ts_idle.tv_nsec / NSEC_PER_USEC + ts_idle.tv_sec * \

USEC_PER_SEC;

/* Update cpuidle counters */

dev->last_residency = idle_time;

/* Restore original PER state if it was modified */

if (dd->per_next_state != dd->per_saved_state)

pwrdm_set_next_pwrst(per_pd, dd->per_saved_state);

This code is not necessarily balanced with the PER state change in omap3_enter_idle_bm (cf. "/* Are we changing PER target state? */" here below), since in the core cpuidle code there is no guarantee that the calls to pre_enter and enter callbacks are balanced. In general I fear that splitting the code in two functions introduces a risk of programming non-coherent settings in the PRCM.

Agree, in general it does introduce that new risk. For the platform code changes, I tried to keep the code paths the same as before. I see now where I created a problem though:

... if (target_state->pre_enter) { idx = target_state-> pre_enter(dev, drv, idx); }

if (idx < 0) { local_irq_enable(); return idx; }

if (need_resched()) { local_irq_enable(); return -EBUSY; }

...

The only way the core cpuidle pre_enter can get called without enter without enter is if the omap3 next_valid_state() returned a negative value. If this ever happened in the existing omap3 code, it would cause it to break anyway. But, this particular need_resched() call could cause an exit that results in difference behavior than before. One solution to that is just to move the need_resched check before the pre_enter call.

...

...

...
@@ -255,8 +255,9 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev, int index) { int new_state_idx;

u32 core_next_state, per_next_state = 0, per_saved_state = 0, cam_state;

struct omap3_idle_statedata *cx;

u32 core_next_state, cam_state;

struct omap3_idle_drvdata *dd = dev->states_usage[index].driver_data;

struct omap3_idle_statedata *cx = &dd->state_data[index];

int ret;

The build throws a few warnings about unused variables:

arch/arm/mach-omap2/cpuidle34xx.c: In function 'omap3_enter_idle_bm': arch/arm/mach-omap2/cpuidle34xx.c:261: warning: unused variable 'ret' arch/arm/mach-omap2/cpuidle34xx.c:257: warning: unused variable 'new_state_idx'

...

Got it, not sure why I missed this.

...

...
DEFINE_PER_CPU(struct cpuidle_device, omap3_idle_dev); @@ -337,7 +329,8 @@ static inline void _fill_cstate(struct cpuidle_driver *drv, state->exit_latency = cpuidle_params_table[idx].exit_latency; state->target_residency = cpuidle_params_table[idx].target_residency; state->flags = CPUIDLE_FLAG_TIME_VALID;

state->enter = omap3_enter_idle_bm;

state->pre_enter = omap3_enter_idle_bm;

state->enter = omap3_enter_idle;

sprintf(state->name, "C%d", idx + 1); strncpy(state->desc, descr, CPUIDLE_DESC_LEN);

...

Also the line at 373 is not needed anymore in omap3_idle_init, since the enter callback is filled in in the _fill_cstate function: /* C1 . MPU WFI + Core active */ _fill_cstate(drv, 0, "MPU ON + CORE ON"); (&drv->states[0])->enter = omap3_enter_idle; <== not needed anymore drv->safe_state_index = 0;

More testing on OMAP3 is needed. Let me come back with the results asap.

Understand and thanks for reviewing.

...

Regards, Jean

Robert Lee

1 Feb 1 Feb

3 a.m.

New subject: [RFC PATCH v4 3/4] acpi: Remove cpuidle timekeeping and irq enable/disable

Signed-off-by: Robert Lee rob.lee@linaro.org --- drivers/acpi/processor_idle.c | 203 ++++++++++++++++++++++++----------------- 1 files changed, 119 insertions(+), 84 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 0e8e2de..7182b7e 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -139,7 +139,6 @@ static void acpi_safe_halt(void) smp_mb(); if (!need_resched()) { safe_halt(); - local_irq_disable(); } current_thread_info()->status |= TS_POLLING; } @@ -730,70 +729,74 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx) }

/** - * acpi_idle_enter_c1 - enters an ACPI C1 state-type + * acpi_idle_pre_enter_c1 - prepares an ACPI C1 state-type * @dev: the target CPU * @drv: cpuidle driver containing cpuidle state info * @index: index of target state - * - * This is equivalent to the HALT instruction. */ -static int acpi_idle_enter_c1(struct cpuidle_device *dev, +static int acpi_idle_pre_enter_c1(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - ktime_t kt1, kt2; - s64 idle_time; - struct acpi_processor *pr; - struct cpuidle_state_usage *state_usage = &dev->states_usage[index]; - struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage); - - pr = __this_cpu_read(processors); - dev->last_residency = 0; + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

if (unlikely(!pr)) return -EINVAL;

- local_irq_disable(); - lapic_timer_state_broadcast(pr, cx, 1); - kt1 = ktime_get_real(); + + return index; +} + +/** + * acpi_idle_enter_c1 - enters an ACPI C1 state-type + * @dev: the target CPU + * @drv: cpuidle driver containing cpuidle state info + * @index: index of target state + * + * This is equivalent to the HALT instruction. + */ +static int acpi_idle_enter_c1(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + acpi_idle_do_entry(cx); - kt2 = ktime_get_real(); - idle_time = ktime_to_us(ktime_sub(kt2, kt1));

- /* Update device last_residency*/ - dev->last_residency = (int)idle_time; + return index; +} +/** + * acpi_idle_enter_c1 - post ACPI C1 state-type cleanup + * @dev: the target CPU + * @drv: cpuidle driver containing cpuidle state info + * @index: index of target state + */ +static int acpi_idle_post_enter_c1(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

- local_irq_enable(); - cx->usage++; lapic_timer_state_broadcast(pr, cx, 0);

- return index; + return 0; }

/** - * acpi_idle_enter_simple - enters an ACPI state without BM handling + * acpi_idle_preenter_simple - prepare an ACPI state without BM handling * @dev: the target CPU * @drv: cpuidle driver with cpuidle state information * @index: the index of suggested state */ -static int acpi_idle_enter_simple(struct cpuidle_device *dev, +static int acpi_idle_pre_enter_simple(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - struct acpi_processor *pr; - struct cpuidle_state_usage *state_usage = &dev->states_usage[index]; - struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage); - ktime_t kt1, kt2; - s64 idle_time_ns; - s64 idle_time; - - pr = __this_cpu_read(processors); - dev->last_residency = 0; + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

if (unlikely(!pr)) return -EINVAL;

- local_irq_disable(); - if (cx->entry_method != ACPI_CSTATE_FFH) { current_thread_info()->status &= ~TS_POLLING; /* @@ -804,7 +807,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,

if (unlikely(need_resched())) { current_thread_info()->status |= TS_POLLING; - local_irq_enable(); return -EINVAL; } } @@ -818,56 +820,65 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev, if (cx->type == ACPI_STATE_C3) ACPI_FLUSH_CPU_CACHE();

- kt1 = ktime_get_real(); /* Tell the scheduler that we are going deep-idle: */ sched_clock_idle_sleep_event(); + + return index; +} + +/** + * acpi_idle_enter_simple - enters an ACPI state without BM handling + * @dev: the target CPU + * @drv: cpuidle driver with cpuidle state information + * @index: the index of suggested state + */ +static int acpi_idle_enter_simple(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + acpi_idle_do_entry(cx); - kt2 = ktime_get_real(); - idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1)); - idle_time = idle_time_ns; - do_div(idle_time, NSEC_PER_USEC);

- /* Update device last_residency*/ - dev->last_residency = (int)idle_time; + sched_clock_idle_wakeup_event(0);

- /* Tell the scheduler how much we idled: */ - sched_clock_idle_wakeup_event(idle_time_ns); + return index; +} + + +/** + * acpi_idle_post_enter_simple - ACPI state without BM handling cleanup + * @dev: the target CPU + * @drv: cpuidle driver with cpuidle state information + * @index: the index of suggested state + */ +static int acpi_idle_post_enter_simple(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

- local_irq_enable(); if (cx->entry_method != ACPI_CSTATE_FFH) current_thread_info()->status |= TS_POLLING;

- cx->usage++; - lapic_timer_state_broadcast(pr, cx, 0); - cx->time += idle_time; - return index; + + return 0; }

static int c3_cpu_count; static DEFINE_RAW_SPINLOCK(c3_lock);

/** - * acpi_idle_enter_bm - enters C3 with proper BM handling + * acpi_idle_pre_enter_bm - runs checks and prepares for C3 * @dev: the target CPU * @drv: cpuidle driver containing state data * @index: the index of suggested state - * - * If BM is detected, the deepest non-C3 idle state is entered instead. */ -static int acpi_idle_enter_bm(struct cpuidle_device *dev, +static int acpi_idle_pre_enter_bm(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - struct acpi_processor *pr; - struct cpuidle_state_usage *state_usage = &dev->states_usage[index]; - struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage); - ktime_t kt1, kt2; - s64 idle_time_ns; - s64 idle_time; - - - pr = __this_cpu_read(processors); - dev->last_residency = 0; + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

if (unlikely(!pr)) return -EINVAL; @@ -877,15 +888,11 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, return drv->states[drv->safe_state_index].enter(dev, drv, drv->safe_state_index); } else { - local_irq_disable(); acpi_safe_halt(); - local_irq_enable(); return -EINVAL; } }

- local_irq_disable(); - if (cx->entry_method != ACPI_CSTATE_FFH) { current_thread_info()->status &= ~TS_POLLING; /* @@ -896,7 +903,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,

if (unlikely(need_resched())) { current_thread_info()->status |= TS_POLLING; - local_irq_enable(); return -EINVAL; } } @@ -911,7 +917,22 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, */ lapic_timer_state_broadcast(pr, cx, 1);

- kt1 = ktime_get_real(); + return index; +} + +/** + * acpi_idle_enter_bm - enters C3 with proper BM handling + * @dev: the target CPU + * @drv: cpuidle driver containing state data + * @index: the index of suggested state + * + * If BM is detected, the deepest non-C3 idle state is entered instead. + */ +static int acpi_idle_enter_bm(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors); /* * disable bus master * bm_check implies we need ARB_DIS @@ -942,26 +963,30 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, c3_cpu_count--; raw_spin_unlock(&c3_lock); } - kt2 = ktime_get_real(); - idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1)); - idle_time = idle_time_ns; - do_div(idle_time, NSEC_PER_USEC);

- /* Update device last_residency*/ - dev->last_residency = (int)idle_time; + sched_clock_idle_wakeup_event(0);

- /* Tell the scheduler how much we idled: */ - sched_clock_idle_wakeup_event(idle_time_ns); + return index; +} + +/** + * acpi_idle_post_enter_bm - cleanup after exiting C3 + * @dev: the target CPU + * @drv: cpuidle driver containing state data + * @index: the index of suggested state + */ +static int acpi_idle_post_enter_bm(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + struct acpi_processor_cx *cx = dev->states_usage[index].driver_data; + struct acpi_processor *pr = __this_cpu_read(processors);

- local_irq_enable(); if (cx->entry_method != ACPI_CSTATE_FFH) current_thread_info()->status |= TS_POLLING;

- cx->usage++; - lapic_timer_state_broadcast(pr, cx, 0); - cx->time += idle_time; - return index; + + return 0; }

struct cpuidle_driver acpi_idle_driver = { @@ -1076,21 +1101,31 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr) if (cx->entry_method == ACPI_CSTATE_FFH) state->flags |= CPUIDLE_FLAG_TIME_VALID;

+ state->pre_enter = acpi_idle_pre_enter_c1; state->enter = acpi_idle_enter_c1; + state->enter = acpi_idle_post_enter_c1; drv->safe_state_index = count; break;

case ACPI_STATE_C2: state->flags |= CPUIDLE_FLAG_TIME_VALID; + state->pre_enter = acpi_idle_pre_enter_simple; state->enter = acpi_idle_enter_simple; + state->post_enter = acpi_idle_post_enter_simple; drv->safe_state_index = count; break;

case ACPI_STATE_C3: state->flags |= CPUIDLE_FLAG_TIME_VALID; + state->pre_enter = pr->flags.bm_check ? + acpi_idle_pre_enter_bm : + acpi_idle_pre_enter_simple; state->enter = pr->flags.bm_check ? acpi_idle_enter_bm : acpi_idle_enter_simple; + state->post_enter = pr->flags.bm_check ? + acpi_idle_post_enter_bm : + acpi_idle_post_enter_simple; break; }

-- 1.7.1

Robert Lee

3 a.m.

New subject: [RFC PATCH v4 4/4] x86: Remove cpuidle timekeeping and irq enable/disable

Signed-off-by: Robert Lee rob.lee@linaro.org --- drivers/idle/intel_idle.c | 110 ++++++++++++++++++++++++++++++++------------- 1 files changed, 79 insertions(+), 31 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 20bce51..482deb6 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -82,8 +82,14 @@ static unsigned int mwait_substates; static unsigned int lapic_timer_reliable_states = (1 << 1); /* Default to only C1 */

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; + +static int intel_pre_idle(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index); static int intel_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index); +static int intel_post_idle(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index); +

static struct cpuidle_state *cpuidle_state_table;

@@ -114,21 +120,27 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = { .flags = CPUIDLE_FLAG_TIME_VALID, .exit_latency = 3, .target_residency = 6, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C2 */ .name = "C3-NHM", .desc = "MWAIT 0x10", .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 20, .target_residency = 80, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C3 */ .name = "C6-NHM", .desc = "MWAIT 0x20", .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 200, .target_residency = 800, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, };

static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = { @@ -139,28 +151,36 @@ static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = { .flags = CPUIDLE_FLAG_TIME_VALID, .exit_latency = 1, .target_residency = 1, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C2 */ .name = "C3-SNB", .desc = "MWAIT 0x10", .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 80, .target_residency = 211, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C3 */ .name = "C6-SNB", .desc = "MWAIT 0x20", .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 104, .target_residency = 345, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C4 */ .name = "C7-SNB", .desc = "MWAIT 0x30", .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 109, .target_residency = 345, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, };

static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = { @@ -171,14 +191,18 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = { .flags = CPUIDLE_FLAG_TIME_VALID, .exit_latency = 1, .target_residency = 4, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C2 */ .name = "C2-ATM", .desc = "MWAIT 0x10", .flags = CPUIDLE_FLAG_TIME_VALID, .exit_latency = 20, .target_residency = 80, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C3 */ }, { /* MWAIT C4 */ .name = "C4-ATM", @@ -186,7 +210,9 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = { .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 100, .target_residency = 400, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, { /* MWAIT C5 */ }, { /* MWAIT C6 */ .name = "C6-ATM", @@ -194,7 +220,9 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = { .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, .exit_latency = 140, .target_residency = 560, - .enter = &intel_idle }, + .pre_enter = &intel_pre_idle, + .enter = &intel_idle, + .post_enter = &intel_post_idle }, };

static long get_driver_data(int cstate) @@ -227,24 +255,20 @@ static long get_driver_data(int cstate) }

/** - * intel_idle + * intel_pre_idle * @dev: cpuidle_device * @drv: cpuidle driver * @index: index of cpuidle state * * Must be called under local_irq_disable(). */ -static int intel_idle(struct cpuidle_device *dev, +static int intel_pre_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - unsigned long ecx = 1; /* break on interrupt flag */ struct cpuidle_state *state = &drv->states[index]; - struct cpuidle_state_usage *state_usage = &dev->states_usage[index]; - unsigned long eax = (unsigned long)cpuidle_get_statedata(state_usage); + const unsigned long eax = + (unsigned long)dev->states_usage[index].driver_data; unsigned int cstate; - ktime_t kt_before, kt_after; - s64 usec_delta; - int cpu = smp_processor_id();

cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;

@@ -253,12 +277,27 @@ static int intel_idle(struct cpuidle_device *dev, * for flushing the user TLB's associated with the active mm. */ if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED) - leave_mm(cpu); + leave_mm(dev->cpu);

if (!(lapic_timer_reliable_states & (1 << (cstate)))) - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu);

- kt_before = ktime_get_real(); + return index; +} + +/** + * intel_idle + * @dev: cpuidle_device + * @drv: cpuidle driver + * @index: index of cpuidle state + * + * Must be called under local_irq_disable(). + */ +static int intel_idle(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + const unsigned long eax = + (unsigned long)dev->states_usage[index].driver_data;

stop_critical_timings(); if (!need_resched()) { @@ -266,21 +305,30 @@ static int intel_idle(struct cpuidle_device *dev, __monitor((void *)&current_thread_info()->flags, 0, 0); smp_mb(); if (!need_resched()) - __mwait(eax, ecx); + __mwait(eax, 1); } - start_critical_timings();

- kt_after = ktime_get_real(); - usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before)); + return index; +}

- local_irq_enable(); +/** + * intel_post_idle + * @dev: cpuidle_device + * @drv: cpuidle driver + * @index: index of cpuidle state + */ +static int intel_post_idle(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + const unsigned long eax = + (unsigned long)dev->states_usage[index].driver_data; + unsigned int cstate;

- if (!(lapic_timer_reliable_states & (1 << (cstate)))) - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); + cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;

- /* Update cpuidle counters */ - dev->last_residency = (int)usec_delta; + if (!(lapic_timer_reliable_states & (1 << (cstate)))) + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

return index; }

-- 1.7.1

Rob Lee

10 Feb 10 Feb

7:32 p.m.

New subject: [RFC PATCH v4 0/4] Consolidate cpuidle timekeeping and irq enabling

Maintainers for drivers/cpuidle, do you have any comments/opinions about this patch?

Intel cpuidle and acpi cpuidle maintainers, do you have any comments/opinions about this patch and the changes to your code?

Any other review and comments welcome.

Summary of positive and negatives as I understand them so far:

version 1, 2, and 3 (Original "wrapper" method of consolidating timekeeping and interrupt enabling) + opportunistically provides consolidation for simple platform cpuidle implementations without disturbing the more complex implementations. By simple, I mean those at can be wrapped in the time keeping calls and interrupt enabling calls without significantly affecting idle time keeping accuracy or interrupt latency - Does not provide consolidation for the more complex platform cpuidle implementations - Adds an additional interface, perhaps unnecessarily if this consolidation could be done in cpuidle

version 4 (modifications to drivers/cpuidle/cpuidle.c cpuidle_idle_call) + Adds consolidation work to cpuidle_idle_call which allows all platform timekeeping / interrupt handling to be consolidated. - Requires splitting up of more complex platform cpuidle implementations, adding further complexity and risk of breaking something. ? Allows both pre_enter or enter to change the idle state. Is there an objection to this? ? Splitting up the enter functions can require additional function calls. Is there any concern that this is significant additional overhead?

Thanks, Rob

On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...

This patch series moves the timekeeping and irq enabling from the platform code to the core cpuidle driver. Also, the platform irq disabling was removed as it appears that all calls into cpuidle_call_idle will have already called local_irq_disable().

To save reviewers time, only a few platforms which required the most changes are included in this version. If these changes are approved, the next version will include the remaining platform code which should require minimal changes.

For those who have followed the previous patch versions, as you know, the previous version of this patch series added some helper functionality which used a wrapper function to remove the time keeping and irq enabling/disabling from the platform code. There was also initialization helper functionality which has now been removed from this version. If the basic implementation in this version is approved, then a separate patch submission effort can be made to focus on consolidation of initialziation functionality.

Based on 3.3-rc1

v3 submission can be found here: http://www.spinics.net/lists/arm-kernel/msg156751.html Changes since v3:

Removed drivers/cpuidle/common.c

** Removed the initialization helper functions ** Removed the wrapper used to consolidate time keeping and irq enable/disable

Add time keeping and local_irq_disable handling in cpuidle_call_idle().

Made necessary modifications to a few platforms that required the most changes

** Note on omap3: changed structure of omap3_idle_drvdata and added per_next_state and per_saved_state vars to accomodate new framework.

v2 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/144199

Changes since v2:

Made various code organization and style changes as suggested in v1 review.

Removed at91 use of common code. A separate effort is underway to clean

at91 code and the author has offered to convert to common interface as part of those changes (if this common interface is accepted in time).

Made platform cpuidle_driver objects __initdata and dynamically added one

persistent instance of this object in common code.

Removed imx5 pm usage of gpc_dvfs clock as it is no longer needed after

being enabled during clock initialization.

Re-organized patches.

v1 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/142791

Changes since v1:

Common interface moved to drivers/cpuidle and made non arch-specific.

Made various fixes and suggested additions to the common cpuidle

code from v1 review.

Added callback for filling in driver_data field as needed.

Modified the various platforms with these changes.

Robert Lee (4): cpuidle: Add time keeping and irq enabling ARM: omap: Remove cpuidle timekeeping and irq enable/disable acpi: Remove cpuidle timekeeping and irq enable/disable x86: Remove cpuidle timekeeping and irq enable/disable

arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++---------- drivers/acpi/processor_idle.c | 203 ++++++++++++++++++++++--------------- drivers/cpuidle/cpuidle.c | 75 +++++++++++--- drivers/idle/intel_idle.c | 110 ++++++++++++++------ include/linux/cpuidle.h | 26 +++-- 5 files changed, 317 insertions(+), 193 deletions(-)

Amit Kucheria

8:06 p.m.

New subject: [RFC PATCH v4 0/4] Consolidate cpuidle timekeeping and irq enabling

On Fri, Feb 10, 2012 at 11:32 AM, Rob Lee rob.lee@linaro.org wrote:

...

Maintainers for drivers/cpuidle, do you have any comments/opinions about this patch?

Venki has changed employers (probably needs a patch to MAINTAINERS?). Cc'ing his new email address.

...

Intel cpuidle and acpi cpuidle maintainers, do you have any comments/opinions about this patch and the changes to your code?

Any other review and comments welcome.

Summary of positive and negatives as I understand them so far:

version 1, 2, and 3 (Original "wrapper" method of consolidating timekeeping and interrupt enabling)

opportunistically provides consolidation for simple platform cpuidle

implementations without disturbing the more complex implementations. By simple, I mean those at can be wrapped in the time keeping calls and interrupt enabling calls without significantly affecting idle time keeping accuracy or interrupt latency

Does not provide consolidation for the more complex platform cpuidle

implementations

Adds an additional interface, perhaps unnecessarily if this

consolidation could be done in cpuidle

version 4 (modifications to drivers/cpuidle/cpuidle.c cpuidle_idle_call)

Adds consolidation work to cpuidle_idle_call which allows all

platform timekeeping / interrupt handling to be consolidated.

Requires splitting up of more complex platform cpuidle

implementations, adding further complexity and risk of breaking something. ? Allows both pre_enter or enter to change the idle state. Is there an objection to this? ? Splitting up the enter functions can require additional function calls. Is there any concern that this is significant additional overhead?

Thanks, Rob

On Tue, Jan 31, 2012 at 7:00 PM, Robert Lee rob.lee@linaro.org wrote:

...
This patch series moves the timekeeping and irq enabling from the platform code to the core cpuidle driver. Also, the platform irq disabling was removed as it appears that all calls into cpuidle_call_idle will have already called local_irq_disable().

To save reviewers time, only a few platforms which required the most changes are included in this version. If these changes are approved, the next version will include the remaining platform code which should require minimal changes.

For those who have followed the previous patch versions, as you know, the previous version of this patch series added some helper functionality which used a wrapper function to remove the time keeping and irq enabling/disabling from the platform code. There was also initialization helper functionality which has now been removed from this version. If the basic implementation in this version is approved, then a separate patch submission effort can be made to focus on consolidation of initialziation functionality.

Based on 3.3-rc1

v3 submission can be found here: http://www.spinics.net/lists/arm-kernel/msg156751.html Changes since v3:

Removed drivers/cpuidle/common.c

** Removed the initialization helper functions ** Removed the wrapper used to consolidate time keeping and irq enable/disable

Add time keeping and local_irq_disable handling in cpuidle_call_idle().

Made necessary modifications to a few platforms that required the most changes

** Note on omap3: changed structure of omap3_idle_drvdata and added per_next_state and per_saved_state vars to accomodate new framework.

v2 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/144199

Changes since v2:

Made various code organization and style changes as suggested in v1 review.

Removed at91 use of common code. A separate effort is underway to clean

at91 code and the author has offered to convert to common interface as part of those changes (if this common interface is accepted in time).

Made platform cpuidle_driver objects __initdata and dynamically added one

persistent instance of this object in common code.

Removed imx5 pm usage of gpc_dvfs clock as it is no longer needed after

being enabled during clock initialization.

Re-organized patches.

v1 submission can be found here: http://comments.gmane.org/gmane.linux.ports.arm.kernel/142791

Changes since v1:

Common interface moved to drivers/cpuidle and made non arch-specific.

Made various fixes and suggested additions to the common cpuidle

code from v1 review.

Added callback for filling in driver_data field as needed.

Modified the various platforms with these changes.

Robert Lee (4): cpuidle: Add time keeping and irq enabling ARM: omap: Remove cpuidle timekeeping and irq enable/disable acpi: Remove cpuidle timekeeping and irq enable/disable x86: Remove cpuidle timekeeping and irq enable/disable

arch/arm/mach-omap2/cpuidle34xx.c | 96 ++++++++---------- drivers/acpi/processor_idle.c | 203 ++++++++++++++++++++++--------------- drivers/cpuidle/cpuidle.c | 75 +++++++++++--- drivers/idle/intel_idle.c | 110 ++++++++++++++------ include/linux/cpuidle.h | 26 +++-- 5 files changed, 317 insertions(+), 193 deletions(-)

Colin Cross

22 Feb 22 Feb

8:52 p.m.

New subject: [RFC PATCH v4 0/4] Consolidate cpuidle timekeeping and irq enabling

On Fri, Feb 10, 2012 at 11:32 AM, Rob Lee rob.lee@linaro.org wrote:

...

Maintainers for drivers/cpuidle, do you have any comments/opinions about this patch?

Intel cpuidle and acpi cpuidle maintainers, do you have any comments/opinions about this patch and the changes to your code?

Any other review and comments welcome.

Summary of positive and negatives as I understand them so far:

version 1, 2, and 3 (Original "wrapper" method of consolidating timekeeping and interrupt enabling)

opportunistically provides consolidation for simple platform cpuidle

implementations without disturbing the more complex implementations. By simple, I mean those at can be wrapped in the time keeping calls and interrupt enabling calls without significantly affecting idle time keeping accuracy or interrupt latency

Does not provide consolidation for the more complex platform cpuidle

implementations

Adds an additional interface, perhaps unnecessarily if this

consolidation could be done in cpuidle

version 4 (modifications to drivers/cpuidle/cpuidle.c cpuidle_idle_call)

Adds consolidation work to cpuidle_idle_call which allows all

platform timekeeping / interrupt handling to be consolidated.

I think the question of what the timekeeping means needs to be considered. If the timekeeping is supposed to be a very accurate measurement of the time spent in the low power idle state, only the cpuidle driver can guarantee that - there may be significant time spent in the hardware transition or the very low level power code that cannot be split into pre_enter, but should not be counted in the timekeeping. Or there may be a long boot time out of the low power state that should not be counted. If it is just a rough estimate of how often the cpu is getting to idle, there is no need to split out the pre_enter time - just measure the time around the entire driver enter call. Either way, pre_enter doesn't seem useful.

...

Requires splitting up of more complex platform cpuidle

implementations, adding further complexity and risk of breaking something. ? Allows both pre_enter or enter to change the idle state. Is there an objection to this?

pre_enter (if it is kept) would probably have to support state demotion, because its actions may depend on the final state. For coupled SMP cpuidle, enter also has to support state demotion, because the final state will depend on actions of the other cpu after idle has been entered.

...

? Splitting up the enter functions can require additional function calls. Is there any concern that this is significant additional overhead?

I don't think so, especially if you support NULL pre_enter and post_enter functions to allow drivers that care to skip them.

Rob Lee

23 Feb 23 Feb

6:39 a.m.

New subject: [RFC PATCH v4 0/4] Consolidate cpuidle timekeeping and irq enabling

On Wed, Feb 22, 2012 at 2:52 PM, Colin Cross ccross@google.com wrote:

...

On Fri, Feb 10, 2012 at 11:32 AM, Rob Lee rob.lee@linaro.org wrote:

...
Maintainers for drivers/cpuidle, do you have any comments/opinions about this patch?

Intel cpuidle and acpi cpuidle maintainers, do you have any comments/opinions about this patch and the changes to your code?

Any other review and comments welcome.

Summary of positive and negatives as I understand them so far:

version 1, 2, and 3 (Original "wrapper" method of consolidating timekeeping and interrupt enabling)

opportunistically provides consolidation for simple platform cpuidle

implementations without disturbing the more complex implementations. By simple, I mean those at can be wrapped in the time keeping calls and interrupt enabling calls without significantly affecting idle time keeping accuracy or interrupt latency

Does not provide consolidation for the more complex platform cpuidle

implementations

Adds an additional interface, perhaps unnecessarily if this

consolidation could be done in cpuidle

version 4 (modifications to drivers/cpuidle/cpuidle.c cpuidle_idle_call)

Adds consolidation work to cpuidle_idle_call which allows all

platform timekeeping / interrupt handling to be consolidated.

I think the question of what the timekeeping means needs to be considered. If the timekeeping is supposed to be a very accurate measurement of the time spent in the low power idle state, only the cpuidle driver can guarantee that - there may be significant time spent in the hardware transition or the very low level power code that cannot be split into pre_enter, but should not be counted in the timekeeping. Or there may be a long boot time out of the low power state that should not be counted. If it is just a rough estimate of how often the cpu is getting to idle, there is no need to split out the pre_enter time - just measure the time around the entire driver enter call. Either way, pre_enter doesn't seem useful.

...

Requires splitting up of more complex platform cpuidle

implementations, adding further complexity and risk of breaking something. ? Allows both pre_enter or enter to change the idle state. Is there an objection to this?

pre_enter (if it is kept) would probably have to support state demotion, because its actions may depend on the final state. For coupled SMP cpuidle, enter also has to support state demotion, because the final state will depend on actions of the other cpu after idle has been entered.

...
? Splitting up the enter functions can require additional function calls. Is there any concern that this is significant additional overhead?

I don't think so, especially if you support NULL pre_enter and post_enter functions to allow drivers that care to skip them.

Colin, thanks for your comments. For now, my plan is to release a v5 buy the end of this week that is a continuation of v3 (using an optional wrapper for consolidation of time keeping and irq enabling). The slightly non-positive feedback for the OMAP3 changes and the lack of feedback from the intel and acpi cpuidle maintainers have swayed me toward this direction. I can continue working towards a solution in cpuidle_idle_call after this patch series if folks voice a preference for doing so.

5059

days inactive

5081

days old

linaro-dev@lists.linaro.org

16 comments

participants

tags (0)

participants (6)

Amit Kucheria
Colin Cross
Jean Pihet
Rob Lee
Robert Lee
Turquette, Mike