[PATCH 0/3] Create sched_select_cpu() and use it in workqueues

List overview All Threads
Download

newer

older

Call for linux-linaro /...

A few more snowballs

Viresh Kumar

25 Sep 2012 25 Sep '12

10:36 a.m.

In order to save power, it would be useful to schedule work onto non-IDLE cpus instead of waking up an IDLE one.

To achieve this, we need scheduler to guide kernel frameworks (like: timers & workqueues) on which is the most preferred CPU that must be used for these tasks.

This patchset is about implementing this concept.

The first patch adds sched_select_cpu() routine which returns the preferred cpu which is non-idle. It accepts max level of sched domain, upto which we can choose a CPU from. It can accept following options: SD_SIBLING, SD_MC, SD_BOOK, SD_CPU or SD_NUMA.

Second and Third patch are about adapting this change in workqueue framework.

Earlier discussions over this concept were done at last LPC: http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-...

Figures: --------

Test case 1: - Performed on TC2 with ubuntu-devel - Boot TC2 and run $ trace-cmd record -e workqueue_execute_start

This will trace only the points, where the work actually runs.

Do, this for 150 seconds.

Results: --------- Domain 0: CPU 0-1 Domain 1: CPU 2-4

Base Kernel: Without my modifications: -------------------------------------

CPU No. of works run by CPU ----- ----------------------- CPU0: 7 CPU1: 445 CPU2: 444 CPU3: 315 CPU4: 226

With my modifications: ----------------------

CPU No. of works run by CPU ---- ----------------------- CPU0: 31 CPU2: 797 CPU3: 274 CPU4: 86

Test case 2: ----------- I have created a small module, which does following: - Create one work for each CPU (using queue_work_on(), so must schedule on that cpu) - Above work, will queue "n" works for each cpu with queue_work(). These works are tracked within the module and results are printed at the end.

This gave similar results, with n ranging from 10 to 1000.

Viresh Kumar (3): sched: Create sched_select_cpu() to give preferred CPU for power saving workqueue: create __flush_delayed_work to avoid duplicating code workqueue: Schedule work on non-idle cpu instead of current one

arch/arm/Kconfig | 11 +++++++ include/linux/sched.h | 11 +++++++ kernel/sched/core.c | 88 +++++++++++++++++++++++++++++++++++++++------------ kernel/workqueue.c | 36 ++++++++++++++------- 4 files changed, 115 insertions(+), 31 deletions(-)

-- 1.7.12.rc2.18.g61b472e

Show replies by date

Viresh Kumar

25 Sep 25 Sep

10:36 a.m.

New subject: [PATCH 1/3] sched: Create sched_select_cpu() to give preferred CPU for power saving

In order to save power, it would be useful to schedule work onto non-IDLE cpus instead of waking up an IDLE one.

To achieve this, we need scheduler to guide kernel frameworks (like: timers & workqueues) on which is the most preferred CPU that must be used for these tasks.

This routine returns the preferred cpu which is non-idle. It accepts max level of sched domain, upto which we can choose a CPU from. It can accept following options: SD_SIBLING, SD_MC, SD_BOOK, SD_CPU or SD_NUMA.

If user passed SD_MC, then we can return a CPU from SD_SIBLING or SD_MC. If the level requested by user is not available for the current kernel configuration, then current CPU will be returned.

If user has passed NUMA level, then we may need to go through numa_levels too. Second parameter to this routine will now come into play. Its minimum value is zero, in which case there is only one NUMA level to go through. If you want to go through all NUMA levels, pass -1 here. This should cover all NUMA levels.

This patch reuses the code from get_nohz_timer_target() routine, which had similar implementation. get_nohz_timer_target() is also modified to use sched_select_cpu() now.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/sched.h | 11 +++++++ kernel/sched/core.c | 88 +++++++++++++++++++++++++++++++++++++++------------ 2 files changed, 79 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h index 0059212..4b660ee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -281,6 +281,10 @@ static inline void select_nohz_load_balancer(int stop_tick) { } static inline void set_cpu_sd_state_idle(void) { } #endif

+#ifdef CONFIG_SMP +extern int sched_select_cpu(int sd_max_level, u32 numa_level); +#endif + /* * Only dump TASK_* tasks. (0 for all tasks) */ @@ -868,6 +872,13 @@ enum cpu_idle_type { #define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */ #define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */

+/* sched-domain levels */ +#define SD_SIBLING 0x01 /* Only for CONFIG_SCHED_SMT */ +#define SD_MC 0x02 /* Only for CONFIG_SCHED_MC */ +#define SD_BOOK 0x04 /* Only for CONFIG_SCHED_BOOK */ +#define SD_CPU 0x08 /* Always enabled */ +#define SD_NUMA 0x10 /* Only for CONFIG_NUMA */ + extern int __weak arch_sd_sibiling_asym_packing(void);

struct sched_group_power { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index de97083..a14014c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -551,22 +551,7 @@ void resched_cpu(int cpu) */ int get_nohz_timer_target(void) { - int cpu = smp_processor_id(); - int i; - struct sched_domain *sd; - - rcu_read_lock(); - for_each_domain(cpu, sd) { - for_each_cpu(i, sched_domain_span(sd)) { - if (!idle_cpu(i)) { - cpu = i; - goto unlock; - } - } - } -unlock: - rcu_read_unlock(); - return cpu; + return sched_select_cpu(SD_NUMA, -1); } /* * When add_timer_on() enqueues a timer into the timer wheel of an @@ -639,6 +624,66 @@ void sched_avg_update(struct rq *rq) } }

+/* Mask of all the SD levels present in current configuration */ +static int sd_present_levels; + +/* + * This routine returns the preferred cpu which is non-idle. It accepts max + * level of sched domain, upto which we can choose a CPU from. It can accept + * following options: SD_SIBLING, SD_MC, SD_BOOK, SD_CPU or SD_NUMA. + * + * If user passed SD_MC, then we can return a CPU from SD_SIBLING or SD_MC. + * If the level requested by user is not available for the current kernel + * configuration, then current CPU will be returned. + * + * If user has passed NUMA level, then we may need to go through numa_levels + * too. Second parameter to this routine will now come into play. Its minimum + * value is zero, in which case there is only one NUMA level to go through. If + * you want to go through all NUMA levels, pass -1 here. This should cover all + * NUMA levels. + */ +int sched_select_cpu(int sd_max_level, u32 numa_level) +{ + struct sched_domain *sd; + int cpu = smp_processor_id(); + int i, sd_target_levels; + + sd_target_levels = (sd_max_level | (sd_max_level - 1)) + & sd_present_levels; + + /* return current cpu if no sd_present_levels <= sd_max_level */ + if (!sd_target_levels) + return cpu; + + rcu_read_lock(); + for_each_domain(cpu, sd) { + for_each_cpu(i, sched_domain_span(sd)) { + if (!idle_cpu(i)) { + cpu = i; + goto unlock; + } + } + + /* Do we need to go through NUMA levels now */ + if (sd_target_levels == SD_NUMA) { + /* Go through NUMA levels until numa_level is zero */ + if (numa_level--) + continue; + } + + /* + * clear first bit set in sd_target_levels, and return if no + * more sd levels must be checked + */ + sd_target_levels &= sd_target_levels - 1; + if (!sd_target_levels) + goto unlock; + } +unlock: + rcu_read_unlock(); + return cpu; +} + #else /* !CONFIG_SMP */ void resched_task(struct task_struct *p) { @@ -6188,6 +6233,7 @@ typedef const struct cpumask *(*sched_domain_mask_f)(int cpu); struct sched_domain_topology_level { sched_domain_init_f init; sched_domain_mask_f mask; + int level_mask; int flags; int numa_level; struct sd_data data; @@ -6434,6 +6480,7 @@ sd_init_##type(struct sched_domain_topology_level *tl, int cpu) \ *sd = SD_##type##_INIT; \ SD_INIT_NAME(sd, type); \ sd->private = &tl->data; \ + sd_present_levels |= tl->level_mask; \ return sd; \ }

@@ -6547,15 +6594,15 @@ static const struct cpumask *cpu_smt_mask(int cpu) */ static struct sched_domain_topology_level default_topology[] = { #ifdef CONFIG_SCHED_SMT - { sd_init_SIBLING, cpu_smt_mask, }, + { sd_init_SIBLING, cpu_smt_mask, SD_SIBLING, }, #endif #ifdef CONFIG_SCHED_MC - { sd_init_MC, cpu_coregroup_mask, }, + { sd_init_MC, cpu_coregroup_mask, SD_MC, }, #endif #ifdef CONFIG_SCHED_BOOK - { sd_init_BOOK, cpu_book_mask, }, + { sd_init_BOOK, cpu_book_mask, SD_BOOK, }, #endif - { sd_init_CPU, cpu_cpu_mask, }, + { sd_init_CPU, cpu_cpu_mask, SD_CPU, }, { NULL, }, };

@@ -6778,6 +6825,7 @@ static void sched_init_numa(void) }; }

+ sd_present_levels |= SD_NUMA; sched_domain_topology = tl; } #else

-- 1.7.12.rc2.18.g61b472e

Peter Zijlstra

10:52 a.m.

New subject: [PATCH 1/3] sched: Create sched_select_cpu() to give preferred CPU for power saving

On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:

...

+/* sched-domain levels */ +#define SD_SIBLING 0x01 /* Only for CONFIG_SCHED_SMT */ +#define SD_MC 0x02 /* Only for CONFIG_SCHED_MC */ +#define SD_BOOK 0x04 /* Only for CONFIG_SCHED_BOOK */ +#define SD_CPU 0x08 /* Always enabled */ +#define SD_NUMA 0x10 /* Only for CONFIG_NUMA */

Urgh, no, not more of that nonsense.. I want to get rid of that hardcoded stuff, not add more of it.

Viresh Kumar

10:36 a.m.

New subject: [PATCH 2/3] workqueue: create __flush_delayed_work to avoid duplicating code

flush_delayed_work() and flush_delayed_work_sync() had major portion of code similar. This patch introduces another routine __flush_delayed_work() which contains the common part to avoid code duplication.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- kernel/workqueue.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 692d976..692a55b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2820,6 +2820,13 @@ bool cancel_work_sync(struct work_struct *work) } EXPORT_SYMBOL_GPL(cancel_work_sync);

+static inline void __flush_delayed_work(struct delayed_work *dwork) +{ + if (del_timer_sync(&dwork->timer)) + __queue_work(raw_smp_processor_id(), + get_work_cwq(&dwork->work)->wq, &dwork->work); +} + /** * flush_delayed_work - wait for a dwork to finish executing the last queueing * @dwork: the delayed work to flush @@ -2834,9 +2841,7 @@ EXPORT_SYMBOL_GPL(cancel_work_sync); */ bool flush_delayed_work(struct delayed_work *dwork) { - if (del_timer_sync(&dwork->timer)) - __queue_work(raw_smp_processor_id(), - get_work_cwq(&dwork->work)->wq, &dwork->work); + __flush_delayed_work(dwork); return flush_work(&dwork->work); } EXPORT_SYMBOL(flush_delayed_work); @@ -2855,9 +2860,7 @@ EXPORT_SYMBOL(flush_delayed_work); */ bool flush_delayed_work_sync(struct delayed_work *dwork) { - if (del_timer_sync(&dwork->timer)) - __queue_work(raw_smp_processor_id(), - get_work_cwq(&dwork->work)->wq, &dwork->work); + __flush_delayed_work(dwork); return flush_work_sync(&dwork->work); } EXPORT_SYMBOL(flush_delayed_work_sync);

-- 1.7.12.rc2.18.g61b472e

Tejun Heo

5:47 p.m.

New subject: [PATCH 2/3] workqueue: create __flush_delayed_work to avoid duplicating code

On Tue, Sep 25, 2012 at 04:06:07PM +0530, Viresh Kumar wrote:

...

flush_delayed_work() and flush_delayed_work_sync() had major portion of code similar. This patch introduces another routine __flush_delayed_work() which contains the common part to avoid code duplication.

This part has seen a lot of update in pending wq/for-3.7 branch. Please rebase on top of that.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

Thanks.

-- tejun

Viresh Kumar

26 Sep 26 Sep

4:22 a.m.

New subject: [PATCH 2/3] workqueue: create __flush_delayed_work to avoid duplicating code

On 25 September 2012 23:17, Tejun Heo tj@kernel.org wrote:

...

On Tue, Sep 25, 2012 at 04:06:07PM +0530, Viresh Kumar wrote:

...
flush_delayed_work() and flush_delayed_work_sync() had major portion of code similar. This patch introduces another routine __flush_delayed_work() which contains the common part to avoid code duplication.

This part has seen a lot of update in pending wq/for-3.7 branch. Please rebase on top of that.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

So, this patch is not required anymore. As they are already merged :)

Viresh Kumar

25 Sep 25 Sep

10:36 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

Workqueues queues work on current cpu, if the caller haven't passed a preferred cpu. This may wake up an idle CPU, which is actually not required.

This work can be processed by any CPU and so we must select a non-idle CPU here. This patch adds in support in workqueue framework to get preferred CPU details from the scheduler, instead of using current CPU.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- arch/arm/Kconfig | 11 +++++++++++ kernel/workqueue.c | 25 ++++++++++++++++++------- 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5944511..da17bd0 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1594,6 +1594,17 @@ config HMP_SLOW_CPU_MASK Specify the cpuids of the slow CPUs in the system as a list string, e.g. cpuid 0+1 should be specified as 0-1.

+config MIGRATE_WQ + bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu" + depends on SMP && EXPERIMENTAL + help + Workqueues queues work on current cpu, if the caller haven't passed a + preferred cpu. This may wake up an idle CPU, which is actually not + required. This work can be processed by any CPU and so we must select + a non-idle CPU here. This patch adds in support in workqueue + framework to get preferred CPU details from the scheduler, instead of + using current CPU. + config HAVE_ARM_SCU bool help diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 692a55b..fd8df4a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -456,6 +456,16 @@ static inline void debug_work_activate(struct work_struct *work) { } static inline void debug_work_deactivate(struct work_struct *work) { } #endif

+/* This enables migration of a work to a non-IDLE cpu instead of current cpu */ +#ifdef CONFIG_MIGRATE_WQ +static int wq_select_cpu(void) +{ + return sched_select_cpu(SD_NUMA, -1); +} +#else +#define wq_select_cpu() smp_processor_id() +#endif + /* Serializes the accesses to the list of workqueues. */ static DEFINE_SPINLOCK(workqueue_lock); static LIST_HEAD(workqueues); @@ -995,7 +1005,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq, struct global_cwq *last_gcwq;

if (unlikely(cpu == WORK_CPU_UNBOUND)) - cpu = raw_smp_processor_id(); + cpu = wq_select_cpu();

/* * It's multi cpu. If @wq is non-reentrant and @work @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;

- ret = queue_work_on(get_cpu(), wq, work); - put_cpu(); + preempt_disable(); + ret = queue_work_on(wq_select_cpu(), wq, work); + preempt_enable();

return ret; } @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long __data) struct delayed_work *dwork = (struct delayed_work *)__data; struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);

- __queue_work(smp_processor_id(), cwq->wq, &dwork->work); + __queue_work(wq_select_cpu(), cwq->wq, &dwork->work); }

/** @@ -1158,7 +1169,7 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND) lcpu = gcwq->cpu; else - lcpu = raw_smp_processor_id(); + lcpu = wq_select_cpu(); } else lcpu = WORK_CPU_UNBOUND;

@@ -2823,8 +2834,8 @@ EXPORT_SYMBOL_GPL(cancel_work_sync); static inline void __flush_delayed_work(struct delayed_work *dwork) { if (del_timer_sync(&dwork->timer)) - __queue_work(raw_smp_processor_id(), - get_work_cwq(&dwork->work)->wq, &dwork->work); + __queue_work(wq_select_cpu(), get_work_cwq(&dwork->work)->wq, + &dwork->work); }

/**

-- 1.7.12.rc2.18.g61b472e

Peter Zijlstra

11:22 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:

...

@@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;
  ret = queue_work_on(get_cpu(), wq, work);
  put_cpu();
  preempt_disable();
  ret = queue_work_on(wq_select_cpu(), wq, work);
  preempt_enable();
return ret; }

Right, so the problem I see here is that wq_select_cpu() is horridly expensive..

...

@@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long __data) struct delayed_work *dwork = (struct delayed_work *)__data; struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
  __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
  __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
}

Shouldn't timer migration have sorted this one?

Viresh Kumar

11:30 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On 25 September 2012 16:52, Peter Zijlstra peterz@infradead.org wrote:

...

On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:

...
@@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;
  ret = queue_work_on(get_cpu(), wq, work);
  put_cpu();
  preempt_disable();
  ret = queue_work_on(wq_select_cpu(), wq, work);
  preempt_enable();

  return ret;
}
Right, so the problem I see here is that wq_select_cpu() is horridly expensive..

But this is what the initial idea during LPC we had. Any improvements here you can suggest?

...

...
@@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long __data) struct delayed_work *dwork = (struct delayed_work *)__data; struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
  __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
  __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
}
Shouldn't timer migration have sorted this one?

Maybe yes. Will investigate more on it.

Thanks for your early feedback.

-- viresh

Vincent Guittot

11:38 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On 25 September 2012 13:30, Viresh Kumar viresh.kumar@linaro.org wrote:

...

On 25 September 2012 16:52, Peter Zijlstra peterz@infradead.org wrote:

...
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:

...
@@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;
  ret = queue_work_on(get_cpu(), wq, work);
  put_cpu();
  preempt_disable();
  ret = queue_work_on(wq_select_cpu(), wq, work);
  preempt_enable();

  return ret;
}
Right, so the problem I see here is that wq_select_cpu() is horridly expensive..
But this is what the initial idea during LPC we had. Any improvements here you can suggest?

The main outcome of the LPC was that we should be able to select another CPU than the local one. Using the same policy than timer, is a 1st step to consolidate interface. A next step should be to update the policy of the function

Vincent

...

...
...
@@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long __data) struct delayed_work *dwork = (struct delayed_work *)__data; struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
  __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
  __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
}
Shouldn't timer migration have sorted this one?
Maybe yes. Will investigate more on it.

Thanks for your early feedback.

-- viresh

Peter Zijlstra

11:40 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:

...

But this is what the initial idea during LPC we had.

Yeah.. that's true.

...

Any improvements here you can suggest?

We could uhm... /me tries thinking ... reuse some of the NOHZ magic? Would that be sufficient, not waking a NOHZ cpu, or do you really want not waking any idle cpu?

Peter Zijlstra

11:46 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On Tue, 2012-09-25 at 13:40 +0200, Peter Zijlstra wrote:

...

On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:

...
But this is what the initial idea during LPC we had.

Yeah.. that's true.

...
Any improvements here you can suggest?

We could uhm... /me tries thinking ... reuse some of the NOHZ magic? Would that be sufficient, not waking a NOHZ cpu, or do you really want not waking any idle cpu?

Depending on the trade-off we could have the NOHZ stuff track a non-NOHZ-idle cpu and avoid having to compute one every time we need it.

Tejun Heo

5:56 p.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

Hello,

On Tue, Sep 25, 2012 at 04:06:08PM +0530, Viresh Kumar wrote:

...

+config MIGRATE_WQ

bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
depends on SMP && EXPERIMENTAL
help

 Workqueues queues work on current cpu, if the caller haven't passed a

 preferred cpu. This may wake up an idle CPU, which is actually not

 required. This work can be processed by any CPU and so we must select

 a non-idle CPU here.  This patch adds in support in workqueue

 framework to get preferred CPU details from the scheduler, instead of

```
 using current CPU.
```

I don't think it's a good idea to make behavior like this a config option. The behavior difference is subtle and may induce incorrect behavior.

...

+/* This enables migration of a work to a non-IDLE cpu instead of current cpu */ +#ifdef CONFIG_MIGRATE_WQ +static int wq_select_cpu(void) +{

return sched_select_cpu(SD_NUMA, -1);

+} +#else +#define wq_select_cpu() smp_processor_id() +#endif

/* Serializes the accesses to the list of workqueues. */ static DEFINE_SPINLOCK(workqueue_lock); static LIST_HEAD(workqueues); @@ -995,7 +1005,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq, struct global_cwq *last_gcwq; if (unlikely(cpu == WORK_CPU_UNBOUND))
	cpu = raw_smp_processor_id();
	cpu = wq_select_cpu();
/* * It's multi cpu. If @wq is non-reentrant and @work @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;

ret = queue_work_on(get_cpu(), wq, work);

put_cpu();

preempt_disable();

ret = queue_work_on(wq_select_cpu(), wq, work);

preempt_enable();

First of all, I'm not entirely sure this is safe. queue_work() used to *guarantee* that the work item would execute on the local CPU. I don't think there are many which depend on that but I'd be surprised if this doesn't lead to some subtle problems somewhere. It might not be realistic to audit all users and we might have to just let it happen and watch for the fallouts. Dunno, still wanna see some level of auditing.

Also, I'm wondering why this is necessary at all for workqueues. For schedule/queue_work(), you pretty much know the current cpu is not idle. For delayed workqueue, sure but for immediate scheduling, why?

Thanks.

-- tejun

Viresh Kumar

26 Sep 26 Sep

11:21 a.m.

New subject: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

On 25 September 2012 23:26, Tejun Heo tj@kernel.org wrote:

...

On Tue, Sep 25, 2012 at 04:06:08PM +0530, Viresh Kumar wrote:

...
+config MIGRATE_WQ
bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
depends on SMP && EXPERIMENTAL
help
  Workqueues queues work on current cpu, if the caller haven't passed a
  preferred cpu. This may wake up an idle CPU, which is actually not
  required. This work can be processed by any CPU and so we must select
  a non-idle CPU here.  This patch adds in support in workqueue
  framework to get preferred CPU details from the scheduler, instead of
  using current CPU.
I don't think it's a good idea to make behavior like this a config option. The behavior difference is subtle and may induce incorrect behavior.

Ok. Will remove it.

...

...
@@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret;
ret = queue_work_on(get_cpu(), wq, work);
put_cpu();
preempt_disable();
ret = queue_work_on(wq_select_cpu(), wq, work);
preempt_enable();
First of all, I'm not entirely sure this is safe. queue_work() used to *guarantee* that the work item would execute on the local CPU. I don't think there are many which depend on that but I'd be surprised if this doesn't lead to some subtle problems somewhere. It might not be realistic to audit all users and we might have to just let it happen and watch for the fallouts. Dunno, still wanna see some level of auditing.

Ok.

...

Also, I'm wondering why this is necessary at all for workqueues. For schedule/queue_work(), you pretty much know the current cpu is not idle. For delayed workqueue, sure but for immediate scheduling, why?

This was done for below scenario: - A cpu has programmed a timer and is IDLE now. - CPU gets into interrupt handler due to timer and queues a work. As the CPU is currently IDLE, we should queue this work to some other CPU.

I know this patch did migrate works in all cases. Will fix it by queuing work only for this case in V2.

-- viresh

Viresh Kumar

25 Sep 25 Sep

11:20 a.m.

On 25 September 2012 16:06, Viresh Kumar viresh.kumar@linaro.org wrote:

...

Test case 2:

I have created a small module, which does following:

Create one work for each CPU (using queue_work_on(), so must schedule on that cpu)

Above work, will queue "n" works for each cpu with queue_work(). These works are tracked within the module and results are printed at the end.

This gave similar results, with n ranging from 10 to 1000.

http://git.linaro.org/gitweb?p=people/vireshk/module.git%3Ba=summary

Source of this module can be found at above repo.

-- viresh

4267

days inactive

4268

days old

linaro-dev@lists.linaro.org

14 comments

participants

tags (0)

participants (4)

Peter Zijlstra
Tejun Heo
Vincent Guittot
Viresh Kumar