This patchset was called: "Create sched_select_cpu() and use it for workqueues" for the first three versions.
Earlier discussions over v3, v2 and v1 can be found here: https://lkml.org/lkml/2013/3/18/364 http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.html http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
For power saving it is better to schedule work on cpus that aren't idle, as bringing a cpu/cluster from idle state can be very costly (both performance and power wise). Earlier we tried to use timer infrastructure to take this decision but we found out later that scheduler gives even better results and so we should use scheduler for choosing cpu for scheduling work.
In workqueue subsystem workqueues with flag WQ_UNBOUND are the ones which uses cpu to select target cpu.
Here we are migrating few users of workqueues to WQ_UNBOUND. These drivers are found to be very much active on idle or lightly busy system and using WQ_UNBOUND for these gave impressive results.
Setup: ----- - ARM Vexpress TC2 - big.LITTLE CPU - Core 0-1: A15, 2-4: A7 - rootfs: linaro-ubuntu-devel
This patchset has been tested on a big LITTLE system (heterogeneous) but is useful for all other homogeneous systems as well. During these tests audio was played in background using aplay.
Results: -------
Cluster A15 Energy Cluster A7 Energy Total ------------------------- ----------------------- ------
Without this patchset (Energy in Joules): ---------------------------------------------------
0.151162 2.183545 2.334707 0.223730 2.687067 2.910797 0.289687 2.732702 3.022389 0.454198 2.745908 3.200106 0.495552 2.746465 3.242017
Average: 0.322866 2.619137 2.942003
With this patchset (Energy in Joules): -----------------------------------------------
0.226421 2.283658 2.510079 0.151361 2.236656 2.388017 0.197726 2.249849 2.447575 0.221915 2.229446 2.451361 0.347098 2.257707 2.604805
Average: 0.2289042 2.2514632 2.4803674
Above tests are repeated multiple times and events are tracked using trace-cmd and analysed using kernelshark. And it was easily noticeable that idle time for many cpus has increased considerably, which eventually saved some power.
PS: All the earlier Acks we got for drivers are reverted here as patches have been updated significantly.
V3->V4: ------- - Dropped changes to kernel/sched directory and hence sched_select_non_idle_cpu(). - Dropped queue_work_on_any_cpu() - Created system_freezable_unbound_wq - Changed all patches accordingly.
V2->V3: ------- - Dropped changes into core queue_work() API, rather create *_on_any_cpu() APIs - Dropped running timers migration patch as that was broken - Migrated few users of workqueues to use *_on_any_cpu() APIs.
Viresh Kumar (4): workqueue: Add system wide system_freezable_unbound_wq PHYLIB: queue work on unbound wq block: queue work on unbound wq fbcon: queue work on unbound wq
block/blk-core.c | 3 ++- block/blk-ioc.c | 2 +- block/genhd.c | 10 ++++++---- drivers/net/phy/phy.c | 9 +++++---- drivers/video/console/fbcon.c | 2 +- include/linux/workqueue.h | 4 ++++ kernel/workqueue.c | 7 ++++++- 7 files changed, 25 insertions(+), 12 deletions(-)
This patch adds system wide system_freezable_unbound_wq which will be used by code that currently uses system_freezable_wq and can be moved to unbound workqueues.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- include/linux/workqueue.h | 4 ++++ kernel/workqueue.c | 7 ++++++- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 835d12b..ab7597b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -325,11 +325,15 @@ enum { * * system_freezable_wq is equivalent to system_wq except that it's * freezable. + * + * system_freezable_unbound_wq is equivalent to system_unbound_wq except that + * it's freezable. */ extern struct workqueue_struct *system_wq; extern struct workqueue_struct *system_long_wq; extern struct workqueue_struct *system_unbound_wq; extern struct workqueue_struct *system_freezable_wq; +extern struct workqueue_struct *system_freezable_unbound_wq;
static inline struct workqueue_struct * __deprecated __system_nrt_wq(void) { diff --git a/kernel/workqueue.c b/kernel/workqueue.c index df2f6c4..771a5cc 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -281,6 +281,8 @@ struct workqueue_struct *system_unbound_wq __read_mostly; EXPORT_SYMBOL_GPL(system_unbound_wq); struct workqueue_struct *system_freezable_wq __read_mostly; EXPORT_SYMBOL_GPL(system_freezable_wq); +struct workqueue_struct *system_freezable_unbound_wq __read_mostly; +EXPORT_SYMBOL_GPL(system_freezable_unbound_wq);
static int worker_thread(void *__worker); static void copy_workqueue_attrs(struct workqueue_attrs *to, @@ -4467,8 +4469,11 @@ static int __init init_workqueues(void) WQ_UNBOUND_MAX_ACTIVE); system_freezable_wq = alloc_workqueue("events_freezable", WQ_FREEZABLE, 0); + system_freezable_unbound_wq = alloc_workqueue("events_unbound_freezable", + WQ_FREEZABLE | WQ_UNBOUND, 0); BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq || - !system_unbound_wq || !system_freezable_wq); + !system_unbound_wq || !system_freezable_wq || + !system_freezable_unbound_wq); return 0; } early_initcall(init_workqueues);
Phylib uses workqueues for multiple purposes. There is no real dependency of scheduling these on the cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one.
This patch replaces system_wq with system_unbound_wq for PHYLIB.
Cc: David S. Miller davem@davemloft.net Cc: netdev@vger.kernel.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/net/phy/phy.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index c14f147..b2fe180 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev, { phydev->adjust_state = handler;
- schedule_delayed_work(&phydev->state_queue, HZ); + queue_delayed_work(system_unbound_wq, &phydev->state_queue, HZ); }
/** @@ -500,7 +500,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) disable_irq_nosync(irq); atomic_inc(&phydev->irq_disable);
- schedule_work(&phydev->phy_queue); + queue_work(system_unbound_wq, &phydev->phy_queue);
return IRQ_HANDLED; } @@ -655,7 +655,7 @@ static void phy_change(struct work_struct *work)
/* reschedule state queue work to run as soon as possible */ cancel_delayed_work_sync(&phydev->state_queue); - schedule_delayed_work(&phydev->state_queue, 0); + queue_delayed_work(system_unbound_wq, &phydev->state_queue, 0);
return;
@@ -918,7 +918,8 @@ void phy_state_machine(struct work_struct *work) if (err < 0) phy_error(phydev);
- schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ); + queue_delayed_work(system_unbound_wq, &phydev->state_queue, + PHY_STATE_TIME * HZ); }
static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
Block layer uses workqueues for multiple purposes. There is no real dependency of scheduling these on the cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one.
This patch replaces normal workqueues with UNBOUND versions.
Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- block/blk-core.c | 3 ++- block/blk-ioc.c | 2 +- block/genhd.c | 10 ++++++---- 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index 492242f..91cd486 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -3186,7 +3186,8 @@ int __init blk_dev_init(void)
/* used for unplugging and affects IO latency/throughput - HIGHPRI */ kblockd_workqueue = alloc_workqueue("kblockd", - WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + WQ_MEM_RECLAIM | WQ_HIGHPRI | + WQ_UNBOUND, 0); if (!kblockd_workqueue) panic("Failed to create kblockd\n");
diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 9c4bb82..5dd576d 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -144,7 +144,7 @@ void put_io_context(struct io_context *ioc) if (atomic_long_dec_and_test(&ioc->refcount)) { spin_lock_irqsave(&ioc->lock, flags); if (!hlist_empty(&ioc->icq_list)) - schedule_work(&ioc->release_work); + queue_work(system_unbound_wq, &ioc->release_work); else free_ioc = true; spin_unlock_irqrestore(&ioc->lock, flags); diff --git a/block/genhd.c b/block/genhd.c index a1ed52a..0f4470a 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1488,9 +1488,10 @@ static void __disk_unblock_events(struct gendisk *disk, bool check_now) intv = disk_events_poll_jiffies(disk); set_timer_slack(&ev->dwork.timer, intv / 4); if (check_now) - queue_delayed_work(system_freezable_wq, &ev->dwork, 0); + queue_delayed_work(system_freezable_unbound_wq, &ev->dwork, 0); else if (intv) - queue_delayed_work(system_freezable_wq, &ev->dwork, intv); + queue_delayed_work(system_freezable_unbound_wq, &ev->dwork, + intv); out_unlock: spin_unlock_irqrestore(&ev->lock, flags); } @@ -1533,7 +1534,7 @@ void disk_flush_events(struct gendisk *disk, unsigned int mask) spin_lock_irq(&ev->lock); ev->clearing |= mask; if (!ev->block) - mod_delayed_work(system_freezable_wq, &ev->dwork, 0); + mod_delayed_work(system_freezable_unbound_wq, &ev->dwork, 0); spin_unlock_irq(&ev->lock); }
@@ -1626,7 +1627,8 @@ static void disk_check_events(struct disk_events *ev,
intv = disk_events_poll_jiffies(disk); if (!ev->block && intv) - queue_delayed_work(system_freezable_wq, &ev->dwork, intv); + queue_delayed_work(system_freezable_unbound_wq, &ev->dwork, + intv);
spin_unlock_irq(&ev->lock);
fbcon uses workqueues and it has no real dependency of scheduling these on the cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to service this work. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one.
This patch replaces system_wq with system_unbound_wq.
Cc: Dave Airlie airlied@redhat.com Cc: linux-fbdev@vger.kernel.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- drivers/video/console/fbcon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c index 3cd6759..6c1f7c3 100644 --- a/drivers/video/console/fbcon.c +++ b/drivers/video/console/fbcon.c @@ -404,7 +404,7 @@ static void cursor_timer_handler(unsigned long dev_addr) struct fb_info *info = (struct fb_info *) dev_addr; struct fbcon_ops *ops = info->fbcon_par;
- schedule_work(&info->queue); + queue_work(system_unbound_wq, &info->queue); mod_timer(&ops->cursor_timer, jiffies + HZ/5); }
On 31 March 2013 19:57, Viresh Kumar viresh.kumar@linaro.org wrote:
This patchset was called: "Create sched_select_cpu() and use it for workqueues" for the first three versions.
Messed up with cc list.. Will resend it.
linaro-kernel@lists.linaro.org