linaro-kernel April 2013

linaro-kernel@lists.linaro.org

91 participants
94 discussions

[ACTIVITY] (Linus Walleij) 2013-03-22 - 2013-04-12

by Linus Walleij

== Linus Walleij linusw == === Highlights === * Completed the multiplatform support for ux500 and the result has been pulled into MFD and ARM SoC trees. There is some immediate fallout from this that need to be fixed and other trouble that also need to be fixed up in the ux500 world. * Collected pinctrl fixes and new patches. Sent a pull request for fixes to Torvalds and he pulled it in. * Collected GPIO fixes and new patches. Sent two pull requests for fixes to Torvalds and he pulled them. This took some time due to a huge pile of cleanup patches. * Sent pull requests to ARM SoC for a few ux500 things, probably I am missing some topics still. * Reviewed and merged a few of Fabios backports to the internal ST-Ericsson tree. * Figured out how to do PCI device tree properly and implemented for the Integrator/AP. Patches are pending review but in nice shape - had to hunt down a specific nasty problem with PCI hosts being rootless (no parent device) on ARM, this will not work going forward so proposed a patch and iterated. * I have a pretty big device tree patch bundle for the U300 building up, but want to have it in a more complete state before I post. The plan for U300 is: enable all for device tree, delete board files, multiplatform in that order. === Plans === * Get better at sending these reports every week. * A short paternity leave 6/5->9/5 in may. * Find all regressions for ux500 lurking in the linux-next tree. * Convert Nomadik pinctrl driver to register GPIO ranges from the gpiochip side. * Test the PL08x patches on the Ericsson Research PB11MPCore and submit platform data for using pl08x DMA on that platform. * Get hands dirty with regmap. === Issues === * A bit overloaded, especially hard to keep track of all the ux500 stuff in my head. Could use another co-maintainer maybe. * Things have been hectic internally at ST-Ericsson diverting me from Linaro work. * I am spending roughly 30-60 mins every day on internal review work on internal baseline and mainline patches-to-be. Thanks, Linus Walleij

12 years, 3 months

[PATCH Resend v5] sched: fix init NOHZ_IDLE flag

by Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce a new sched_domain_rq struct that is the entry point for both sched_domains and objects that must follow the same lifecycle like NOHZ_IDLE flags. They will share the same RCU lifecycle and will be always synchronized. The synchronization is done at the cost of : - an additional indirection for accessing the first sched_domain level - an additional indirection and a rcu_dereference before accessing to the NOHZ_IDLE flag. Change since v4: - link both sched_domain and NOHZ_IDLE flag in one RCU object so their states are always synchronized. Change since V3; - NOHZ flag is not cleared if a NULL domain is attached to the CPU - Remove patch 2/2 which becomes useless with latest modifications Change since V2: - change the initialization to idle state instead of busy state so a CPU that enters idle during the build of the sched_domain will not corrupt the initialization state Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org> --- include/linux/sched.h | 6 +++ kernel/sched/core.c | 105 ++++++++++++++++++++++++++++++++++++++++++++----- kernel/sched/fair.c | 35 +++++++++++------ kernel/sched/sched.h | 24 +++++++++-- 4 files changed, 145 insertions(+), 25 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index d35d2b6..2a52188 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -959,6 +959,12 @@ struct sched_domain { unsigned long span[0]; }; +struct sched_domain_rq { + struct sched_domain *sd; + unsigned long flags; + struct rcu_head rcu; /* used during destruction */ +}; + static inline struct cpumask *sched_domain_span(struct sched_domain *sd) { return to_cpumask(sd->span); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f12624..69e2313 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu) destroy_sched_domain(sd, cpu); } +static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu) +{ + if (!sd_rq) + return; + + destroy_sched_domains(sd_rq->sd, cpu); + kfree_rcu(sd_rq, rcu); +} + /* * Keep a special pointer to the highest sched_domain that has * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this @@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu) * hold the hotplug lock. */ static void -cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) +cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd, + int cpu) { struct rq *rq = cpu_rq(cpu); - struct sched_domain *tmp; + struct sched_domain_rq *tmp_rq; + struct sched_domain *tmp, *sd = NULL; + + /* + * If we don't have any sched_domain and associated object, we can + * directly jump to the attach sequence otherwise we try to degenerate + * the sched_domain + */ + if (!sd_rq) + goto attach; + + /* Get a pointer to the 1st sched_domain */ + sd = sd_rq->sd; /* Remove the sched domains which do not contribute to scheduling. */ for (tmp = sd; tmp; ) { @@ -5658,14 +5680,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) destroy_sched_domain(tmp, cpu); if (sd) sd->child = NULL; + /* update sched_domain_rq */ + sd_rq->sd = sd; } +attach: sched_domain_debug(sd, cpu); rq_attach_root(rq, rd); - tmp = rq->sd; - rcu_assign_pointer(rq->sd, sd); - destroy_sched_domains(tmp, cpu); + tmp_rq = rq->sd_rq; + rcu_assign_pointer(rq->sd_rq, sd_rq); + destroy_sched_domain_rq(tmp_rq, cpu); update_top_cache_domain(cpu); } @@ -5695,12 +5720,14 @@ struct sd_data { }; struct s_data { + struct sched_domain_rq ** __percpu sd_rq; struct sched_domain ** __percpu sd; struct root_domain *rd; }; enum s_alloc { sa_rootdomain, + sa_sd_rq, sa_sd, sa_sd_storage, sa_none, @@ -5935,7 +5962,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd) return; update_group_power(sd, cpu); - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); + atomic_set(&sg->sgp->nr_busy_cpus, 0); } int __weak arch_sd_sibling_asym_packing(void) @@ -6011,6 +6038,8 @@ static void set_domain_attribute(struct sched_domain *sd, static void __sdt_free(const struct cpumask *cpu_map); static int __sdt_alloc(const struct cpumask *cpu_map); +static void __sdrq_free(const struct cpumask *cpu_map, struct s_data *d); +static int __sdrq_alloc(const struct cpumask *cpu_map, struct s_data *d); static void __free_domain_allocs(struct s_data *d, enum s_alloc what, const struct cpumask *cpu_map) @@ -6019,6 +6048,9 @@ static void __free_domain_allocs(struct s_data *d, enum s_alloc what, case sa_rootdomain: if (!atomic_read(&d->rd->refcount)) free_rootdomain(&d->rd->rcu); /* fall through */ + case sa_sd_rq: + __sdrq_free(cpu_map, d); /* fall through */ + free_percpu(d->sd_rq); /* fall through */ case sa_sd: free_percpu(d->sd); /* fall through */ case sa_sd_storage: @@ -6038,9 +6070,14 @@ static enum s_alloc __visit_domain_allocation_hell(struct s_data *d, d->sd = alloc_percpu(struct sched_domain *); if (!d->sd) return sa_sd_storage; + d->sd_rq = alloc_percpu(struct sched_domain_rq *); + if (!d->sd_rq) + return sa_sd; + if (__sdrq_alloc(cpu_map, d)) + return sa_sd_rq; d->rd = alloc_rootdomain(); if (!d->rd) - return sa_sd; + return sa_sd_rq; return sa_rootdomain; } @@ -6466,6 +6503,46 @@ static void __sdt_free(const struct cpumask *cpu_map) } } +static int __sdrq_alloc(const struct cpumask *cpu_map, struct s_data *d) +{ + int j; + + for_each_cpu(j, cpu_map) { + struct sched_domain_rq *sd_rq; + + sd_rq = kzalloc_node(sizeof(struct sched_domain_rq), + GFP_KERNEL, cpu_to_node(j)); + if (!sd_rq) + return -ENOMEM; + + *per_cpu_ptr(d->sd_rq, j) = sd_rq; + } + + return 0; +} + +static void __sdrq_free(const struct cpumask *cpu_map, struct s_data *d) +{ + int j; + + for_each_cpu(j, cpu_map) + if (*per_cpu_ptr(d->sd_rq, j)) + kfree(*per_cpu_ptr(d->sd_rq, j)); +} + +static void build_sched_domain_rq(struct s_data *d, int cpu) +{ + struct sched_domain_rq *sd_rq; + struct sched_domain *sd; + + /* Attach sched_domain to sched_domain_rq */ + sd = *per_cpu_ptr(d->sd, cpu); + sd_rq = *per_cpu_ptr(d->sd_rq, cpu); + sd_rq->sd = sd; + /* Init flags */ + set_bit(NOHZ_IDLE, sched_rq_flags(sd_rq)); +} + struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl, struct s_data *d, const struct cpumask *cpu_map, struct sched_domain_attr *attr, struct sched_domain *child, @@ -6495,6 +6572,7 @@ static int build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr) { enum s_alloc alloc_state = sa_none; + struct sched_domain_rq *sd_rq; struct sched_domain *sd; struct s_data d; int i, ret = -ENOMEM; @@ -6547,11 +6625,18 @@ static int build_sched_domains(const struct cpumask *cpu_map, } } + /* Init objects that must follow the sched_domain lifecycle */ + for_each_cpu(i, cpu_map) { + build_sched_domain_rq(&d, i); + } + /* Attach the domains */ rcu_read_lock(); for_each_cpu(i, cpu_map) { - sd = *per_cpu_ptr(d.sd, i); - cpu_attach_domain(sd, d.rd, i); + sd_rq = *per_cpu_ptr(d.sd_rq, i); + cpu_attach_domain(sd_rq, d.rd, i); + /* claim allocation of sched_domain_rq object */ + *per_cpu_ptr(d.sd_rq, i) = NULL; } rcu_read_unlock(); @@ -6982,7 +7067,7 @@ void __init sched_init(void) rq->last_load_update_tick = jiffies; #ifdef CONFIG_SMP - rq->sd = NULL; + rq->sd_rq = NULL; rq->rd = NULL; rq->cpu_power = SCHED_POWER_SCALE; rq->post_schedule = 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..1c7447e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5392,31 +5392,39 @@ static inline void nohz_balance_exit_idle(int cpu) static inline void set_cpu_sd_state_busy(void) { + struct sched_domain_rq *sd_rq; struct sched_domain *sd; int cpu = smp_processor_id(); - if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) - return; - clear_bit(NOHZ_IDLE, nohz_flags(cpu)); - rcu_read_lock(); - for_each_domain(cpu, sd) + sd_rq = get_sched_domain_rq(cpu); + + if (!sd_rq || !test_bit(NOHZ_IDLE, sched_rq_flags(sd_rq))) + goto unlock; + clear_bit(NOHZ_IDLE, sched_rq_flags(sd_rq)); + + for_each_domain_from_rq(sd_rq, sd) atomic_inc(&sd->groups->sgp->nr_busy_cpus); +unlock: rcu_read_unlock(); } void set_cpu_sd_state_idle(void) { + struct sched_domain_rq *sd_rq; struct sched_domain *sd; int cpu = smp_processor_id(); - if (test_bit(NOHZ_IDLE, nohz_flags(cpu))) - return; - set_bit(NOHZ_IDLE, nohz_flags(cpu)); - rcu_read_lock(); - for_each_domain(cpu, sd) + sd_rq = get_sched_domain_rq(cpu); + + if (!sd_rq || test_bit(NOHZ_IDLE, sched_rq_flags(sd_rq))) + goto unlock; + set_bit(NOHZ_IDLE, sched_rq_flags(sd_rq)); + + for_each_domain_from_rq(sd_rq, sd) atomic_dec(&sd->groups->sgp->nr_busy_cpus); +unlock: rcu_read_unlock(); } @@ -5673,7 +5681,12 @@ static void run_rebalance_domains(struct softirq_action *h) static inline int on_null_domain(int cpu) { - return !rcu_dereference_sched(cpu_rq(cpu)->sd); + struct sched_domain_rq *sd_rq = + rcu_dereference_sched(cpu_rq(cpu)->sd_rq); + struct sched_domain *sd = NULL; + if (sd_rq) + sd = sd_rq->sd; + return !sd; } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..f589306 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -417,7 +417,7 @@ struct rq { #ifdef CONFIG_SMP struct root_domain *rd; - struct sched_domain *sd; + struct sched_domain_rq *sd_rq; unsigned long cpu_power; @@ -505,21 +505,37 @@ DECLARE_PER_CPU(struct rq, runqueues); #ifdef CONFIG_SMP -#define rcu_dereference_check_sched_domain(p) \ +#define rcu_dereference_check_sched_domain_rq(p) \ rcu_dereference_check((p), \ lockdep_is_held(&sched_domains_mutex)) +#define get_sched_domain_rq(cpu) \ + rcu_dereference_check_sched_domain_rq(cpu_rq(cpu)->sd_rq) + +#define rcu_dereference_check_sched_domain(cpu) ({ \ + struct sched_domain_rq *__sd_rq = get_sched_domain_rq(cpu); \ + struct sched_domain *__sd = NULL; \ + if (__sd_rq) \ + __sd = __sd_rq->sd; \ + __sd; \ +}) + +#define sched_rq_flags(sd_rq) (&sd_rq->flags) + /* - * The domain tree (rq->sd) is protected by RCU's quiescent state transition. + * The domain tree (rq->sd_rq) is protected by RCU's quiescent state transition. * See detach_destroy_domains: synchronize_sched for details. * * The domain tree of any CPU may only be accessed from within * preempt-disabled sections. */ #define for_each_domain(cpu, __sd) \ - for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \ + for (__sd = rcu_dereference_check_sched_domain(cpu); \ __sd; __sd = __sd->parent) +#define for_each_domain_from_rq(sd_rq, __sd) \ + for (__sd = sd_rq->sd; __sd; __sd = __sd->parent) + #define for_each_lower_domain(sd) for (; sd; sd = sd->child) /** -- 1.7.9.5

12 years, 3 months

[V2 patch 00/19] cpuidle: code consolidation

by Daniel Lezcano

This patchset series provide some code consolidation across the different cpuidle drivers. It contains two parts, the first one is the removal of the time keeping flag and the second one, is a common initialization routine. All the drivers use the en_core_tk_irqen flag, which means it is not necessary to make the time computation optional. We can remove this flag and assume the cpuidle framework always manage this operation. The cpuidle code initialization is duplicated across the different drivers in the same manner. The repeating pattern is: SMP: cpuidle_register_driver(drv); for_each_possible_cpu(cpu) { dev = per_cpu(cpuidle_device, cpu); cpuidle_register_device(dev); } UP: cpuidle_register_driver(drv); cpuidle_register_device(dev); As on a UP machine the macro 'for_each_cpu' is a one iteration loop, using the initialization loop from SMP to UP works. The patchset does some cleanup for different drivers in order to make the init code the same. Then it introduces a generic function: cpuidle_register(struct cpuidle_driver *drv, struct cpumask *cpumask) The cpumask is for the coupled idle states. The drivers are then modified to take into account this new function and to remove the duplicated code. The benefit is observable in the diffstat: 332 lines of code removed. Tested-on: u8500 Tested-on: at91 Tested-on: intel i5 Tested-on: OMAP4 Compiled with and without CPU_IDLE for: u8500, at91, davinci, exynos, imx5, imx6, kirkwood, multi_v7 (for calxeda), omap2plus, s3c64, tegra1, tegra2, tegra3 Daniel Lezcano (19): ARM: shmobile: cpuidle: remove shmobile_enter_wfi function ARM: OMAP3: remove cpuidle_wrap_enter cpuidle: remove en_core_tk_irqen flag ARM: ux500: cpuidle: replace for_each_online_cpu by for_each_possible_cpu ARM: imx: cpuidle: create separate drivers for imx5/imx6 cpuidle: make a single register function for all ARM: ux500: cpuidle: use init/exit common routine ARM: at91: cpuidle: use init/exit common routine ARM: OMAP3: cpuidle: use init/exit common routine ARM: s3c64xx: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine ARM: shmobile: cpuidle: use init/exit common routine ARM: OMAP4: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine for tegra2 ARM: tegra: cpuidle: use init/exit common routine for tegra3 ARM: calxeda: cpuidle: use init/exit common routine ARM: kirkwood: cpuidle: use init/exit common routine ARM: davinci: cpuidle: use init/exit common routine ARM: imx: cpuidle: use init/exit common routine Documentation/cpuidle/driver.txt | 6 + arch/arm/mach-at91/cpuidle.c | 18 +-- arch/arm/mach-davinci/cpuidle.c | 21 +--- arch/arm/mach-exynos/cpuidle.c | 1 - arch/arm/mach-imx/Makefile | 1 + arch/arm/mach-imx/cpuidle-imx5.c | 40 +++++++ arch/arm/mach-imx/cpuidle-imx6q.c | 3 +- arch/arm/mach-imx/cpuidle.c | 80 ------------- arch/arm/mach-imx/cpuidle.h | 10 +- arch/arm/mach-imx/pm-imx5.c | 30 +---- arch/arm/mach-omap2/cpuidle34xx.c | 49 ++------ arch/arm/mach-omap2/cpuidle44xx.c | 23 +--- arch/arm/mach-s3c64xx/cpuidle.c | 15 +-- arch/arm/mach-shmobile/cpuidle.c | 11 +- arch/arm/mach-shmobile/include/mach/common.h | 3 - arch/arm/mach-shmobile/pm-sh7372.c | 2 - arch/arm/mach-tegra/cpuidle-tegra114.c | 27 +---- arch/arm/mach-tegra/cpuidle-tegra20.c | 31 +---- arch/arm/mach-tegra/cpuidle-tegra30.c | 28 +---- arch/arm/mach-ux500/cpuidle.c | 33 +----- arch/powerpc/platforms/pseries/processor_idle.c | 1 - arch/sh/kernel/cpu/shmobile/cpuidle.c | 1 - arch/x86/kernel/apm_32.c | 1 - drivers/acpi/processor_idle.c | 1 - drivers/cpuidle/cpuidle-calxeda.c | 53 +-------- drivers/cpuidle/cpuidle-kirkwood.c | 18 +-- drivers/cpuidle/cpuidle.c | 144 ++++++++++++++--------- drivers/idle/intel_idle.c | 1 - include/linux/cpuidle.h | 20 ++-- 29 files changed, 175 insertions(+), 497 deletions(-) create mode 100644 arch/arm/mach-imx/cpuidle-imx5.c delete mode 100644 arch/arm/mach-imx/cpuidle.c -- 1.7.9.5

12 years, 3 months

[PATCH] cpufreq: Call __cpufreq_governor() with correct policy->cpus mask

by Viresh Kumar

__cpufreq_governor() must be called with correct policy->cpus mask. In __cpufreq_remove_dev() we initially clear policy->cpus with cpumask_clear_cpu() and then call __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). In case governor is doing some per-cpu stuff in EXIT callback, this can create uncertain behavior. Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff in EXIT callback and so we don't face any issues currently. But its better to keep the code clean, so we don't face any issues in future. Now, we call cpumask_clear_cpu() only when multiple cpus are managed by policy. Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> --- drivers/cpufreq/cpufreq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fd97a62..3564947 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1105,7 +1105,9 @@ static int __cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif WARN_ON(lock_policy_rwsem_write(cpu)); cpus = cpumask_weight(data->cpus); - cpumask_clear_cpu(cpu, data->cpus); + + if (cpus > 1) + cpumask_clear_cpu(cpu, data->cpus); unlock_policy_rwsem_write(cpu); if (cpu != data->cpu) { -- 1.7.12.rc2.18.g61b472e

12 years, 3 months

Re: [PATCH 9/9] powerpc: cpufreq: move cpufreq driver to drivers/cpufreq

by Viresh Kumar

On 3 April 2013 16:00, Benjamin Herrenschmidt <benh(a)kernel.crashing.org> wrote: > On Wed, 2013-04-03 at 15:00 +0530, Viresh Kumar wrote: >> On 31 March 2013 09:33, Viresh Kumar <viresh.kumar(a)linaro.org> wrote: >> > Benjamin/Paul/Olof, >> > >> > Any comments on this? >> >> Ping!! > > I'm on vacation until end of April. No objection to the patch but > somebody needs to test it. Hi, Can somebody else from powerpc world give it a try? OR @Rafael: Can we get this pushed in linux-next as is and then people would be forced to test it and in case there are any complains, i will fix them or you can revert it?

12 years, 3 months

[PATCH v6] sched: fix wrong rq's runnable_avg update with rt tasks

by Vincent Guittot

The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V5: - Rename idle_enter/exit function to idle_enter/exit_fair Changes since V4: - Rebase on v3.9-rc6 instead of Steven Rostedt's patches - Create the post_schedule_idle function that was previously created by Steven's patches Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org> --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 16 ++++++++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..1de3df0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep IRQ/preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..b8ce773 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + idle_exit_fair(rq); +} + +static void post_schedule_idle(struct rq *rq) +{ + idle_enter_fair(rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_enter for CFS */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..8f1d80e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter_fair(struct rq *this_rq); +extern void idle_exit_fair(struct rq *this_rq); +#else +static inline void idle_enter_fair(struct rq *this_rq) {} +static inline void idle_exit_fair(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq) -- 1.7.9.5

12 years, 3 months

[PATCH v4] sched: fix wrong rq's runnable_avg update with rt tasks

by Vincent Guittot

The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org> --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 10 ++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fcdbff..1851ca8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 66b5220..0775261 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -14,8 +14,17 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) return task_cpu(p); /* IDLE tasks as never migrated */ } +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + /* Update rq's load with elapsed idle time */ + idle_exit(rq); +} + static void post_schedule_idle(struct rq *rq) { + /* Update rq's load with elapsed running time */ + idle_enter(rq); + idle_balance(smp_processor_id(), rq); } #endif /* CONFIG_SMP */ @@ -95,6 +104,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, .post_schedule = post_schedule_idle, #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..ff4b029 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -878,6 +878,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter(struct rq *this_rq); +extern void idle_exit(struct rq *this_rq); +#else +static inline void idle_enter(struct rq *this_rq) {} +static inline void idle_exit(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq) -- 1.7.9.5

12 years, 3 months

[PATCH v5] sched: fix wrong rq's runnable_avg update with rt tasks

by Vincent Guittot

The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V4: - Rebase on v3.9-rc6 instead of Steven Rostedt's patches - Create the post_schedule_idle function that was previously created by Steven's patches Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org> --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 16 ++++++++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..653edd8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep IRQ/preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..cef61fa 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + idle_exit(rq); +} + +static void post_schedule_idle(struct rq *rq) +{ + idle_enter(rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_enter for CFS */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..2b826f2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter(struct rq *this_rq); +extern void idle_exit(struct rq *this_rq); +#else +static inline void idle_enter(struct rq *this_rq) {} +static inline void idle_exit(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq) -- 1.7.9.5

12 years, 3 months

bl_image symbol in big.LITTLE switcher code

by Prashant B

Hi, I was going through the b.L switcher code. I found a call to enter_nonsecure_world() with parameter "bl_image", obviously it must be address of function that initializes switcher functionality. But I couldn't find any other reference to this symbol in the switcher code. Can somebody please explain this? Thanks. -Prashant

12 years, 3 months

[ACTIVITY] 2013-03-30 - 2013-04-05

by David Long

=== David Long === === Highlights === * Responded to QA requests for input on testing requirements for uprobes and kprobes. * Did some coming up to speed on systemtap. * Still working on a clean way to disentangle uprobe and kprobe code without unnecessary duplication. === Plans === * Restructure code * Start building systemtap === Issues === * Apparently we have a complaint from a TSC member that Kprobes does not work, yet v3.8 passes the kernel-built-in tests and when exercised manually krpobes seem to work. We need more specific information about the problems seen. === Travel/Time Off === -dl

12 years, 3 months

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

linaro-kernel April 2013