Hello,
I'm pleased to announce that we have pushed a very early version of some of the key features we intend to make available as EAS 1.2 this year to Google's msm repository ( https://android.googlesource.com/kernel/msm.git/ ) as android-msm-marlin-3.18-nougat-mr1-eas-experimental.
EAS 1.2 is intended to be the next iteration of EAS for AOSP, including improvements to the wakeup path to better support big.LITTLE and trialling other upstream scheduler enhancements such as schedutil along with some important load/util tracking enhancements to PELT.
Although EAS 1.2 will be primarily focused on a 4.4-based kernel, we are making this experimental branch available on the 3.18-based Pixel kernel (marlin_defconfig) in order that we have a readily-available real platform with an optimised userspace for experimentation.
There are some differences in the scheduler task wake-up path between this release and that shipping in the Pixel kernel which should be taken into account when using this kernel.
The most visible change in the wake-up path is the removal of the is_big_little sysctl. Wake-up now uses a single cpu selection algorithm (the same one used previously for !isBigLittle) but modified to remove the assumption that the highest capacity cpus have the highest logical cpu number. We now allow cpu topology independent selection of max capacity cpus for tasks which belong to a schedtune group which has some boost applied irrespective of the cpu numbering. This changes the iteration order of cpus when looking for a place to run these tasks from [3,2],[1,0] to [2,3], [0,1]. This has an impact on runtime configuration. Not making a change to this configuration is likely to have a small impact for lightly-loaded systems where there will usually be two idle high-capacity cpus, but we should anyway match cpuset configuration to the selection ordering to restore the expectations used when tuning.
In Pixel, cpusets are arranged such that one of the highest capacity cpus is available only to tasks belonging to the ‘top-app’ cpuset. In combination with the cpu iteration order used for schedtune boosted tasks, we hope to find an empty cpu more often for these tasks to wake on. As a result of the changed iteration order, the top-app should now be set to the lowest numbered high capacity cpu (in this case #2 for Pixel). The impact of this is likely to be small for most light use cases if not changed. This is done in the initrc:
The usual group setup for Pixel is in init.sailfish.rc - the part which configures the CPUSets for the tuning groups is normally as follows:
on property:sys.boot_completed=1 write /proc/sys/kernel/sched_boost 0 # update cpusets now that boot is complete and we want better load balancing write /dev/cpuset/top-app/cpus 0-3 write /dev/cpuset/foreground/boost/cpus 0-2 write /dev/cpuset/foreground/cpus 0-2 write /dev/cpuset/background/cpus 0 write /dev/cpuset/system-background/cpus 0-2
As we wish to make cpu 2 the one which is only available for tasks in the top-app group, we should exclude cpu 2 from the other groups.
on property:sys.boot_completed=1 write /proc/sys/kernel/sched_boost 0 # update cpusets now that boot is complete and we want better load balancing write /dev/cpuset/top-app/cpus 0-3 write /dev/cpuset/foreground/boost/cpus 0-1,3 write /dev/cpuset/foreground/cpus 0-1,3 write /dev/cpuset/background/cpus 0 write /dev/cpuset/system-background/cpus 0-1,3
We normally do this at run time in a root shell rather than modifying the init scripts.
The schedutil governor is present but not selected as the default cpufreq governor. It is important to note that there is a slight difference in the meaning of the up & down frequency select throttling for the 'sched' governor (sched-dvfs) and 'schedutil'. The 'sched' governor considers time to be measured since the last *frequency change* whilst the 'schedutil' governor considers the time to be measured since the last *utilisation request*. This means that we need to shorten the throttle periods used for schedutil when comparing it to sched-dvfs to avoid staying at the maximum frequency for long periods in UI-driven workloads.
We have been experimenting with up_rate_limit_usec set to 500 and down_rate_limit_usec set to 2000 or 5000 which appears to give results comparable with those of the 'sched' governor.
The branch is based upon the mr1 kernel release, and contains the patches shown at the end of this mail.
They are comprised of 6 main areas of functionality.
* ec114ba...d2238c2 and 8646350...35ea67a patches to reduce the delta between the msm kernel and the common kernel * b055eba...d2e2970 introduce a backport of the upstream schedutil governor (but it is not the default governor in marlin_defconfig) * 7f7e79e...14531d4e bring the energy-aware-scheduling calculations into line with our mainline-focused implementation and backport capacity-based-scheduling to 3.18 * b75b728...407d2a7 integrate the current EAS 1.1 wakeup path with the mainline-focused wakeup path and introduce a way to provide a common algorithm implementing the alternate CPU search algorithm for schedtune boosted tasks * f966249...1ad6d08 Backport some important upstream CFS fixes to 3.18. This fixes some critical group accounting issues which had a negative impact on the suitability of PELT utilisation signals for Android * 6ae4707 Allows EAS to continue to calculate energy for systems which end up with a single CPU in a sched domain
Best Regards, Chris
Amit Pundir (3): sched/walt: use do_div instead of division operator ANDROID: sched/walt: fix build failure if FAIR_GROUP_SCHED=n Revert "cgroup: Fix issues in allow_attach callback"
Brendan Jackman (2): DEBUG: sched/fair: Fix missing sched_load_avg_cpu events DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups
Chris Redpath (17): Revert "WIP: UTIL_EST: use estimated utilization on load balancing paths" Revert "WIP: UTIL_EST: use estimated utilization on energy aware wakeup path" Revert "WIP: UTIL_EST: sched/fair: use estimated utilization to drive CPUFreq" Revert "WIP: UTIL_EST: switch to usage of tasks's estimated utilization" sched: revert UTIL_EST usage from commit 6bf72ca7f1 Revert "WIP: UTIL_EST: sched/{core,fair}: add support to use estimated utilization" Revert "WIP: UTIL_EST: sched/fair: add support for estimated utilization" sched/fair: missing parts of 'optimize idle cpu selection for boosted tasks' sched/fair: Fix uninitialised variable in idle_balance Revert: UTIL_EST code from 'fix set_cfs_cpu_capacity when WALT is in use" Unify whitespace layout with android-3.18 schedtune: Guarding against compile errors sched/walt: Drop arch-specific timer access Revert "DEBUG: UTIL_EST: sched: update tracepoint to report estimated CPU utilzation" sched: This kernel expects sched_cfs_boost to be signed schedutil: Fix linkage of schedutil and walt config: Update marlin_defconfig to include schedutil governor
Dietmar Eggemann (20): Revert "WIP: sched: Consider spare cpu capacity at task wake-up" Partial Revert: "WIP: sched: Add cpu capacity awareness to wakeup balancing" Experimental! arm64: Set SD_SHARE_CAP_STATES sched_domain flag on DIE level Experimental!: sched/fair: Do not force want_affine eq. true if EAS is enabled Experimental!: sched/fair: Decommission energy_aware_wake_cpu() Fixup!: sched/fair.c: Set SchedTune specific struct energy_env.task Experimental!: EAS: sched/fair: Re-integrate 'honor sync wakeups' into wakeup path Experimental!: sched/fair: Code !is_big_little path into select_energy_cpu_brute() Experimental!: sched: Remove sysctl_sched_is_big_little sched/core: Remove remnants of commit fd5c98da1a42 Experimental!: sched/core: Add first cpu w/ max/min orig capacity to root domain Experimental!: sched/fair: Change cpu iteration order in find_best_target() sched/fair: Simplify backup_capacity handling in find_best_target() Fixup!: sched/fair: Simplify target_util handling in find_best_target() Fixup!: sched/fair: Simplify idle_idx handling in find_best_target() Fixup!: sched/fair: Refactor min_util, new_util in find_best_target() Fixup!: sched/fair: Simplify idle_idx handling in select_idle_sibling() Fixup!: Return first idle cpu for prefer_idle task immediately Fixup!: sched/fair: No need to 'and' current cpu w/ online mask in wakeup sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability
Dmitry Shmidt (1): sched: Fix sysctl_sched_cfs_boost type to be int
Juri Lelli (3): sched/cpufreq: make schedutil use WALT signal trace/sched: add rq utilization signal for WALT sched/walt: kill {min,max}_capacity
Ke Wang (1): sched: tune: Fix lacking spinlock initialization
Morten Rasmussen (15): sched/core: Fix power to capacity renaming in comment sched/fair: Make the use of prev_cpu consistent in the wakeup path sched/fair: Optimize find_idlest_cpu() when there is no choice sched/core: Remove unnecessary NULL-pointer check sched/core: Introduce SD_ASYM_CPUCAPACITY sched_domain topology flag sched/core: Pass child domain into sd_init() sched/core: Enable SD_BALANCE_WAKE for asymmetric capacity systems sched/fair: Let asymmetric CPU configurations balance at wake-up sched/fair: Compute task/cpu utilization at wake-up correctly sched/fair: Consider spare capacity in find_idlest_group() sched/fair: Add per-CPU min capacity to sched_group_capacity sched/fair: Avoid pulling tasks from non-overloaded higher capacity groups sched/fair: Fix incorrect comment for capacity_margin Experimental!: sched/fair: Add energy_diff dead-zone margin Experimental!: sched/fair: Energy-aware wake-up task placement
Patrick Bellasi (3): FIXUP: sched/tune: update accouting before CPU capacity FIX: sched/tune: move schedtune_nornalize_energy into fair.c sched/tune: backport 'fix accounting for runnable tasks'
Peter Zijlstra (Intel) (3): sched/fair: Apply more PELT fixes sched/fair: Improve PELT stuff some more sched/fair: Fix effective_load() to consistently use smoothed load
Petr Mladek (1): kthread: allow to cancel kthread work
Srinath Sridharan (1): eas/sched/fair: Fixing comments in find_best_target.
Steve Muckle (5): sched/cpufreq: fix tunables for schedfreq governor sched: backport cpufreq hooks from 4.9-rc4 sched: backport schedutil governor from 4.9-rc4 sched: cpufreq: use rt_avg as estimate of required RT CPU capacity cpufreq: schedutil: add up/down frequency transition rate limits
Vincent Guittot (6): sched: factorize attach entity sched: factorize PELT update sched: fix hierarchical order in rq->leaf_cfs_rq_list sched: propagate load during synchronous attach/detach sched: propagate asynchrous detach sched: Multiple upstream load tracking changes
Viresh Kumar (1): cpufreq: schedutil: move slow path from workqueue to SCHED_FIFO task
Yuyang Du (1): sched/fair: Initiate a new task's util avg to a bounded value
kbuild test robot (2): ANDROID: sched/tune: __pcpu_scope_cpu_boost_groups can be static ANDROID: sched/tune: schedtune_allow_attach() can be static
arch/arm64/configs/marlin_defconfig | 2 +- arch/arm64/kernel/topology.c | 7 +- drivers/cpufreq/Kconfig | 27 + drivers/cpufreq/Makefile | 2 +- drivers/cpufreq/cpufreq.c | 32 + drivers/cpufreq/cpufreq_governor_attr_set.c | 84 ++ include/linux/cgroup.h | 2 +- include/linux/cpufreq.h | 49 ++ include/linux/kthread.h | 4 + include/linux/sched.h | 20 +- include/linux/sched/sysctl.h | 7 +- include/trace/events/sched.h | 22 +- init/Kconfig | 1 + kernel/kthread.c | 96 +- kernel/sched/Makefile | 2 + kernel/sched/core.c | 84 +- kernel/sched/cpufreq.c | 63 ++ kernel/sched/cpufreq_sched.c | 220 ++--- kernel/sched/cpufreq_schedutil.c | 762 ++++++++++++++++ kernel/sched/deadline.c | 3 + kernel/sched/debug.c | 4 - kernel/sched/fair.c | 1254 ++++++++++++++++++--------- kernel/sched/features.h | 5 - kernel/sched/rt.c | 3 + kernel/sched/sched.h | 84 +- kernel/sched/tune.c | 5 +- kernel/sched/tune.h | 3 + kernel/sched/walt.c | 52 +- kernel/sysctl.c | 7 - 29 files changed, 2261 insertions(+), 645 deletions(-) create mode 100644 drivers/cpufreq/cpufreq_governor_attr_set.c create mode 100644 kernel/sched/cpufreq.c create mode 100644 kernel/sched/cpufreq_schedutil.c
-- 1.9.1 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.