Good day,
First, I apologize for the information overload below. I just wanted to make sure to provided as much detail as possible.
I do kernel development as a hobby for XDA and other communities. For a given device that we support, we tend to follow the same sched changes as Google and will pull in changes from upstream. One of the devices we dev for is the OnePlus6 and 6T. It uses the same SD845 SOC as the P3. Long story short, I started to notice an issue that we did not have on our Oneplus5 Device (SD835 same as the Pixel 2). When boosting top-app/schedtune.boost to 50 for example, it would appear to boost both small and big cores. With a boost of 50, this means both clusters would be running at 1766MHz and above. But this behavior was inconsistent. I could reboot, and sometimes only the big cluster would appear to be affected by the stune boost but it would seem that after hours/days of up time, the issue would occur again and both clusters would be affected.
So I started kernel tracing trying to narrow down a culprit. One of the things I noticed between a trace that had the bug captured and a trace where the bug isn't occurring, is the boost margin assigned each cpu. I could tell this bug would occur when the boosted task could easily fit on a big core due to low utilization and margin assigned to them. Also, there was not any other noticeable reason I could discern. Below is an example of what I am referring to. Also note that I had a friend with a Pixel 3 confirm it had the problem as well before writing here and the logs/info below are taken from a P3.
Kernel version:
Linux version 4.9.124-gdff097c-ab5226088 (android-build@abfarm934) (Android clang version 5.0.1 (https://us3-mirror-android.googlesource.com/toolchain/clang 00e4a5a67eb7d626653c23780ff02367ead74955) (https://us3-mirror-android.googlesource.com/toolchain/llvm ef376ecb7d9c1460216126d102bb32fc5f73800d) (based on LLVM 5.0.1svn)) #0 SMP PREEMPT Fri Jan 11 21:28:03 UTC 2019
Extracted from PQ2A.190305.002 factory images
Events used: sched_boost_cpu sched:sched_find_best_target sched:sched_boost_task sched:sched_load_balance sched:sched_migrate_task sched:sched_switch sched:sched_wakeup_new sched:sched_waking sched:sched_task_util sched:sched_tune_boostgroup_update sched:sched_tune_tasks_update
Bug (no interaction): <idle>-0 [000] d.H9 12042.557751: sched_boost_cpu: cpu=0 util=22 margin=100 <idle>-0 [000] d.H9 12042.557756: sched_boost_cpu: cpu=1 util=26 margin=0 <idle>-0 [000] d.H9 12042.557759: sched_boost_cpu: cpu=2 util=0 margin=0 <idle>-0 [000] d.H9 12042.557761: sched_boost_cpu: cpu=3 util=4 margin=0 <idle>-0 [000] d.H9 12042.557764: sched_boost_cpu: cpu=4 util=0 margin=0 <idle>-0 [000] d.H9 12042.557767: sched_boost_cpu: cpu=5 util=0 margin=0 <idle>-0 [000] d.H9 12042.557770: sched_boost_cpu: cpu=6 util=0 margin=102 <idle>-0 [000] d.H9 12042.557772: sched_boost_cpu: cpu=7 util=0 margin=102
Bug during interactions: <...>-640 [000] d.H8 12044.737723: sched_boost_cpu: cpu=0 util=33 margin=495 <...>-640 [000] d.H8 12044.737725: sched_boost_cpu: cpu=1 util=22 margin=0 <...>-640 [000] d.H8 12044.737727: sched_boost_cpu: cpu=2 util=47 margin=0 <...>-640 [000] d.H8 12044.737728: sched_boost_cpu: cpu=3 util=12 margin=0 <...>-640 [000] d.H8 12044.737730: sched_boost_cpu: cpu=4 util=119 margin=0 <...>-640 [000] d.H8 12044.737732: sched_boost_cpu: cpu=5 util=83 margin=0 <...>-640 [000] d.H8 12044.737736: sched_boost_cpu: cpu=6 util=0 margin=512 <...>-640 [000] d.H8 12044.737738: sched_boost_cpu: cpu=7 util=2 margin=511
NO Bug during interactions: ndroid.settings-10354 [004] d.hb 117.480930: sched_boost_cpu: cpu=0 util=50 margin=0 ndroid.settings-10354 [004] d.hb 117.480932: sched_boost_cpu: cpu=1 util=184 margin=0 ndroid.settings-10354 [004] d.hb 117.480933: sched_boost_cpu: cpu=2 util=66 margin=0 ndroid.settings-10354 [004] d.hb 117.480934: sched_boost_cpu: cpu=3 util=46 margin=0 ndroid.settings-10354 [004] d.hb 117.480935: sched_boost_cpu: cpu=4 util=119 margin=452 ndroid.settings-10354 [004] d.hb 117.480936: sched_boost_cpu: cpu=5 util=2 margin=0 ndroid.settings-10354 [004] d.hb 117.480936: sched_boost_cpu: cpu=6 util=112 margin=0 ndroid.settings-10354 [004] d.hb 117.480937: sched_boost_cpu: cpu=7 util=7 margin=0
When you take a closer look at what is running on the small cores, I started to notice something else. With the event:
sched_tune_tasks_update
I noticed a pattern with tasks that had "boost=0 max_boost=50". When the bug is occurring, you would see small cores running tasks with "boost=0 max_boost=50" and when no bug, only big cores would have tasks with "boost=0 max_boost=50". I have not been able to track down why sometimes these tasks that received a boost as some point are running on small cores. I also cannot discern why this issue has been inconsistent. I wanted to give a heads up and would appreciate any assistance or information anyone here might have with regards to a possible solution. Thanks for the great work! I look forward to hearing back from you guys.
I have uploaded the a number of traces captured to my G Drive here:
https://drive.google.com/open?id=11YTlGezAMllmlVr9CtMNCz56CTfrJGBd
If there is a better method, please let me know.
Regards, Zachariah Kennedy
Hi Zachariah,
On 3/22/19 7:01 PM, Zachariah Kennedy wrote:
Good day,
First, I apologize for the information overload below. I just wanted to make sure to provided as much detail as possible.
I do kernel development as a hobby for XDA and other communities. For a given device that we support, we tend to follow the same sched changes as Google and will pull in changes from upstream. One of the devices we dev for is the OnePlus6 and 6T. It uses the same SD845 SOC as the P3. Long story short, I started to notice an issue that we did not have on our Oneplus5 Device (SD835 same as the Pixel 2). When boosting top-app/schedtune.boost to 50 for example, it would appear to boost both small and big cores. With a boost of 50, this means both clusters would be running at 1766MHz and above. But this behavior was inconsistent. I could reboot, and sometimes only the big cluster would appear to be affected by the stune boost but it would seem that after hours/days of up time, the issue would occur again and both clusters would be affected.
There is this additional boost feature of "boost policy" in the P3 device kernel (I'm on android-9.0.0_r0.66, commit dff097cd6c51 "Merge branch 'android-msm-bluecross-4.9-pi-qpr1' into android-msm-bluecross-4.9-pi-qpr2"). You won't find this in the v4.9 Android Common Kernel.
kernel/sched/sched.h:
1111 enum sched_boost_policy { 1112 SCHED_BOOST_NONE, 1113 SCHED_BOOST_ON_BIG, 1114 SCHED_BOOST_ON_ALL, 1115 };
sched_boost_policy() is also used in fair.c. I wonder if this has something to do with what you see.
I assume you haven't changed the kernel build nor the scheduler related runtime configuration:
Jut to make sure, can you share:
# zcat /proc/config.gz | grep "SCHED|CGROUP"
# for f in /proc/sys/kernel/sched_[^d]*; do echo -e "$f \c"; cat $f; done
In case you want to put extra trace_printks into the kernel stack and you use trace-cmd, you might want to revert commit f9b19a92ea7d "tracing: do not leak kernel addresses".
Best Regards,
-- Dietmar
Hi,
On Friday 22 Mar 2019 at 13:01:13 (-0500), Zachariah Kennedy wrote:
Good day,
First, I apologize for the information overload below. I just wanted to make sure to provided as much detail as possible.
I do kernel development as a hobby for XDA and other communities. For a given device that we support, we tend to follow the same sched changes as Google and will pull in changes from upstream. One of the devices we dev for is the OnePlus6 and 6T. It uses the same SD845 SOC as the P3. Long story short, I started to notice an issue that we did not have on our Oneplus5 Device (SD835 same as the Pixel 2). When boosting top-app/schedtune.boost to 50 for example, it would appear to boost both small and big cores. With a boost of 50, this means both clusters would be running at 1766MHz and above. But this behavior was inconsistent. I could reboot, and sometimes only the big cluster would appear to be affected by the stune boost but it would seem that after hours/days of up time, the issue would occur again and both clusters would be affected.
So I started kernel tracing trying to narrow down a culprit. One of the things I noticed between a trace that had the bug captured and a trace where the bug isn't occurring, is the boost margin assigned each cpu. I could tell this bug would occur when the boosted task could easily fit on a big core due to low utilization and margin assigned to them. Also, there was not any other noticeable reason I could discern. Below is an example of what I am referring to. Also note that I had a friend with a Pixel 3 confirm it had the problem as well before writing here and the logs/info below are taken from a P3.
Kernel version:
Linux version 4.9.124-gdff097c-ab5226088 (android-build@abfarm934) (Android clang version 5.0.1 (https://us3-mirror-android.googlesource.com/toolchain/clang 00e4a5a67eb7d626653c23780ff02367ead74955) (https://us3-mirror-android.googlesource.com/toolchain/llvm ef376ecb7d9c1460216126d102bb32fc5f73800d) (based on LLVM 5.0.1svn)) #0 SMP PREEMPT Fri Jan 11 21:28:03 UTC 2019
Extracted from PQ2A.190305.002 factory images
Events used: sched_boost_cpu sched:sched_find_best_target sched:sched_boost_task sched:sched_load_balance sched:sched_migrate_task sched:sched_switch sched:sched_wakeup_new sched:sched_waking sched:sched_task_util sched:sched_tune_boostgroup_update sched:sched_tune_tasks_update
Bug (no interaction): <idle>-0 [000] d.H9 12042.557751: sched_boost_cpu: cpu=0 util=22 margin=100 <idle>-0 [000] d.H9 12042.557756: sched_boost_cpu: cpu=1 util=26 margin=0 <idle>-0 [000] d.H9 12042.557759: sched_boost_cpu: cpu=2 util=0 margin=0 <idle>-0 [000] d.H9 12042.557761: sched_boost_cpu: cpu=3 util=4 margin=0 <idle>-0 [000] d.H9 12042.557764: sched_boost_cpu: cpu=4 util=0 margin=0 <idle>-0 [000] d.H9 12042.557767: sched_boost_cpu: cpu=5 util=0 margin=0 <idle>-0 [000] d.H9 12042.557770: sched_boost_cpu: cpu=6 util=0 margin=102 <idle>-0 [000] d.H9 12042.557772: sched_boost_cpu: cpu=7 util=0 margin=102
Bug during interactions: <...>-640 [000] d.H8 12044.737723: sched_boost_cpu: cpu=0 util=33 margin=495 <...>-640 [000] d.H8 12044.737725: sched_boost_cpu: cpu=1 util=22 margin=0 <...>-640 [000] d.H8 12044.737727: sched_boost_cpu: cpu=2 util=47 margin=0 <...>-640 [000] d.H8 12044.737728: sched_boost_cpu: cpu=3 util=12 margin=0 <...>-640 [000] d.H8 12044.737730: sched_boost_cpu: cpu=4 util=119 margin=0 <...>-640 [000] d.H8 12044.737732: sched_boost_cpu: cpu=5 util=83 margin=0 <...>-640 [000] d.H8 12044.737736: sched_boost_cpu: cpu=6 util=0 margin=512 <...>-640 [000] d.H8 12044.737738: sched_boost_cpu: cpu=7 util=2 margin=511
NO Bug during interactions: ndroid.settings-10354 [004] d.hb 117.480930: sched_boost_cpu: cpu=0 util=50 margin=0 ndroid.settings-10354 [004] d.hb 117.480932: sched_boost_cpu: cpu=1 util=184 margin=0 ndroid.settings-10354 [004] d.hb 117.480933: sched_boost_cpu: cpu=2 util=66 margin=0 ndroid.settings-10354 [004] d.hb 117.480934: sched_boost_cpu: cpu=3 util=46 margin=0 ndroid.settings-10354 [004] d.hb 117.480935: sched_boost_cpu: cpu=4 util=119 margin=452 ndroid.settings-10354 [004] d.hb 117.480936: sched_boost_cpu: cpu=5 util=2 margin=0 ndroid.settings-10354 [004] d.hb 117.480936: sched_boost_cpu: cpu=6 util=112 margin=0 ndroid.settings-10354 [004] d.hb 117.480937: sched_boost_cpu: cpu=7 util=7 margin=0
When you take a closer look at what is running on the small cores, I started to notice something else. With the event:
sched_tune_tasks_update
I noticed a pattern with tasks that had "boost=0 max_boost=50". When the bug is occurring, you would see small cores running tasks with "boost=0 max_boost=50" and when no bug, only big cores would have tasks with "boost=0 max_boost=50". I have not been able to track down why sometimes these tasks that received a boost as some point are running on small cores. I also cannot discern why this issue has been inconsistent. I wanted to give a heads up and would appreciate any assistance or information anyone here might have with regards to a possible solution. Thanks for the great work! I look forward to hearing back from you guys.
I have uploaded the a number of traces captured to my G Drive here:
https://drive.google.com/open?id=11YTlGezAMllmlVr9CtMNCz56CTfrJGBd
If there is a better method, please let me know.
Thanks for reporting all this :-)
The schedtune boost level is per cgroup, which means all tasks in the (say) top-app group get the same boost. In case of interactions (a user touching the screen), userspace raises the boost level temporarily to 50, and that applies to _all_ top-app tasks. Most of these tasks typically run on big cores, but this is not guaranteed. Top-app tasks can end up on little cores if there is no idle big CPU when they wake up for example -- we really want to start them ASAP after wake up, even if that means using a little CPU. This scenario is supposed to be infrequent, but certainly not impossible in practice.
So, if you have top-app tasks running on both clusters at any given moment, the boost level is applied to both clusters. It's not really a 'bug' I would say as this is sort of expected. I guess one thing you could try is check in which scenario do you get top-app tasks on the little side. Are there big CPUs idle when the task wakes-up ? If yes, then maybe there is indeed something that needs fixing ... Otherwise, this is basically a design choice :-)
I hope that makes sense.
Thanks, Quentin
First, thanks for getting back to me. I appreciate that you guys take the time to help the community like this.
My fiend who owns a Pixel 3 is gathering the info you requested and will reply back. In the meantime, I wanted to point out some findings. This is a snippet from "stune_trace-20190321_192157.log" that I previously provided that captured the problem I am seeing. See below:
Binder:1188_1D-8087 [007] d..2 12046.707148: sched_waking: comm=remote-ui pid=3751 prio=120 target_cpu=003 Binder:1188_1D-8087 [007] d..2 12046.707153: sched_boost_task: comm=remote-ui pid=3751 util=115 margin=454 Binder:1188_1D-8087 [007] d..2 12046.707154: sched_find_best_target: pid=3751 comm=remote-ui prefer_idle=1 start_cpu=4 best_idle=-1 best_active=-1 target=4 Binder:1188_1D-8087 [007] d..2 12046.707155: sched_task_util: pid=3751 comm=remote-ui util=115 prev_cpu=3 next_cpu=4 backup_cpu=-1 target_cpu=4 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=2344 Binder:1188_1D-8087 [007] d..2 12046.707155: sched_migrate_task: comm=remote-ui pid=3751 prio=120 load=115 orig_cpu=3 dest_cpu=4 Binder:1188_1D-8087 [007] d..4 12046.707164: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=4 tasks=2 idx=3 boost=50 max_boost=50 Binder:1188_1D-8087 [007] d.h9 12046.707177: sched_boost_cpu: cpu=0 util=26 margin=499 Binder:1188_1D-8087 [007] d.h9 12046.707178: sched_boost_cpu: cpu=1 util=102 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707179: sched_boost_cpu: cpu=2 util=71 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707179: sched_boost_cpu: cpu=3 util=32 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707180: sched_boost_cpu: cpu=4 util=255 margin=384 Binder:1188_1D-8087 [007] d.h9 12046.707180: sched_boost_cpu: cpu=5 util=122 margin=451 Binder:1188_1D-8087 [007] d.h9 12046.707181: sched_boost_cpu: cpu=6 util=158 margin=433 Binder:1188_1D-8087 [007] d.h9 12046.707181: sched_boost_cpu: cpu=7 util=235 margin=394 Binder:1188_1D-8087 [007] d..3 12046.707193: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=7 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [007] d..2 12046.707195: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=RenderThread next_pid=2319 next_prio=120 <...>-2319 [007] d..3 12046.707208: sched_tune_tasks_update: pid=2319 comm=RenderThread cpu=7 tasks=0 idx=1 boost=0 max_boost=50 <...>-2319 [007] d..1 12046.707211: sched_load_balance: cpu=7 state=newly_idle balance=1 group=0x0 busy_nr=2 imbalance=0 flags=0x1 ld_moved=0 bal_int=16 <...>-2319 [007] d..1 12046.707213: sched_load_balance: cpu=7 state=newly_idle balance=1 group=0x0 busy_nr=0 imbalance=0 flags=0x0 ld_moved=0 bal_int=16 <...>-2319 [007] d..2 12046.707215: sched_switch: prev_comm=RenderThread prev_pid=2319 prev_prio=120 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120 ndroid.systemui-2083 [006] d..4 12046.707224: sched_waking: comm=Binder:1188_1D pid=8087 prio=120 target_cpu=007 ndroid.systemui-2083 [006] d..4 12046.707227: sched_migrate_task: comm=Binder:1188_1D pid=8087 prio=120 load=15 orig_cpu=7 dest_cpu=6 ndroid.systemui-2083 [006] d..5 12046.707232: sched_boost_cpu: cpu=6 util=158 margin=433 ndroid.systemui-2083 [006] d..6 12046.707233: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=2 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..3 12046.707236: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=1 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..2 12046.707238: sched_switch: prev_comm=ndroid.systemui prev_pid=2083 prev_prio=120 prev_state=S ==> next_comm=Binder:1188_1D next_pid=8087 next_prio=120 Binder:1188_1D-8087 [006] d..2 12046.707270: sched_waking: comm=ndroid.systemui pid=2083 prio=120 target_cpu=006 Binder:1188_1D-8087 [006] d..4 12046.707274: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=2 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..3 12046.707281: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..2 12046.707283: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=ndroid.systemui next_pid=2083 next_prio=120 ndroid.systemui-2083 [006] d..4 12046.707299: sched_waking: comm=Binder:1188_1D pid=8087 prio=120 target_cpu=006 ndroid.systemui-2083 [006] d..6 12046.707302: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=2 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..3 12046.707304: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=1 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..2 12046.707306: sched_switch: prev_comm=ndroid.systemui prev_pid=2083 prev_prio=120 prev_state=S ==> next_comm=Binder:1188_1D next_pid=8087 next_prio=120 Binder:1188_1D-8087 [006] d..2 12046.707326: sched_waking: comm=ndroid.systemui pid=2083 prio=120 target_cpu=006 Binder:1188_1D-8087 [006] d..4 12046.707330: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=2 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..3 12046.707336: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..2 12046.707339: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=ndroid.systemui next_pid=2083 next_prio=120 surfaceflinger-606 [003] d..1 12046.707383: sched_waking: comm=sfEventThread pid=632 prio=97 target_cpu=002 android.anim-1355 [005] d..4 12046.707401: sched_waking: comm=Binder:606_5 pid=2524 prio=110 target_cpu=007 ndroid.systemui-2083 [006] d..4 12046.707401: sched_waking: comm=Binder:3120_4 pid=3197 prio=120 target_cpu=004 android.anim-1355 [005] d..4 12046.707404: sched_migrate_task: comm=Binder:606_5 pid=2524 prio=110 load=31 orig_cpu=7 dest_cpu=5 <idle>-0 [002] d..2 12046.707405: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sfEventThread next_pid=632 next_prio=97 ndroid.systemui-2083 [006] d..4 12046.707406: sched_boost_task: comm=Binder:3120_4 pid=3197 util=24 margin=500 ndroid.systemui-2083 [006] d..4 12046.707407: sched_find_best_target: pid=3197 comm=Binder:3120_4 prefer_idle=1 start_cpu=4 best_idle=-1 best_active=-1 target=7 ndroid.systemui-2083 [006] d..4 12046.707407: sched_task_util: pid=3197 comm=Binder:3120_4 util=24 prev_cpu=4 next_cpu=7 backup_cpu=-1 target_cpu=7 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=2187 ndroid.systemui-2083 [006] d..4 12046.707408: sched_migrate_task: comm=Binder:3120_4 pid=3197 prio=120 load=24 orig_cpu=4 dest_cpu=7 android.anim-1355 [005] d..5 12046.707409: sched_boost_cpu: cpu=5 util=122 margin=451 android.anim-1355 [005] d..5 12046.707410: sched_boost_cpu: cpu=5 util=122 margin=451 android.anim-1355 [005] d..6 12046.707411: sched_tune_tasks_update: pid=2524 comm=Binder:606_5 cpu=5 tasks=2 idx=1 boost=0 max_boost=50 <...>-632 [002] d..2 12046.707412: sched_boost_cpu: cpu=2 util=71 margin=0 ndroid.systemui-2083 [006] d..6 12046.707413: sched_tune_tasks_update: pid=3197 comm=Binder:3120_4 cpu=7 tasks=2 idx=3 boost=50 max_boost=50 android.anim-1355 [005] d..3 12046.707414: sched_tune_tasks_update: pid=1355 comm=android.anim cpu=5 tasks=0 idx=3 boost=50 max_boost=50 android.anim-1355 [005] d..2 12046.707416: sched_switch: prev_comm=android.anim prev_pid=1355 prev_prio=110 prev_state=S ==> next_comm=Binder:606_5 next_pid=2524 next_prio=110 <...>-632 [002] d..1 12046.707418: sched_load_balance: cpu=2 state=newly_idle balance=1 group=0x0 busy_nr=0 imbalance=0 flags=0x0 ld_moved=0 bal_int=8 <idle>-0 [007] d..2 12046.707423: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=Binder:3120_4 next_pid=3197 next_prio=120 <...>-632 [002] d..3 12046.707426: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=4 tasks=1 idx=3 boost=50 max_boost=50 <...>-632 [002] d..3 12046.707428: sched_migrate_task: comm=remote-ui pid=3751 prio=120 load=115 orig_cpu=4 dest_cpu=2 <...>-632 [002] d..2 12046.707437: sched_boost_cpu: cpu=2 util=175 margin=0 <...>-632 [002] d..3 12046.707439: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=2 tasks=1 idx=3 boost=50 max_boost=0 <...>-632 [002] d..1 12046.707440: sched_load_balance: cpu=2 state=newly_idle balance=1 group=0xf0 busy_nr=1 imbalance=-31 flags=0x0 ld_moved=1 bal_int=8 <...>-632 [002] d..2 12046.707441: sched_boost_cpu: cpu=2 util=175 margin=424 <...>-632 [002] d..2 12046.707446: sched_switch: prev_comm=sfEventThread prev_pid=632 prev_prio=97 prev_state=S ==> next_comm=remote-ui next_pid=3751 next_prio=120 ndroid.systemui-2083 [006] d..3 12046.707450: sched_waking: comm=InputDispatcher pid=1373 prio=111 target_cpu=007 ndroid.systemui-2083 [006] d..3 12046.707453: sched_boost_task: comm=InputDispatcher pid=1373 util=29 margin=0 ndroid.systemui-2083 [006] d..3 12046.707453: sched_find_best_target: pid=1373 comm=InputDispatcher prefer_idle=1 start_cpu=0 best_idle=-1 best_active=-1 target=7 ndroid.systemui-2083 [006] d..3 12046.707454: sched_task_util: pid=1373 comm=InputDispatcher util=29 prev_cpu=7 next_cpu=7 backup_cpu=-1 target_cpu=7 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=1615 <...>-3751 [002] d.ha 12046.707467: sched_boost_cpu: cpu=0 util=26 margin=499 <...>-3751 [002] d.ha 12046.707468: sched_boost_cpu: cpu=1 util=102 margin=0 <...>-3751 [002] d.ha 12046.707470: sched_boost_cpu: cpu=2 util=175 margin=424 <...>-3751 [002] d.ha 12046.707471: sched_boost_cpu: cpu=3 util=32 margin=0 <...>-3751 [002] d.ha 12046.707472: sched_boost_cpu: cpu=4 util=152 margin=436 <...>-3751 [002] d.ha 12046.707474: sched_boost_cpu: cpu=5 util=122 margin=0 <...>-3751 [002] d.ha 12046.707475: sched_boost_cpu: cpu=6 util=158 margin=433 <...>-3751 [002] d.ha 12046.707476: sched_boost_cpu: cpu=7 util=235 margin=394
I see the above issue over and over when the problem is occurring. Even when bigs are hardly being utilized. I say that it is a possible bug because for me (OnePlus 6 w/ p3 commits) it will either happen or it won't. For example, I can boot up and during interactions, only bigs are boosted like expected. This will go on for a length of time, maybe a few hours or a couple days. And visa versa, when the issue occurs, there is no middle ground. When this issue occurs, during interactions, both clusters run with a margin 500ish due to a boosted TA running on a small core. This problem will continue to persist till we reboot. The issue doesn't always occur at boot either.
When I looked closer at exactly what boosted TA task is running on a small core, I noticed a pattern. I can see that the TA task is being migrated as soon as it's scheduled. FBT() will pick the right target_cpu (big core) but it will then immediately be migrated to a small core. Every single time, regardless of how loaded bigs are. Remember, when this issue doesn't occur, only bigs will remain boosted until overutilized. This is what I would expect. From what I can tell, it looks like the small core is load balancing when newly_idle. I am trying to gather more data. Unfortunately, I am not very knowledgeable about the ins and outs of load balancing but there is most definitely something going on. For example, in the above example, I dont see why "comm=remote-ui pid=3751" would be migrated to CPU2. It is more utilized than CPU4, and remote-ui is stune boosted.
Once again, thanks for your help. There are a number of members from our community that track your work and this mailing list. Keep up the good work!
Regards, Zachariah
I would also like to point out that I have had users from a number of different SD845 devices expressing the same symptom that this "bug" can cause. I thought it might be the fact that we are tying into the stock CAF perf/hal. But I was also getting reports from Pixel 3 users as well. The easy way to tell when this problem is occurring is when during interactions, the min frequency for each cluster is 1766MHz+. You can stop touching the screen and things would idle like normal, but then interact again and the problem would continue to occur. Like I said earlier, I am trying to gather more pertinent data from other P3 users to help backup this claim. Preliminary digging into Pixel 3 traces I provided in my first email, look to show the issue I am describing.
Regards, Zachariah Kennedy
On Tue, Apr 2, 2019 at 1:51 PM Zachariah Kennedy zkennedy87@gmail.com wrote:
First, thanks for getting back to me. I appreciate that you guys take the time to help the community like this.
My fiend who owns a Pixel 3 is gathering the info you requested and will reply back. In the meantime, I wanted to point out some findings. This is a snippet from "stune_trace-20190321_192157.log" that I previously provided that captured the problem I am seeing. See below:
Binder:1188_1D-8087 [007] d..2 12046.707148: sched_waking: comm=remote-ui pid=3751 prio=120 target_cpu=003 Binder:1188_1D-8087 [007] d..2 12046.707153: sched_boost_task: comm=remote-ui pid=3751 util=115 margin=454 Binder:1188_1D-8087 [007] d..2 12046.707154: sched_find_best_target: pid=3751 comm=remote-ui prefer_idle=1 start_cpu=4 best_idle=-1 best_active=-1 target=4 Binder:1188_1D-8087 [007] d..2 12046.707155: sched_task_util: pid=3751 comm=remote-ui util=115 prev_cpu=3 next_cpu=4 backup_cpu=-1 target_cpu=4 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=2344 Binder:1188_1D-8087 [007] d..2 12046.707155: sched_migrate_task: comm=remote-ui pid=3751 prio=120 load=115 orig_cpu=3 dest_cpu=4 Binder:1188_1D-8087 [007] d..4 12046.707164: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=4 tasks=2 idx=3 boost=50 max_boost=50 Binder:1188_1D-8087 [007] d.h9 12046.707177: sched_boost_cpu: cpu=0 util=26 margin=499 Binder:1188_1D-8087 [007] d.h9 12046.707178: sched_boost_cpu: cpu=1 util=102 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707179: sched_boost_cpu: cpu=2 util=71 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707179: sched_boost_cpu: cpu=3 util=32 margin=0 Binder:1188_1D-8087 [007] d.h9 12046.707180: sched_boost_cpu: cpu=4 util=255 margin=384 Binder:1188_1D-8087 [007] d.h9 12046.707180: sched_boost_cpu: cpu=5 util=122 margin=451 Binder:1188_1D-8087 [007] d.h9 12046.707181: sched_boost_cpu: cpu=6 util=158 margin=433 Binder:1188_1D-8087 [007] d.h9 12046.707181: sched_boost_cpu: cpu=7 util=235 margin=394 Binder:1188_1D-8087 [007] d..3 12046.707193: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=7 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [007] d..2 12046.707195: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=RenderThread next_pid=2319 next_prio=120 <...>-2319 [007] d..3 12046.707208: sched_tune_tasks_update: pid=2319 comm=RenderThread cpu=7 tasks=0 idx=1 boost=0 max_boost=50 <...>-2319 [007] d..1 12046.707211: sched_load_balance: cpu=7 state=newly_idle balance=1 group=0x0 busy_nr=2 imbalance=0 flags=0x1 ld_moved=0 bal_int=16 <...>-2319 [007] d..1 12046.707213: sched_load_balance: cpu=7 state=newly_idle balance=1 group=0x0 busy_nr=0 imbalance=0 flags=0x0 ld_moved=0 bal_int=16 <...>-2319 [007] d..2 12046.707215: sched_switch: prev_comm=RenderThread prev_pid=2319 prev_prio=120 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120 ndroid.systemui-2083 [006] d..4 12046.707224: sched_waking: comm=Binder:1188_1D pid=8087 prio=120 target_cpu=007 ndroid.systemui-2083 [006] d..4 12046.707227: sched_migrate_task: comm=Binder:1188_1D pid=8087 prio=120 load=15 orig_cpu=7 dest_cpu=6 ndroid.systemui-2083 [006] d..5 12046.707232: sched_boost_cpu: cpu=6 util=158 margin=433 ndroid.systemui-2083 [006] d..6 12046.707233: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=2 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..3 12046.707236: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=1 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..2 12046.707238: sched_switch: prev_comm=ndroid.systemui prev_pid=2083 prev_prio=120 prev_state=S ==> next_comm=Binder:1188_1D next_pid=8087 next_prio=120 Binder:1188_1D-8087 [006] d..2 12046.707270: sched_waking: comm=ndroid.systemui pid=2083 prio=120 target_cpu=006 Binder:1188_1D-8087 [006] d..4 12046.707274: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=2 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..3 12046.707281: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..2 12046.707283: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=ndroid.systemui next_pid=2083 next_prio=120 ndroid.systemui-2083 [006] d..4 12046.707299: sched_waking: comm=Binder:1188_1D pid=8087 prio=120 target_cpu=006 ndroid.systemui-2083 [006] d..6 12046.707302: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=2 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..3 12046.707304: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=1 idx=1 boost=0 max_boost=50 ndroid.systemui-2083 [006] d..2 12046.707306: sched_switch: prev_comm=ndroid.systemui prev_pid=2083 prev_prio=120 prev_state=S ==> next_comm=Binder:1188_1D next_pid=8087 next_prio=120 Binder:1188_1D-8087 [006] d..2 12046.707326: sched_waking: comm=ndroid.systemui pid=2083 prio=120 target_cpu=006 Binder:1188_1D-8087 [006] d..4 12046.707330: sched_tune_tasks_update: pid=2083 comm=ndroid.systemui cpu=6 tasks=2 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..3 12046.707336: sched_tune_tasks_update: pid=8087 comm=Binder:1188_1D cpu=6 tasks=1 idx=1 boost=0 max_boost=50 Binder:1188_1D-8087 [006] d..2 12046.707339: sched_switch: prev_comm=Binder:1188_1D prev_pid=8087 prev_prio=120 prev_state=S ==> next_comm=ndroid.systemui next_pid=2083 next_prio=120 surfaceflinger-606 [003] d..1 12046.707383: sched_waking: comm=sfEventThread pid=632 prio=97 target_cpu=002 android.anim-1355 [005] d..4 12046.707401: sched_waking: comm=Binder:606_5 pid=2524 prio=110 target_cpu=007 ndroid.systemui-2083 [006] d..4 12046.707401: sched_waking: comm=Binder:3120_4 pid=3197 prio=120 target_cpu=004 android.anim-1355 [005] d..4 12046.707404: sched_migrate_task: comm=Binder:606_5 pid=2524 prio=110 load=31 orig_cpu=7 dest_cpu=5 <idle>-0 [002] d..2 12046.707405: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sfEventThread next_pid=632 next_prio=97 ndroid.systemui-2083 [006] d..4 12046.707406: sched_boost_task: comm=Binder:3120_4 pid=3197 util=24 margin=500 ndroid.systemui-2083 [006] d..4 12046.707407: sched_find_best_target: pid=3197 comm=Binder:3120_4 prefer_idle=1 start_cpu=4 best_idle=-1 best_active=-1 target=7 ndroid.systemui-2083 [006] d..4 12046.707407: sched_task_util: pid=3197 comm=Binder:3120_4 util=24 prev_cpu=4 next_cpu=7 backup_cpu=-1 target_cpu=7 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=2187 ndroid.systemui-2083 [006] d..4 12046.707408: sched_migrate_task: comm=Binder:3120_4 pid=3197 prio=120 load=24 orig_cpu=4 dest_cpu=7 android.anim-1355 [005] d..5 12046.707409: sched_boost_cpu: cpu=5 util=122 margin=451 android.anim-1355 [005] d..5 12046.707410: sched_boost_cpu: cpu=5 util=122 margin=451 android.anim-1355 [005] d..6 12046.707411: sched_tune_tasks_update: pid=2524 comm=Binder:606_5 cpu=5 tasks=2 idx=1 boost=0 max_boost=50 <...>-632 [002] d..2 12046.707412: sched_boost_cpu: cpu=2 util=71 margin=0 ndroid.systemui-2083 [006] d..6 12046.707413: sched_tune_tasks_update: pid=3197 comm=Binder:3120_4 cpu=7 tasks=2 idx=3 boost=50 max_boost=50 android.anim-1355 [005] d..3 12046.707414: sched_tune_tasks_update: pid=1355 comm=android.anim cpu=5 tasks=0 idx=3 boost=50 max_boost=50 android.anim-1355 [005] d..2 12046.707416: sched_switch: prev_comm=android.anim prev_pid=1355 prev_prio=110 prev_state=S ==> next_comm=Binder:606_5 next_pid=2524 next_prio=110 <...>-632 [002] d..1 12046.707418: sched_load_balance: cpu=2 state=newly_idle balance=1 group=0x0 busy_nr=0 imbalance=0 flags=0x0 ld_moved=0 bal_int=8 <idle>-0 [007] d..2 12046.707423: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=Binder:3120_4 next_pid=3197 next_prio=120 <...>-632 [002] d..3 12046.707426: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=4 tasks=1 idx=3 boost=50 max_boost=50 <...>-632 [002] d..3 12046.707428: sched_migrate_task: comm=remote-ui pid=3751 prio=120 load=115 orig_cpu=4 dest_cpu=2 <...>-632 [002] d..2 12046.707437: sched_boost_cpu: cpu=2 util=175 margin=0 <...>-632 [002] d..3 12046.707439: sched_tune_tasks_update: pid=3751 comm=remote-ui cpu=2 tasks=1 idx=3 boost=50 max_boost=0 <...>-632 [002] d..1 12046.707440: sched_load_balance: cpu=2 state=newly_idle balance=1 group=0xf0 busy_nr=1 imbalance=-31 flags=0x0 ld_moved=1 bal_int=8 <...>-632 [002] d..2 12046.707441: sched_boost_cpu: cpu=2 util=175 margin=424 <...>-632 [002] d..2 12046.707446: sched_switch: prev_comm=sfEventThread prev_pid=632 prev_prio=97 prev_state=S ==> next_comm=remote-ui next_pid=3751 next_prio=120 ndroid.systemui-2083 [006] d..3 12046.707450: sched_waking: comm=InputDispatcher pid=1373 prio=111 target_cpu=007 ndroid.systemui-2083 [006] d..3 12046.707453: sched_boost_task: comm=InputDispatcher pid=1373 util=29 margin=0 ndroid.systemui-2083 [006] d..3 12046.707453: sched_find_best_target: pid=1373 comm=InputDispatcher prefer_idle=1 start_cpu=0 best_idle=-1 best_active=-1 target=7 ndroid.systemui-2083 [006] d..3 12046.707454: sched_task_util: pid=1373 comm=InputDispatcher util=29 prev_cpu=7 next_cpu=7 backup_cpu=-1 target_cpu=7 need_idle=0 fastpath=0 placement_boost=1 rtg_cpu=-1 latency=1615 <...>-3751 [002] d.ha 12046.707467: sched_boost_cpu: cpu=0 util=26 margin=499 <...>-3751 [002] d.ha 12046.707468: sched_boost_cpu: cpu=1 util=102 margin=0 <...>-3751 [002] d.ha 12046.707470: sched_boost_cpu: cpu=2 util=175 margin=424 <...>-3751 [002] d.ha 12046.707471: sched_boost_cpu: cpu=3 util=32 margin=0 <...>-3751 [002] d.ha 12046.707472: sched_boost_cpu: cpu=4 util=152 margin=436 <...>-3751 [002] d.ha 12046.707474: sched_boost_cpu: cpu=5 util=122 margin=0 <...>-3751 [002] d.ha 12046.707475: sched_boost_cpu: cpu=6 util=158 margin=433 <...>-3751 [002] d.ha 12046.707476: sched_boost_cpu: cpu=7 util=235 margin=394
I see the above issue over and over when the problem is occurring. Even when bigs are hardly being utilized. I say that it is a possible bug because for me (OnePlus 6 w/ p3 commits) it will either happen or it won't. For example, I can boot up and during interactions, only bigs are boosted like expected. This will go on for a length of time, maybe a few hours or a couple days. And visa versa, when the issue occurs, there is no middle ground. When this issue occurs, during interactions, both clusters run with a margin 500ish due to a boosted TA running on a small core. This problem will continue to persist till we reboot. The issue doesn't always occur at boot either.
When I looked closer at exactly what boosted TA task is running on a small core, I noticed a pattern. I can see that the TA task is being migrated as soon as it's scheduled. FBT() will pick the right target_cpu (big core) but it will then immediately be migrated to a small core. Every single time, regardless of how loaded bigs are. Remember, when this issue doesn't occur, only bigs will remain boosted until overutilized. This is what I would expect. From what I can tell, it looks like the small core is load balancing when newly_idle. I am trying to gather more data. Unfortunately, I am not very knowledgeable about the ins and outs of load balancing but there is most definitely something going on. For example, in the above example, I dont see why "comm=remote-ui pid=3751" would be migrated to CPU2. It is more utilized than CPU4, and remote-ui is stune boosted.
Once again, thanks for your help. There are a number of members from our community that track your work and this mailing list. Keep up the good work!
Regards, Zachariah