Hello,
kernel test robot noticed a 11.5% improvement of lmbench3.PIPE.latency.us on:
commit: db86f55bf81a3a297be05ee8775ae9a8c6e3a599 ("cpuidle: governors: menu: Select polling state in some more cases") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: lmbench3 config: x86_64-rhel-9.4 compiler: gcc-14 test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory parameters:
test_memory_size: 50% nr_threads: 20% mode: development test: PIPE cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_thread_ops 13.4% improvement | | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | | test parameters | cpufreq_governor=performance | | | test=context_switch1 | +------------------+---------------------------------------------------------------------------------------------+
Details are as below: -------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20251107/202511071439.d081322d-lkp@i...
========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase: gcc-14/performance/x86_64-rhel-9.4/development/20%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/PIPE/50%/lmbench3
commit: v6.18-rc3 db86f55bf8 ("cpuidle: governors: menu: Select polling state in some more cases")
v6.18-rc3 db86f55bf81a3a297be05ee8775 ---------------- --------------------------- %stddev %change %stddev \ | \ 2.984e+08 ± 3% +13.0% 3.373e+08 ± 2% cpuidle..usage 3870548 ± 3% +8.9% 4215418 ± 3% vmstat.system.cs 5.11 -11.5% 4.52 lmbench3.PIPE.latency.us 2.949e+08 ± 3% +13.0% 3.334e+08 ± 2% lmbench3.time.voluntary_context_switches 1474808 ± 2% +13.7% 1676175 ± 2% sched_debug.cpu.nr_switches.avg 908098 ± 4% +11.5% 1012241 ± 5% sched_debug.cpu.nr_switches.stddev 35.76 ± 6% +70.7% 61.02 ± 36% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 438.60 ± 6% -42.6% 251.80 ± 28% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 35.75 ± 6% +70.7% 61.01 ± 36% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 6.834e+09 ± 2% +8.8% 7.438e+09 ± 2% perf-stat.i.branch-instructions 4003246 ± 3% +9.0% 4365130 ± 4% perf-stat.i.context-switches 14211 ± 7% +30.7% 18567 ± 6% perf-stat.i.cycles-between-cache-misses 3.305e+10 +7.8% 3.563e+10 ± 2% perf-stat.i.instructions 17.81 ± 3% +9.1% 19.42 ± 4% perf-stat.i.metric.K/sec 6.738e+09 ± 2% +8.8% 7.328e+09 ± 2% perf-stat.ps.branch-instructions 3917194 ± 3% +9.0% 4267905 ± 4% perf-stat.ps.context-switches 3.257e+10 +7.8% 3.51e+10 ± 2% perf-stat.ps.instructions 4.907e+12 ± 5% +11.7% 5.481e+12 ± 3% perf-stat.total.instructions
*************************************************************************************************** lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: gcc-14/performance/x86_64-rhel-9.4/debian-13-x86_64-20250902.cgz/lkp-spr-2sp4/context_switch1/will-it-scale
commit: v6.18-rc3 db86f55bf8 ("cpuidle: governors: menu: Select polling state in some more cases")
v6.18-rc3 db86f55bf81a3a297be05ee8775 ---------------- --------------------------- %stddev %change %stddev \ | \ 8360618 ± 14% +37.6% 11502592 ± 9% meminfo.DirectMap2M 0.52 ± 4% +0.2 0.68 ± 2% mpstat.cpu.all.irq% 0.07 ± 2% +0.0 0.08 ± 2% mpstat.cpu.all.soft% 24410019 +11.1% 27123411 sched_debug.cpu.nr_switches.avg 56464105 ± 3% +17.9% 66574674 ± 5% sched_debug.cpu.nr_switches.max 473411 +2.6% 485915 proc-vmstat.nr_active_anon 1205948 +1.0% 1218407 proc-vmstat.nr_file_pages 292446 +4.3% 304940 proc-vmstat.nr_shmem 473411 +2.6% 485915 proc-vmstat.nr_zone_active_anon 4.03 ± 3% -2.5 1.49 ± 21% turbostat.C1% 4.83 ± 9% -1.0 3.86 ± 13% turbostat.C1E% 1.087e+08 ± 3% +59.1% 1.729e+08 ± 3% turbostat.IRQ 0.03 ± 13% +2.0 2.03 turbostat.POLL% 4.745e+10 +8.5% 5.147e+10 perf-stat.i.branch-instructions 0.94 -0.0 0.90 ± 3% perf-stat.i.branch-miss-rate% 2.567e+08 +9.1% 2.8e+08 perf-stat.i.branch-misses 48551481 +8.1% 52493759 perf-stat.i.context-switches 2.76 -4.8% 2.63 ± 2% perf-stat.i.cpi 593.92 +5.6% 627.34 perf-stat.i.cpu-migrations 2.367e+11 +8.2% 2.561e+11 perf-stat.i.instructions 0.62 +9.8% 0.68 perf-stat.i.ipc 216.75 +8.1% 234.36 perf-stat.i.metric.K/sec 1.85 -7.0% 1.73 perf-stat.overall.cpi 0.54 +7.5% 0.58 perf-stat.overall.ipc 204618 +2.0% 208750 perf-stat.overall.path-length 4.674e+10 +8.4% 5.066e+10 perf-stat.ps.branch-instructions 2.532e+08 +9.0% 2.76e+08 perf-stat.ps.branch-misses 47800355 +8.1% 51652366 perf-stat.ps.context-switches 591.27 +5.5% 623.50 perf-stat.ps.cpu-migrations 2.332e+11 +8.1% 2.521e+11 perf-stat.ps.instructions 7.417e+13 +8.1% 8.019e+13 perf-stat.total.instructions 534510 ± 9% +24.8% 666973 ± 8% will-it-scale.1.linear 471854 ± 2% +26.9% 598959 ± 13% will-it-scale.1.processes 534510 ± 9% +24.8% 666973 ± 8% will-it-scale.1.threads 59865194 ± 9% +24.8% 74700994 ± 8% will-it-scale.112.linear 52446572 ± 3% +17.2% 61467049 will-it-scale.112.processes 52.12 ± 2% -11.2% 46.28 will-it-scale.112.processes_idle 89797792 ± 9% +24.8% 1.121e+08 ± 8% will-it-scale.168.linear 90159107 +5.3% 94951409 will-it-scale.168.processes 23.53 -8.4% 21.56 will-it-scale.168.processes_idle 1.197e+08 ± 9% +24.8% 1.494e+08 ± 8% will-it-scale.224.linear 29932597 ± 9% +24.8% 37350497 ± 8% will-it-scale.56.linear 24228097 ± 2% +22.6% 29712218 ± 2% will-it-scale.56.processes 80.80 -3.6% 77.89 will-it-scale.56.processes_idle 495994 +13.5% 562996 ± 2% will-it-scale.per_process_ops 211008 ± 5% +13.4% 239359 ± 4% will-it-scale.per_thread_ops 5748 +1.7% 5848 will-it-scale.time.percent_of_cpu_this_job_got 16823 +1.5% 17082 will-it-scale.time.system_time 1410 +4.2% 1469 will-it-scale.time.user_time 3.625e+08 +6.0% 3.842e+08 will-it-scale.workload 6.75 ± 9% -3.4 3.33 ± 4% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 8.01 ± 8% -1.7 6.26 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 7.89 ± 8% -1.7 6.14 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 10.30 ± 6% -1.7 8.60 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64 4.36 ± 2% -0.4 3.99 ± 5% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.59 ± 2% -0.4 3.23 ± 5% perf-profile.calltrace.cycles-pp.anon_pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.91 ± 2% -0.4 3.55 ± 5% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.75 ± 2% -0.4 2.39 ± 6% perf-profile.calltrace.cycles-pp.__wake_up_sync_key.anon_pipe_write.vfs_write.ksys_write.do_syscall_64 2.50 ± 2% -0.4 2.14 ± 6% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.anon_pipe_write.vfs_write 2.44 ± 2% -0.4 2.09 ± 6% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.anon_pipe_write 2.56 ± 2% -0.4 2.21 ± 6% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.anon_pipe_write.vfs_write.ksys_write 1.73 ± 2% -0.3 1.39 ± 8% perf-profile.calltrace.cycles-pp.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key 1.48 ± 2% -0.3 1.14 ± 10% perf-profile.calltrace.cycles-pp.__smp_call_single_queue.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function.__wake_up_common 1.35 ± 2% -0.3 1.02 ± 11% perf-profile.calltrace.cycles-pp.call_function_single_prep_ipi.__smp_call_single_queue.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function 61.64 +1.2 62.81 perf-profile.calltrace.cycles-pp.__schedule.schedule.anon_pipe_read.vfs_read.ksys_read 62.24 +1.2 63.45 perf-profile.calltrace.cycles-pp.schedule.anon_pipe_read.vfs_read.ksys_read.do_syscall_64 0.00 +1.7 1.72 ± 23% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 6.82 ± 9% -3.5 3.36 ± 4% perf-profile.children.cycles-pp.intel_idle 8.10 ± 8% -1.8 6.34 ± 6% perf-profile.children.cycles-pp.cpuidle_enter 8.04 ± 8% -1.8 6.28 ± 6% perf-profile.children.cycles-pp.cpuidle_enter_state 10.45 ± 6% -1.7 8.72 ± 4% perf-profile.children.cycles-pp.cpuidle_idle_call 4.41 ± 2% -0.4 4.04 ± 5% perf-profile.children.cycles-pp.ksys_write 2.76 ± 2% -0.4 2.40 ± 5% perf-profile.children.cycles-pp.__wake_up_sync_key 3.63 ± 2% -0.4 3.27 ± 5% perf-profile.children.cycles-pp.anon_pipe_write 2.51 ± 2% -0.4 2.15 ± 6% perf-profile.children.cycles-pp.autoremove_wake_function 2.47 ± 2% -0.4 2.11 ± 6% perf-profile.children.cycles-pp.try_to_wake_up 3.94 ± 2% -0.4 3.58 ± 5% perf-profile.children.cycles-pp.vfs_write 2.57 ± 2% -0.4 2.22 ± 6% perf-profile.children.cycles-pp.__wake_up_common 1.74 ± 2% -0.3 1.40 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist 1.49 ± 2% -0.3 1.15 ± 10% perf-profile.children.cycles-pp.__smp_call_single_queue 1.36 ± 3% -0.3 1.03 ± 11% perf-profile.children.cycles-pp.call_function_single_prep_ipi 0.22 ± 2% +0.0 0.25 ± 8% perf-profile.children.cycles-pp.switch_mm_irqs_off 0.23 ± 4% +0.0 0.28 ± 3% perf-profile.children.cycles-pp.local_clock_noinstr 62.26 +1.2 63.47 perf-profile.children.cycles-pp.schedule 66.06 +1.4 67.43 perf-profile.children.cycles-pp.__schedule 0.00 +1.8 1.75 ± 23% perf-profile.children.cycles-pp.poll_idle 6.82 ± 9% -3.5 3.36 ± 4% perf-profile.self.cycles-pp.intel_idle 1.35 ± 3% -0.3 1.02 ± 11% perf-profile.self.cycles-pp.call_function_single_prep_ipi 0.26 ± 4% -0.1 0.16 ± 4% perf-profile.self.cycles-pp.flush_smp_call_function_queue 0.24 ± 3% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.set_next_entity 0.05 +0.0 0.06 perf-profile.self.cycles-pp.local_clock_noinstr 0.00 +1.7 1.66 ± 23% perf-profile.self.cycles-pp.poll_idle
Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
linux-stable-mirror@lists.linaro.org