Hello,
kernel test robot noticed a 11.5% improvement of lmbench3.PIPE.latency.us on:
commit: db86f55bf81a3a297be05ee8775ae9a8c6e3a599 ("cpuidle: governors: menu: Select polling state in some more cases") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: lmbench3 config: x86_64-rhel-9.4 compiler: gcc-14 test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory parameters:
test_memory_size: 50% nr_threads: 20% mode: development test: PIPE cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_thread_ops 13.4% improvement | | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | | test parameters | cpufreq_governor=performance | | | test=context_switch1 | +------------------+---------------------------------------------------------------------------------------------+
Details are as below: -------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20251107/202511071439.d081322d-lkp@i...
========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase: gcc-14/performance/x86_64-rhel-9.4/development/20%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/PIPE/50%/lmbench3
commit: v6.18-rc3 db86f55bf8 ("cpuidle: governors: menu: Select polling state in some more cases")
v6.18-rc3 db86f55bf81a3a297be05ee8775 ---------------- --------------------------- %stddev %change %stddev \ | \ 2.984e+08 ± 3% +13.0% 3.373e+08 ± 2% cpuidle..usage 3870548 ± 3% +8.9% 4215418 ± 3% vmstat.system.cs 5.11 -11.5% 4.52 lmbench3.PIPE.latency.us 2.949e+08 ± 3% +13.0% 3.334e+08 ± 2% lmbench3.time.voluntary_context_switches 1474808 ± 2% +13.7% 1676175 ± 2% sched_debug.cpu.nr_switches.avg 908098 ± 4% +11.5% 1012241 ± 5% sched_debug.cpu.nr_switches.stddev 35.76 ± 6% +70.7% 61.02 ± 36% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 438.60 ± 6% -42.6% 251.80 ± 28% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 35.75 ± 6% +70.7% 61.01 ± 36% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 6.834e+09 ± 2% +8.8% 7.438e+09 ± 2% perf-stat.i.branch-instructions 4003246 ± 3% +9.0% 4365130 ± 4% perf-stat.i.context-switches 14211 ± 7% +30.7% 18567 ± 6% perf-stat.i.cycles-between-cache-misses 3.305e+10 +7.8% 3.563e+10 ± 2% perf-stat.i.instructions 17.81 ± 3% +9.1% 19.42 ± 4% perf-stat.i.metric.K/sec 6.738e+09 ± 2% +8.8% 7.328e+09 ± 2% perf-stat.ps.branch-instructions 3917194 ± 3% +9.0% 4267905 ± 4% perf-stat.ps.context-switches 3.257e+10 +7.8% 3.51e+10 ± 2% perf-stat.ps.instructions 4.907e+12 ± 5% +11.7% 5.481e+12 ± 3% perf-stat.total.instructions
*************************************************************************************************** lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: gcc-14/performance/x86_64-rhel-9.4/debian-13-x86_64-20250902.cgz/lkp-spr-2sp4/context_switch1/will-it-scale
commit: v6.18-rc3 db86f55bf8 ("cpuidle: governors: menu: Select polling state in some more cases")
v6.18-rc3 db86f55bf81a3a297be05ee8775 ---------------- --------------------------- %stddev %change %stddev \ | \ 8360618 ± 14% +37.6% 11502592 ± 9% meminfo.DirectMap2M 0.52 ± 4% +0.2 0.68 ± 2% mpstat.cpu.all.irq% 0.07 ± 2% +0.0 0.08 ± 2% mpstat.cpu.all.soft% 24410019 +11.1% 27123411 sched_debug.cpu.nr_switches.avg 56464105 ± 3% +17.9% 66574674 ± 5% sched_debug.cpu.nr_switches.max 473411 +2.6% 485915 proc-vmstat.nr_active_anon 1205948 +1.0% 1218407 proc-vmstat.nr_file_pages 292446 +4.3% 304940 proc-vmstat.nr_shmem 473411 +2.6% 485915 proc-vmstat.nr_zone_active_anon 4.03 ± 3% -2.5 1.49 ± 21% turbostat.C1% 4.83 ± 9% -1.0 3.86 ± 13% turbostat.C1E% 1.087e+08 ± 3% +59.1% 1.729e+08 ± 3% turbostat.IRQ 0.03 ± 13% +2.0 2.03 turbostat.POLL% 4.745e+10 +8.5% 5.147e+10 perf-stat.i.branch-instructions 0.94 -0.0 0.90 ± 3% perf-stat.i.branch-miss-rate% 2.567e+08 +9.1% 2.8e+08 perf-stat.i.branch-misses 48551481 +8.1% 52493759 perf-stat.i.context-switches 2.76 -4.8% 2.63 ± 2% perf-stat.i.cpi 593.92 +5.6% 627.34 perf-stat.i.cpu-migrations 2.367e+11 +8.2% 2.561e+11 perf-stat.i.instructions 0.62 +9.8% 0.68 perf-stat.i.ipc 216.75 +8.1% 234.36 perf-stat.i.metric.K/sec 1.85 -7.0% 1.73 perf-stat.overall.cpi 0.54 +7.5% 0.58 perf-stat.overall.ipc 204618 +2.0% 208750 perf-stat.overall.path-length 4.674e+10 +8.4% 5.066e+10 perf-stat.ps.branch-instructions 2.532e+08 +9.0% 2.76e+08 perf-stat.ps.branch-misses 47800355 +8.1% 51652366 perf-stat.ps.context-switches 591.27 +5.5% 623.50 perf-stat.ps.cpu-migrations 2.332e+11 +8.1% 2.521e+11 perf-stat.ps.instructions 7.417e+13 +8.1% 8.019e+13 perf-stat.total.instructions 534510 ± 9% +24.8% 666973 ± 8% will-it-scale.1.linear 471854 ± 2% +26.9% 598959 ± 13% will-it-scale.1.processes 534510 ± 9% +24.8% 666973 ± 8% will-it-scale.1.threads 59865194 ± 9% +24.8% 74700994 ± 8% will-it-scale.112.linear 52446572 ± 3% +17.2% 61467049 will-it-scale.112.processes 52.12 ± 2% -11.2% 46.28 will-it-scale.112.processes_idle 89797792 ± 9% +24.8% 1.121e+08 ± 8% will-it-scale.168.linear 90159107 +5.3% 94951409 will-it-scale.168.processes 23.53 -8.4% 21.56 will-it-scale.168.processes_idle 1.197e+08 ± 9% +24.8% 1.494e+08 ± 8% will-it-scale.224.linear 29932597 ± 9% +24.8% 37350497 ± 8% will-it-scale.56.linear 24228097 ± 2% +22.6% 29712218 ± 2% will-it-scale.56.processes 80.80 -3.6% 77.89 will-it-scale.56.processes_idle 495994 +13.5% 562996 ± 2% will-it-scale.per_process_ops 211008 ± 5% +13.4% 239359 ± 4% will-it-scale.per_thread_ops 5748 +1.7% 5848 will-it-scale.time.percent_of_cpu_this_job_got 16823 +1.5% 17082 will-it-scale.time.system_time 1410 +4.2% 1469 will-it-scale.time.user_time 3.625e+08 +6.0% 3.842e+08 will-it-scale.workload 6.75 ± 9% -3.4 3.33 ± 4% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 8.01 ± 8% -1.7 6.26 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 7.89 ± 8% -1.7 6.14 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 10.30 ± 6% -1.7 8.60 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64 4.36 ± 2% -0.4 3.99 ± 5% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.59 ± 2% -0.4 3.23 ± 5% perf-profile.calltrace.cycles-pp.anon_pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.91 ± 2% -0.4 3.55 ± 5% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.75 ± 2% -0.4 2.39 ± 6% perf-profile.calltrace.cycles-pp.__wake_up_sync_key.anon_pipe_write.vfs_write.ksys_write.do_syscall_64 2.50 ± 2% -0.4 2.14 ± 6% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.anon_pipe_write.vfs_write 2.44 ± 2% -0.4 2.09 ± 6% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.anon_pipe_write 2.56 ± 2% -0.4 2.21 ± 6% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.anon_pipe_write.vfs_write.ksys_write 1.73 ± 2% -0.3 1.39 ± 8% perf-profile.calltrace.cycles-pp.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key 1.48 ± 2% -0.3 1.14 ± 10% perf-profile.calltrace.cycles-pp.__smp_call_single_queue.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function.__wake_up_common 1.35 ± 2% -0.3 1.02 ± 11% perf-profile.calltrace.cycles-pp.call_function_single_prep_ipi.__smp_call_single_queue.ttwu_queue_wakelist.try_to_wake_up.autoremove_wake_function 61.64 +1.2 62.81 perf-profile.calltrace.cycles-pp.__schedule.schedule.anon_pipe_read.vfs_read.ksys_read 62.24 +1.2 63.45 perf-profile.calltrace.cycles-pp.schedule.anon_pipe_read.vfs_read.ksys_read.do_syscall_64 0.00 +1.7 1.72 ± 23% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 6.82 ± 9% -3.5 3.36 ± 4% perf-profile.children.cycles-pp.intel_idle 8.10 ± 8% -1.8 6.34 ± 6% perf-profile.children.cycles-pp.cpuidle_enter 8.04 ± 8% -1.8 6.28 ± 6% perf-profile.children.cycles-pp.cpuidle_enter_state 10.45 ± 6% -1.7 8.72 ± 4% perf-profile.children.cycles-pp.cpuidle_idle_call 4.41 ± 2% -0.4 4.04 ± 5% perf-profile.children.cycles-pp.ksys_write 2.76 ± 2% -0.4 2.40 ± 5% perf-profile.children.cycles-pp.__wake_up_sync_key 3.63 ± 2% -0.4 3.27 ± 5% perf-profile.children.cycles-pp.anon_pipe_write 2.51 ± 2% -0.4 2.15 ± 6% perf-profile.children.cycles-pp.autoremove_wake_function 2.47 ± 2% -0.4 2.11 ± 6% perf-profile.children.cycles-pp.try_to_wake_up 3.94 ± 2% -0.4 3.58 ± 5% perf-profile.children.cycles-pp.vfs_write 2.57 ± 2% -0.4 2.22 ± 6% perf-profile.children.cycles-pp.__wake_up_common 1.74 ± 2% -0.3 1.40 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist 1.49 ± 2% -0.3 1.15 ± 10% perf-profile.children.cycles-pp.__smp_call_single_queue 1.36 ± 3% -0.3 1.03 ± 11% perf-profile.children.cycles-pp.call_function_single_prep_ipi 0.22 ± 2% +0.0 0.25 ± 8% perf-profile.children.cycles-pp.switch_mm_irqs_off 0.23 ± 4% +0.0 0.28 ± 3% perf-profile.children.cycles-pp.local_clock_noinstr 62.26 +1.2 63.47 perf-profile.children.cycles-pp.schedule 66.06 +1.4 67.43 perf-profile.children.cycles-pp.__schedule 0.00 +1.8 1.75 ± 23% perf-profile.children.cycles-pp.poll_idle 6.82 ± 9% -3.5 3.36 ± 4% perf-profile.self.cycles-pp.intel_idle 1.35 ± 3% -0.3 1.02 ± 11% perf-profile.self.cycles-pp.call_function_single_prep_ipi 0.26 ± 4% -0.1 0.16 ± 4% perf-profile.self.cycles-pp.flush_smp_call_function_queue 0.24 ± 3% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.set_next_entity 0.05 +0.0 0.06 perf-profile.self.cycles-pp.local_clock_noinstr 0.00 +1.7 1.66 ± 23% perf-profile.self.cycles-pp.poll_idle
Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.