Hello,
After I've upgraded from 5.7.11 (didn't have access to this machine for about 10 months) to 5.12.10, I've noticed that anytime I used all my cores to, for example, compile a project, the system would degrade significantly in performance and applications would start to stutter. Compiles are also about 3-4x slower on kernels with the regression vs. without. After debugging this for the past 24 hours or so, I've narrowed it down to a change between 5.7.19 and 5.8.1. Sadly, bisect does not help, because trying to run any of the 5.8 RC kernels causes the kernel to be stuck before init, without any apparent errors on the screen (and I don't have a serial cable to dump the kernel output to). I'm listing all the information I know and my system information below.
Reproduction steps (dunno if this helps, but): 1. Boot with kernel with the regression 2. Do something that uses all cores, like compiling the Linux kernel 3. Observe long compile times and stuttering applications (which doesn't happen even on full load with a working kernel)
Regression between kernel versions: 5.7 - working 5.7.11 - working 5.7.19 - working 5.8.1 - broken 5.8.18 - broken 5.12.10 - broken 8ecfa36cd4db3275bf3b6c6f32c7e3c6bb537de2 (master on 2021-06-13) - broken
System information: Gentoo Linux 17.1 amd64
CPU info: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz stepping : 3 microcode : 0x27 cpu MHz : 2042.008 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple shadow_vmcs bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 7199.96 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
RAM: 16GiB GPU: VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730] cat /proc/sys/kernel/tainted: 0
Thanks in advance.
Best Regards, Adam Edge
On Mon, Jun 14, 2021 at 07:21:55PM +0000, Adam Edge wrote:
Hello,
After I've upgraded from 5.7.11 (didn't have access to this machine for about 10 months) to 5.12.10, I've noticed that anytime I used all my cores to, for example, compile a project, the system would degrade significantly in performance and applications would start to stutter. Compiles are also about 3-4x slower on kernels with the regression vs. without. After debugging this for the past 24 hours or so, I've narrowed it down to a change between 5.7.19 and 5.8.1. Sadly, bisect does not help, because trying to run any of the 5.8 RC kernels causes the kernel to be stuck before init, without any apparent errors on the screen (and I don't have a serial cable to dump the kernel output to). I'm listing all the information I know and my system information below.
Reproduction steps (dunno if this helps, but):
- Boot with kernel with the regression
- Do something that uses all cores, like compiling the Linux kernel
- Observe long compile times and stuttering applications (which doesn't
happen even on full load with a working kernel)
Regression between kernel versions: 5.7 - working 5.7.11 - working 5.7.19 - working 5.8.1 - broken 5.8.18 - broken 5.12.10 - broken 8ecfa36cd4db3275bf3b6c6f32c7e3c6bb537de2 (master on 2021-06-13) - broken
Can you use 'git bisect' to track down the commit that caused the problem?
thanks,
greg k-h
On Tuesday, June 15th, 2021 at 6:01 AM, Greg KH gregkh@linuxfoundation.org wrote:
Can you use 'git bisect' to track down the commit that caused the problem?
As I have mentioned in my previous email, anytime I'm within the v5.8-rc* range of commits, running the kernel fails to get past a certain point at boot. init doesn't get executed, but SysRq keys work and I can reboot from there. Has this happened before? If there is an alternate method of getting the kernel debug logs for this (that doesn't involve a serial connection, as I don't have the equipment for that), I'm happy to get them for you.
Best Regards, Adam Edge
On Tue, Jun 15, 2021 at 06:13:46AM +0000, Adam Edge wrote:
On Tuesday, June 15th, 2021 at 6:01 AM, Greg KH gregkh@linuxfoundation.org wrote:
Can you use 'git bisect' to track down the commit that caused the problem?
As I have mentioned in my previous email, anytime I'm within the v5.8-rc* range of commits, running the kernel fails to get past a certain point at boot. init doesn't get executed, but SysRq keys work and I can reboot from there. Has this happened before? If there is an alternate method of getting the kernel debug logs for this (that doesn't involve a serial connection, as I don't have the equipment for that), I'm happy to get them for you.
That sounds like the offending commit is in that range, which is good, you have narrowed it down.
Just use 'git bisect' to track down the place where this fails at, that's all we want. No need to get a debug log yet, "failing to boot" is a good sign something is going wrong :)
thanks,
greg k-h
On Tuesday, June 15th, 2021 at 6:17 AM, Greg KH gregkh@linuxfoundation.org wrote:
That sounds like the offending commit is in that range, which is good, you have narrowed it down. Just use 'git bisect' to track down the place where this fails at, that's all we want. No need to get a debug log yet, "failing to boot" is a good sign something is going wrong :)
Thank you. I will have to get back to this in the weekend then, I will inform you (and the list! :^) of the bisect results as soon as possible.
Best Regards, Adam Edge
linux-stable-mirror@lists.linaro.org