Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

25 Jan 2024

      Thanks for the information!
On Tue, Jan 23, 2024 at 3:58 PM Dan Moulding dan@danm.net wrote:
...
...
This appears the md thread hit some infinite loop, so I would like to
know what it is doing. We can probably get the information with the
perf tool, something like:
perf record -a
perf report
Here you go!
# Total Lost Samples: 0 # # Samples: 78K of event 'cycles' # Event count (approx.): 83127675745 # # Overhead  Command # ........  ............... # 49.31%  md0_raid5 18.63%  md0_raid5 6.07%  md0_raid5 5.50%  md0_raid5 3.09%  md0_raid5 2.48%  md0_raid5 1.89%  md0_raid5 1.45%  ksmd 1.37%  md0_raid5 0.87%  ksmd 0.68%  ksmd 0.56%  md0_raid5 0.52%  md0_raid5 0.46%  ksmd 0.44%  ksmd 0.40%  ksmd 0.39%  md0_raid5 0.37%  md0_raid5 0.33%  md0_raid5 0.31%  md0_raid5

Shared Object                   Symbol ..............................  ................................................... [kernel.kallsyms]               [k] handle_stripe [kernel.kallsyms]               [k] ops_run_io [kernel.kallsyms]               [k] handle_active_stripes.isra.0 [kernel.kallsyms]               [k] do_release_stripe [kernel.kallsyms]               [k] _raw_spin_lock_irqsave [kernel.kallsyms]               [k] r5l_write_stripe [kernel.kallsyms]               [k] md_wakeup_thread [kernel.kallsyms]               [k] ksm_scan_thread [kernel.kallsyms]               [k] stripe_is_lowprio [kernel.kallsyms]               [k] memcmp [kernel.kallsyms]               [k] xxh64 [kernel.kallsyms]               [k] __wake_up_common [kernel.kallsyms]               [k] __wake_up [kernel.kallsyms]               [k] mtree_load [kernel.kallsyms]               [k] try_grab_page [kernel.kallsyms]               [k] follow_p4d_mask.constprop.0 [kernel.kallsyms]               [k] r5l_log_disk_error [kernel.kallsyms]               [k] _raw_spin_lock_irq [kernel.kallsyms]               [k] release_stripe_list [kernel.kallsyms]               [k] release_inactive_stripe_list
It appears the thread is indeed doing something. I haven't got luck to
reproduce this on my hosts. Could you please try whether the following
change fixes the issue (without reverting 0de40f76d567)? I will try to
reproduce the issue on my side.
Junxiao,
Please also help look into this.
Thanks,
Song

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected