On Mon, Aug 22, 2022 at 3:59 PM Song Liu song@kernel.org wrote:
On Mon, Aug 22, 2022 at 3:44 PM Thomas Deutschmann whissi@whissi.de wrote:
On 2022-08-22 23:52, Song Liu wrote:
Hmm.. I still cannot repro the hang in my test. I have:
[root@eth50-1 ~]# mount | grep mnt /dev/md0 on /root/mnt type ext4 (rw,relatime,stripe=384) [root@eth50-1 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 253:0 0 32G 0 disk ├─vda1 253:1 0 2G 0 part /boot └─vda2 253:2 0 30G 0 part / nvme0n1 259:0 0 4G 0 disk └─md0 9:0 0 12G 0 raid5 /root/mnt nvme2n1 259:1 0 4G 0 disk └─md0 9:0 0 12G 0 raid5 /root/mnt nvme3n1 259:2 0 4G 0 disk └─md0 9:0 0 12G 0 raid5 /root/mnt nvme1n1 259:3 0 4G 0 disk └─md0 9:0 0 12G 0 raid5 /root/mnt
[root@eth50-1 ~]# history 381 fio iou/repro.fio 382 fsfreeze --freeze /root/mnt 383 fsfreeze --unfreeze /root/mnt 384 fio iou/repro.fio 385 fsfreeze --freeze /root/mnt 386 fsfreeze --unfreeze /root/mnt ^^^^^^^^^^^^^^ all works fine.
Did I miss something?
No :(
I am currently not testing against the mdraid but this shouldn't matter.
However, it looks like you don't test on bare metal, do you?
I tried to test on VMware Workstation 16 myself but VMware's nvme implementation is currently broken (https://github.com/vmware/open-vm-tools/issues/579).
I am testing with QEMU emulator version 6.2.0. I can also test with bare metal.
OK, now I got a repro with bare metal: nvme+xfs.
This is a 5.19 based kernel, the stack is
[ 867.091579] INFO: task fsfreeze:49972 blocked for more than 122 seconds. [ 867.104969] Tainted: G S 5.19.0-0_fbk0_rc1_gc225658be66e #1 [ 867.119750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 867.135381] task:fsfreeze state:D stack: 0 pid:49972 ppid: 22571 flags:0x00004000 [ 867.135388] Call Trace: [ 867.135390] <TASK> [ 867.135394] __schedule+0x3d7/0x700 [ 867.135404] schedule+0x39/0x90 [ 867.135409] percpu_down_write+0x234/0x270 [ 867.135414] freeze_super+0x8a/0x160 [ 867.135422] do_vfs_ioctl+0x8b5/0x920 [ 867.135430] __x64_sys_ioctl+0x52/0xb0 [ 867.135435] do_syscall_64+0x3d/0x90 [ 867.135441] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 867.135447] RIP: 0033:0x7f034f23fcdb [ 867.135453] RSP: 002b:00007ffe2bdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 867.135457] RAX: ffffffffffffffda RBX: 0000000000000066 RCX: 00007f034f23fcdb [ 867.135460] RDX: 0000000000000000 RSI: 00000000c0045877 RDI: 0000000000000003 [ 867.135463] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000 [ 867.135466] R10: 0000000000001000 R11: 0000000000000246 R12: 00007ffe2bdff334 [ 867.135469] R13: 00005650ff68dc40 R14: ffffffff00000000 R15: 00005650ff68c0f5 [ 867.135474] </TASK>
I am not very familiar with this code, so I will need more time to look into it.
Thomas, have you tried to bisect with the fio repro?
Thanks, Song