On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
Hi,
This box is newly booted into linux v6.1.35 (2 days ago), it was previously running v5.15.118 without any problems (other than that fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the upgrade).
I have rm operations on two files that have been stuck for in excess of 22 hours and 18 hours respectively:
$ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
The "rm"s are run from a process that's obviously tried a few times to get rid of these files.
There's nothing extraordinary about the files in terms of size:
$ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the first one run on each file) stack trace looks like:
[<0>] vfs_unlink+0x48/0x270 [<0>] do_unlinkat+0x1f5/0x290 [<0>] __x64_sys_unlinkat+0x3b/0x60 [<0>] do_syscall_64+0x34/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
And each "secondary" rm (i.e. the subsequent ones on each file) stack trace looks like:
== blog-230710-xfs-rm-stuckd [<0>] down_write_nested+0xdc/0x100 [<0>] do_unlinkat+0x10d/0x290 [<0>] __x64_sys_unlinkat+0x3b/0x60 [<0>] do_syscall_64+0x34/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Multiple kernel strack traces don't show vfs_unlink or anything related that I can see, or anything else consistent or otherwise interesting. Most cores are idle.
Each of /aaa and /bbb are separate XFS filesystems:
$ xfs_info /aaa meta-data=/dev/mapper/aaa isize=512 agcount=2, agsize=268434432 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=536868864, imaxpct=5 = sunit=256 swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=262143, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
$ xfs_info /bbb meta-data=/dev/mapper/bbb isize=512 agcount=8, agsize=268434432 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=1879047168, imaxpct=5 = sunit=256 swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
There's plenty of free space at the fs level:
$ df -h /aaa /bbb Filesystem Size Used Avail Use% Mounted on /dev/mapper/aaa 2.0T 551G 1.5T 27% /aaa /dev/mapper/bbb 7.0T 3.6T 3.5T 52% /bbb
The fses are on sparse ceph/rbd volumes, the underlying storage tells me they're 50-60% utilised:
aaa: provisioned="2048G" used="1015.9G" bbb: provisioned="7168G" used="4925.3G"
Where to from here?
I'm guessing only a reboot is going to unstick this. Anything I should be looking at before reverting to v5.15.118?
...subsequent to starting writing all this down I have another two sets of rms stuck, again on unremarkable files, and on two more separate filesystems.
...oh. And an 'ls' on those files is hanging. The reboot has become more urgent.
Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not completely fix your reported blocking regression earlier?
I'm kinda confused...
On Tue, Jul 11, 2023 at 07:53:35AM +0700, Bagas Sanjaya wrote:
On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
Hi,
This box is newly booted into linux v6.1.35 (2 days ago), it was previously running v5.15.118 without any problems (other than that fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the upgrade).
I have rm operations on two files that have been stuck for in excess of 22 hours and 18 hours respectively:
Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not completely fix your reported blocking regression earlier?
I'm kinda confused...
It looks like a completely different problem. I was wanting 5e672cd69f0a53 as a resolution to this:
https://lore.kernel.org/all/20220510215411.GT1098723@dread.disaster.area/T/
In that case there were kernel stacks etc. showing inodegc stuff, and the problem eventually resolved itself once the massive amount of work had been processed (5e672cd69f0a53 puts that work into the background).
In this case it looks like it's just a pure hang or deadlock - there's apparently nothing happening.
Cheers,
Chris
On Tue, Jul 11, 2023 at 07:53:35AM +0700, Bagas Sanjaya wrote:
On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
Hi,
This box is newly booted into linux v6.1.35 (2 days ago), it was previously running v5.15.118 without any problems (other than that fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the upgrade).
I have rm operations on two files that have been stuck for in excess of 22 hours and 18 hours respectively:
$ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
The "rm"s are run from a process that's obviously tried a few times to get rid of these files.
There's nothing extraordinary about the files in terms of size:
$ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the first one run on each file) stack trace looks like:
[<0>] vfs_unlink+0x48/0x270 [<0>] do_unlinkat+0x1f5/0x290 [<0>] __x64_sys_unlinkat+0x3b/0x60 [<0>] do_syscall_64+0x34/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
This looks to be stuck on the target inode lock (i.e. the locks for the inodes at /aaa/5539_tmp and /bbb/5541_tmp).
What's holding these inode locks? This hasn't even got to XFS yet here, so there's something else going on in the background. Attached the full output of 'echo w > /proc/sysrq-trigger' and 'echo t > /proc/sysrq-trigger', please?
And each "secondary" rm (i.e. the subsequent ones on each file) stack trace looks like:
== blog-230710-xfs-rm-stuckd [<0>] down_write_nested+0xdc/0x100 [<0>] do_unlinkat+0x10d/0x290 [<0>] __x64_sys_unlinkat+0x3b/0x60 [<0>] do_syscall_64+0x34/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
These are likely all stuck on the parent directory inode lock (i.e. /aaa and /bbb).
Where to from here?
I'm guessing only a reboot is going to unstick this. Anything I should be looking at before reverting to v5.15.118?
...subsequent to starting writing all this down I have another two sets of rms stuck, again on unremarkable files, and on two more separate filesystems.
What's an "unremarkable file" look like? Is is a reflink copy of something else, a hard link, a small/large regular data file or something else?
...oh. And an 'ls' on those files is hanging. The reboot has become more urgent.
Yup, that's most likely getting stuck on the directory locks that the unlinks are holding....
-Dave.
linux-stable-mirror@lists.linaro.org