Re: tmpfs hang after v6.12-rc6

15 Nov 2024

      On Sat, Nov 16, 2024 at 01:33:19AM +0900, Jeongjun Park wrote:
...
2024년 11월 16일 (토) 오전 1:25, Chuck Lever chuck.lever@oracle.com님이 작성:
...
On Fri, Nov 15, 2024 at 11:04:56AM -0500, Chuck Lever wrote:
...
I've found that NFS access to an exported tmpfs file system hangs
indefinitely when the client first performs a GETATTR. The hanging
nfsd thread is waiting for the inode lock in shmem_getattr():
task:nfsd            state:D stack:0     pid:1775  tgid:1775  ppid:2      flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x770/0x7b0
 schedule+0x33/0x50
 schedule_preempt_disabled+0x19/0x30
 rwsem_down_read_slowpath+0x206/0x230
 down_read+0x3f/0x60
 shmem_getattr+0x84/0xf0
 vfs_getattr_nosec+0x9e/0xc0
 vfs_getattr+0x49/0x50
 fh_getattr+0x43/0x50 [nfsd]
 fh_fill_pre_attrs+0x4e/0xd0 [nfsd]
 nfsd4_open+0x51f/0x910 [nfsd]
 nfsd4_proc_compound+0x492/0x5d0 [nfsd]
 nfsd_dispatch+0x117/0x1f0 [nfsd]
 svc_process_common+0x3b2/0x5e0 [sunrpc]
 ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
 svc_process+0xcf/0x130 [sunrpc]
 svc_recv+0x64e/0x750 [sunrpc]
 ? __wake_up_bit+0x4b/0x60
 ? __pfx_nfsd+0x10/0x10 [nfsd]
 nfsd+0xc6/0xf0 [nfsd]
 kthread+0xed/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2e/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
I bisected the problem to:
d949d1d14fa281ace388b1de978e8f2cd52875cf is the first bad commit
commit d949d1d14fa281ace388b1de978e8f2cd52875cf
Author:     Jeongjun Park aha310510@gmail.com
AuthorDate: Mon Sep 9 21:35:58 2024 +0900
Commit:     Andrew Morton akpm@linux-foundation.org
CommitDate: Mon Oct 28 21:40:39 2024 -0700
mm: shmem: fix data-race in shmem_getattr()

...
Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com
Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot <syzkaller@googlegroup.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

which first appeared in v6.12-rc6, and adds the line that is waiting
on the inode lock when my NFS server hangs.
I haven't yet found the process that is holding the inode lock for
this inode.
It is likely that the caller (nfsd4_open()-> fh_fill_pre_attrs()) is
already holding the inode semaphore in this case.
Thanks for letting me know!
It seems that the previous patch I wrote was wrong in how to prevent data-race.
It seems that the problem occurs in nfsd because nfsd4_create_file() already
holds the inode_lock.
After further analysis, I found that this data-race mainly occurs when
vfs_statx_path does not acquire the inode_lock, and in other filesystems,
it is confirmed that inode_lock is acquired in many cases, so I will send a
new patch that fixes this problem right away.
Thanks for your quick response!
My brief sample of file system ->getattr methods shows that these
functions do not grab the inode semaphore at all when calling
generic_fillattr(). Likely they expect the method's caller to take
it.
I strongly prefer to see this commit reverted for v6.12-rc first,
and then the new fix should be merged via a normal merge window to
permit a lengthy period of testing.
...
...
...
Because this commit addresses only a KCSAN splat that has been
present since v4.3, and does not address a reported behavioral
issue, I respectfully request that this commit be reverted
immediately so that it does not appear in v6.12 final.
Troubleshooting and testing should continue until a fix to the KCSAN
issue can be found that does not deadlock NFS exports of tmpfs.
--
Chuck Lever
--
Chuck Lever
-- 
Chuck Lever

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: tmpfs hang after v6.12-rc6