The precise explanation is in the commit message.
I'm asking that you please apply and test these patches on v6.11 and v6.1, at the very least, before requesting that Greg apply these patches to the LTS kernels. Greg needs very clear instructions about how he should proceed, as well as some evidence that we are not asking him to apply patches that will break the target kernels.
It's very reasonable to have more tests. I 100% agree.
Per your request, I performed the testing under kernel 6.1 and 6.11. We've already tested under kernel 6.6 using our test farm.
Conclusion: it works as expected - the issue reproduces without the fix, it's no longer reproducible with the fix applied.
To Greg: please cherry-pick the commits 1, 2, 3, and 4 (see below for what these commits are) and apply to kernel 6.1 all the way up to 6.11 if that's appropriate.
Here is how it's done:
- cherry-pick nfsd commit(s) - build and install kernel - run my reproducer - check nfsd_file_allocations and nfsd_file_releases, see if it reproduces or not
Commits to cherry-pick:
1. nfsd: add list_head nf_gc to struct nfsd_file https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
2. nfsd: fix refcount leak when file is unhashed after being found https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
3. nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
4. nfsd: count nfsd_file allocations https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Note: commit #4 is very useful for detecting if a leak happens or not, otherwise it would be very time-consuming to find out. It's not needed but better to have.
Kernel 6.1.110 + commit 4 1st run: nfsd_file_allocations: 811 nfsd_file_releases: 791 leaks? Yes 2nd run: nfsd_file_allocations: 554 nfsd_file_releases: 548 leaks? Yes
Kernel 6.1.110 + commits 1,2,3,4 1st run: nfsd_file_allocations: 816 nfsd_file_releases: 816 leaks? No 2nd run: nfsd_file_allocations: 9755 nfsd_file_releases: 9755 leaks? No
Kernel 6.11.0 + commit 4 1st run: nfsd_file_allocations: 3662 nfsd_file_releases: 3659 leaks? Yes 2nd run: nfsd_file_allocations: 2177 nfsd_file_releases: 2175 leaks? Yes
Kernel 6.11.0 + commits 1,2,3,4 1st run: nfsd_file_allocations: 2136 nfsd_file_releases: 2136 leaks? No 2nd run: nfsd_file_allocations: 9094 nfsd_file_releases: 9094 leaks? No
On Fri, Oct 4, 2024 at 2:09 PM Chuck Lever III chuck.lever@oracle.com wrote:
On Oct 4, 2024, at 1:59 PM, Youzhong Yang youzhong@gmail.com wrote:
The explanation of how it can happen is in the commit message. Using list_head 'nf_lru' for two purposes (the LRU list and dispose list) is problematic.
"is problematic" is not an adequate or precise explanation of how the code is failing. But as I said, I'm not objecting, just noting that we don't understand why this change addresses the problem.
In other words, we have test experience that shows the patch is safe to apply, but no deep explanation of why it is effective.
I also mentioned my reproducer in one of the e-mail threads, here it is if it still matters:
I'm asking that you please apply and test these patches on v6.11 and v6.1, at the very least, before requesting that Greg apply these patches to the LTS kernels. Greg needs very clear instructions about how he should proceed, as well as some evidence that we are not asking him to apply patches that will break the target kernels.
And, please confirm that 4/4 is needed. I can't see any obvious reason why it is necessary to prevent a leak.
-- Chuck Lever