The explanation of how it can happen is in the commit message. Using list_head 'nf_lru' for two purposes (the LRU list and dispose list) is problematic. I also mentioned my reproducer in one of the e-mail threads, here it is if it still matters:
https://github.com/youzhongyang/nfsd-file-leaks
Thank you.
On Fri, Oct 4, 2024 at 1:23 PM Chuck Lever III chuck.lever@oracle.com wrote:
On Oct 4, 2024, at 10:35 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Fri, Oct 04, 2024 at 04:26:39PM +0200, Greg Kroah-Hartman wrote:
On Fri, Oct 04, 2024 at 10:17:34AM -0400, Youzhong Yang wrote:
Here is 1/4 in the context of Chuck's e-mail reply:
nfsd: add list_head nf_gc to struct nfsd_file https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Sorry, again, I don't know what to do here :(
When we tested 1/4 on upstream, it was neither sufficient nor necessary to address the leak, IIRC. And I don't recall ever seeing a clear explanation about why that change is necessary. That's why we consider it a defensive change, not a bug fix.
But it shouldn't be harmful to backport it to LTS kernels. I don't object to a backport. To me, though, it seems to be lacking a complete rationale.
Ok, in digging through the thread, do you feel this one should also be backported to the 6.11.y tree?
It's not clear that it is needed in v6.11 without testing. Neither Jeff nor I have a reproducer for that leak, though.
4/4 seems like an ABI change, and again, testing is needed to see whether its backport is truly needed. So far we know only that when all 4 are backported, the leak goes away. That is not proof that 4/4 by itself is required.
If so, how far back?
LTS kernels all the way back to v5.10.y are likely to have this leak, since they have all the NFSD filecache backports already. 5.4.y is generally too old to be reparable.
I would prefer more testing of this backport on the stable kernels, but I understand if that isn't practical.
-- Chuck Lever