Re: 5.4.188 and later: massive performance regression with nfsd

21 May 2022


      ...
On May 20, 2022, at 7:43 PM, Chuck Lever III chuck.lever@oracle.com wrote:
...
On May 20, 2022, at 6:24 PM, Trond Myklebust trondmy@hammerspace.com wrote:
On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
...
...
On May 20, 2022, at 12:40 PM, Trond Myklebust
trondmy@hammerspace.com wrote:
On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
...
...
On May 11, 2022, at 10:36 AM, Chuck Lever III
chuck.lever@oracle.com wrote:
> On May 11, 2022, at 10:23 AM, Greg KH
> gregkh@linuxfoundation.org wrote:
> 
> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III
> wrote:
>> 
>> 
>>> On May 11, 2022, at 8:38 AM, Greg KH
>>> gregkh@linuxfoundation.org wrote:
>>> 
>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
>>> wrote:
>>>> Hi,
>>>> 
>>>> starting with 5.4.188 wie see a massive performance
>>>> regression on our
>>>> nfs-server. It basically is serving requests very very
>>>> slowly with cpu
>>>> utilization of 100% (with 5.4.187 and earlier it is
>>>> 10%) so
>>>> that it is
>>>> unusable as a fileserver.
>>>> 
>>>> The culprit are commits (or one of it):
>>>> 
>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>> nfsd_file_lru_dispose()"
>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
>>>> Containerise filecache
>>>> laundrette"
>>>> 
>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>> 
>>>> If I revert them in v5.4.192 the kernel works as before
>>>> and
>>>> performance is
>>>> ok again.
>>>> 
>>>> I did not try to revert them one by one as any
>>>> disruption
>>>> of our nfs-server
>>>> is a severe problem for us and I'm not sure if they are
>>>> related.
>>>> 
>>>> 5.10 and 5.15 both always performed very badly on our
>>>> nfs-
>>>> server in a
>>>> similar way so we were stuck with 5.4.
>>>> 
>>>> I now think this is because of
>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though
>>>> I
>>>> didn't tried to
>>>> revert them in 5.15 yet.
>>> 
>>> Odds are 5.18-rc6 is also a problem?
>> 
>> We believe that
>> 
>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>> 
>> addresses the performance regression. It was merged into
>> 5.18-
>> rc.
> 
> And into 5.17.4 if someone wants to try that release.
I don't have a lot of time to backport this one myself, so
I welcome anyone who wants to apply that commit to their
favorite LTS kernel and test it for us.
>>> If so, I'll just wait for the fix to get into Linus's
>>> tree as
>>> this does
>>> not seem to be a stable-tree-only issue.
>> 
>> Unfortunately I've received a recent report that the fix
>> introduces
>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
> 
> Ick, not good, any potential fixes for that?
Not yet. I was at LSF last week, so I've just started digging
into this one. I've confirmed that the report is a real bug,
but we still don't know how hard it is to hit it with real
workloads.
We believe the following, which should be part of the first
NFSD pull request for 5.19, will properly address the splat.
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-...
Uh... What happens if you have 2 simultaneous calls to
nfsd4_release_lockowner() for the same file? i.e. 2 separate
processes
owned by the same user, both locking the same file.
Can't that cause the 'putlist' to get corrupted when both callers
add
the same nf->nf_putfile to two separate lists?
IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
The first call finds the lockowner in cl_ownerstr_hashtbl and
unhashes it before releasing cl_lock.
Then the second cannot find that lockowner, thus it can't
requeue it for bulk_put.
Am I missing something?
In the example I quoted, there are 2 separate processes running on the
client. Those processes could share the same open owner + open stateid,
and hence the same struct nfs4_file, since that depends only on the
process credentials matching. However they will not normally share a
lock owner, since POSIX does not expect different processes to share
locks.
IOW: The point is that one can relatively easily create 2 different
lock owners with different lock stateids that share the same underlying
struct nfs4_file.
Is there a similar exposure if two different clients are locking
the same file? If so, then we can't use a per-nfs4_client semaphore
to serialize access to the nf_putfile field.
I had a thought about an alternate approach.
Create a second nfsd_file_put API that is not allowed to sleep.
Let's call it "nfsd_file_put_async()". Teach check_for_locked()
to use that instead of nfsd_file_put().
Here's where I'm a little fuzzy: nfsd_file_put_async() could do
something like:
void nfsd_file_put_async(struct nfsd_file *nf)
{
    if (refcount_dec_and_test(&nf->nf_ref))
    	nfsd_file_close_inode(nf->nf_inode);
}
--
Chuck Lever

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: 5.4.188 and later: massive performance regression with nfsd