This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.220-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.220-rc1
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Fix a regression in nfsd_setattr()
NeilBrown neilb@suse.de nfsd: don't call locks_release_private() twice concurrently
NeilBrown neilb@suse.de nfsd: don't take fi_lock in nfsd_break_deleg_cb()
NeilBrown neilb@suse.de nfsd: fix RELEASE_LOCKOWNER
Jeff Layton jlayton@kernel.org nfsd: drop the nfsd_put helper
NeilBrown neilb@suse.de nfsd: call nfsd_last_thread() before final nfsd_put()
NeilBrown neilb@suse.de NFSD: fix possible oops when nfsd/pool_stats is closed.
Chuck Lever chuck.lever@oracle.com Documentation: Add missing documentation for EXPORT_OP flags
NeilBrown neilb@suse.de nfsd: separate nfsd_last_thread() from nfsd_put()
NeilBrown neilb@suse.de nfsd: Simplify code around svc_exit_thread() call in nfsd()
Chuck Lever chuck.lever@oracle.com nfsd: don't allow nfsd threads to be signalled.
Tavian Barnes tavianator@tavianator.com nfsd: Fix creation time serialization order
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd4_encode_nfstime4() helper
NeilBrown neilb@suse.de lockd: drop inappropriate svc_get() from locked_get()
Dan Carpenter dan.carpenter@linaro.org nfsd: fix double fget() bug in __write_ports_addfd()
Jeff Layton jlayton@kernel.org nfsd: make a copy of struct iattr before calling notify_change
Dai Ngo dai.ngo@oracle.com NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop
Jeff Layton jlayton@kernel.org nfsd: simplify the delayed disposal list code
Chuck Lever chuck.lever@oracle.com NFSD: Convert filecache to rhltable
Jeff Layton jlayton@kernel.org nfsd: allow reaping files still under writeback
Jeff Layton jlayton@kernel.org nfsd: update comment over __nfsd_file_cache_purge
Jeff Layton jlayton@kernel.org nfsd: don't take/put an extra reference when putting a file
Jeff Layton jlayton@kernel.org nfsd: add some comments to nfsd_file_do_acquire
Jeff Layton jlayton@kernel.org nfsd: don't kill nfsd_files because of lease break error
Jeff Layton jlayton@kernel.org nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator
Jeff Layton jlayton@kernel.org nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries
Jeff Layton jlayton@kernel.org nfsd: don't open-code clear_and_wake_up_bit
Jeff Layton jlayton@kernel.org nfsd: call op_release, even when op_func returns an error
Chuck Lever chuck.lever@oracle.com NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
Jeff Layton jlayton@kernel.org nfsd: don't replace page in rq_pages if it's a continuation of last page
Jeff Layton jlayton@kernel.org lockd: set file_lock start and end when decoding nlm4 testargs
Chuck Lever chuck.lever@oracle.com NFSD: Protect against filesystem freezing
Chuck Lever chuck.lever@oracle.com NFSD: copy the whole verifier in nfsd_copy_write_verifier
Jeff Layton jlayton@kernel.org nfsd: don't fsync nfsd_files on last close
Jeff Layton jlayton@kernel.org nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
Dai Ngo dai.ngo@oracle.com NFSD: fix problems with cleanup on errors in nfsd4_copy
Jeff Layton jlayton@kernel.org nfsd: don't hand out delegation on setuid files being opened for write
Dai Ngo dai.ngo@oracle.com NFSD: fix leaked reference count of nfsd4_ssc_umount_item
Jeff Layton jlayton@kernel.org nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
Jeff Layton jlayton@kernel.org nfsd: allow nfsd_file_get to sanely handle a NULL pointer
Dai Ngo dai.ngo@oracle.com NFSD: enhance inter-server copy cleanup
Jeff Layton jlayton@kernel.org nfsd: don't destroy global nfs4_file table in per-net shutdown
Jeff Layton jlayton@kernel.org nfsd: don't free files unconditionally in __nfsd_file_cache_purge
Dai Ngo dai.ngo@oracle.com NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
Dai Ngo dai.ngo@oracle.com NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
Xingyuan Mo hdthky0@gmail.com NFSD: fix use-after-free in nfsd4_ssc_setup_dul()
Chuck Lever chuck.lever@oracle.com NFSD: Use set_bit(RQ_DROPME)
Chuck Lever chuck.lever@oracle.com Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
Jeff Layton jlayton@kernel.org nfsd: fix handling of cached open files in nfsd4_open codepath
Jeff Layton jlayton@kernel.org nfsd: rework refcounting in filecache
Kees Cook keescook@chromium.org NFSD: Avoid clashing function prototypes
Chuck Lever chuck.lever@oracle.com NFSD: Use only RQ_DROPME to signal the need to drop a reply
Dai Ngo dai.ngo@oracle.com NFSD: add delegation reaper to react to low memory condition
Dai Ngo dai.ngo@oracle.com NFSD: add support for sending CB_RECALL_ANY
Dai Ngo dai.ngo@oracle.com NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
Brian Foster bfoster@redhat.com NFSD: pass range end to vfs_fsync_range() instead of count
Jeff Layton jlayton@kernel.org lockd: fix file selection in nlmsvc_cancel_blocked
Jeff Layton jlayton@kernel.org lockd: ensure we use the correct file descriptor when unlocking
Jeff Layton jlayton@kernel.org lockd: set missing fl_flags field when retrieving args
Xiu Jianfeng xiujianfeng@huawei.com NFSD: Use struct_size() helper in alloc_session()
Jeff Layton jlayton@kernel.org nfsd: return error if nfs4_setacl fails
Trond Myklebust trond.myklebust@hammerspace.com lockd: set other missing fields when unlocking files
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd_file_fsync tracepoint
Jeff Layton jlayton@kernel.org nfsd: fix up the filecache laundrette scheduling
Jeff Layton jlayton@kernel.org nfsd: reorganize filecache.c
Jeff Layton jlayton@kernel.org nfsd: remove the pages_flushed statistic from filecache
Chuck Lever chuck.lever@oracle.com NFSD: Fix licensing header in filecache.c
Chuck Lever chuck.lever@oracle.com NFSD: Use rhashtable for managing nfs4_file objects
Chuck Lever chuck.lever@oracle.com NFSD: Refactor find_file()
Chuck Lever chuck.lever@oracle.com NFSD: Clean up find_or_add_file()
Chuck Lever chuck.lever@oracle.com NFSD: Add a nfsd4_file_hash_remove() helper
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd4_init_file()
Chuck Lever chuck.lever@oracle.com NFSD: Update file_hashtbl() helpers
Chuck Lever chuck.lever@oracle.com NFSD: Use const pointers as parameters to fh_ helpers
Chuck Lever chuck.lever@oracle.com NFSD: Trace delegation revocations
Chuck Lever chuck.lever@oracle.com NFSD: Trace stateids returned via DELEGRETURN
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfs4_preprocess_stateid_op() call sites
Chuck Lever chuck.lever@oracle.com NFSD: Flesh out a documenting comment for filecache.c
Chuck Lever chuck.lever@oracle.com NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
Chuck Lever chuck.lever@oracle.com NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
Chuck Lever chuck.lever@oracle.com NFSD: Pass the target nfsd_file to nfsd_commit()
David Disseldorp ddiss@suse.de exportfs: use pr_debug for unreachable debug statements
Jeff Layton jlayton@kernel.org nfsd: allow disabling NFSv2 at compile time
Jeff Layton jlayton@kernel.org nfsd: move nfserrno() to vfs.c
Jeff Layton jlayton@kernel.org nfsd: ignore requests to disable unsupported versions
Chuck Lever chuck.lever@oracle.com NFSD: Finish converting the NFSv3 GETACL result encoder
Chuck Lever chuck.lever@oracle.com NFSD: Finish converting the NFSv2 GETACL result encoder
Colin Ian King colin.i.king@gmail.com NFSD: Remove redundant assignment to variable host_err
Anna Schumaker Anna.Schumaker@Netapp.com NFSD: Simplify READ_PLUS
Jeff Layton jlayton@kernel.org nfsd: use locks_inode_context helper
Jeff Layton jlayton@kernel.org lockd: use locks_inode_context helper
Jeff Layton jlayton@kernel.org filelock: add a new locks_inode_context accessor function
Chuck Lever chuck.lever@oracle.com NFSD: Fix reads with a non-zero offset that don't end on a page boundary
Jeff Layton jlayton@kernel.org nfsd: put the export reference in nfsd4_verify_deleg_dentry
Jeff Layton jlayton@kernel.org nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
Jeff Layton jlayton@kernel.org nfsd: fix net-namespace logic in __nfsd_file_cache_purge
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp NFSD: unregister shrinker when nfsd_init_net() fails
Jeff Layton jlayton@kernel.org nfsd: rework hashtable handling in nfsd_do_file_acquire
Jeff Layton jlayton@kernel.org nfsd: fix nfsd_file_unhash_and_dispose
Gaosheng Cui cuigaosheng1@huawei.com fanotify: Remove obsoleted fanotify_event_has_path()
Gaosheng Cui cuigaosheng1@huawei.com fsnotify: remove unused declaration
Al Viro viro@zeniv.linux.org.uk fs/notify: constify path
Jeff Layton jlayton@kernel.org nfsd: extra checks when freeing delegation stateids
Jeff Layton jlayton@kernel.org nfsd: make nfsd4_run_cb a bool return function
Jeff Layton jlayton@kernel.org nfsd: fix comments about spinlock handling with delegations
Jeff Layton jlayton@kernel.org nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
Chuck Lever chuck.lever@oracle.com NFSD: Cap rsize_bop result based on send buffer size
Chuck Lever chuck.lever@oracle.com NFSD: Rename the fields in copy_stateid_t
ChenXiaoSong chenxiaosong2@huawei.com nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
ChenXiaoSong chenxiaosong2@huawei.com nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
ChenXiaoSong chenxiaosong2@huawei.com nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
ChenXiaoSong chenxiaosong2@huawei.com nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
ChenXiaoSong chenxiaosong2@huawei.com nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
Chuck Lever chuck.lever@oracle.com NFSD: Pack struct nfsd4_compoundres
Chuck Lever chuck.lever@oracle.com NFSD: Remove unused nfsd4_compoundargs::cachetype field
Chuck Lever chuck.lever@oracle.com NFSD: Remove "inline" directives on op_rsize_bop helpers
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfs4svc_encode_compoundres()
Chuck Lever chuck.lever@oracle.com NFSD: Clean up WRITE arg decoders
Chuck Lever chuck.lever@oracle.com NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
Chuck Lever chuck.lever@oracle.com NFSD: Refactor common code out of dirlist helpers
Chuck Lever chuck.lever@oracle.com NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
Chuck Lever chuck.lever@oracle.com SUNRPC: Parametrize how much of argsize should be zeroed
Dai Ngo dai.ngo@oracle.com NFSD: add shrinker to reap courtesy clients on low memory condition
Dai Ngo dai.ngo@oracle.com NFSD: keep track of the number of courtesy clients in the system
Chuck Lever chuck.lever@oracle.com NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
Chuck Lever chuck.lever@oracle.com NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
Chuck Lever chuck.lever@oracle.com NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd_setattr()
Chuck Lever chuck.lever@oracle.com NFSD: Add a mechanism to wait for a DELEGRETURN
Chuck Lever chuck.lever@oracle.com NFSD: Add tracepoints to report NFSv4 callback completions
Gaosheng Cui cuigaosheng1@huawei.com nfsd: remove nfsd4_prepare_cb_recall() declaration
Jeff Layton jlayton@kernel.org nfsd: clean up mounted_on_fileid handling
Chuck Lever chuck.lever@oracle.com NFSD: Fix handling of oversized NFSv4 COMPOUND requests
NeilBrown neilb@suse.de NFSD: drop fname and flen args from nfsd_create_locked()
Chuck Lever chuck.lever@oracle.com NFSD: Protect against send buffer overflow in NFSv3 READ
Chuck Lever chuck.lever@oracle.com NFSD: Protect against send buffer overflow in NFSv2 READ
Chuck Lever chuck.lever@oracle.com NFSD: Protect against send buffer overflow in NFSv3 READDIR
Chuck Lever chuck.lever@oracle.com NFSD: Protect against send buffer overflow in NFSv2 READDIR
Chuck Lever chuck.lever@oracle.com NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
Christophe JAILLET christophe.jaillet@wanadoo.fr nfsd: Propagate some error code returned by memdup_user()
Christophe JAILLET christophe.jaillet@wanadoo.fr nfsd: Avoid some useless tests
Jinpeng Cui cui.jinpeng2@zte.com.cn NFSD: remove redundant variable status
Olga Kornievskaia kolga@netapp.com NFSD enforce filehandle check for source file in COPY
Wolfram Sang wsa+renesas@sang-engineering.com lockd: move from strlcpy with unused retval to strscpy
Wolfram Sang wsa+renesas@sang-engineering.com NFSD: move from strlcpy with unused retval to strscpy
Al Viro viro@zeniv.linux.org.uk nfsd_splice_actor(): handle compound pages
NeilBrown neilb@suse.de NFSD: fix regression with setting ACLs.
Jeff Layton jlayton@kernel.org lockd: detect and reject lock arguments that overflow
NeilBrown neilb@suse.de NFSD: discard fh_locked flag and fh_lock/fh_unlock
NeilBrown neilb@suse.de NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
NeilBrown neilb@suse.de NFSD: use explicit lock/unlock for directory ops
NeilBrown neilb@suse.de NFSD: reduce locking in nfsd_lookup()
NeilBrown neilb@suse.de NFSD: only call fh_unlock() once in nfsd_link()
NeilBrown neilb@suse.de NFSD: always drop directory lock in nfsd_unlink()
NeilBrown neilb@suse.de NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
NeilBrown neilb@suse.de NFSD: add posix ACLs to struct nfsd_attrs
NeilBrown neilb@suse.de NFSD: add security label to struct nfsd_attrs
NeilBrown neilb@suse.de NFSD: set attributes when creating symlinks
NeilBrown neilb@suse.de NFSD: introduce struct nfsd_attrs
Jeff Layton jlayton@kernel.org NFSD: verify the opened dentry after setting a delegation
Jeff Layton jlayton@kernel.org NFSD: drop fh argument from alloc_init_deleg
Chuck Lever chuck.lever@oracle.com NFSD: Move copy offload callback arguments into a separate structure
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd4_send_cb_offload()
Chuck Lever chuck.lever@oracle.com NFSD: Remove kmalloc from nfsd4_do_async_copy()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd4_do_copy()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
Chuck Lever chuck.lever@oracle.com NFSD: Replace boolean fields in struct nfsd4_copy
Chuck Lever chuck.lever@oracle.com NFSD: Make nfs4_put_copy() static
Chuck Lever chuck.lever@oracle.com NFSD: Reorder the fields in struct nfsd4_op
Chuck Lever chuck.lever@oracle.com NFSD: Shrink size of struct nfsd4_copy
Chuck Lever chuck.lever@oracle.com NFSD: Shrink size of struct nfsd4_copy_notify
Chuck Lever chuck.lever@oracle.com NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Chuck Lever chuck.lever@oracle.com NFSD: Fix strncpy() fortify warning
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd4_encode_readlink()
Chuck Lever chuck.lever@oracle.com NFSD: Use xdr_pad_size()
Chuck Lever chuck.lever@oracle.com NFSD: Simplify starting_len
Chuck Lever chuck.lever@oracle.com NFSD: Optimize nfsd4_encode_readv()
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd4_read::rd_eof field
Chuck Lever chuck.lever@oracle.com NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Chuck Lever chuck.lever@oracle.com NFSD: Optimize nfsd4_encode_fattr()
Chuck Lever chuck.lever@oracle.com NFSD: Optimize nfsd4_encode_operation()
Jeff Layton jlayton@kernel.org nfsd: silence extraneous printk on nfsd.ko insertion
Dai Ngo dai.ngo@oracle.com NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Dai Ngo dai.ngo@oracle.com NFSD: keep track of the number of v4 clients in the system
Dai Ngo dai.ngo@oracle.com NFSD: refactoring v4 specific code to a helper in nfs4state.c
Chuck Lever chuck.lever@oracle.com NFSD: Ensure nf_inode is never dereferenced
Chuck Lever chuck.lever@oracle.com NFSD: NFSv4 CLOSE should release an nfsd_file immediately
Chuck Lever chuck.lever@oracle.com NFSD: Move nfsd_file_trace_alloc() tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Separate tracepoints for acquire and create
Chuck Lever chuck.lever@oracle.com NFSD: Clean up unused code after rhashtable conversion
Chuck Lever chuck.lever@oracle.com NFSD: Convert the filecache to use rhashtable
Chuck Lever chuck.lever@oracle.com NFSD: Set up an rhashtable for the filecache
Chuck Lever chuck.lever@oracle.com NFSD: Replace the "init once" mechanism
Chuck Lever chuck.lever@oracle.com NFSD: Remove nfsd_file::nf_hashval
Chuck Lever chuck.lever@oracle.com NFSD: nfsd_file_hash_remove can compute hashval
Chuck Lever chuck.lever@oracle.com NFSD: Refactor __nfsd_file_close_inode()
Chuck Lever chuck.lever@oracle.com NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
Chuck Lever chuck.lever@oracle.com NFSD: Remove lockdep assertion from unhash_and_release_locked()
Chuck Lever chuck.lever@oracle.com NFSD: No longer record nf_hashval in the trace log
Chuck Lever chuck.lever@oracle.com NFSD: Never call nfsd_file_gc() in foreground paths
Chuck Lever chuck.lever@oracle.com NFSD: Fix the filecache LRU shrinker
Chuck Lever chuck.lever@oracle.com NFSD: Leave open files out of the filecache LRU
Chuck Lever chuck.lever@oracle.com NFSD: Trace filecache LRU activity
Chuck Lever chuck.lever@oracle.com NFSD: WARN when freeing an item still linked via nf_lru
Chuck Lever chuck.lever@oracle.com NFSD: Hook up the filecache stat file
Chuck Lever chuck.lever@oracle.com NFSD: Zero counters when the filecache is re-initialized
Chuck Lever chuck.lever@oracle.com NFSD: Record number of flush calls
Chuck Lever chuck.lever@oracle.com NFSD: Report the number of items evicted by the LRU walk
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd_file_lru_scan()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd_file_gc()
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_file_lru_dispose_list() helper
Chuck Lever chuck.lever@oracle.com NFSD: Report average age of filecache items
Chuck Lever chuck.lever@oracle.com NFSD: Report count of freed filecache items
Chuck Lever chuck.lever@oracle.com NFSD: Report count of calls to nfsd_file_acquire()
Chuck Lever chuck.lever@oracle.com NFSD: Report filecache LRU size
Chuck Lever chuck.lever@oracle.com NFSD: Demote a WARN to a pr_warn()
Colin Ian King colin.i.king@gmail.com nfsd: remove redundant assignment to variable len
Zhang Jiaming jiaming@nfschina.com NFSD: Fix space and spelling mistake
Benjamin Coddington bcodding@redhat.com NLM: Defend against file_lock changes after vfs_test_lock()
Chuck Lever chuck.lever@oracle.com SUNRPC: Fix xdr_encode_bool()
Jeff Layton jlayton@kernel.org nfsd: eliminate the NFSD_FILE_BREAK_* flags
Xin Gao gaoxin@cdjrlc.com fsnotify: Fix comment typo
Amir Goldstein amir73il@gmail.com fanotify: introduce FAN_MARK_IGNORE
Amir Goldstein amir73il@gmail.com fanotify: cleanups for fanotify_mark() input validations
Amir Goldstein amir73il@gmail.com fanotify: prepare for setting event flags in ignore mask
Oliver Ford ojford@gmail.com fs: inotify: Fix typo in inotify comment
Jeff Layton jlayton@kernel.org lockd: fix nlm_close_files
Jeff Layton jlayton@kernel.org lockd: set fl_owner when unlocking files
Chuck Lever chuck.lever@oracle.com NFSD: Decode NFSv4 birth time attribute
NeilBrown neilb@suse.de NFS: restore module put when manager exits.
Amir Goldstein amir73il@gmail.com fanotify: refine the validation checks on non-dir inode mask
Chuck Lever chuck.lever@oracle.com SUNRPC: Optimize xdr_reserve_space()
Chuck Lever chuck.lever@oracle.com NFSD: Fix potential use-after-free in nfsd_file_put()
Chuck Lever chuck.lever@oracle.com NFSD: nfsd_file_put() can sleep
Chuck Lever chuck.lever@oracle.com NFSD: Add documenting comment for nfsd4_release_lockowner()
Chuck Lever chuck.lever@oracle.com NFSD: Modernize nfsd4_release_lockowner()
Julian Schroeder jumaco@amazon.com nfsd: destroy percpu stats counters after reply cache shutdown
Zhang Xiaoxu zhangxiaoxu5@huawei.com nfsd: Fix null-ptr-deref in nfsd_fill_super()
Zhang Xiaoxu zhangxiaoxu5@huawei.com nfsd: Unregister the cld notifier when laundry_wq create failed
Chuck Lever chuck.lever@oracle.com SUNRPC: Use RMW bitops in single-threaded hot paths
Chuck Lever chuck.lever@oracle.com NFSD: Clean up the show_nf_flags() macro
Chuck Lever chuck.lever@oracle.com NFSD: Trace filecache opens
Chuck Lever chuck.lever@oracle.com NFSD: Move documenting comment for nfsd4_process_open2()
Chuck Lever chuck.lever@oracle.com NFSD: Fix whitespace
Chuck Lever chuck.lever@oracle.com NFSD: Remove dprintk call sites from tail of nfsd4_open()
Chuck Lever chuck.lever@oracle.com NFSD: Instantiate a struct file when creating a regular NFSv4 file
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd_open_verified()
Chuck Lever chuck.lever@oracle.com NFSD: Remove do_nfsd_create()
Chuck Lever chuck.lever@oracle.com NFSD: Refactor NFSv4 OPEN(CREATE)
Chuck Lever chuck.lever@oracle.com NFSD: Refactor NFSv3 CREATE
Chuck Lever chuck.lever@oracle.com NFSD: Refactor nfsd_create_setattr()
Chuck Lever chuck.lever@oracle.com NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd3_proc_create()
Dai Ngo dai.ngo@oracle.com NFSD: Show state of courtesy client in client info
Dai Ngo dai.ngo@oracle.com NFSD: add support for lock conflict to courteous server
Dai Ngo dai.ngo@oracle.com fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict
Dai Ngo dai.ngo@oracle.com fs/lock: add helper locks_owner_has_blockers to check for blockers
Dai Ngo dai.ngo@oracle.com NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
Dai Ngo dai.ngo@oracle.com NFSD: add support for share reservation conflict to courteous server
Dai Ngo dai.ngo@oracle.com NFSD: add courteous server support for thread with only delegation
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd_splice_actor()
Vasily Averin vvs@openvz.org fanotify: fix incorrect fmode_t casts
Amir Goldstein amir73il@gmail.com fsnotify: consistent behavior for parent not watching children
Amir Goldstein amir73il@gmail.com fsnotify: introduce mark type iterator
Amir Goldstein amir73il@gmail.com fanotify: enable "evictable" inode marks
Amir Goldstein amir73il@gmail.com fanotify: use fsnotify group lock helpers
Amir Goldstein amir73il@gmail.com fanotify: implement "evictable" inode marks
Amir Goldstein amir73il@gmail.com fanotify: factor out helper fanotify_mark_update_flags()
Amir Goldstein amir73il@gmail.com fanotify: create helper fanotify_mark_user_flags()
Amir Goldstein amir73il@gmail.com fsnotify: allow adding an inode mark without pinning inode
Amir Goldstein amir73il@gmail.com dnotify: use fsnotify group lock helpers
Amir Goldstein amir73il@gmail.com nfsd: use fsnotify group lock helpers
Amir Goldstein amir73il@gmail.com inotify: use fsnotify group lock helpers
Amir Goldstein amir73il@gmail.com fsnotify: create helpers for group mark_mutex lock
Amir Goldstein amir73il@gmail.com fsnotify: make allow_dups a property of the group
Amir Goldstein amir73il@gmail.com fsnotify: pass flags argument to fsnotify_alloc_group()
Amir Goldstein amir73il@gmail.com inotify: move control flags from mask to mark flags
Dai Ngo dai.ngo@oracle.com fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
Amir Goldstein amir73il@gmail.com fanotify: do not allow setting dirent events in mask of non-dir
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Clean up nfsd_file_put()
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Fix a write performance regression
Haowen Bai baihaowen@meizu.com SUNRPC: Return true/false (not 1/0) from bool functions
Bang Li libang.linuxer@gmail.com fsnotify: remove redundant parameter judgment
Amir Goldstein amir73il@gmail.com fsnotify: optimize FS_MODIFY events with no ignored masks
Amir Goldstein amir73il@gmail.com fsnotify: fix merge with parent's ignored mask
Jakob Koschel jakobkoschel@gmail.com nfsd: fix using the correct variable for sizeof()
Chuck Lever chuck.lever@oracle.com NFSD: Clean up _lm_ operation names
Chuck Lever chuck.lever@oracle.com NFSD: Remove CONFIG_NFSD_V3
Chuck Lever chuck.lever@oracle.com NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Chuck Lever chuck.lever@oracle.com NFSD: Remove svc_serv_ops::svo_module
Chuck Lever chuck.lever@oracle.com SUNRPC: Remove svc_shutdown_net()
Chuck Lever chuck.lever@oracle.com SUNRPC: Rename svc_close_xprt()
Chuck Lever chuck.lever@oracle.com SUNRPC: Rename svc_create_xprt()
Chuck Lever chuck.lever@oracle.com SUNRPC: Remove svo_shutdown method
Chuck Lever chuck.lever@oracle.com SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
Chuck Lever chuck.lever@oracle.com SUNRPC: Remove the .svo_enqueue_xprt method
Chuck Lever chuck.lever@oracle.com NFSD: Streamline the rare "found" case
Chuck Lever chuck.lever@oracle.com NFSD: Skip extra computation for RC_NOCACHE case
Chuck Lever chuck.lever@oracle.com NFSD: De-duplicate hash bucket indexing
Ondrej Valousek ondrej.valousek.xm@renesas.com nfsd: Add support for the birth time attribute
Chuck Lever chuck.lever@oracle.com NFSD: Deprecate NFS_OFFSET_MAX
Chuck Lever chuck.lever@oracle.com NFSD: COMMIT operations must not return NFS?ERR_INVAL
Chuck Lever chuck.lever@oracle.com NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
Chuck Lever chuck.lever@oracle.com NFSD: Fix ia_size underflow
Chuck Lever chuck.lever@oracle.com NFSD: Fix the behavior of READ near OFFSET_MAX
J. Bruce Fields bfields@redhat.com lockd: fix failure to cleanup client locks
J. Bruce Fields bfields@redhat.com lockd: fix server crash on reboot of client holding lock
Yang Li yang.lee@linux.alibaba.com fanotify: remove variable set but not used
J. Bruce Fields bfields@redhat.com nfsd: fix crash on COPY_NOTIFY with special stateid
Chuck Lever chuck.lever@oracle.com NFSD: Move fill_pre_wcc() and fill_post_wcc()
Chuck Lever chuck.lever@oracle.com Revert "nfsd: skip some unnecessary stats in the v4 case"
Chuck Lever chuck.lever@oracle.com NFSD: Trace boot verifier resets
Chuck Lever chuck.lever@oracle.com NFSD: Rename boot verifier functions
Chuck Lever chuck.lever@oracle.com NFSD: Clean up the nfsd_net::nfssvc_boot field
Chuck Lever chuck.lever@oracle.com NFSD: Write verifier might go backwards
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
Chuck Lever chuck.lever@oracle.com NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
Chuck Lever chuck.lever@oracle.com NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
Chuck Lever chuck.lever@oracle.com NFSD: Clean up nfsd_vfs_write()
Jeff Layton jeff.layton@primarydata.com nfsd: Retry once in nfsd_open on an -EOPENSTALE return
Jeff Layton jeff.layton@primarydata.com nfsd: Add errno mapping for EREMOTEIO
Peng Tao tao.peng@primarydata.com nfsd: map EBADF
Chuck Lever chuck.lever@oracle.com NFSD: Fix zero-length NFSv3 WRITEs
Vasily Averin vvs@virtuozzo.com nfsd4: add refcount for nfsd4_blocked_lock
J. Bruce Fields bfields@redhat.com nfs: block notification on fs with its own ->lock
Chuck Lever chuck.lever@oracle.com NFSD: De-duplicate nfsd4_decode_bitmap4()
J. Bruce Fields bfields@redhat.com nfsd: improve stateid access bitmask documentation
Chuck Lever chuck.lever@oracle.com NFSD: Combine XDR error tracepoints
NeilBrown neilb@suse.de NFSD: simplify per-net file cache management
Jiapeng Chong jiapeng.chong@linux.alibaba.com NFSD: Fix inconsistent indenting
Chuck Lever chuck.lever@oracle.com NFSD: Remove be32_to_cpu() from DRC hash function
NeilBrown neilb@suse.de NFS: switch the callback service back to non-pooled.
NeilBrown neilb@suse.de lockd: use svc_set_num_threads() for thread start and stop
NeilBrown neilb@suse.de SUNRPC: always treat sv_nrpools==1 as "not pooled"
NeilBrown neilb@suse.de SUNRPC: move the pool_map definitions (back) into svc.c
NeilBrown neilb@suse.de lockd: rename lockd_create_svc() to lockd_get()
NeilBrown neilb@suse.de lockd: introduce lockd_put()
NeilBrown neilb@suse.de lockd: move svc_exit_thread() into the thread
NeilBrown neilb@suse.de lockd: move lockd_start_svc() call into lockd_create_svc()
NeilBrown neilb@suse.de lockd: simplify management of network status notifiers
NeilBrown neilb@suse.de lockd: introduce nlmsvc_serv
NeilBrown neilb@suse.de NFSD: simplify locking for network notifier.
NeilBrown neilb@suse.de SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
NeilBrown neilb@suse.de NFSD: Make it possible to use svc_set_num_threads_sync
NeilBrown neilb@suse.de NFSD: narrow nfsd_mutex protection in nfsd thread
NeilBrown neilb@suse.de SUNRPC: use sv_lock to protect updates to sv_nrthreads.
NeilBrown neilb@suse.de nfsd: make nfsd_stats.th_cnt atomic_t
NeilBrown neilb@suse.de SUNRPC: stop using ->sv_nrthreads as a refcount
NeilBrown neilb@suse.de SUNRPC/NFSD: clean up get/put functions.
NeilBrown neilb@suse.de SUNRPC: change svc_get() to return the svc.
NeilBrown neilb@suse.de NFSD: handle errors better in write_ports_addfd()
Chuck Lever chuck.lever@oracle.com NFSD: Fix sparse warning
Eric W. Biederman ebiederm@xmission.com exit: Rename module_put_and_exit to module_put_and_kthread_exit
Eric W. Biederman ebiederm@xmission.com exit: Implement kthread_exit
Amir Goldstein amir73il@gmail.com fanotify: wire up FAN_RENAME event
Amir Goldstein amir73il@gmail.com fanotify: report old and/or new parent+name in FAN_RENAME event
Amir Goldstein amir73il@gmail.com fanotify: record either old name new name or both for FAN_RENAME
Amir Goldstein amir73il@gmail.com fanotify: record old and new parent and name in FAN_RENAME event
Amir Goldstein amir73il@gmail.com fanotify: support secondary dir fh and name in fanotify_info
Amir Goldstein amir73il@gmail.com fanotify: use helpers to parcel fanotify_info buffer
Amir Goldstein amir73il@gmail.com fanotify: use macros to get the offset to fanotify_info buffer
Amir Goldstein amir73il@gmail.com fsnotify: generate FS_RENAME event with rich information
Amir Goldstein amir73il@gmail.com fanotify: introduce group flag FAN_REPORT_TARGET_FID
Amir Goldstein amir73il@gmail.com fsnotify: separate mark iterator type from object type enum
Amir Goldstein amir73il@gmail.com fsnotify: clarify object type argument
Chuck Lever chuck.lever@oracle.com NFSD: Fix READDIR buffer overflow
Chuck Lever chuck.lever@oracle.com NFSD: Fix exposure in nfsd4_decode_bitmap()
J. Bruce Fields bfields@redhat.com nfsd4: remove obselete comment
Changcheng Deng deng.changcheng@zte.com.cn NFSD:fix boolreturn.cocci warning
J. Bruce Fields bfields@redhat.com nfsd: update create verifier comment
Chuck Lever chuck.lever@oracle.com SUNRPC: Change return value type of .pc_encode
Chuck Lever chuck.lever@oracle.com SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
Chuck Lever chuck.lever@oracle.com NFSD: Save location of NFSv4 COMPOUND status
Chuck Lever chuck.lever@oracle.com SUNRPC: Change return value type of .pc_decode
Chuck Lever chuck.lever@oracle.com SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
Chuck Lever chuck.lever@oracle.com NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
Colin Ian King colin.king@canonical.com NFSD: Initialize pointer ni with NULL and not plain integer 0
NeilBrown neilb@suse.de NFSD: simplify struct nfsfh
NeilBrown neilb@suse.de NFSD: drop support for ancient filehandles
NeilBrown neilb@suse.de NFSD: move filehandle format declarations out of "uapi".
Chuck Lever chuck.lever@oracle.com NFSD: Optimize DRC bucket pruning
Chuck Lever chuck.lever@oracle.com SUNRPC: Trace calls to .rpc_call_done
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Allow users to request FAN_FS_ERROR events
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Emit generic error info for error event
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Report fid info for file related file system errors
Gabriel Krisman Bertazi krisman@collabora.com fanotify: WARN_ON against too large file handles
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Add helpers to decide whether to report FID/DFID
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Wrap object_fh inline space in a creator macro
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Support merging of error events
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Support enqueueing of error events
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Pre-allocate pool of error events
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Reserve UAPI bits for FAN_FS_ERROR
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Support FS_ERROR event type
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Require fid_mode for any non-fd event
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Encode empty file handle when no inode is provided
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Allow file handle encoding for unhashed events
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Support null inode event in fanotify_dfid_inode
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Pass group argument to free_event
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Protect fsnotify_handle_inode_event from no-inode events
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Retrieve super block from the data field
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Add wrapper around fsnotify_add_event
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Add helper to detect overflow_event
Gabriel Krisman Bertazi krisman@collabora.com inotify: Don't force FS_IN_IGNORED
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Split fsid check from other fid mode checks
Gabriel Krisman Bertazi krisman@collabora.com fanotify: Fold event size calculation to its own function
Gabriel Krisman Bertazi krisman@collabora.com fsnotify: Don't insert unmergeable events in hashtable
Amir Goldstein amir73il@gmail.com fsnotify: clarify contract for create event hooks
Amir Goldstein amir73il@gmail.com fsnotify: pass dentry instead of inode data
Amir Goldstein amir73il@gmail.com fsnotify: pass data_type to fsnotify_name()
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Fix a warning for nfsd_file_close_inode
Chuck Lever chuck.lever@oracle.com NLM: Fix svcxdr_encode_owner()
Amir Goldstein amir73il@gmail.com fsnotify: fix sb_connectors leak
Chuck Lever chuck.lever@oracle.com NFS: Remove unused callback void decoder
Chuck Lever chuck.lever@oracle.com NFS: Add a private local dispatcher for NFSv4 callback operations
Chuck Lever chuck.lever@oracle.com SUNRPC: Eliminate the RQ_AUTHERR flag
Chuck Lever chuck.lever@oracle.com SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
Chuck Lever chuck.lever@oracle.com SUNRPC: Add svc_rqst::rq_auth_stat
J. Bruce Fields bfields@redhat.com nfs: don't allow reexport reclaims
J. Bruce Fields bfields@redhat.com lockd: don't attempt blocking locks on nfs reexports
J. Bruce Fields bfields@redhat.com nfs: don't atempt blocking locks on nfs reexports
J. Bruce Fields bfields@redhat.com Keep read and write fds with each nlm_file
J. Bruce Fields bfields@redhat.com lockd: update nlm_lookup_file reexport comment
J. Bruce Fields bfields@redhat.com nlm: minor refactoring
J. Bruce Fields bfields@redhat.com nlm: minor nlm_lookup_file argument change
Jia He hejianet@gmail.com lockd: change the proc_handler for nsm_use_hostnames
Jia He hejianet@gmail.com sysctl: introduce new proc handler proc_dobool
NeilBrown neilb@suse.de NFSD: remove vanity comments
Chuck Lever chuck.lever@oracle.com NFSD: Batch release pages during splice read
Chuck Lever chuck.lever@oracle.com SUNRPC: Add svc_rqst_replace_page() API
Chuck Lever chuck.lever@oracle.com NFSD: Clean up splice actor
Amir Goldstein amir73il@gmail.com fsnotify: optimize the case of no marks of any type
Amir Goldstein amir73il@gmail.com fsnotify: count all objects with attached connectors
Amir Goldstein amir73il@gmail.com fsnotify: count s_fsnotify_inode_refs for attached connectors
Amir Goldstein amir73il@gmail.com fsnotify: replace igrab() with ihold() on attach connector
Matthew Bobrowski repnop@google.com fanotify: add pidfd support to the fanotify API
Matthew Bobrowski repnop@google.com fanotify: introduce a generic info record copying helper
Matthew Bobrowski repnop@google.com fanotify: minor cosmetic adjustments to fid labels
Matthew Bobrowski repnop@google.com kernel/pid.c: implement additional checks upon pidfd_create() parameters
Matthew Bobrowski repnop@google.com kernel/pid.c: remove static qualifier from pidfd_create()
J. Bruce Fields bfields@redhat.com nfsd: fix NULL dereference in nfs3svc_encode_getaclres
Chuck Lever chuck.lever@oracle.com NFSD: Prevent a possible oops in the nfs_dirent() tracepoint
Colin Ian King colin.king@canonical.com nfsd: remove redundant assignment to pointer 'this'
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 void results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 void results encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 FREE_ALL arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 SHARE arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 SM_NOTIFY arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 nlm_res arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 UNLOCK arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 CANCEL arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 LOCK arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 TEST arguments decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Update the NLMv1 void argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com lockd: Common NLM XDR helpers
Chuck Lever chuck.lever@oracle.com lockd: Create a simplified .vs_dispatch method for NLM requests
Chuck Lever chuck.lever@oracle.com lockd: Remove stale comments
J. Bruce Fields bfields@redhat.com nfsd: rpc_peeraddr2str needs rcu lock
Wei Yongjun weiyongjun1@huawei.com NFSD: Fix error return code in nfsd4_interssc_connect()
Dai Ngo dai.ngo@oracle.com nfsd: fix kernel test robot warning in SSC code
Dave Wysochanski dwysocha@redhat.com nfsd4: Expose the callback address and state of each NFS4 client
J. Bruce Fields bfields@redhat.com nfsd: move fsnotify on client creation outside spinlock
Dai Ngo dai.ngo@oracle.com NFSD: delay unmount source's export after inter-server copy completed.
Olga Kornievskaia kolga@netapp.com NFSD add vfs_fsync after async copy is done
J. Bruce Fields bfields@redhat.com nfsd: move some commit_metadata()s outside the inode lock
Yu Hsiang Huang nickhuang@synology.com nfsd: Prevent truncation of an unlinked inode from blocking access to its directory
Chuck Lever chuck.lever@oracle.com NFSD: Update nfsd_cb_args tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Remove the nfsd_cb_work and nfsd_cb_done tracepoints
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd_cb_probe tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Replace the nfsd_deleg_break tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd_cb_offload tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add an nfsd_cb_lm_notify tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Enhance the nfsd_cb_setup tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Adjust cb_shutdown tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add cb_lost tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Drop TRACE_DEFINE_ENUM for NFSD4_CB_<state> macros
Chuck Lever chuck.lever@oracle.com NFSD: Capture every CB state transition
Chuck Lever chuck.lever@oracle.com NFSD: Constify @fh argument of knfsd_fh_hash()
Chuck Lever chuck.lever@oracle.com NFSD: Add tracepoints for EXCHANGEID edge cases
Chuck Lever chuck.lever@oracle.com NFSD: Add tracepoints for SETCLIENTID edge cases
Chuck Lever chuck.lever@oracle.com NFSD: Add a couple more nfsd_clid_expired call sites
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_clid_destroyed tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_clid_reclaim_complete tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_clid_confirmed tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Remove trace_nfsd_clid_inuse_err
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_clid_verf_mismatch tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add nfsd_clid_cred_mismatch tracepoint
Chuck Lever chuck.lever@oracle.com NFSD: Add an RPC authflavor tracepoint display helper
Amir Goldstein amir73il@gmail.com fanotify: fix permission model of unprivileged group
Trond Myklebust trond.myklebust@hammerspace.com NFS: fix nfs_fetch_iversion()
Dai Ngo dai.ngo@oracle.com NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
Gustavo A. R. Silva gustavoars@kernel.org nfsd: Fix fall-through warnings for Clang
J. Bruce Fields bfields@redhat.com nfsd: grant read delegations to clients holding writes
J. Bruce Fields bfields@redhat.com nfsd: reshuffle some code
J. Bruce Fields bfields@redhat.com nfsd: track filehandle aliasing in nfs4_files
J. Bruce Fields bfields@redhat.com nfsd: hash nfs4_files by inode number
Vasily Averin vvs@virtuozzo.com nfsd: removed unused argument in nfsd_startup_generic()
Jiapeng Chong jiapeng.chong@linux.alibaba.com nfsd: remove unused function
Christian Brauner christian.brauner@ubuntu.com fanotify_user: use upper_32_bits() to verify mask
Amir Goldstein amir73il@gmail.com fanotify: support limited functionality for unprivileged users
Amir Goldstein amir73il@gmail.com fanotify: configurable limits via sysfs
Amir Goldstein amir73il@gmail.com fanotify: limit number of event merge attempts
Amir Goldstein amir73il@gmail.com fsnotify: use hash table for faster events merge
Amir Goldstein amir73il@gmail.com fanotify: mix event info and pid into merge key hash
Amir Goldstein amir73il@gmail.com fanotify: reduce event objectid to 29-bit hash
Chuck Lever chuck.lever@oracle.com Revert "fanotify: limit number of event merge attempts"
Amir Goldstein amir73il@gmail.com fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
Guobin Huang huangguobin4@huawei.com NFSD: Use DEFINE_SPINLOCK() for spinlock
Gustavo A. R. Silva gustavoars@kernel.org UAPI: nfsfh.h: Replace one-element array with flexible-array member
Chuck Lever chuck.lever@oracle.com SUNRPC: Export svc_xprt_received()
NeilBrown neilb@suse.de nfsd: report client confirmation status in "info" file
J. Bruce Fields bfields@redhat.com nfsd: don't ignore high bits of copy count
J. Bruce Fields bfields@redhat.com nfsd: COPY with length 0 should copy to end of file
Ricardo Ribalda ribalda@chromium.org nfsd: Fix typo "accesible"
Paul Menzel pmenzel@molgen.mpg.de nfsd: Log client tracking type log message as info instead of warning
J. Bruce Fields bfields@redhat.com nfsd: helper for laundromat expiry calculations
Chuck Lever chuck.lever@oracle.com NFSD: Clean up NFSDDBG_FACILITY macro
Chuck Lever chuck.lever@oracle.com NFSD: Add a tracepoint to record directory entry encoding
Chuck Lever chuck.lever@oracle.com NFSD: Clean up after updating NFSv3 ACL encoders
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Clean up after updating NFSv2 ACL encoders
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs
Chuck Lever chuck.lever@oracle.com NFSD: Remove unused NFSv2 directory entry encoders
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder
Chuck Lever chuck.lever@oracle.com NFSD: Add a helper that encodes NFSv3 directory offset cookies, again
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 stat encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations
Chuck Lever chuck.lever@oracle.com NFSD: Remove unused NFSv3 directory entry encoders
Chuck Lever chuck.lever@oracle.com NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder
Chuck Lever chuck.lever@oracle.com NFSD: Add a helper that encodes NFSv3 directory offset cookies
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the GETATTR3res encoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Extract the svcxdr_init_encode() helper
Christian Brauner christian.brauner@ubuntu.com namei: introduce struct renamedata
Christian Brauner christian.brauner@ubuntu.com fs: add file and path permissions helpers
Christoph Hellwig hch@lst.de kallsyms: only build {,module_}kallsyms_on_each_symbol when required
Christoph Hellwig hch@lst.de kallsyms: refactor {,module_}kallsyms_on_each_symbol
Christoph Hellwig hch@lst.de module: use RCU to synchronize find_module
Christoph Hellwig hch@lst.de module: unexport find_module and module_mutex
Shakeel Butt shakeelb@google.com inotify, memcg: account inotify instances to kmemcg
J. Bruce Fields bfields@redhat.com nfsd: skip some unnecessary stats in the v4 case
J. Bruce Fields bfields@redhat.com nfs: use change attribute for NFS re-exports
Dai Ngo dai.ngo@oracle.com NFSv4_2: SSC helper should use its own config.
J. Bruce Fields bfields@redhat.com nfsd: cstate->session->se_client -> cstate->clp
J. Bruce Fields bfields@redhat.com nfsd: simplify nfsd4_check_open_reclaim
J. Bruce Fields bfields@redhat.com nfsd: remove unused set_client argument
J. Bruce Fields bfields@redhat.com nfsd: find_cpntf_state cleanup
J. Bruce Fields bfields@redhat.com nfsd: refactor set_client
J. Bruce Fields bfields@redhat.com nfsd: rename lookup_clientid->set_client
J. Bruce Fields bfields@redhat.com nfsd: simplify nfsd_renew
J. Bruce Fields bfields@redhat.com nfsd: simplify process_lock
J. Bruce Fields bfields@redhat.com nfsd4: simplify process_lookup1
Amir Goldstein amir73il@gmail.com nfsd: report per-export stats
Amir Goldstein amir73il@gmail.com nfsd: protect concurrent access to nfsd stats counters
Amir Goldstein amir73il@gmail.com nfsd: remove unused stats counters
Chuck Lever chuck.lever@oracle.com NFSD: Clean up after updating NFSv3 ACL decoders
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream, again
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 GETACL argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Clean up after updating NFSv2 ACL decoders
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 ACL ACCESS argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 ACL GETATTR argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Add an xdr_stream-based decoder for NFSv2/3 ACLs
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 GETACL argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Remove argument length checking in nfsd_dispatch()
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 SYMLINK argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 CREATE argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 SETATTR argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 LINK argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 RENAME argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update NFSv2 diropargs decoding to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READDIR argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to set up the pages where the dirlist is encoded, again
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READLINK argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 WRITE argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 READ argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv2 GETATTR argument decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the MKNOD3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the SYMLINK3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the MKDIR3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the CREATE3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the SETATTR3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the LINK3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the RENAME3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update COMMIT3arg decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update READDIR3args decoders to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to set up the pages where the dirlist is encoded
Chuck Lever chuck.lever@oracle.com NFSD: Fix returned READDIR offset cookie
Chuck Lever chuck.lever@oracle.com NFSD: Update READLINK3arg decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update WRITE3arg decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update READ3arg decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update ACCESS3arg decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com NFSD: Update GETATTR3args decoder to use struct xdr_stream
Chuck Lever chuck.lever@oracle.com SUNRPC: Move definition of XDR_UNIT
Chuck Lever chuck.lever@oracle.com SUNRPC: Display RPC procedure names instead of proc numbers
Chuck Lever chuck.lever@oracle.com SUNRPC: Make trace_svc_process() display the RPC procedure symbolically
Chuck Lever chuck.lever@oracle.com NFSD: Restore NFSv4 decoding's SAVEMEM functionality
Chuck Lever chuck.lever@oracle.com NFSD: Fix sparse warning in nfssvc.c
Zheng Yongjun zhengyongjun3@huawei.com fs/lockd: convert comma to semicolon
Waiman Long longman@redhat.com inotify: Increase default inotify.max_user_watches limit to 1048576
Eric W. Biederman ebiederm@xmission.com file: Replace ksys_close with close_fd
Eric W. Biederman ebiederm@xmission.com file: Rename __close_fd to close_fd and remove the files parameter
Eric W. Biederman ebiederm@xmission.com file: Merge __alloc_fd into alloc_fd
Eric W. Biederman ebiederm@xmission.com file: In f_dupfd read RLIMIT_NOFILE once.
Eric W. Biederman ebiederm@xmission.com file: Merge __fd_install into fd_install
Eric W. Biederman ebiederm@xmission.com proc/fd: In fdinfo seq_show don't use get_files_struct
Eric W. Biederman ebiederm@xmission.com proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
Eric W. Biederman ebiederm@xmission.com file: Implement task_lookup_next_fd_rcu
Eric W. Biederman ebiederm@xmission.com kcmp: In get_file_raw_ptr use task_lookup_fd_rcu
Eric W. Biederman ebiederm@xmission.com proc/fd: In tid_fd_mode use task_lookup_fd_rcu
Eric W. Biederman ebiederm@xmission.com file: Implement task_lookup_fd_rcu
Eric W. Biederman ebiederm@xmission.com file: Rename fcheck lookup_fd_rcu
Eric W. Biederman ebiederm@xmission.com file: Replace fcheck_files with files_lookup_fd_rcu
Eric W. Biederman ebiederm@xmission.com file: Factor files_lookup_fd_locked out of fcheck_files
Eric W. Biederman ebiederm@xmission.com file: Rename __fcheck_files to files_lookup_fd_raw
Chuck Lever chuck.lever@oracle.com Revert "fget: clarify and improve __fget_files() implementation"
Eric W. Biederman ebiederm@xmission.com proc/fd: In proc_fd_link use fget_task
Eric W. Biederman ebiederm@xmission.com bpf: In bpf_task_fd_query use fget_task
Eric W. Biederman ebiederm@xmission.com kcmp: In kcmp_epoll_target use fget_task
Eric W. Biederman ebiederm@xmission.com exec: Remove reset_files_struct
Eric W. Biederman ebiederm@xmission.com exec: Simplify unshare_files
Eric W. Biederman ebiederm@xmission.com exec: Move unshare_files to fix posix file locking during exec
Eric W. Biederman ebiederm@xmission.com exec: Don't open code get_close_on_exec
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Record NFSv4 pre/post-op attributes as non-atomic
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
Trond Myklebust trond.myklebust@hammerspace.com nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
Trond Myklebust trond.myklebust@hammerspace.com exportfs: Add a function to return the raw output from fh_to_dentry()
Jeff Layton jeff.layton@primarydata.com nfsd: close cached files prior to a REMOVE or RENAME that would replace target
Jeff Layton jeff.layton@primarydata.com nfsd: allow filesystems to opt out of subtree checking
Jeff Layton jeff.layton@primarydata.com nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
J. Bruce Fields bfields@redhat.com Revert "nfsd4: support change_attr_type attribute"
J. Bruce Fields bfields@redhat.com nfsd4: don't query change attribute in v2/v3 case
J. Bruce Fields bfields@redhat.com nfsd: minor nfsd4_change_attribute cleanup
J. Bruce Fields bfields@redhat.com nfsd: simplify nfsd4_change_info
J. Bruce Fields bfields@redhat.com nfsd: only call inode_query_iversion in the I_VERSION case
Chuck Lever chuck.lever@oracle.com NFSD: Remove macros that are no longer used
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_compound()
Chuck Lever chuck.lever@oracle.com NFSD: Make nfsd4_ops::opnum a u32
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_listxattrs()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_setxattr()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_xattr_name()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_clone()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_seek()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_offload_status()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_copy_notify()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_copy()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_nl4_server()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_fallocate()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_reclaim_complete()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_destroy_clientid()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_test_stateid()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_sequence()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_secinfo_no_name()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_layoutreturn()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_layoutget()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_layoutcommit()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_getdeviceinfo()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_free_stateid()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_destroy_session()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_create_session()
Chuck Lever chuck.lever@oracle.com NFSD: Add a helper to decode channel_attrs4
Chuck Lever chuck.lever@oracle.com NFSD: Add a helper to decode nfs_impl_id4
Chuck Lever chuck.lever@oracle.com NFSD: Add a helper to decode state_protect4_a
Chuck Lever chuck.lever@oracle.com NFSD: Add a separate decoder for ssv_sp_parms
Chuck Lever chuck.lever@oracle.com NFSD: Add a separate decoder to handle state_protect_ops
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_bind_conn_to_session()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_backchannel_ctl()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_cb_sec()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_release_lockowner()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_write()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_verify()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_setclientid_confirm()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_setclientid()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_setattr()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_secinfo()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_renew()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_rename()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_remove()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_readdir()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_read()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_putfh()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_open_downgrade()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_open_confirm()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_open()
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to decode OPEN's open_claim4 argument
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_share_deny()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_share_access()
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to decode OPEN's openflag4 argument
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to decode OPEN's createhow4 argument
Chuck Lever chuck.lever@oracle.com NFSD: Add helper to decode NFSv4 verifiers
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_lookup()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_locku()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_lockt()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_lock()
Chuck Lever chuck.lever@oracle.com NFSD: Add helper for decoding locker4
Chuck Lever chuck.lever@oracle.com NFSD: Add helpers to decode a clientid4 and an NFSv4 state owner
Chuck Lever chuck.lever@oracle.com NFSD: Relocate nfsd4_decode_opaque()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_link()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_getattr()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_delegreturn()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_create()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_fattr()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 umask attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 security label attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 time_set attributes
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 owner_group attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 owner attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 mode attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 acl attribute
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros that decode the fattr4 size attribute
Chuck Lever chuck.lever@oracle.com NFSD: Change the way the expected length of a fattr4 is checked
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_commit()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_close()
Chuck Lever chuck.lever@oracle.com NFSD: Replace READ* macros in nfsd4_decode_access()
Chuck Lever chuck.lever@oracle.com NFSD: Replace the internals of the READ_BUF() macro
Chuck Lever chuck.lever@oracle.com NFSD: Add tracepoints in nfsd4_decode/encode_compound()
Chuck Lever chuck.lever@oracle.com NFSD: Add tracepoints in nfsd_dispatch()
Chuck Lever chuck.lever@oracle.com NFSD: Add common helpers to decode void args and encode void results
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
Chuck Lever chuck.lever@oracle.com SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()
Huang Guobin huangguobin4@huawei.com nfsd: Fix error return code in nfsd_file_cache_init()
Chuck Lever chuck.lever@oracle.com NFSD: Add SPDX header for fs/nfsd/trace.c
Chuck Lever chuck.lever@oracle.com NFSD: Remove extra "0x" in tracepoint format specifier
Chuck Lever chuck.lever@oracle.com NFSD: Clean up the show_nf_may macro
Alex Shi alex.shi@linux.alibaba.com nfsd/nfs3: remove unused macro nfsd3_fhandleres
Tom Rix trix@redhat.com NFSD: A semicolon is not needed after a switch statement.
Chuck Lever chuck.lever@oracle.com NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders
Chuck Lever chuck.lever@oracle.com SUNRPC: Rename svc_encode_read_payload()
-------------
Diffstat:
Documentation/filesystems/files.rst | 8 +- Documentation/filesystems/locking.rst | 10 +- Documentation/filesystems/nfs/exporting.rst | 78 + Makefile | 4 +- arch/powerpc/platforms/cell/spufs/coredump.c | 2 +- crypto/algboss.c | 4 +- fs/Kconfig | 6 +- fs/autofs/dev-ioctl.c | 5 +- fs/cachefiles/namei.c | 9 +- fs/cifs/connect.c | 2 +- fs/coredump.c | 5 +- fs/ecryptfs/inode.c | 10 +- fs/exec.c | 29 +- fs/exportfs/expfs.c | 40 +- fs/file.c | 177 +- fs/init.c | 6 +- fs/lockd/clnt4xdr.c | 9 +- fs/lockd/clntproc.c | 3 - fs/lockd/host.c | 4 +- fs/lockd/svc.c | 262 +- fs/lockd/svc4proc.c | 70 +- fs/lockd/svclock.c | 67 +- fs/lockd/svcproc.c | 62 +- fs/lockd/svcsubs.c | 123 +- fs/lockd/svcxdr.h | 142 + fs/lockd/xdr.c | 448 +-- fs/lockd/xdr4.c | 472 ++-- fs/locks.c | 102 +- fs/namei.c | 21 +- fs/nfs/blocklayout/blocklayout.c | 2 +- fs/nfs/blocklayout/dev.c | 2 +- fs/nfs/callback.c | 111 +- fs/nfs/callback_xdr.c | 33 +- fs/nfs/dir.c | 2 +- fs/nfs/export.c | 17 + fs/nfs/file.c | 3 + fs/nfs/filelayout/filelayout.c | 4 +- fs/nfs/filelayout/filelayoutdev.c | 2 +- fs/nfs/flexfilelayout/flexfilelayout.c | 4 +- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- fs/nfs/nfs42xdr.c | 2 +- fs/nfs/nfs4state.c | 2 +- fs/nfs/nfs4xdr.c | 6 +- fs/nfs/pagelist.c | 3 - fs/nfs/super.c | 8 + fs/nfs/write.c | 3 - fs/nfs_common/Makefile | 2 +- fs/nfs_common/nfs_ssc.c | 2 - fs/nfs_common/nfsacl.c | 123 + fs/nfsd/Kconfig | 36 +- fs/nfsd/Makefile | 8 +- fs/nfsd/acl.h | 6 +- fs/nfsd/blocklayout.c | 1 + fs/nfsd/blocklayoutxdr.c | 1 + fs/nfsd/cache.h | 2 +- fs/nfsd/export.c | 74 +- fs/nfsd/export.h | 16 +- fs/nfsd/filecache.c | 1229 +++++---- fs/nfsd/filecache.h | 23 +- fs/nfsd/flexfilelayout.c | 3 +- fs/nfsd/lockd.c | 10 +- fs/nfsd/netns.h | 63 +- fs/nfsd/nfs2acl.c | 214 +- fs/nfsd/nfs3acl.c | 140 +- fs/nfsd/nfs3proc.c | 396 ++- fs/nfsd/nfs3xdr.c | 1763 ++++++------ fs/nfsd/nfs4acl.c | 45 +- fs/nfsd/nfs4callback.c | 168 +- fs/nfsd/nfs4idmap.c | 9 +- fs/nfsd/nfs4layouts.c | 4 +- fs/nfsd/nfs4proc.c | 1111 +++++--- fs/nfsd/nfs4recover.c | 20 +- fs/nfsd/nfs4state.c | 1725 ++++++++---- fs/nfsd/nfs4xdr.c | 3763 ++++++++++++++------------ fs/nfsd/nfscache.c | 115 +- fs/nfsd/nfsctl.c | 169 +- fs/nfsd/nfsd.h | 50 +- fs/nfsd/nfsfh.c | 291 +- fs/nfsd/nfsfh.h | 179 +- fs/nfsd/nfsproc.c | 262 +- fs/nfsd/nfssvc.c | 356 ++- fs/nfsd/nfsxdr.c | 834 +++--- fs/nfsd/state.h | 69 +- fs/nfsd/stats.c | 126 +- fs/nfsd/stats.h | 96 +- fs/nfsd/trace.c | 1 + fs/nfsd/trace.h | 894 +++++- fs/nfsd/vfs.c | 931 +++---- fs/nfsd/vfs.h | 60 +- fs/nfsd/xdr.h | 68 +- fs/nfsd/xdr3.h | 116 +- fs/nfsd/xdr4.h | 127 +- fs/nfsd/xdr4cb.h | 6 + fs/notify/dnotify/dnotify.c | 17 +- fs/notify/fanotify/fanotify.c | 487 +++- fs/notify/fanotify/fanotify.h | 252 +- fs/notify/fanotify/fanotify_user.c | 882 ++++-- fs/notify/fdinfo.c | 19 +- fs/notify/fsnotify.c | 183 +- fs/notify/fsnotify.h | 19 +- fs/notify/group.c | 38 +- fs/notify/inotify/inotify.h | 11 +- fs/notify/inotify/inotify_fsnotify.c | 12 +- fs/notify/inotify/inotify_user.c | 87 +- fs/notify/mark.c | 172 +- fs/notify/notification.c | 78 +- fs/open.c | 49 +- fs/overlayfs/overlayfs.h | 9 +- fs/proc/fd.c | 48 +- fs/udf/file.c | 2 +- fs/verity/enable.c | 2 +- include/linux/dnotify.h | 2 +- include/linux/errno.h | 1 + include/linux/exportfs.h | 15 + include/linux/fanotify.h | 74 +- include/linux/fdtable.h | 37 +- include/linux/fs.h | 54 +- include/linux/fsnotify.h | 77 +- include/linux/fsnotify_backend.h | 372 ++- include/linux/iversion.h | 13 + include/linux/kallsyms.h | 17 +- include/linux/kthread.h | 1 + include/linux/lockd/bind.h | 3 +- include/linux/lockd/lockd.h | 17 +- include/linux/lockd/xdr.h | 35 +- include/linux/lockd/xdr4.h | 33 +- include/linux/module.h | 24 +- include/linux/nfs.h | 8 - include/linux/nfs4.h | 21 +- include/linux/nfs_ssc.h | 14 + include/linux/nfsacl.h | 6 + include/linux/pid.h | 1 + include/linux/sched/user.h | 3 - include/linux/sunrpc/msg_prot.h | 3 - include/linux/sunrpc/svc.h | 151 +- include/linux/sunrpc/svc_rdma.h | 4 +- include/linux/sunrpc/svc_xprt.h | 16 +- include/linux/sunrpc/svcauth.h | 4 +- include/linux/sunrpc/svcsock.h | 7 +- include/linux/sunrpc/xdr.h | 153 +- include/linux/syscalls.h | 12 - include/linux/sysctl.h | 2 + include/linux/user_namespace.h | 4 + include/trace/events/sunrpc.h | 26 +- include/uapi/linux/fanotify.h | 42 + include/uapi/linux/nfs3.h | 6 + include/uapi/linux/nfsd/nfsfh.h | 105 - kernel/audit_fsnotify.c | 8 +- kernel/audit_tree.c | 2 +- kernel/audit_watch.c | 5 +- kernel/bpf/inode.c | 2 +- kernel/bpf/syscall.c | 20 +- kernel/bpf/task_iter.c | 2 +- kernel/fork.c | 12 +- kernel/kallsyms.c | 8 +- kernel/kcmp.c | 29 +- kernel/kthread.c | 23 +- kernel/livepatch/core.c | 7 +- kernel/module.c | 26 +- kernel/pid.c | 15 +- kernel/sys.c | 2 +- kernel/sysctl.c | 54 +- kernel/trace/trace_kprobe.c | 4 +- kernel/ucount.c | 4 + mm/madvise.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- net/bluetooth/bnep/core.c | 2 +- net/bluetooth/cmtp/core.c | 2 +- net/bluetooth/hidp/core.c | 2 +- net/sunrpc/auth_gss/gss_rpc_xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 47 +- net/sunrpc/sched.c | 1 + net/sunrpc/svc.c | 314 ++- net/sunrpc/svc_xprt.c | 104 +- net/sunrpc/svcauth.c | 8 +- net/sunrpc/svcauth_unix.c | 18 +- net/sunrpc/svcsock.c | 32 +- net/sunrpc/xdr.c | 112 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 32 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- net/unix/af_unix.c | 2 +- tools/objtool/check.c | 3 +- 184 files changed, 13912 insertions(+), 8825 deletions(-)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 03493bca084fdca48abc59b00e06ce733aa9eb7d ]
Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- include/linux/sunrpc/svc.h | 6 +++--- include/linux/sunrpc/svc_rdma.h | 4 ++-- include/linux/sunrpc/svc_xprt.h | 4 ++-- net/sunrpc/svc.c | 11 ++++++----- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 8 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index dbfa24cf33906..9971d3c295731 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3843,7 +3843,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, read->rd_length = maxcount; if (nfserr) return nfserr; - if (svc_encode_read_payload(resp->rqstp, starting_len + 8, maxcount)) + if (svc_encode_result_payload(resp->rqstp, starting_len + 8, maxcount)) return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount));
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 386628b36bc75..c220b734fa69e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -519,9 +519,9 @@ void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char * svc_print_addr(struct svc_rqst *, char *, size_t); -int svc_encode_read_payload(struct svc_rqst *rqstp, - unsigned int offset, - unsigned int length); +int svc_encode_result_payload(struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, struct kvec *first, size_t total); diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 9dc3a3b88391b..2b870a3f391b1 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -207,8 +207,8 @@ extern void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, struct svc_rdma_recv_ctxt *rctxt, int status); extern int svc_rdma_sendto(struct svc_rqst *); -extern int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length); +extern int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length);
/* svc_rdma_transport.c */ extern struct svc_xprt_class svc_rdma_class; diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index aca35ab5cff24..92455e0d52445 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -21,8 +21,8 @@ struct svc_xprt_ops { int (*xpo_has_wspace)(struct svc_xprt *); int (*xpo_recvfrom)(struct svc_rqst *); int (*xpo_sendto)(struct svc_rqst *); - int (*xpo_read_payload)(struct svc_rqst *, unsigned int, - unsigned int); + int (*xpo_result_payload)(struct svc_rqst *, unsigned int, + unsigned int); void (*xpo_release_rqst)(struct svc_rqst *); void (*xpo_detach)(struct svc_xprt *); void (*xpo_free)(struct svc_xprt *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index cfe8b911ca013..e4e4e203ecdad 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1626,7 +1626,7 @@ u32 svc_max_payload(const struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svc_max_payload);
/** - * svc_encode_read_payload - mark a range of bytes as a READ payload + * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in rqstp->rq_res * @length: size of payload, in bytes @@ -1634,12 +1634,13 @@ EXPORT_SYMBOL_GPL(svc_max_payload); * Returns zero on success, or a negative errno if a permanent * error occurred. */ -int svc_encode_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { - return rqstp->rq_xprt->xpt_ops->xpo_read_payload(rqstp, offset, length); + return rqstp->rq_xprt->xpt_ops->xpo_result_payload(rqstp, offset, + length); } -EXPORT_SYMBOL_GPL(svc_encode_read_payload); +EXPORT_SYMBOL_GPL(svc_encode_result_payload);
/** * svc_fill_write_vector - Construct data argument for VFS write call diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 3d5ee042c5015..90f6231d6ed67 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -181,8 +181,8 @@ static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh) } }
-static int svc_sock_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +static int svc_sock_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { return 0; } @@ -635,7 +635,7 @@ static const struct svc_xprt_ops svc_udp_ops = { .xpo_create = svc_udp_create, .xpo_recvfrom = svc_udp_recvfrom, .xpo_sendto = svc_udp_sendto, - .xpo_read_payload = svc_sock_read_payload, + .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_udp_release_rqst, .xpo_detach = svc_sock_detach, .xpo_free = svc_sock_free, @@ -1209,7 +1209,7 @@ static const struct svc_xprt_ops svc_tcp_ops = { .xpo_create = svc_tcp_create, .xpo_recvfrom = svc_tcp_recvfrom, .xpo_sendto = svc_tcp_sendto, - .xpo_read_payload = svc_sock_read_payload, + .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_tcp_release_rqst, .xpo_detach = svc_tcp_sock_detach, .xpo_free = svc_sock_free, diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c3d588b149aaa..c8411b4f3492a 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -979,19 +979,19 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) }
/** - * svc_rdma_read_payload - special processing for a READ payload + * svc_rdma_result_payload - special processing for a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in @xdr * @length: size of payload, in bytes * * Returns zero on success. * - * For the moment, just record the xdr_buf location of the READ + * For the moment, just record the xdr_buf location of the result * payload. svc_rdma_sendto will use that location later when * we actually send the payload. */ -int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 5f7e3d12523fe..c895f80df659c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -80,7 +80,7 @@ static const struct svc_xprt_ops svc_rdma_ops = { .xpo_create = svc_rdma_create, .xpo_recvfrom = svc_rdma_recvfrom, .xpo_sendto = svc_rdma_sendto, - .xpo_read_payload = svc_rdma_read_payload, + .xpo_result_payload = svc_rdma_result_payload, .xpo_release_rqst = svc_rdma_release_rqst, .xpo_detach = svc_rdma_detach, .xpo_free = svc_rdma_free,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 76e5492b161f555c0fb69cad9eb39a7d8467f5fe ]
Have the NFSD encoders annotate the boundaries of every direct-data-placement eligible result data payload. Then change svcrdma to use that annotation instead of the xdr->page_len when handling Write chunks.
For NFSv4 on RDMA, that enables the ability to recognize multiple result payloads per compound. This is a pre-requisite for supporting multiple Write chunks per RPC transaction.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 7 +++++ fs/nfsd/nfs4xdr.c | 41 +++++++++++++++++++-------- fs/nfsd/nfsxdr.c | 6 ++++ net/sunrpc/xprtrdma/svc_rdma_sendto.c | 24 +++++----------- 4 files changed, 49 insertions(+), 29 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 716566da400e1..27b24823f7c42 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -707,6 +707,7 @@ int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readlinkres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head;
*p++ = resp->status; p = encode_post_op_attr(rqstp, p, &resp->fh); @@ -720,6 +721,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + return 0; return 1; } else return xdr_ressize_check(rqstp, p); @@ -730,6 +733,7 @@ int nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head;
*p++ = resp->status; p = encode_post_op_attr(rqstp, p, &resp->fh); @@ -746,6 +750,9 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3); } + if (svc_encode_result_payload(rqstp, head->iov_len, + resp->count)) + return 0; return 1; } else return xdr_ressize_check(rqstp, p); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9971d3c295731..4b3344296ed0e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3770,8 +3770,8 @@ static __be32 nfsd4_encode_splice_read( { struct xdr_stream *xdr = &resp->xdr; struct xdr_buf *buf = xdr->buf; + int status, space_left; u32 eof; - int space_left; __be32 nfserr; __be32 *p = xdr->p - 2;
@@ -3782,14 +3782,13 @@ static __be32 nfsd4_encode_splice_read( nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp, file, read->rd_offset, &maxcount, &eof); read->rd_length = maxcount; - if (nfserr) { - /* - * nfsd_splice_actor may have already messed with the - * page length; reset it so as not to confuse - * xdr_truncate_encode: - */ - buf->page_len = 0; - return nfserr; + if (nfserr) + goto out_err; + status = svc_encode_result_payload(read->rd_rqstp, + buf->head[0].iov_len, maxcount); + if (status) { + nfserr = nfserrno(status); + goto out_err; }
*(p++) = htonl(eof); @@ -3820,6 +3819,15 @@ static __be32 nfsd4_encode_splice_read( xdr->end = (__be32 *)((void *)xdr->end + space_left);
return 0; + +out_err: + /* + * nfsd_splice_actor may have already messed with the + * page length; reset it so as not to confuse + * xdr_truncate_encode in our caller. + */ + buf->page_len = 0; + return nfserr; }
static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, @@ -3911,6 +3919,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd int zero = 0; struct xdr_stream *xdr = &resp->xdr; int length_offset = xdr->buf->len; + int status; __be32 *p;
p = xdr_reserve_space(xdr, 4); @@ -3931,9 +3940,13 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd (char *)p, &maxcount); if (nfserr == nfserr_isdir) nfserr = nfserr_inval; - if (nfserr) { - xdr_truncate_encode(xdr, length_offset); - return nfserr; + if (nfserr) + goto out_err; + status = svc_encode_result_payload(readlink->rl_rqstp, length_offset, + maxcount); + if (status) { + nfserr = nfserrno(status); + goto out_err; }
wire_count = htonl(maxcount); @@ -3943,6 +3956,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, &zero, 4 - (maxcount&3)); return 0; + +out_err: + xdr_truncate_encode(xdr, length_offset); + return nfserr; }
static __be32 diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 8a288c8fcd57c..9e00a902113e3 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -469,6 +469,7 @@ int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readlinkres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head;
*p++ = resp->status; if (resp->status != nfs_ok) @@ -483,6 +484,8 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + return 0; return 1; }
@@ -490,6 +493,7 @@ int nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head;
*p++ = resp->status; if (resp->status != nfs_ok) @@ -507,6 +511,8 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count)) + return 0; return 1; }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c8411b4f3492a..d6436c13d5c47 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -448,7 +448,6 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, * svc_rdma_encode_write_list - Encode RPC Reply's Write chunk list * @rctxt: Reply context with information about the RPC Call * @sctxt: Send context for the RPC Reply - * @length: size in bytes of the payload in the first Write chunk * * The client provides a Write chunk list in the Call message. Fill * in the segments in the first Write chunk in the Reply's transport @@ -465,12 +464,12 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, */ static ssize_t svc_rdma_encode_write_list(const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - unsigned int length) + struct svc_rdma_send_ctxt *sctxt) { ssize_t len, ret;
- ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, length); + ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, + rctxt->rc_read_payload_length); if (ret < 0) return ret; len = ret; @@ -923,21 +922,12 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) goto err0; if (wr_lst) { /* XXX: Presume the client sent only one Write chunk */ - unsigned long offset; - unsigned int length; - - if (rctxt->rc_read_payload_length) { - offset = rctxt->rc_read_payload_offset; - length = rctxt->rc_read_payload_length; - } else { - offset = xdr->head[0].iov_len; - length = xdr->page_len; - } - ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, offset, - length); + ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, + rctxt->rc_read_payload_offset, + rctxt->rc_read_payload_length); if (ret < 0) goto err2; - if (svc_rdma_encode_write_list(rctxt, sctxt, length) < 0) + if (svc_rdma_encode_write_list(rctxt, sctxt) < 0) goto err0; } else { if (xdr_stream_encode_item_absent(&sctxt->sc_stream) < 0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Rix trix@redhat.com
[ Upstream commit 25fef48bdbe7cac5ba5577eab6a750e1caea43bc ]
Signed-off-by: Tom Rix trix@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4b3344296ed0e..fd9107332a20f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2558,7 +2558,7 @@ static u32 nfs4_file_type(umode_t mode) case S_IFREG: return NF4REG; case S_IFSOCK: return NF4SOCK; default: return NF4BAD; - }; + } }
static inline __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit 71fd721839a74d945c242299f6be29a246fc2131 ]
The macro is unused, remove it to tame gcc warning: fs/nfsd/nfs3proc.c:702:0: warning: macro "nfsd3_fhandleres" is not used [-Wunused-macros]
Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Cc: "J. Bruce Fields" bfields@fieldses.org Cc: Chuck Lever chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 981a4e4c9a3cf..eba84417406ca 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -694,7 +694,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) #define nfsd3_mkdirargs nfsd3_createargs #define nfsd3_readdirplusargs nfsd3_readdirargs #define nfsd3_fhandleargs nfsd_fhandle -#define nfsd3_fhandleres nfsd3_attrstat #define nfsd3_attrstatres nfsd3_attrstat #define nfsd3_wccstatres nfsd3_attrstat #define nfsd3_createres nfsd3_diropres
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b76278ae68848cea13b325d247aa5cf31c87edac ]
Display all currently possible NFSD_MAY permission flags.
Move and rename show_nf_may with a more generic name because the NFSD_MAY permission flags are used in other places besides the file cache.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index a952f4a9b2a68..7bb1c398daa51 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -12,6 +12,22 @@ #include "export.h" #include "nfsfh.h"
+#define show_nfsd_may_flags(x) \ + __print_flags(x, "|", \ + { NFSD_MAY_EXEC, "EXEC" }, \ + { NFSD_MAY_WRITE, "WRITE" }, \ + { NFSD_MAY_READ, "READ" }, \ + { NFSD_MAY_SATTR, "SATTR" }, \ + { NFSD_MAY_TRUNC, "TRUNC" }, \ + { NFSD_MAY_LOCK, "LOCK" }, \ + { NFSD_MAY_OWNER_OVERRIDE, "OWNER_OVERRIDE" }, \ + { NFSD_MAY_LOCAL_ACCESS, "LOCAL_ACCESS" }, \ + { NFSD_MAY_BYPASS_GSS_ON_ROOT, "BYPASS_GSS_ON_ROOT" }, \ + { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ + { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ + { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) + TRACE_EVENT(nfsd_compound, TP_PROTO(const struct svc_rqst *rqst, u32 args_opcnt), @@ -392,6 +408,9 @@ TRACE_EVENT(nfsd_clid_inuse_err, __entry->cl_boot, __entry->cl_id) )
+/* + * from fs/nfsd/filecache.h + */ TRACE_DEFINE_ENUM(NFSD_FILE_HASHED); TRACE_DEFINE_ENUM(NFSD_FILE_PENDING); TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ); @@ -406,13 +425,6 @@ TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED); { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"})
-/* FIXME: This should probably be fleshed out in the future. */ -#define show_nf_may(val) \ - __print_flags(val, "|", \ - { NFSD_MAY_READ, "READ" }, \ - { NFSD_MAY_WRITE, "WRITE" }, \ - { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }) - DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), TP_ARGS(nf), @@ -437,7 +449,7 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nf_may(__entry->nf_may), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file) )
@@ -463,10 +475,10 @@ TRACE_EVENT(nfsd_file_acquire, __field(u32, xid) __field(unsigned int, hash) __field(void *, inode) - __field(unsigned int, may_flags) + __field(unsigned long, may_flags) __field(int, nf_ref) __field(unsigned long, nf_flags) - __field(unsigned char, nf_may) + __field(unsigned long, nf_may) __field(struct file *, nf_file) __field(u32, status) ), @@ -485,10 +497,10 @@ TRACE_EVENT(nfsd_file_acquire,
TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u", __entry->xid, __entry->hash, __entry->inode, - show_nf_may(__entry->may_flags), __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nf_may(__entry->nf_may), __entry->nf_file, - __entry->status) + show_nfsd_may_flags(__entry->may_flags), + __entry->nf_ref, show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file, __entry->status) );
DECLARE_EVENT_CLASS(nfsd_file_search_class,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3a90e1dff16afdae6e1c918bfaff24f4d0f84869 ]
Clean up: %p adds its own 0x already.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 7bb1c398daa51..9239d97b682c7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -444,7 +444,7 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p", + TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", __entry->nf_hashval, __entry->nf_inode, __entry->nf_ref, @@ -495,7 +495,7 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ),
- TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u", + TP_printk("xid=0x%x hash=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", __entry->xid, __entry->hash, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -516,7 +516,7 @@ DECLARE_EVENT_CLASS(nfsd_file_search_class, __entry->hash = hash; __entry->found = found; ), - TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash, + TP_printk("hash=0x%x inode=%p found=%d", __entry->hash, __entry->inode, __entry->found) );
@@ -544,7 +544,7 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->mode = inode->i_mode; __entry->mask = mask; ), - TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, + TP_printk("inode=%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, __entry->nlink, __entry->mode, __entry->mask) );
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f45a444cfe582b85af937a30d35d68d9a84399dd ]
Clean up.
The file was contributed in 2014 by Christoph Hellwig in commit 31ef83dc0538 ("nfsd: add trace events").
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c index 90967466a1e56..f008b95ceec2e 100644 --- a/fs/nfsd/trace.c +++ b/fs/nfsd/trace.c @@ -1,3 +1,4 @@ +// SPDX-License-Identifier: GPL-2.0
#define CREATE_TRACE_POINTS #include "trace.h"
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huang Guobin huangguobin4@huawei.com
[ Upstream commit 231307df246eb29f30092836524ebb1fcb8f5b25 ]
Fix to return PTR_ERR() error code from the error handling case instead of 0 in function nfsd_file_cache_init(), as done elsewhere in this function.
Fixes: 65294c1f2c5e7("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Huang Guobin huangguobin4@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e30e1ddc1aceb..d0748ff04b92f 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -684,6 +684,7 @@ nfsd_file_cache_init(void) if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); + ret = PTR_ERR(nfsd_file_fsnotify_group); nfsd_file_fsnotify_group = NULL; goto out_notifier; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0ae4c3e8a64ace1b8d7de033b0751afe43024416 ]
Clean up: De-duplicate some frequently-used code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/blocklayout/blocklayout.c | 2 +- fs/nfs/blocklayout/dev.c | 2 +- fs/nfs/dir.c | 2 +- fs/nfs/filelayout/filelayout.c | 2 +- fs/nfs/filelayout/filelayoutdev.c | 2 +- fs/nfs/flexfilelayout/flexfilelayout.c | 2 +- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- fs/nfs/nfs42xdr.c | 2 +- fs/nfs/nfs4xdr.c | 6 ++-- fs/nfsd/nfs4proc.c | 2 +- include/linux/sunrpc/xdr.h | 44 ++++++++++++++++++++++- net/sunrpc/auth_gss/gss_rpc_xdr.c | 2 +- net/sunrpc/xdr.c | 28 +++------------ 13 files changed, 59 insertions(+), 39 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 73000aa2d220b..a9e563145e0c2 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -699,7 +699,7 @@ bl_alloc_lseg(struct pnfs_layout_hdr *lo, struct nfs4_layoutget_res *lgr,
xdr_init_decode_pages(&xdr, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&xdr, scratch);
status = -EIO; p = xdr_inline_decode(&xdr, 4); diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index 6e3a14fdff9c8..16412d6636e86 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -510,7 +510,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out;
xdr_init_decode_pages(&xdr, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&xdr, scratch);
p = xdr_inline_decode(&xdr, sizeof(__be32)); if (!p) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 9f88ca7b20015..935029632d5f6 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -576,7 +576,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en goto out_nopages;
xdr_init_decode_pages(&stream, &buf, xdr_pages, buflen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch);
do { if (entry->label) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index deecfb50dd7e3..45eec08ec904f 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -666,7 +666,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo, return -ENOMEM;
xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch);
/* 20 = ufl_util (4), first_stripe_index (4), pattern_offset (8), * num_fh (4) */ diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c index d913e818858f3..86c3f7e69ec42 100644 --- a/fs/nfs/filelayout/filelayoutdev.c +++ b/fs/nfs/filelayout/filelayoutdev.c @@ -82,7 +82,7 @@ nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out_err;
xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch);
/* Get the stripe count (number of stripe index) */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index e4f2820ba5a59..f2ae271fe7ec7 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -378,7 +378,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch);
/* stripe unit and mirror_array_cnt */ rc = -EIO; diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c index 1f12297109b41..bfa7202ca7be1 100644 --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -69,7 +69,7 @@ nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, INIT_LIST_HEAD(&dsaddrs);
xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch);
/* multipath count */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c index f2248d9d4db51..df5bee2f505c4 100644 --- a/fs/nfs/nfs42xdr.c +++ b/fs/nfs/nfs42xdr.c @@ -1536,7 +1536,7 @@ static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp, struct compound_hdr hdr; int status;
- xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE); + xdr_set_scratch_page(xdr, res->scratch);
status = decode_compound_hdr(xdr, &hdr); if (status) diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index f1e599553f2be..4e5c6cb770ad5 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -6404,10 +6404,8 @@ nfs4_xdr_dec_getacl(struct rpc_rqst *rqstp, struct xdr_stream *xdr, struct compound_hdr hdr; int status;
- if (res->acl_scratch != NULL) { - void *p = page_address(res->acl_scratch); - xdr_set_scratch_buffer(xdr, p, PAGE_SIZE); - } + if (res->acl_scratch != NULL) + xdr_set_scratch_page(xdr, res->acl_scratch); status = decode_compound_hdr(xdr, &hdr); if (status) goto out; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e84996c3867c7..c01b6f7462fcb 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2266,7 +2266,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; /* Tail and page_len should be zero at this point: */ buf->len = buf->head[0].iov_len; - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); xdr->page_ptr = buf->pages - 1; buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages) - rqstp->rq_auth_slack; diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 6d9d1520612b8..0c8cab6210b3b 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -246,7 +246,6 @@ extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); extern void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, struct page **pages, unsigned int len); -extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen); extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes); extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); @@ -254,6 +253,49 @@ extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned in extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint32_t); extern uint64_t xdr_expand_hole(struct xdr_stream *, uint64_t, uint64_t);
+/** + * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. + * @xdr: pointer to xdr_stream struct + * @buf: pointer to an empty buffer + * @buflen: size of 'buf' + * + * The scratch buffer is used when decoding from an array of pages. + * If an xdr_inline_decode() call spans across page boundaries, then + * we copy the data into the scratch buffer in order to allow linear + * access. + */ +static inline void +xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) +{ + xdr->scratch.iov_base = buf; + xdr->scratch.iov_len = buflen; +} + +/** + * xdr_set_scratch_page - Attach a scratch buffer for decoding data + * @xdr: pointer to xdr_stream struct + * @page: an anonymous page + * + * See xdr_set_scratch_buffer(). + */ +static inline void +xdr_set_scratch_page(struct xdr_stream *xdr, struct page *page) +{ + xdr_set_scratch_buffer(xdr, page_address(page), PAGE_SIZE); +} + +/** + * xdr_reset_scratch_buffer - Clear scratch buffer information + * @xdr: pointer to xdr_stream struct + * + * See xdr_set_scratch_buffer(). + */ +static inline void +xdr_reset_scratch_buffer(struct xdr_stream *xdr) +{ + xdr_set_scratch_buffer(xdr, NULL, 0); +} + /** * xdr_stream_remaining - Return the number of bytes remaining in the stream * @xdr: pointer to struct xdr_stream diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c index e265b8d38aa14..a857fc99431ce 100644 --- a/net/sunrpc/auth_gss/gss_rpc_xdr.c +++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c @@ -800,7 +800,7 @@ int gssx_dec_accept_sec_context(struct rpc_rqst *rqstp, scratch = alloc_page(GFP_KERNEL); if (!scratch) return -ENOMEM; - xdr_set_scratch_buffer(xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(xdr, scratch);
/* res->status */ err = gssx_dec_status(xdr, &res->status); diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index d84bb5037bb5b..02adc5c7f034d 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -669,7 +669,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct kvec *iov = buf->head; int scratch_len = buf->buflen - buf->page_len - buf->tail[0].iov_len;
- xdr_set_scratch_buffer(xdr, NULL, 0); + xdr_reset_scratch_buffer(xdr); BUG_ON(scratch_len < 0); xdr->buf = buf; xdr->iov = iov; @@ -713,7 +713,7 @@ inline void xdr_commit_encode(struct xdr_stream *xdr) page = page_address(*xdr->page_ptr); memcpy(xdr->scratch.iov_base, page, shift); memmove(page, page + shift, (void *)xdr->p - page); - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); } EXPORT_SYMBOL_GPL(xdr_commit_encode);
@@ -743,8 +743,7 @@ static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, * the "scratch" iov to track any temporarily unused fragment of * space at the end of the previous buffer: */ - xdr->scratch.iov_base = xdr->p; - xdr->scratch.iov_len = frag1bytes; + xdr_set_scratch_buffer(xdr, xdr->p, frag1bytes); p = page_address(*xdr->page_ptr); /* * Note this is where the next encode will start after we've @@ -1056,8 +1055,7 @@ void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst) { xdr->buf = buf; - xdr->scratch.iov_base = NULL; - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); xdr->nwords = XDR_QUADLEN(buf->len); if (buf->head[0].iov_len != 0) xdr_set_iov(xdr, buf->head, buf->len); @@ -1105,24 +1103,6 @@ static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes) return p; }
-/** - * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. - * @xdr: pointer to xdr_stream struct - * @buf: pointer to an empty buffer - * @buflen: size of 'buf' - * - * The scratch buffer is used when decoding from an array of pages. - * If an xdr_inline_decode() call spans across page boundaries, then - * we copy the data into the scratch buffer in order to allow linear - * access. - */ -void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) -{ - xdr->scratch.iov_base = buf; - xdr->scratch.iov_len = buflen; -} -EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer); - static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes) { __be32 *p;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5191955d6fc65e6d4efe8f4f10a6028298f57281 ]
A "permanent" struct xdr_stream is allocated in struct svc_rqst so that it is usable by all server-side decoders. A per-rqst scratch buffer is also allocated to handle decoding XDR data items that cross page boundaries.
To demonstrate how it will be used, add the first call site for the new svcxdr_init_decode() API.
As an additional part of the overall conversion, add symbolic constants for successful and failed XDR operations. Returning "0" is overloaded. Sometimes it means something failed, but sometimes it means success. To make it more clear when XDR decoding functions succeed or fail, introduce symbolic constants.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 2 ++ include/linux/sunrpc/svc.h | 16 ++++++++++++++++ net/sunrpc/svc.c | 5 +++++ 3 files changed, 23 insertions(+)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2e61a565cdbd8..311b287c5024f 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1052,6 +1052,8 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * (necessary in the NFSv4.0 compound case) */ rqstp->rq_cachetype = proc->pc_cachetype; + + svcxdr_init_decode(rqstp); if (!proc->pc_decode(rqstp, argv->iov_base)) goto out_decode_err;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c220b734fa69e..34c2a69820e93 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -247,6 +247,8 @@ struct svc_rqst {
size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; + struct xdr_stream rq_arg_stream; + struct page *rq_scratch_page; struct xdr_buf rq_res; struct page *rq_pages[RPCSVC_MAXPAGES + 1]; struct page * *rq_respages; /* points into rq_pages */ @@ -557,4 +559,18 @@ static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space) svc_reserve(rqstp, space + rqstp->rq_auth_slack); }
+/** + * svcxdr_init_decode - Prepare an xdr_stream for svc Call decoding + * @rqstp: controlling server RPC transaction context + * + */ +static inline void svcxdr_init_decode(struct svc_rqst *rqstp) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct kvec *argv = rqstp->rq_arg.head; + + xdr_init_decode(xdr, &rqstp->rq_arg, argv->iov_base, NULL); + xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); +} + #endif /* SUNRPC_SVC_H */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index e4e4e203ecdad..73e02ba4ed53a 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -614,6 +614,10 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) rqstp->rq_server = serv; rqstp->rq_pool = pool;
+ rqstp->rq_scratch_page = alloc_pages_node(node, GFP_KERNEL, 0); + if (!rqstp->rq_scratch_page) + goto out_enomem; + rqstp->rq_argp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_argp) goto out_enomem; @@ -846,6 +850,7 @@ void svc_rqst_free(struct svc_rqst *rqstp) { svc_release_buffer(rqstp); + put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 788f7183fba86b46074c16e7d57ea09302badff4 ]
Start off the conversion to xdr_stream by de-duplicating the functions that decode void arguments and encode void results.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 21 ++++----------------- fs/nfsd/nfs3acl.c | 8 ++++---- fs/nfsd/nfs3proc.c | 10 ++++------ fs/nfsd/nfs3xdr.c | 11 ----------- fs/nfsd/nfs4proc.c | 11 ++++------- fs/nfsd/nfs4xdr.c | 12 ------------ fs/nfsd/nfsd.h | 8 ++++++++ fs/nfsd/nfsproc.c | 25 ++++++++++++------------- fs/nfsd/nfssvc.c | 28 ++++++++++++++++++++++++++++ fs/nfsd/nfsxdr.c | 10 ---------- fs/nfsd/xdr.h | 2 -- fs/nfsd/xdr3.h | 2 -- fs/nfsd/xdr4.h | 2 -- 13 files changed, 64 insertions(+), 86 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 6a900f770dd23..b0f66604532a5 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -185,10 +185,6 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) /* * XDR decode functions */ -static int nfsaclsvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -}
static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { @@ -255,15 +251,6 @@ static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) * XDR encode functions */
-/* - * There must be an encoding function for void results so svc_process - * will work properly. - */ -static int nfsaclsvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - /* GETACL */ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { @@ -378,10 +365,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_NULL] = { .pc_func = nfsacld_proc_null, - .pc_decode = nfsaclsvc_decode_voidarg, - .pc_encode = nfsaclsvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidargs), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 34a394e50e1d1..7c30876a31a1b 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -245,10 +245,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures3[3] = { [ACLPROC3_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfs3svc_decode_voidarg, - .pc_encode = nfs3svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidargs), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index eba84417406ca..3257233d1a655 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -697,8 +697,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) #define nfsd3_attrstatres nfsd3_attrstat #define nfsd3_wccstatres nfsd3_attrstat #define nfsd3_createres nfsd3_diropres -#define nfsd3_voidres nfsd3_voidargs -struct nfsd3_voidargs { int dummy; };
#define ST 1 /* status*/ #define FH 17 /* filehandle with length */ @@ -709,10 +707,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfs3svc_decode_voidarg, - .pc_encode = nfs3svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidres), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 27b24823f7c42..0d75d201db1b3 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -304,11 +304,6 @@ void fill_post_wcc(struct svc_fh *fhp) /* * XDR decode functions */ -int -nfs3svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -}
int nfs3svc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) @@ -642,12 +637,6 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) * XDR encode functions */
-int -nfs3svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - /* GETATTR */ int nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c01b6f7462fcb..55a46e7c152b9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3285,16 +3285,13 @@ static const char *nfsd4_op_name(unsigned opnum) return "unknown_operation"; }
-#define nfsd4_voidres nfsd4_voidargs -struct nfsd4_voidargs { int dummy; }; - static const struct svc_procedure nfsd_procedures4[2] = { [NFSPROC4_NULL] = { .pc_func = nfsd4_proc_null, - .pc_decode = nfs4svc_decode_voidarg, - .pc_encode = nfs4svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd4_voidargs), - .pc_ressize = sizeof(struct nfsd4_voidres), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, }, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fd9107332a20f..c176d0090f7c8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5288,12 +5288,6 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op) p = xdr_encode_opaque_fixed(p, rp->rp_buf, rp->rp_buflen); }
-int -nfs4svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - void nfsd4_release_compoundargs(struct svc_rqst *rqstp) { struct nfsd4_compoundargs *args = rqstp->rq_argp; @@ -5311,12 +5305,6 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } }
-int -nfs4svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -} - int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 4362d295ed340..3eaa81e001f9c 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -73,6 +73,14 @@ extern unsigned long nfsd_drc_mem_used;
extern const struct seq_operations nfs_exports_op;
+/* + * Common void argument and result helpers + */ +struct nfsd_voidargs { }; +struct nfsd_voidres { }; +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); + /* * Function prototypes. */ diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index bbd01e8397f6e..dbd8d36046539 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -609,7 +609,6 @@ nfsd_proc_statfs(struct svc_rqst *rqstp) * NFSv2 Server procedures. * Only the results of non-idempotent operations are cached. */ -struct nfsd_void { int dummy; };
#define ST 1 /* status */ #define FH 8 /* filehandle */ @@ -618,10 +617,10 @@ struct nfsd_void { int dummy; }; static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_NULL] = { .pc_func = nfsd_proc_null, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, @@ -647,10 +646,10 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_ROOT] = { .pc_func = nfsd_proc_root, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, @@ -685,10 +684,10 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_WRITECACHE] = { .pc_func = nfsd_proc_writecache, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 311b287c5024f..c81d49b07971c 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1107,6 +1107,34 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) return 1; }
+/** + * nfssvc_decode_voidarg - Decode void arguments + * @rqstp: Server RPC transaction context + * @p: buffer containing arguments to decode + * + * Return values: + * %0: Arguments were not valid + * %1: Decoding was successful + */ +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + +/** + * nfssvc_encode_voidres - Encode void results + * @rqstp: Server RPC transaction context + * @p: buffer in which to encode results + * + * Return values: + * %0: Local error while encoding + * %1: Encoding was successful + */ +int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} + int nfsd_pool_stats_open(struct inode *inode, struct file *file) { int ret; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 9e00a902113e3..7aa6e8aca2c1a 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -192,11 +192,6 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f /* * XDR decode functions */ -int -nfssvc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -}
int nfssvc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) @@ -423,11 +418,6 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) /* * XDR encode functions */ -int -nfssvc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -}
int nfssvc_encode_stat(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b8cc6a4b2e0ec..edd87688ff863 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -144,7 +144,6 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore)
-int nfssvc_decode_void(struct svc_rqst *, __be32 *); int nfssvc_decode_fhandle(struct svc_rqst *, __be32 *); int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); @@ -156,7 +155,6 @@ int nfssvc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfssvc_encode_void(struct svc_rqst *, __be32 *); int nfssvc_encode_stat(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index ae6fa6c9cb467..456fcd7a10383 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -273,7 +273,6 @@ union nfsd3_xdrstore {
#define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore)
-int nfs3svc_decode_voidarg(struct svc_rqst *, __be32 *); int nfs3svc_decode_fhandle(struct svc_rqst *, __be32 *); int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); @@ -290,7 +289,6 @@ int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); -int nfs3svc_encode_voidres(struct svc_rqst *, __be32 *); int nfs3svc_encode_attrstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 679d40af1bbb1..37f89ad5e9923 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -781,8 +781,6 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_voidarg(struct svc_rqst *, __be32 *); -int nfs4svc_encode_voidres(struct svc_rqst *, __be32 *); int nfs4svc_decode_compoundargs(struct svc_rqst *, __be32 *); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0dfdad1c1d1b77b9b085f4da390464dd0ac5647a ]
For troubleshooting purposes, record GARBAGE_ARGS and CANT_ENCODE failures.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 17 ++++---------- fs/nfsd/trace.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index c81d49b07971c..3fb9607d67a37 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -29,6 +29,8 @@ #include "netns.h" #include "filecache.h"
+#include "trace.h" + #define NFSDDBG_FACILITY NFSDDBG_SVC
bool inter_copy_offload_enable; @@ -1041,11 +1043,8 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p;
- dprintk("nfsd_dispatch: vers %d proc %d\n", - rqstp->rq_vers, rqstp->rq_proc); - if (nfs_request_too_big(rqstp, proc)) - goto out_too_large; + goto out_decode_err;
/* * Give the xdr decoder a chance to change this if it wants @@ -1084,24 +1083,18 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) out_cached_reply: return 1;
-out_too_large: - dprintk("nfsd: NFSv%d argument too large\n", rqstp->rq_vers); - *statp = rpc_garbage_args; - return 1; - out_decode_err: - dprintk("nfsd: failed to decode arguments!\n"); + trace_nfsd_garbage_args_err(rqstp); *statp = rpc_garbage_args; return 1;
out_update_drop: - dprintk("nfsd: Dropping request; may be revisited later\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); out_dropit: return 0;
out_encode_err: - dprintk("nfsd: failed to encode result!\n"); + trace_nfsd_cant_encode_err(rqstp); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); *statp = rpc_system_err; return 1; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 9239d97b682c7..dbbda3e09eedd 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -12,6 +12,66 @@ #include "export.h" #include "nfsfh.h"
+#define NFSD_TRACE_PROC_ARG_FIELDS \ + __field(unsigned int, netns_ino) \ + __field(u32, xid) \ + __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ + __array(unsigned char, client, sizeof(struct sockaddr_in6)) + +#define NFSD_TRACE_PROC_ARG_ASSIGNMENTS \ + do { \ + __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ + __entry->xid = be32_to_cpu(rqstp->rq_xid); \ + memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ + rqstp->rq_xprt->xpt_locallen); \ + memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ + rqstp->rq_xprt->xpt_remotelen); \ + } while (0); + +TRACE_EVENT(nfsd_garbage_args_err, + TP_PROTO( + const struct svc_rqst *rqstp + ), + TP_ARGS(rqstp), + TP_STRUCT__entry( + NFSD_TRACE_PROC_ARG_FIELDS + + __field(u32, vers) + __field(u32, proc) + ), + TP_fast_assign( + NFSD_TRACE_PROC_ARG_ASSIGNMENTS + + __entry->vers = rqstp->rq_vers; + __entry->proc = rqstp->rq_proc; + ), + TP_printk("xid=0x%08x vers=%u proc=%u", + __entry->xid, __entry->vers, __entry->proc + ) +); + +TRACE_EVENT(nfsd_cant_encode_err, + TP_PROTO( + const struct svc_rqst *rqstp + ), + TP_ARGS(rqstp), + TP_STRUCT__entry( + NFSD_TRACE_PROC_ARG_FIELDS + + __field(u32, vers) + __field(u32, proc) + ), + TP_fast_assign( + NFSD_TRACE_PROC_ARG_ASSIGNMENTS + + __entry->vers = rqstp->rq_vers; + __entry->proc = rqstp->rq_proc; + ), + TP_printk("xid=0x%08x vers=%u proc=%u", + __entry->xid, __entry->vers, __entry->proc + ) +); + #define show_nfsd_may_flags(x) \ __print_flags(x, "|", \ { NFSD_MAY_EXEC, "EXEC" }, \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 08281341be8ebc97ee47999812bcf411942baa1e ]
For troubleshooting purposes, record failures to decode NFSv4 operation arguments and encode operation results.
trace_nfsd_compound_decode_err() replaces the dprintk() call sites that are embedded in READ_* macros that are about to be removed.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 13 +++++++-- fs/nfsd/trace.h | 68 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c176d0090f7c8..101ada3636b78 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -54,6 +54,8 @@ #include "pnfs.h" #include "filecache.h"
+#include "trace.h" + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include <linux/security.h> #endif @@ -2248,9 +2250,14 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) READ_BUF(4); op->opnum = be32_to_cpup(p++);
- if (nfsd4_opnum_in_range(argp, op)) + if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); - else { + if (op->status != nfs_ok) + trace_nfsd_compound_decode_err(argp->rqstp, + argp->opcnt, i, + op->opnum, + op->status); + } else { op->opnum = OP_ILLEGAL; op->status = nfserr_op_illegal; } @@ -5220,6 +5227,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) !nfsd4_enc_ops[op->opnum]); encoder = nfsd4_enc_ops[op->opnum]; op->status = encoder(resp, op->status, &op->u); + if (op->status) + trace_nfsd_compound_encode_err(rqstp, op->opnum, op->status); if (opdesc && opdesc->op_release) opdesc->op_release(&op->u); xdr_commit_encode(xdr); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index dbbda3e09eedd..2bc2a888f7fa8 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -28,6 +28,24 @@ rqstp->rq_xprt->xpt_remotelen); \ } while (0);
+#define NFSD_TRACE_PROC_RES_FIELDS \ + __field(unsigned int, netns_ino) \ + __field(u32, xid) \ + __field(unsigned long, status) \ + __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ + __array(unsigned char, client, sizeof(struct sockaddr_in6)) + +#define NFSD_TRACE_PROC_RES_ASSIGNMENTS(error) \ + do { \ + __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ + __entry->xid = be32_to_cpu(rqstp->rq_xid); \ + __entry->status = be32_to_cpu(error); \ + memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ + rqstp->rq_xprt->xpt_locallen); \ + memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ + rqstp->rq_xprt->xpt_remotelen); \ + } while (0); + TRACE_EVENT(nfsd_garbage_args_err, TP_PROTO( const struct svc_rqst *rqstp @@ -127,6 +145,56 @@ TRACE_EVENT(nfsd_compound_status, __get_str(name), __entry->status) )
+TRACE_EVENT(nfsd_compound_decode_err, + TP_PROTO( + const struct svc_rqst *rqstp, + u32 args_opcnt, + u32 resp_opcnt, + u32 opnum, + __be32 status + ), + TP_ARGS(rqstp, args_opcnt, resp_opcnt, opnum, status), + TP_STRUCT__entry( + NFSD_TRACE_PROC_RES_FIELDS + + __field(u32, args_opcnt) + __field(u32, resp_opcnt) + __field(u32, opnum) + ), + TP_fast_assign( + NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) + + __entry->args_opcnt = args_opcnt; + __entry->resp_opcnt = resp_opcnt; + __entry->opnum = opnum; + ), + TP_printk("op=%u/%u opnum=%u status=%lu", + __entry->resp_opcnt, __entry->args_opcnt, + __entry->opnum, __entry->status) +); + +TRACE_EVENT(nfsd_compound_encode_err, + TP_PROTO( + const struct svc_rqst *rqstp, + u32 opnum, + __be32 status + ), + TP_ARGS(rqstp, opnum, status), + TP_STRUCT__entry( + NFSD_TRACE_PROC_RES_FIELDS + + __field(u32, opnum) + ), + TP_fast_assign( + NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) + + __entry->opnum = opnum; + ), + TP_printk("opnum=%u status=%lu", + __entry->opnum, __entry->status) +); + + DECLARE_EVENT_CLASS(nfsd_fh_err_class, TP_PROTO(struct svc_rqst *rqstp, struct svc_fh *fhp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c1346a1216ab5cb04a265380ac9035d91b16b6d5 ]
Convert the READ_BUF macro in nfs4xdr.c from open code to instead use the new xdr_stream-style decoders already in use by the encode side (and by the in-kernel NFS client implementation). Once this conversion is done, each individual NFSv4 argument decoder can be independently cleaned up to replace these macros with C code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 +- fs/nfsd/nfs4xdr.c | 181 ++++++------------------------------- fs/nfsd/xdr4.h | 10 +- include/linux/sunrpc/xdr.h | 2 + net/sunrpc/xdr.c | 45 +++++++++ 5 files changed, 77 insertions(+), 165 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 55a46e7c152b9..95545a61bfc77 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1024,8 +1024,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
write->wr_how_written = write->wr_stable_how;
- nvecs = svc_fill_write_vector(rqstp, write->wr_pagelist, - &write->wr_head, write->wr_buflen); + nvecs = svc_fill_write_vector(rqstp, write->wr_payload.pages, + write->wr_payload.head, write->wr_buflen); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 101ada3636b78..8c694844f0efb 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -131,90 +131,13 @@ xdr_error: \ memcpy((x), p, nbytes); \ p += XDR_QUADLEN(nbytes); \ } while (0) - -/* READ_BUF, read_buf(): nbytes must be <= PAGE_SIZE */ -#define READ_BUF(nbytes) do { \ - if (nbytes <= (u32)((char *)argp->end - (char *)argp->p)) { \ - p = argp->p; \ - argp->p += XDR_QUADLEN(nbytes); \ - } else if (!(p = read_buf(argp, nbytes))) { \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - goto xdr_error; \ - } \ -} while (0) - -static void next_decode_page(struct nfsd4_compoundargs *argp) -{ - argp->p = page_address(argp->pagelist[0]); - argp->pagelist++; - if (argp->pagelen < PAGE_SIZE) { - argp->end = argp->p + XDR_QUADLEN(argp->pagelen); - argp->pagelen = 0; - } else { - argp->end = argp->p + (PAGE_SIZE>>2); - argp->pagelen -= PAGE_SIZE; - } -} - -static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes) -{ - /* We want more bytes than seem to be available. - * Maybe we need a new page, maybe we have just run out - */ - unsigned int avail = (char *)argp->end - (char *)argp->p; - __be32 *p; - - if (argp->pagelen == 0) { - struct kvec *vec = &argp->rqstp->rq_arg.tail[0]; - - if (!argp->tail) { - argp->tail = true; - avail = vec->iov_len; - argp->p = vec->iov_base; - argp->end = vec->iov_base + avail; - } - - if (avail < nbytes) - return NULL; - - p = argp->p; - argp->p += XDR_QUADLEN(nbytes); - return p; - } - - if (avail + argp->pagelen < nbytes) - return NULL; - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */ - return NULL; - /* ok, we can do it with the current plus the next page */ - if (nbytes <= sizeof(argp->tmp)) - p = argp->tmp; - else { - kfree(argp->tmpp); - p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL); - if (!p) - return NULL; - - } - /* - * The following memcpy is safe because read_buf is always - * called with nbytes > avail, and the two cases above both - * guarantee p points to at least nbytes bytes. - */ - memcpy(p, argp->p, avail); - next_decode_page(argp); - memcpy(((char*)p)+avail, argp->p, (nbytes - avail)); - argp->p += XDR_QUADLEN(nbytes - avail); - return p; -} - -static unsigned int compoundargs_bytes_left(struct nfsd4_compoundargs *argp) -{ - unsigned int this = (char *)argp->end - (char *)argp->p; - - return this + argp->pagelen; -} +#define READ_BUF(nbytes) \ + do { \ + p = xdr_inline_decode(argp->xdr,\ + nbytes); \ + if (!p) \ + goto xdr_error; \ + } while (0)
static int zero_clientid(clientid_t *clid) { @@ -261,44 +184,6 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; }
-static __be32 -svcxdr_construct_vector(struct nfsd4_compoundargs *argp, struct kvec *head, - struct page ***pagelist, u32 buflen) -{ - int avail; - int len; - int pages; - - /* Sorry .. no magic macros for this.. * - * READ_BUF(write->wr_buflen); - * SAVEMEM(write->wr_buf, write->wr_buflen); - */ - avail = (char *)argp->end - (char *)argp->p; - if (avail + argp->pagelen < buflen) { - dprintk("NFSD: xdr error (%s:%d)\n", - __FILE__, __LINE__); - return nfserr_bad_xdr; - } - head->iov_base = argp->p; - head->iov_len = avail; - *pagelist = argp->pagelist; - - len = XDR_QUADLEN(buflen) << 2; - if (len >= avail) { - len -= avail; - - pages = len >> PAGE_SHIFT; - argp->pagelist += pages; - argp->pagelen -= pages * PAGE_SIZE; - len -= pages * PAGE_SIZE; - - next_decode_page(argp); - } - argp->p += XDR_QUADLEN(len); - - return 0; -} - /** * savemem - duplicate a chunk of memory for later processing * @argp: NFSv4 compound argument structure to be freed with @@ -398,7 +283,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, READ_BUF(4); len += 4; nace = be32_to_cpup(p++);
- if (nace > compoundargs_bytes_left(argp)/20) + if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) /* * Even with 4-byte names there wouldn't be * space for that many aces; something fishy is @@ -929,7 +814,7 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x)
static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) { - __be32 *p; + DECODE_HEAD;
READ_BUF(4); o->len = be32_to_cpup(p++); @@ -939,9 +824,8 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
READ_BUF(o->len); SAVEMEM(o->data, o->len); - return nfs_ok; -xdr_error: - return nfserr_bad_xdr; + + DECODE_TAIL; }
static __be32 @@ -1319,10 +1203,8 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) goto xdr_error; write->wr_buflen = be32_to_cpup(p++);
- status = svcxdr_construct_vector(argp, &write->wr_head, - &write->wr_pagelist, write->wr_buflen); - if (status) - return status; + if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) + goto xdr_error;
DECODE_TAIL; } @@ -1891,13 +1773,14 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) */
/* - * Decode data into buffer. Uses head and pages constructed by - * svcxdr_construct_vector. + * Decode data into buffer. */ static __be32 -nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct kvec *head, - struct page **pages, char **bufp, u32 buflen) +nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, + char **bufp, u32 buflen) { + struct page **pages = xdr->pages; + struct kvec *head = xdr->head; char *tmp, *dp; u32 len;
@@ -2012,8 +1895,6 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, { DECODE_HEAD; u32 flags, maxcount, size; - struct kvec head; - struct page **pagelist;
READ_BUF(4); flags = be32_to_cpup(p++); @@ -2036,12 +1917,12 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp,
setxattr->setxa_len = size; if (size > 0) { - status = svcxdr_construct_vector(argp, &head, &pagelist, size); - if (status) - return status; + struct xdr_buf payload;
- status = nfsd4_vbuf_from_vector(argp, &head, pagelist, - &setxattr->setxa_buf, size); + if (!xdr_stream_subsegment(argp->xdr, &payload, size)) + goto xdr_error; + status = nfsd4_vbuf_from_vector(argp, &payload, + &setxattr->setxa_buf, size); }
DECODE_TAIL; @@ -5305,8 +5186,6 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) kfree(args->ops); args->ops = args->iops; } - kfree(args->tmpp); - args->tmpp = NULL; while (args->to_free) { struct svcxdr_tmpbuf *tb = args->to_free; args->to_free = tb->next; @@ -5319,19 +5198,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundargs *args = rqstp->rq_argp;
- if (rqstp->rq_arg.head[0].iov_len % 4) { - /* client is nuts */ - dprintk("%s: compound not properly padded! (peeraddr=%pISc xid=0x%x)", - __func__, svc_addr(rqstp), be32_to_cpu(rqstp->rq_xid)); - return 0; - } - args->p = p; - args->end = rqstp->rq_arg.head[0].iov_base + rqstp->rq_arg.head[0].iov_len; - args->pagelist = rqstp->rq_arg.pages; - args->pagelen = rqstp->rq_arg.page_len; - args->tail = false; + /* svcxdr_tmp_alloc */ args->tmpp = NULL; args->to_free = NULL; + + args->xdr = &rqstp->rq_arg_stream; args->ops = args->iops; args->rqstp = rqstp;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 37f89ad5e9923..0eb13bd603ea6 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -419,8 +419,7 @@ struct nfsd4_write { u64 wr_offset; /* request */ u32 wr_stable_how; /* request */ u32 wr_buflen; /* request */ - struct kvec wr_head; - struct page ** wr_pagelist; /* request */ + struct xdr_buf wr_payload; /* request */
u32 wr_bytes_written; /* response */ u32 wr_how_written; /* response */ @@ -696,15 +695,10 @@ struct svcxdr_tmpbuf {
struct nfsd4_compoundargs { /* scratch variables for XDR decode */ - __be32 * p; - __be32 * end; - struct page ** pagelist; - int pagelen; - bool tail; __be32 tmp[8]; __be32 * tmpp; + struct xdr_stream *xdr; struct svcxdr_tmpbuf *to_free; - struct svc_rqst *rqstp;
u32 taglen; diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 0c8cab6210b3b..c03f7bf585c96 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -252,6 +252,8 @@ extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data); extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint32_t); extern uint64_t xdr_expand_hole(struct xdr_stream *, uint64_t, uint64_t); +extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, + unsigned int len);
/** * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 02adc5c7f034d..722586696fade 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -1412,6 +1412,51 @@ xdr_buf_subsegment(struct xdr_buf *buf, struct xdr_buf *subbuf, } EXPORT_SYMBOL_GPL(xdr_buf_subsegment);
+/** + * xdr_stream_subsegment - set @subbuf to a portion of @xdr + * @xdr: an xdr_stream set up for decoding + * @subbuf: the result buffer + * @nbytes: length of @xdr to extract, in bytes + * + * Sets up @subbuf to represent a portion of @xdr. The portion + * starts at the current offset in @xdr, and extends for a length + * of @nbytes. If this is successful, @xdr is advanced to the next + * position following that portion. + * + * Return values: + * %true: @subbuf has been initialized, and @xdr has been advanced. + * %false: a bounds error has occurred + */ +bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, + unsigned int nbytes) +{ + unsigned int remaining, offset, len; + + if (xdr_buf_subsegment(xdr->buf, subbuf, xdr_stream_pos(xdr), nbytes)) + return false; + + if (subbuf->head[0].iov_len) + if (!__xdr_inline_decode(xdr, subbuf->head[0].iov_len)) + return false; + + remaining = subbuf->page_len; + offset = subbuf->page_base; + while (remaining) { + len = min_t(unsigned int, remaining, PAGE_SIZE) - offset; + + if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr)) + return false; + if (!__xdr_inline_decode(xdr, len)) + return false; + + remaining -= len; + offset = 0; + } + + return true; +} +EXPORT_SYMBOL_GPL(xdr_stream_subsegment); + /** * xdr_buf_trim - lop at most "len" bytes off the end of "buf" * @buf: buf to be trimmed
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d169a6a9e5fd7f9e4b74e5e5d2e5a4fd0f84ef05 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8c694844f0efb..f3d54af9de92a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -439,17 +439,6 @@ nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) DECODE_TAIL; }
-static __be32 -nfsd4_decode_access(struct nfsd4_compoundargs *argp, struct nfsd4_access *access) -{ - DECODE_HEAD; - - READ_BUF(4); - access->ac_req_access = be32_to_cpup(p++); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -531,6 +520,19 @@ static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_ DECODE_TAIL; }
+/* + * NFSv4 operation argument decoders + */ + +static __be32 +nfsd4_decode_access(struct nfsd4_compoundargs *argp, + struct nfsd4_access *access) +{ + if (xdr_stream_decode_u32(argp->xdr, &access->ac_req_access) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) { DECODE_HEAD;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d3d2f38154571e70d5806b5c5264bf61c101ea15 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f3d54af9de92a..ca02478534931 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -439,6 +439,19 @@ nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) DECODE_TAIL; }
+static __be32 +nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_STATEID_SIZE); + if (!p) + return nfserr_bad_xdr; + sid->si_generation = be32_to_cpup(p++); + memcpy(&sid->si_opaque, p, sizeof(sid->si_opaque)); + return nfs_ok; +} + static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -559,13 +572,9 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) { - DECODE_HEAD; - - READ_BUF(4); - close->cl_seqid = be32_to_cpup(p++); - return nfsd4_decode_stateid(argp, &close->cl_stateid); - - DECODE_TAIL; + if (xdr_stream_decode_u32(argp->xdr, &close->cl_seqid) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_stateid4(argp, &close->cl_stateid); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cbd9abb3706e96563b36af67595707a7054ab693 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 12 +++++------- include/linux/sunrpc/xdr.h | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ca02478534931..8251b905d5479 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -581,13 +581,11 @@ nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) static __be32 nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit) { - DECODE_HEAD; - - READ_BUF(12); - p = xdr_decode_hyper(p, &commit->co_offset); - commit->co_count = be32_to_cpup(p++); - - DECODE_TAIL; + if (xdr_stream_decode_u64(argp->xdr, &commit->co_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) + return nfserr_bad_xdr; + return nfs_ok; }
static __be32 diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index c03f7bf585c96..6b17575437474 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -569,6 +569,27 @@ xdr_stream_decode_u32(struct xdr_stream *xdr, __u32 *ptr) return 0; }
+/** + * xdr_stream_decode_u64 - Decode a 64-bit integer + * @xdr: pointer to xdr_stream + * @ptr: location to store 64-bit integer + * + * Return values: + * %0 on success + * %-EBADMSG on XDR buffer overflow + */ +static inline ssize_t +xdr_stream_decode_u64(struct xdr_stream *xdr, __u64 *ptr) +{ + const size_t count = sizeof(*ptr); + __be32 *p = xdr_inline_decode(xdr, count); + + if (unlikely(!p)) + return -EBADMSG; + xdr_decode_hyper(p, ptr); + return 0; +} + /** * xdr_stream_decode_opaque_fixed - Decode fixed length opaque xdr data * @xdr: pointer to xdr_stream
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 081d53fe0b43c47c36d1832b759bf14edde9cdbb ]
Because the fattr4 is now managed in an xdr_stream, all that is needed is to store the initial position of the stream before decoding the attribute list. Then the actual length of the list is computed using the final stream position, after decoding is complete.
No behavior change is expected.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8251b905d5479..de5ac334cb8ab 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -250,7 +250,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, struct xdr_netobj *label, int *umask) { - int expected_len, len = 0; + unsigned int starting_pos; + u32 attrlist4_count; u32 dummy32; char *buf;
@@ -267,12 +268,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return nfserr_attrnotsupp; }
- READ_BUF(4); - expected_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &attrlist4_count) < 0) + return nfserr_bad_xdr; + starting_pos = xdr_stream_pos(argp->xdr);
if (bmval[0] & FATTR4_WORD0_SIZE) { READ_BUF(8); - len += 8; p = xdr_decode_hyper(p, &iattr->ia_size); iattr->ia_valid |= ATTR_SIZE; } @@ -280,7 +281,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, u32 nace; struct nfs4_ace *ace;
- READ_BUF(4); len += 4; + READ_BUF(4); nace = be32_to_cpup(p++);
if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) @@ -297,13 +298,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
(*acl)->naces = nace; for (ace = (*acl)->aces; ace < (*acl)->aces + nace; ace++) { - READ_BUF(16); len += 16; + READ_BUF(16); ace->type = be32_to_cpup(p++); ace->flag = be32_to_cpup(p++); ace->access_mask = be32_to_cpup(p++); dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += XDR_QUADLEN(dummy32) << 2; READMEM(buf, dummy32); ace->whotype = nfs4_acl_get_whotype(buf, dummy32); status = nfs_ok; @@ -322,17 +322,14 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { READ_BUF(4); - len += 4; iattr->ia_mode = be32_to_cpup(p++); iattr->ia_mode &= (S_IFMT | S_IALLUGO); iattr->ia_valid |= ATTR_MODE; } if (bmval[1] & FATTR4_WORD1_OWNER) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid))) return status; @@ -340,10 +337,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid))) return status; @@ -351,11 +346,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - len += 12; status = nfsd4_decode_time(argp, &iattr->ia_atime); if (status) return status; @@ -370,11 +363,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - len += 12; status = nfsd4_decode_time(argp, &iattr->ia_mtime); if (status) return status; @@ -392,18 +383,14 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (IS_ENABLED(CONFIG_NFSD_V4_SECURITY_LABEL) && bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); /* lfs: we don't use it */ READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); /* pi: we don't use it either */ READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); if (dummy32 > NFS4_MAXLABELLEN) return nfserr_badlabel; - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); label->len = dummy32; label->data = svcxdr_dupstr(argp, buf, dummy32); @@ -414,15 +401,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (!umask) goto xdr_error; READ_BUF(8); - len += 8; dummy32 = be32_to_cpup(p++); iattr->ia_mode = dummy32 & (S_IFMT | S_IALLUGO); dummy32 = be32_to_cpup(p++); *umask = dummy32 & S_IRWXUGO; iattr->ia_valid |= ATTR_MODE; } - if (len != expected_len) - goto xdr_error; + + /* request sanity: did attrlist4 contain the expected number of words? */ + if (attrlist4_count != xdr_stream_pos(argp->xdr) - starting_pos) + return nfserr_bad_xdr;
DECODE_TAIL; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2ac1b9b2afbbacf597dbec722b23b6be62e4e41e ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index de5ac334cb8ab..5ec0c2dac3348 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -273,8 +273,11 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, starting_pos = xdr_stream_pos(argp->xdr);
if (bmval[0] & FATTR4_WORD0_SIZE) { - READ_BUF(8); - p = xdr_decode_hyper(p, &iattr->ia_size); + u64 size; + + if (xdr_stream_decode_u64(argp->xdr, &size) < 0) + return nfserr_bad_xdr; + iattr->ia_size = size; iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c941a96823cf52e742606b486b81ab346bf111c9 ]
Refactor for clarity and to move infrequently-used code out of line.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 107 +++++++++++++++++++++++++++++----------------- 1 file changed, 67 insertions(+), 40 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5ec0c2dac3348..0fe57ca0f31ac 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -245,6 +245,70 @@ nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) DECODE_TAIL; }
+static __be32 +nfsd4_decode_nfsace4(struct nfsd4_compoundargs *argp, struct nfs4_ace *ace) +{ + __be32 *p, status; + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &ace->type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &ace->flag) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &ace->access_mask) < 0) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + ace->whotype = nfs4_acl_get_whotype((char *)p, length); + if (ace->whotype != NFS4_ACL_WHO_NAMED) + status = nfs_ok; + else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) + status = nfsd_map_name_to_gid(argp->rqstp, + (char *)p, length, &ace->who_gid); + else + status = nfsd_map_name_to_uid(argp->rqstp, + (char *)p, length, &ace->who_uid); + + return status; +} + +/* A counted array of nfsace4's */ +static noinline __be32 +nfsd4_decode_acl(struct nfsd4_compoundargs *argp, struct nfs4_acl **acl) +{ + struct nfs4_ace *ace; + __be32 status; + u32 count; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + + if (count > xdr_stream_remaining(argp->xdr) / 20) + /* + * Even with 4-byte names there wouldn't be + * space for that many aces; something fishy is + * going on: + */ + return nfserr_fbig; + + *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(count)); + if (*acl == NULL) + return nfserr_jukebox; + + (*acl)->naces = count; + for (ace = (*acl)->aces; ace < (*acl)->aces + count; ace++) { + status = nfsd4_decode_nfsace4(argp, ace); + if (status) + return status; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, @@ -281,46 +345,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) { - u32 nace; - struct nfs4_ace *ace; - - READ_BUF(4); - nace = be32_to_cpup(p++); - - if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) - /* - * Even with 4-byte names there wouldn't be - * space for that many aces; something fishy is - * going on: - */ - return nfserr_fbig; - - *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(nace)); - if (*acl == NULL) - return nfserr_jukebox; - - (*acl)->naces = nace; - for (ace = (*acl)->aces; ace < (*acl)->aces + nace; ace++) { - READ_BUF(16); - ace->type = be32_to_cpup(p++); - ace->flag = be32_to_cpup(p++); - ace->access_mask = be32_to_cpup(p++); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - ace->whotype = nfs4_acl_get_whotype(buf, dummy32); - status = nfs_ok; - if (ace->whotype != NFS4_ACL_WHO_NAMED) - ; - else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) - status = nfsd_map_name_to_gid(argp->rqstp, - buf, dummy32, &ace->who_gid); - else - status = nfsd_map_name_to_uid(argp->rqstp, - buf, dummy32, &ace->who_uid); - if (status) - return status; - } + status = nfsd4_decode_acl(argp, acl); + if (status) + return status; } else *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1c8f0ad7dd35fd12307904036c7c839f77b6e3f9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0fe57ca0f31ac..9dc73ab95eac9 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -351,8 +351,11 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } else *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { - READ_BUF(4); - iattr->ia_mode = be32_to_cpup(p++); + u32 mode; + + if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) + return nfserr_bad_xdr; + iattr->ia_mode = mode; iattr->ia_mode &= (S_IFMT | S_IALLUGO); iattr->ia_valid |= ATTR_MODE; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9853a5ac9be381917e9be0b4133cd4ac5a7ad875 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9dc73ab95eac9..7dc6b79e51fd0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -360,11 +360,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_MODE; } if (bmval[1] & FATTR4_WORD1_OWNER) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid))) + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + status = nfsd_map_name_to_uid(argp->rqstp, (char *)p, length, + &iattr->ia_uid); + if (status) return status; iattr->ia_valid |= ATTR_UID; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 393c31dd27f83adb06b07a1b5f0a5b8966a0f01e ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7dc6b79e51fd0..979f1d384cd0f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -374,11 +374,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_UID; } if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid))) + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + status = nfsd_map_name_to_gid(argp->rqstp, (char *)p, length, + &iattr->ia_gid); + if (status) return status; iattr->ia_valid |= ATTR_GID; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1c3eff7ea4a98c642134ee493001ae13b79ff38c ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 979f1d384cd0f..09975786e71b2 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -219,6 +219,21 @@ nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) DECODE_TAIL; }
+static __be32 +nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 3); + if (!p) + return nfserr_bad_xdr; + p = xdr_decode_hyper(p, &tv->tv_sec); + tv->tv_nsec = be32_to_cpup(p++); + if (tv->tv_nsec >= (u32)1000000000) + return nfserr_inval; + return nfs_ok; +} + static __be32 nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) { @@ -388,11 +403,13 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_GID; } if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - switch (dummy32) { + u32 set_it; + + if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) + return nfserr_bad_xdr; + switch (set_it) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_time(argp, &iattr->ia_atime); + status = nfsd4_decode_nfstime4(argp, &iattr->ia_atime); if (status) return status; iattr->ia_valid |= (ATTR_ATIME | ATTR_ATIME_SET); @@ -401,15 +418,17 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_ATIME; break; default: - goto xdr_error; + return nfserr_bad_xdr; } } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - switch (dummy32) { + u32 set_it; + + if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) + return nfserr_bad_xdr; + switch (set_it) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_time(argp, &iattr->ia_mtime); + status = nfsd4_decode_nfstime4(argp, &iattr->ia_mtime); if (status) return status; iattr->ia_valid |= (ATTR_MTIME | ATTR_MTIME_SET); @@ -418,7 +437,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_MTIME; break; default: - goto xdr_error; + return nfserr_bad_xdr; } }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit dabe91828f92cd493e9e75efbc10f9878d2a73fe ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 09975786e71b2..453a902c7490a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -324,6 +324,33 @@ nfsd4_decode_acl(struct nfsd4_compoundargs *argp, struct nfs4_acl **acl) return nfs_ok; }
+static noinline __be32 +nfsd4_decode_security_label(struct nfsd4_compoundargs *argp, + struct xdr_netobj *label) +{ + u32 lfs, pi, length; + __be32 *p; + + if (xdr_stream_decode_u32(argp->xdr, &lfs) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &pi) < 0) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + if (length > NFS4_MAXLABELLEN) + return nfserr_badlabel; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + label->len = length; + label->data = svcxdr_dupstr(argp, p, length); + if (!label->data) + return nfserr_jukebox; + + return nfs_ok; +} + static __be32 nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, @@ -332,7 +359,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, unsigned int starting_pos; u32 attrlist4_count; u32 dummy32; - char *buf;
DECODE_HEAD; iattr->ia_valid = 0; @@ -440,24 +466,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return nfserr_bad_xdr; } } - label->len = 0; if (IS_ENABLED(CONFIG_NFSD_V4_SECURITY_LABEL) && bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); /* lfs: we don't use it */ - READ_BUF(4); - dummy32 = be32_to_cpup(p++); /* pi: we don't use it either */ - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - if (dummy32 > NFS4_MAXLABELLEN) - return nfserr_badlabel; - READMEM(buf, dummy32); - label->len = dummy32; - label->data = svcxdr_dupstr(argp, buf, dummy32); - if (!label->data) - return nfserr_jukebox; + status = nfsd4_decode_security_label(argp, label); + if (status) + return status; } if (bmval[2] & FATTR4_WORD2_MODE_UMASK) { if (!umask)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 66f0476c704c86d44aa9da19d4753df66f2dbc96 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 453a902c7490a..2d97bbff13b68 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -358,7 +358,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, { unsigned int starting_pos; u32 attrlist4_count; - u32 dummy32;
DECODE_HEAD; iattr->ia_valid = 0; @@ -474,13 +473,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return status; } if (bmval[2] & FATTR4_WORD2_MODE_UMASK) { + u32 mode, mask; + if (!umask) - goto xdr_error; - READ_BUF(8); - dummy32 = be32_to_cpup(p++); - iattr->ia_mode = dummy32 & (S_IFMT | S_IALLUGO); - dummy32 = be32_to_cpup(p++); - *umask = dummy32 & S_IRWXUGO; + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) + return nfserr_bad_xdr; + iattr->ia_mode = mode & (S_IFMT | S_IALLUGO); + if (xdr_stream_decode_u32(argp->xdr, &mask) < 0) + return nfserr_bad_xdr; + *umask = mask & S_IRWXUGO; iattr->ia_valid |= ATTR_MODE; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d1c263a031e876ac3ca5223c728e4d98ed50b3c0 ]
Let's be more careful to avoid overrunning the memory that backs the bitmap array. This requires updating the synopsis of nfsd4_decode_fattr().
Bruce points out that a server needs to be careful to return nfs_ok when a client presents bitmap bits the server doesn't support. This includes bits in bitmap words the server might not yet support.
The current READ* based implementation is good about that, but that requirement hasn't been documented.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 82 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 64 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2d97bbff13b68..c916e5d9d3074 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -260,6 +260,46 @@ nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) DECODE_TAIL; }
+/** + * nfsd4_decode_bitmap4 - Decode an NFSv4 bitmap4 + * @argp: NFSv4 compound argument structure + * @bmval: pointer to an array of u32's to decode into + * @bmlen: size of the @bmval array + * + * The server needs to return nfs_ok rather than nfserr_bad_xdr when + * encountering bitmaps containing bits it does not recognize. This + * includes bits in bitmap words past WORDn, where WORDn is the last + * bitmap WORD the implementation currently supports. Thus we are + * careful here to simply ignore bits in bitmap words that this + * implementation has yet to support explicitly. + * + * Return values: + * %nfs_ok: @bmval populated successfully + * %nfserr_bad_xdr: the encoded bitmap was invalid + */ +static __be32 +nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) +{ + u32 i, count; + __be32 *p; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + /* request sanity */ + if (count > 1000) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, count << 2); + if (!p) + return nfserr_bad_xdr; + i = 0; + while (i < count) + bmval[i++] = be32_to_cpup(p++); + while (i < bmlen) + bmval[i++] = 0; + + return nfs_ok; +} + static __be32 nfsd4_decode_nfsace4(struct nfsd4_compoundargs *argp, struct nfs4_ace *ace) { @@ -352,17 +392,18 @@ nfsd4_decode_security_label(struct nfsd4_compoundargs *argp, }
static __be32 -nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, - struct iattr *iattr, struct nfs4_acl **acl, - struct xdr_netobj *label, int *umask) +nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, + struct iattr *iattr, struct nfs4_acl **acl, + struct xdr_netobj *label, int *umask) { unsigned int starting_pos; u32 attrlist4_count; + __be32 *p, status;
- DECODE_HEAD; iattr->ia_valid = 0; - if ((status = nfsd4_decode_bitmap(argp, bmval))) - return status; + status = nfsd4_decode_bitmap4(argp, bmval, bmlen); + if (status) + return nfserr_bad_xdr;
if (bmval[0] & ~NFSD_WRITEABLE_ATTRS_WORD0 || bmval[1] & ~NFSD_WRITEABLE_ATTRS_WORD1 @@ -490,7 +531,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (attrlist4_count != xdr_stream_pos(argp->xdr) - starting_pos) return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32 @@ -690,9 +731,10 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create if ((status = check_filename(create->cr_name, create->cr_namelen))) return status;
- status = nfsd4_decode_fattr(argp, create->cr_bmval, &create->cr_iattr, - &create->cr_acl, &create->cr_label, - &create->cr_umask); + status = nfsd4_decode_fattr4(argp, create->cr_bmval, + ARRAY_SIZE(create->cr_bmval), + &create->cr_iattr, &create->cr_acl, + &create->cr_label, &create->cr_umask); if (status) goto out;
@@ -941,9 +983,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) switch (open->op_createmode) { case NFS4_CREATE_UNCHECKED: case NFS4_CREATE_GUARDED: - status = nfsd4_decode_fattr(argp, open->op_bmval, - &open->op_iattr, &open->op_acl, &open->op_label, - &open->op_umask); + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); if (status) goto out; break; @@ -956,9 +999,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) goto xdr_error; READ_BUF(NFS4_VERIFIER_SIZE); COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); - status = nfsd4_decode_fattr(argp, open->op_bmval, - &open->op_iattr, &open->op_acl, &open->op_label, - &open->op_umask); + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); if (status) goto out; break; @@ -1194,8 +1238,10 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta status = nfsd4_decode_stateid(argp, &setattr->sa_stateid); if (status) return status; - return nfsd4_decode_fattr(argp, setattr->sa_bmval, &setattr->sa_iattr, - &setattr->sa_acl, &setattr->sa_label, NULL); + return nfsd4_decode_fattr4(argp, setattr->sa_bmval, + ARRAY_SIZE(setattr->sa_bmval), + &setattr->sa_iattr, &setattr->sa_acl, + &setattr->sa_label, NULL); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 000dfa18b3df9c62df5f768f9187cf1a94ded71d ]
A dedicated decoder for component4 is introduced here, which will be used by other operation decoders in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 58 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 40 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c916e5d9d3074..8f296f5568b11 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -92,6 +92,8 @@ check_filename(char *str, int len)
if (len == 0) return nfserr_inval; + if (len > NFS4_MAXNAMLEN) + return nfserr_nametoolong; if (isdotent(str, len)) return nfserr_badname; for (i = 0; i < len; i++) @@ -205,6 +207,27 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) return ret; }
+static __be32 +nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) +{ + __be32 *p, status; + + if (xdr_stream_decode_u32(argp->xdr, lenp) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, *lenp); + if (!p) + return nfserr_bad_xdr; + status = check_filename((char *)p, *lenp); + if (status) + return status; + *namp = svcxdr_tmpalloc(argp, *lenp); + if (!*namp) + return nfserr_jukebox; + memcpy(*namp, p, *lenp); + + return nfs_ok; +} + static __be32 nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) { @@ -698,24 +721,27 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit static __be32 nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) { - DECODE_HEAD; + __be32 *p, status;
- READ_BUF(4); - create->cr_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_type) < 0) + return nfserr_bad_xdr; switch (create->cr_type) { case NF4LNK: - READ_BUF(4); - create->cr_datalen = be32_to_cpup(p++); - READ_BUF(create->cr_datalen); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_datalen) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, create->cr_datalen); + if (!p) + return nfserr_bad_xdr; create->cr_data = svcxdr_dupstr(argp, p, create->cr_datalen); if (!create->cr_data) return nfserr_jukebox; break; case NF4BLK: case NF4CHR: - READ_BUF(8); - create->cr_specdata1 = be32_to_cpup(p++); - create->cr_specdata2 = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata1) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata2) < 0) + return nfserr_bad_xdr; break; case NF4SOCK: case NF4FIFO: @@ -723,22 +749,18 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create default: break; } - - READ_BUF(4); - create->cr_namelen = be32_to_cpup(p++); - READ_BUF(create->cr_namelen); - SAVEMEM(create->cr_name, create->cr_namelen); - if ((status = check_filename(create->cr_name, create->cr_namelen))) + status = nfsd4_decode_component4(argp, &create->cr_name, + &create->cr_namelen); + if (status) return status; - status = nfsd4_decode_fattr4(argp, create->cr_bmval, ARRAY_SIZE(create->cr_bmval), &create->cr_iattr, &create->cr_acl, &create->cr_label, &create->cr_umask); if (status) - goto out; + return status;
- DECODE_TAIL; + return nfs_ok; }
static inline __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 95e6482cedfc0785b85db49b72a05323bbf41750 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8f296f5568b11..234d500961230 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -766,7 +766,7 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create static inline __be32 nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegreturn *dr) { - return nfsd4_decode_stateid(argp, &dr->dr_stateid); + return nfsd4_decode_stateid4(argp, &dr->dr_stateid); }
static inline __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f759eff260f1f0b0f56531517762f27ee3233506 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 234d500961230..70ce48340c1b7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -772,7 +772,8 @@ nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegretu static inline __be32 nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) { - return nfsd4_decode_bitmap(argp, getattr->ga_bmval); + return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, + ARRAY_SIZE(getattr->ga_bmval)); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5c505d128691c70991b766dd6a3faf49fa59ecfb ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 70ce48340c1b7..4596b8cef222c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -779,16 +779,7 @@ nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *geta static __be32 nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) { - DECODE_HEAD; - - READ_BUF(4); - link->li_namelen = be32_to_cpup(p++); - READ_BUF(link->li_namelen); - SAVEMEM(link->li_name, link->li_namelen); - if ((status = check_filename(link->li_name, link->li_namelen))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5dcbfabb676b2b6d97767209cf707eb463ca232a ]
Enable nfsd4_decode_opaque() to be used in more decoders, and replace the READ* macros in nfsd4_decode_opaque().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4596b8cef222c..b3459059cec1b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -207,6 +207,33 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) return ret; }
+ +/* + * NFSv4 basic data type decoders + */ + +static __be32 +nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(argp->xdr, &len) < 0) + return nfserr_bad_xdr; + if (len == 0 || len > NFS4_OPAQUE_LIMIT) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, len); + if (!p) + return nfserr_bad_xdr; + o->data = svcxdr_tmpalloc(argp, len); + if (!o->data) + return nfserr_jukebox; + o->len = len; + memcpy(o->data, p, len); + + return nfs_ok; +} + static __be32 nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) { @@ -943,22 +970,6 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) return nfserr_bad_xdr; }
-static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) -{ - DECODE_HEAD; - - READ_BUF(4); - o->len = be32_to_cpup(p++); - - if (o->len == 0 || o->len > NFS4_OPAQUE_LIMIT) - return nfserr_bad_xdr; - - READ_BUF(o->len); - SAVEMEM(o->data, o->len); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 144e82694092ff80b5e64749d6822cd8947587f2 ]
These helpers will also be used to simplify decoders in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b3459059cec1b..63140cd4c50e4 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -609,6 +609,30 @@ nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) return nfs_ok; }
+static __be32 +nfsd4_decode_clientid4(struct nfsd4_compoundargs *argp, clientid_t *clientid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, sizeof(__be64)); + if (!p) + return nfserr_bad_xdr; + memcpy(clientid, p, sizeof(*clientid)); + return nfs_ok; +} + +static __be32 +nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, + clientid_t *clientid, struct xdr_netobj *owner) +{ + __be32 status; + + status = nfsd4_decode_clientid4(argp, clientid); + if (status) + return status; + return nfsd4_decode_opaque(argp, owner); +} + static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -832,12 +856,12 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) status = nfsd4_decode_stateid(argp, &lock->lk_new_open_stateid); if (status) return status; - READ_BUF(8 + sizeof(clientid_t)); + READ_BUF(4); lock->lk_new_lock_seqid = be32_to_cpup(p++); - COPYMEM(&lock->lk_new_clientid, sizeof(clientid_t)); - lock->lk_new_owner.len = be32_to_cpup(p++); - READ_BUF(lock->lk_new_owner.len); - READMEM(lock->lk_new_owner.data, lock->lk_new_owner.len); + status = nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, + &lock->lk_new_owner); + if (status) + return status; } else { status = nfsd4_decode_stateid(argp, &lock->lk_old_lock_stateid); if (status)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8918cc0d2b72db9997390626010b182c4500d749 ]
Refactor for clarity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 64 +++++++++++++++++++++++++------------- include/linux/sunrpc/xdr.h | 21 +++++++++++++ 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 63140cd4c50e4..15ed5249e2c74 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -833,6 +833,48 @@ nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); }
+static __be32 +nfsd4_decode_open_to_lock_owner4(struct nfsd4_compoundargs *argp, + struct nfsd4_lock *lock) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_open_seqid) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lock->lk_new_open_stateid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_lock_seqid) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, + &lock->lk_new_owner); +} + +static __be32 +nfsd4_decode_exist_lock_owner4(struct nfsd4_compoundargs *argp, + struct nfsd4_lock *lock) +{ + __be32 status; + + status = nfsd4_decode_stateid4(argp, &lock->lk_old_lock_stateid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_old_lock_seqid) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + +static __be32 +nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) +{ + if (xdr_stream_decode_bool(argp->xdr, &lock->lk_is_new) < 0) + return nfserr_bad_xdr; + if (lock->lk_is_new) + return nfsd4_decode_open_to_lock_owner4(argp, lock); + return nfsd4_decode_exist_lock_owner4(argp, lock); +} + static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { @@ -848,27 +890,7 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) lock->lk_reclaim = be32_to_cpup(p++); p = xdr_decode_hyper(p, &lock->lk_offset); p = xdr_decode_hyper(p, &lock->lk_length); - lock->lk_is_new = be32_to_cpup(p++); - - if (lock->lk_is_new) { - READ_BUF(4); - lock->lk_new_open_seqid = be32_to_cpup(p++); - status = nfsd4_decode_stateid(argp, &lock->lk_new_open_stateid); - if (status) - return status; - READ_BUF(4); - lock->lk_new_lock_seqid = be32_to_cpup(p++); - status = nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, - &lock->lk_new_owner); - if (status) - return status; - } else { - status = nfsd4_decode_stateid(argp, &lock->lk_old_lock_stateid); - if (status) - return status; - READ_BUF(4); - lock->lk_old_lock_seqid = be32_to_cpup(p++); - } + status = nfsd4_decode_locker4(argp, lock);
DECODE_TAIL; } diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 6b17575437474..f6569b620beab 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -548,6 +548,27 @@ static inline bool xdr_item_is_present(const __be32 *p) return *p != xdr_zero; }
+/** + * xdr_stream_decode_bool - Decode a boolean + * @xdr: pointer to xdr_stream + * @ptr: pointer to a u32 in which to store the result + * + * Return values: + * %0 on success + * %-EBADMSG on XDR buffer overflow + */ +static inline ssize_t +xdr_stream_decode_bool(struct xdr_stream *xdr, __u32 *ptr) +{ + const size_t count = sizeof(*ptr); + __be32 *p = xdr_inline_decode(xdr, count); + + if (unlikely(!p)) + return -EBADMSG; + *ptr = (*p != xdr_zero); + return 0; +} + /** * xdr_stream_decode_u32 - Decode a 32-bit integer * @xdr: pointer to xdr_stream
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7c59deed5cd2e1cfc6cbecf06f4584ac53755f53 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 15ed5249e2c74..b50d2987bb6e8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -878,21 +878,17 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { - DECODE_HEAD; - - /* - * type, reclaim(boolean), offset, length, new_lock_owner(boolean) - */ - READ_BUF(28); - lock->lk_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) + return nfserr_bad_xdr; if ((lock->lk_type < NFS4_READ_LT) || (lock->lk_type > NFS4_WRITEW_LT)) - goto xdr_error; - lock->lk_reclaim = be32_to_cpup(p++); - p = xdr_decode_hyper(p, &lock->lk_offset); - p = xdr_decode_hyper(p, &lock->lk_length); - status = nfsd4_decode_locker4(argp, lock); - - DECODE_TAIL; + return nfserr_bad_xdr; + if (xdr_stream_decode_bool(argp->xdr, &lock->lk_reclaim) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lock->lk_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lock->lk_length) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_locker4(argp, lock); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0a146f04aa0fa7a57aaed3913d1c2732b3853f31 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b50d2987bb6e8..4f2680650a567 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -894,20 +894,16 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) { - DECODE_HEAD; - - READ_BUF(32); - lockt->lt_type = be32_to_cpup(p++); - if((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) - goto xdr_error; - p = xdr_decode_hyper(p, &lockt->lt_offset); - p = xdr_decode_hyper(p, &lockt->lt_length); - COPYMEM(&lockt->lt_clientid, 8); - lockt->lt_owner.len = be32_to_cpup(p++); - READ_BUF(lockt->lt_owner.len); - READMEM(lockt->lt_owner.data, lockt->lt_owner.len); - - DECODE_TAIL; + if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) + return nfserr_bad_xdr; + if ((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_length) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_state_owner4(argp, &lockt->lt_clientid, + &lockt->lt_owner); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ca9cf9fc27f8f722e9eb2763173ba01f6ac3dad1 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4f2680650a567..3c20a1f8eaa91 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -909,21 +909,23 @@ nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) static __be32 nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) { - DECODE_HEAD; + __be32 status;
- READ_BUF(8); - locku->lu_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &locku->lu_type) < 0) + return nfserr_bad_xdr; if ((locku->lu_type < NFS4_READ_LT) || (locku->lu_type > NFS4_WRITEW_LT)) - goto xdr_error; - locku->lu_seqid = be32_to_cpup(p++); - status = nfsd4_decode_stateid(argp, &locku->lu_stateid); + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &locku->lu_seqid) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &locku->lu_stateid); if (status) return status; - READ_BUF(16); - p = xdr_decode_hyper(p, &locku->lu_offset); - p = xdr_decode_hyper(p, &locku->lu_length); + if (xdr_stream_decode_u64(argp->xdr, &locku->lu_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &locku->lu_length) < 0) + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3d5877e8e03f60d7cc804d7b230ff9c00c9c07bd ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3c20a1f8eaa91..431ab9d604be7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -931,16 +931,7 @@ nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) static __be32 nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup) { - DECODE_HEAD; - - READ_BUF(4); - lookup->lo_len = be32_to_cpup(p++); - READ_BUF(lookup->lo_len); - SAVEMEM(lookup->lo_name, lookup->lo_len); - if ((status = check_filename(lookup->lo_name, lookup->lo_len))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); }
static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 796dd1c6b680959ac968b52aa507911b288b1749 ]
This helper will be used to simplify decoders in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 431ab9d604be7..1a2dc52c4340b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -284,6 +284,18 @@ nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) return nfs_ok; }
+static __be32 +nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_VERIFIER_SIZE); + if (!p) + return nfserr_bad_xdr; + memcpy(verf->data, p, sizeof(verf->data)); + return nfs_ok; +} + static __be32 nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) { @@ -1047,14 +1059,16 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) goto out; break; case NFS4_CREATE_EXCLUSIVE: - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; break; case NFS4_CREATE_EXCLUSIVE4_1: if (argp->minorversion < 1) goto xdr_error; - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; status = nfsd4_decode_fattr4(argp, open->op_bmval, ARRAY_SIZE(open->op_bmval), &open->op_iattr, &open->op_acl,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bf33bab3c4182cdd795983f14de5606e82fab377 ]
Refactor for clarity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 78 +++++++++++++++++++++++++++-------------------- 1 file changed, 45 insertions(+), 33 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1a2dc52c4340b..62096b2a57b35 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -946,6 +946,48 @@ nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); }
+static __be32 +nfsd4_decode_createhow4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_createmode) < 0) + return nfserr_bad_xdr; + switch (open->op_createmode) { + case NFS4_CREATE_UNCHECKED: + case NFS4_CREATE_GUARDED: + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); + if (status) + return status; + break; + case NFS4_CREATE_EXCLUSIVE: + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; + break; + case NFS4_CREATE_EXCLUSIVE4_1: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { __be32 *p; @@ -1046,39 +1088,9 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) case NFS4_OPEN_NOCREATE: break; case NFS4_OPEN_CREATE: - READ_BUF(4); - open->op_createmode = be32_to_cpup(p++); - switch (open->op_createmode) { - case NFS4_CREATE_UNCHECKED: - case NFS4_CREATE_GUARDED: - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - goto out; - break; - case NFS4_CREATE_EXCLUSIVE: - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - break; - case NFS4_CREATE_EXCLUSIVE4_1: - if (argp->minorversion < 1) - goto xdr_error; - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - goto out; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_createhow4(argp, open); + if (status) + return status; break; default: goto xdr_error;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e6ec04b27bfb4869c0e35fbcf24333d379f101d5 ]
Refactor for clarity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 62096b2a57b35..76715d1935ade 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -988,6 +988,28 @@ nfsd4_decode_createhow4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open return nfs_ok; }
+static __be32 +nfsd4_decode_openflag4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_create) < 0) + return nfserr_bad_xdr; + switch (open->op_create) { + case NFS4_OPEN_NOCREATE: + break; + case NFS4_OPEN_CREATE: + status = nfsd4_decode_createhow4(argp, open); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { __be32 *p; @@ -1082,19 +1104,9 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) status = nfsd4_decode_opaque(argp, &open->op_owner); if (status) goto xdr_error; - READ_BUF(4); - open->op_create = be32_to_cpup(p++); - switch (open->op_create) { - case NFS4_OPEN_NOCREATE: - break; - case NFS4_OPEN_CREATE: - status = nfsd4_decode_createhow4(argp, open); - if (status) - return status; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_openflag4(argp, open); + if (status) + return status;
/* open_claim */ READ_BUF(4);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9aa62f5199749b274454b6d7d914c9b2a5e77031 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 76715d1935ade..a43b39940ab25 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1012,11 +1012,10 @@ nfsd4_decode_openflag4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open)
static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { - __be32 *p; u32 w;
- READ_BUF(4); - w = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &w) < 0) + return nfserr_bad_xdr; *share_access = w & NFS4_SHARE_ACCESS_MASK; *deleg_want = w & NFS4_SHARE_WANT_MASK; if (deleg_when) @@ -1059,7 +1058,6 @@ static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *sh NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED): return nfs_ok; } -xdr_error: return nfserr_bad_xdr; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b07bebd9eb9842e2d0dea87efeb92884556e55b0 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a43b39940ab25..a9257ec9d151d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1063,16 +1063,13 @@ static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *sh
static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) { - __be32 *p; - - READ_BUF(4); - *x = be32_to_cpup(p++); - /* Note: unlinke access bits, deny bits may be zero. */ + if (xdr_stream_decode_u32(argp->xdr, x) < 0) + return nfserr_bad_xdr; + /* Note: unlike access bits, deny bits may be zero. */ if (*x & ~NFS4_SHARE_DENY_BOTH) return nfserr_bad_xdr; + return nfs_ok; -xdr_error: - return nfserr_bad_xdr; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1708e50b0145f393acbec9e319bdf0e33f765d25 ]
Refactor for clarity.
Note that op_fname is the only instance of an NFSv4 filename stored in a struct xdr_netobj. Convert it to a u32/char * pair so that the new nfsd4_decode_filename() helper can be used.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 8 ++-- fs/nfsd/nfs4xdr.c | 95 ++++++++++++++++++++++++---------------------- fs/nfsd/xdr4.h | 3 +- 3 files changed, 56 insertions(+), 50 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 95545a61bfc77..a038d1e182ff3 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -257,8 +257,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * in NFSv4 as in v3 except EXCLUSIVE4_1. */ current->fs->umask = open->op_umask; - status = do_nfsd_create(rqstp, current_fh, open->op_fname.data, - open->op_fname.len, &open->op_iattr, + status = do_nfsd_create(rqstp, current_fh, open->op_fname, + open->op_fnamelen, &open->op_iattr, *resfh, open->op_createmode, (u32 *)open->op_verf.data, &open->op_truncate, &open->op_created); @@ -283,7 +283,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * a chance to an acquire a delegation if appropriate. */ status = nfsd_lookup(rqstp, current_fh, - open->op_fname.data, open->op_fname.len, *resfh); + open->op_fname, open->op_fnamelen, *resfh); if (status) goto out; status = nfsd_check_obj_isreg(*resfh); @@ -360,7 +360,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, bool reclaim = false;
dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n", - (int)open->op_fname.len, open->op_fname.data, + (int)open->op_fnamelen, open->op_fname, open->op_openowner);
/* This check required by spec. */ diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a9257ec9d151d..3e0fca521c39b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1072,6 +1072,55 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) return nfs_ok; }
+static __be32 +nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, + struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_claim_type) < 0) + return nfserr_bad_xdr; + switch (open->op_claim_type) { + case NFS4_OPEN_CLAIM_NULL: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + status = nfsd4_decode_component4(argp, &open->op_fname, + &open->op_fnamelen); + if (status) + return status; + break; + case NFS4_OPEN_CLAIM_PREVIOUS: + if (xdr_stream_decode_u32(argp->xdr, &open->op_delegate_type) < 0) + return nfserr_bad_xdr; + break; + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + if (status) + return status; + status = nfsd4_decode_component4(argp, &open->op_fname, + &open->op_fnamelen); + if (status) + return status; + break; + case NFS4_OPEN_CLAIM_FH: + case NFS4_OPEN_CLAIM_DELEG_PREV_FH: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + /* void */ + break; + case NFS4_OPEN_CLAIM_DELEG_CUR_FH: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { @@ -1102,51 +1151,7 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) status = nfsd4_decode_openflag4(argp, open); if (status) return status; - - /* open_claim */ - READ_BUF(4); - open->op_claim_type = be32_to_cpup(p++); - switch (open->op_claim_type) { - case NFS4_OPEN_CLAIM_NULL: - case NFS4_OPEN_CLAIM_DELEGATE_PREV: - READ_BUF(4); - open->op_fname.len = be32_to_cpup(p++); - READ_BUF(open->op_fname.len); - SAVEMEM(open->op_fname.data, open->op_fname.len); - if ((status = check_filename(open->op_fname.data, open->op_fname.len))) - return status; - break; - case NFS4_OPEN_CLAIM_PREVIOUS: - READ_BUF(4); - open->op_delegate_type = be32_to_cpup(p++); - break; - case NFS4_OPEN_CLAIM_DELEGATE_CUR: - status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); - if (status) - return status; - READ_BUF(4); - open->op_fname.len = be32_to_cpup(p++); - READ_BUF(open->op_fname.len); - SAVEMEM(open->op_fname.data, open->op_fname.len); - if ((status = check_filename(open->op_fname.data, open->op_fname.len))) - return status; - break; - case NFS4_OPEN_CLAIM_FH: - case NFS4_OPEN_CLAIM_DELEG_PREV_FH: - if (argp->minorversion < 1) - goto xdr_error; - /* void */ - break; - case NFS4_OPEN_CLAIM_DELEG_CUR_FH: - if (argp->minorversion < 1) - goto xdr_error; - status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); - if (status) - return status; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_open_claim4(argp, open);
DECODE_TAIL; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 0eb13bd603ea6..6245004a9993b 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -252,7 +252,8 @@ struct nfsd4_listxattrs {
struct nfsd4_open { u32 op_claim_type; /* request */ - struct xdr_netobj op_fname; /* request - everything but CLAIM_PREV */ + u32 op_fnamelen; + char * op_fname; /* request - everything but CLAIM_PREV */ u32 op_delegate_type; /* request - CLAIM_PREV only */ stateid_t op_delegate_stateid; /* request - response */ u32 op_why_no_deleg; /* response - DELEG_NONE_EXT only */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 61e5e0b3ec713d1365008c8af3fe5fdd262e2a60 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3e0fca521c39b..2de54e84a3ab0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1124,7 +1124,7 @@ nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { - DECODE_HEAD; + __be32 status; u32 dummy;
memset(open->op_bmval, 0, sizeof(open->op_bmval)); @@ -1132,28 +1132,24 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) open->op_openowner = NULL;
open->op_xdr_error = 0; - /* seqid, share_access, share_deny, clientid, ownerlen */ - READ_BUF(4); - open->op_seqid = be32_to_cpup(p++); - /* decode, yet ignore deleg_when until supported */ + if (xdr_stream_decode_u32(argp->xdr, &open->op_seqid) < 0) + return nfserr_bad_xdr; + /* deleg_want is ignored */ status = nfsd4_decode_share_access(argp, &open->op_share_access, &open->op_deleg_want, &dummy); if (status) - goto xdr_error; + return status; status = nfsd4_decode_share_deny(argp, &open->op_share_deny); if (status) - goto xdr_error; - READ_BUF(sizeof(clientid_t)); - COPYMEM(&open->op_clientid, sizeof(clientid_t)); - status = nfsd4_decode_opaque(argp, &open->op_owner); + return status; + status = nfsd4_decode_state_owner4(argp, &open->op_clientid, + &open->op_owner); if (status) - goto xdr_error; + return status; status = nfsd4_decode_openflag4(argp, open); if (status) return status; - status = nfsd4_decode_open_claim4(argp, open); - - DECODE_TAIL; + return nfsd4_decode_open_claim4(argp, open); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 06bee693a1f1cb774b91000f05a6e183c257d8e9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2de54e84a3ab0..7b6fb11cdc809 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1155,18 +1155,18 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) static __be32 nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_confirm *open_conf) { - DECODE_HEAD; + __be32 status;
if (argp->minorversion >= 1) return nfserr_notsupp;
- status = nfsd4_decode_stateid(argp, &open_conf->oc_req_stateid); + status = nfsd4_decode_stateid4(argp, &open_conf->oc_req_stateid); if (status) return status; - READ_BUF(4); - open_conf->oc_seqid = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &open_conf->oc_seqid) < 0) + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit dca71651f097ea608945d7a66bf62761a630de9a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7b6fb11cdc809..95c755473899f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1172,21 +1172,19 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con static __be32 nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_downgrade *open_down) { - DECODE_HEAD; - - status = nfsd4_decode_stateid(argp, &open_down->od_stateid); + __be32 status; + + status = nfsd4_decode_stateid4(argp, &open_down->od_stateid); if (status) return status; - READ_BUF(4); - open_down->od_seqid = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &open_down->od_seqid) < 0) + return nfserr_bad_xdr; + /* deleg_want is ignored */ status = nfsd4_decode_share_access(argp, &open_down->od_share_access, &open_down->od_deleg_want, NULL); if (status) return status; - status = nfsd4_decode_share_deny(argp, &open_down->od_share_deny); - if (status) - return status; - DECODE_TAIL; + return nfsd4_decode_share_deny(argp, &open_down->od_share_deny); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a73bed98413b1d9eb4466f776a56d2fde8b3b2c9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 95c755473899f..149948393ccb1 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1190,16 +1190,21 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d static __be32 nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) { - DECODE_HEAD; + __be32 *p;
- READ_BUF(4); - putfh->pf_fhlen = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &putfh->pf_fhlen) < 0) + return nfserr_bad_xdr; if (putfh->pf_fhlen > NFS4_FHSIZE) - goto xdr_error; - READ_BUF(putfh->pf_fhlen); - SAVEMEM(putfh->pf_fhval, putfh->pf_fhlen); + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, putfh->pf_fhlen); + if (!p) + return nfserr_bad_xdr; + putfh->pf_fhval = svcxdr_tmpalloc(argp, putfh->pf_fhlen); + if (!putfh->pf_fhval) + return nfserr_jukebox; + memcpy(putfh->pf_fhval, p, putfh->pf_fhlen);
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3909c3bc604688503e31ddceb429dc156c4720c1 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 149948393ccb1..c9652040d748b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1218,16 +1218,17 @@ nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) static __be32 nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { - DECODE_HEAD; + __be32 status;
- status = nfsd4_decode_stateid(argp, &read->rd_stateid); + status = nfsd4_decode_stateid4(argp, &read->rd_stateid); if (status) return status; - READ_BUF(12); - p = xdr_decode_hyper(p, &read->rd_offset); - read->rd_length = be32_to_cpup(p++); + if (xdr_stream_decode_u64(argp->xdr, &read->rd_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &read->rd_length) < 0) + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0dfaf2a371436860ace6af889e6cd8410ee63164 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c9652040d748b..6036f8d595efa 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1234,17 +1234,22 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) static __be32 nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *readdir) { - DECODE_HEAD; + __be32 status;
- READ_BUF(24); - p = xdr_decode_hyper(p, &readdir->rd_cookie); - COPYMEM(readdir->rd_verf.data, sizeof(readdir->rd_verf.data)); - readdir->rd_dircount = be32_to_cpup(p++); - readdir->rd_maxcount = be32_to_cpup(p++); - if ((status = nfsd4_decode_bitmap(argp, readdir->rd_bmval))) - goto out; + if (xdr_stream_decode_u64(argp->xdr, &readdir->rd_cookie) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_verifier4(argp, &readdir->rd_verf); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_dircount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_maxcount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_uint32_array(argp->xdr, readdir->rd_bmval, + ARRAY_SIZE(readdir->rd_bmval)) < 0) + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b7f5fbf219aecda98e32de305551e445f9438899 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6036f8d595efa..d4e1e3138739c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1255,16 +1255,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read static __be32 nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) { - DECODE_HEAD; - - READ_BUF(4); - remove->rm_namelen = be32_to_cpup(p++); - READ_BUF(remove->rm_namelen); - SAVEMEM(remove->rm_name, remove->rm_namelen); - if ((status = check_filename(remove->rm_name, remove->rm_namelen))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ba881a0a5342b3aaf83958901ebe3fe752eaab46 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d4e1e3138739c..adf4a6fb94d4c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1261,22 +1261,12 @@ nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove static __be32 nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename) { - DECODE_HEAD; + __be32 status;
- READ_BUF(4); - rename->rn_snamelen = be32_to_cpup(p++); - READ_BUF(rename->rn_snamelen); - SAVEMEM(rename->rn_sname, rename->rn_snamelen); - READ_BUF(4); - rename->rn_tnamelen = be32_to_cpup(p++); - READ_BUF(rename->rn_tnamelen); - SAVEMEM(rename->rn_tname, rename->rn_tnamelen); - if ((status = check_filename(rename->rn_sname, rename->rn_snamelen))) - return status; - if ((status = check_filename(rename->rn_tname, rename->rn_tnamelen))) + status = nfsd4_decode_component4(argp, &rename->rn_sname, &rename->rn_snamelen); + if (status) return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &rename->rn_tname, &rename->rn_tnamelen); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d12f90458dc8c11734ba44ec88f109bf8de86ff0 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index adf4a6fb94d4c..51b59b369d726 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1272,15 +1272,7 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename static __be32 nfsd4_decode_renew(struct nfsd4_compoundargs *argp, clientid_t *clientid) { - DECODE_HEAD; - - if (argp->minorversion >= 1) - return nfserr_notsupp; - - READ_BUF(sizeof(clientid_t)); - COPYMEM(clientid, sizeof(clientid_t)); - - DECODE_TAIL; + return nfsd4_decode_clientid4(argp, clientid); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d0abdae5191a916d767164f6fc6c0e2e814a20a7 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 51b59b369d726..42d69c0207ce8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1279,16 +1279,7 @@ static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, struct nfsd4_secinfo *secinfo) { - DECODE_HEAD; - - READ_BUF(4); - secinfo->si_namelen = be32_to_cpup(p++); - READ_BUF(secinfo->si_namelen); - SAVEMEM(secinfo->si_name, secinfo->si_namelen); - status = check_filename(secinfo->si_name, secinfo->si_namelen); - if (status) - return status; - DECODE_TAIL; + return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 44592fe9479d8d4b88594365ab825f7b07afdf7c ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 42d69c0207ce8..cda56ca9ca3fc 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1298,7 +1298,7 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta { __be32 status;
- status = nfsd4_decode_stateid(argp, &setattr->sa_stateid); + status = nfsd4_decode_stateid4(argp, &setattr->sa_stateid); if (status) return status; return nfsd4_decode_fattr4(argp, setattr->sa_bmval,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 92fa6c08c251d52d0d7b46066ecf87b96a0c4b8f ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index cda56ca9ca3fc..0af51cc1adba3 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1310,31 +1310,46 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta static __be32 nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid *setclientid) { - DECODE_HEAD; + __be32 *p, status;
if (argp->minorversion >= 1) return nfserr_notsupp;
- READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(setclientid->se_verf.data, NFS4_VERIFIER_SIZE); - + status = nfsd4_decode_verifier4(argp, &setclientid->se_verf); + if (status) + return status; status = nfsd4_decode_opaque(argp, &setclientid->se_name); if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_prog) < 0) return nfserr_bad_xdr; - READ_BUF(8); - setclientid->se_callback_prog = be32_to_cpup(p++); - setclientid->se_callback_netid_len = be32_to_cpup(p++); - READ_BUF(setclientid->se_callback_netid_len); - SAVEMEM(setclientid->se_callback_netid_val, setclientid->se_callback_netid_len); - READ_BUF(4); - setclientid->se_callback_addr_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_netid_len) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, setclientid->se_callback_netid_len); + if (!p) + return nfserr_bad_xdr; + setclientid->se_callback_netid_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_netid_len); + if (!setclientid->se_callback_netid_val) + return nfserr_jukebox; + memcpy(setclientid->se_callback_netid_val, p, + setclientid->se_callback_netid_len);
- READ_BUF(setclientid->se_callback_addr_len); - SAVEMEM(setclientid->se_callback_addr_val, setclientid->se_callback_addr_len); - READ_BUF(4); - setclientid->se_callback_ident = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_addr_len) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, setclientid->se_callback_addr_len); + if (!p) + return nfserr_bad_xdr; + setclientid->se_callback_addr_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_addr_len); + if (!setclientid->se_callback_addr_val) + return nfserr_jukebox; + memcpy(setclientid->se_callback_addr_val, p, + setclientid->se_callback_addr_len); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_ident) < 0) + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d1ca55149d67e5896f89a30053f5d83c002ac10e ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0af51cc1adba3..057cc1579f9b8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1355,16 +1355,15 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient static __be32 nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid_confirm *scd_c) { - DECODE_HEAD; + __be32 status;
if (argp->minorversion >= 1) return nfserr_notsupp;
- READ_BUF(8 + NFS4_VERIFIER_SIZE); - COPYMEM(&scd_c->sc_clientid, 8); - COPYMEM(&scd_c->sc_confirm, NFS4_VERIFIER_SIZE); - - DECODE_TAIL; + status = nfsd4_decode_clientid4(argp, &scd_c->sc_clientid); + if (status) + return status; + return nfsd4_decode_verifier4(argp, &scd_c->sc_confirm); }
/* Also used for NVERIFY */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 67cd453eeda86be90f83a0f4798f33832cf2d98c ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 057cc1579f9b8..231a2628e3e6f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1370,20 +1370,27 @@ nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_s static __be32 nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify) { - DECODE_HEAD; + __be32 *p, status;
- if ((status = nfsd4_decode_bitmap(argp, verify->ve_bmval))) - goto out; + status = nfsd4_decode_bitmap4(argp, verify->ve_bmval, + ARRAY_SIZE(verify->ve_bmval)); + if (status) + return status;
/* For convenience's sake, we compare raw xdr'd attributes in * nfsd4_proc_verify */
- READ_BUF(4); - verify->ve_attrlen = be32_to_cpup(p++); - READ_BUF(verify->ve_attrlen); - SAVEMEM(verify->ve_attrval, verify->ve_attrlen); + if (xdr_stream_decode_u32(argp->xdr, &verify->ve_attrlen) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, verify->ve_attrlen); + if (!p) + return nfserr_bad_xdr; + verify->ve_attrval = svcxdr_tmpalloc(argp, verify->ve_attrlen); + if (!verify->ve_attrval) + return nfserr_jukebox; + memcpy(verify->ve_attrval, p, verify->ve_attrlen);
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 244e2befcba80f42c65293b6c56282bb78f9f417 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 231a2628e3e6f..26744b7f0e35c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1396,22 +1396,23 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify static __be32 nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) { - DECODE_HEAD; + __be32 status;
- status = nfsd4_decode_stateid(argp, &write->wr_stateid); + status = nfsd4_decode_stateid4(argp, &write->wr_stateid); if (status) return status; - READ_BUF(16); - p = xdr_decode_hyper(p, &write->wr_offset); - write->wr_stable_how = be32_to_cpup(p++); + if (xdr_stream_decode_u64(argp->xdr, &write->wr_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &write->wr_stable_how) < 0) + return nfserr_bad_xdr; if (write->wr_stable_how > NFS_FILE_SYNC) - goto xdr_error; - write->wr_buflen = be32_to_cpup(p++); - + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &write->wr_buflen) < 0) + return nfserr_bad_xdr; if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) - goto xdr_error; + return nfserr_bad_xdr;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a4a80c15ca4dd998ab5cbe87bd856c626a318a80 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 26744b7f0e35c..cc406b7a530b6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1418,20 +1418,20 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) static __be32 nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_release_lockowner *rlockowner) { - DECODE_HEAD; + __be32 status;
if (argp->minorversion >= 1) return nfserr_notsupp;
- READ_BUF(12); - COPYMEM(&rlockowner->rl_clientid, sizeof(clientid_t)); - rlockowner->rl_owner.len = be32_to_cpup(p++); - READ_BUF(rlockowner->rl_owner.len); - READMEM(rlockowner->rl_owner.data, rlockowner->rl_owner.len); + status = nfsd4_decode_state_owner4(argp, &rlockowner->rl_clientid, + &rlockowner->rl_owner); + if (status) + return status;
if (argp->minorversion && !zero_clientid(&rlockowner->rl_clientid)) return nfserr_inval; - DECODE_TAIL; + + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1a99440807bfc66597aaa2e0f0213c319b023e34 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 165 ++++++++++++++++++++++++++++++---------------- 1 file changed, 107 insertions(+), 58 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index cc406b7a530b6..6f3c86bee6211 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -212,6 +212,25 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) * NFSv4 basic data type decoders */
+/* + * This helper handles variable-length opaques which belong to protocol + * elements that this implementation does not support. + */ +static __be32 +nfsd4_decode_ignored_string(struct nfsd4_compoundargs *argp, u32 maxlen) +{ + u32 len; + + if (xdr_stream_decode_u32(argp->xdr, &len) < 0) + return nfserr_bad_xdr; + if (maxlen && len > maxlen) + return nfserr_bad_xdr; + if (!xdr_inline_decode(argp->xdr, len)) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) { @@ -645,87 +664,117 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); }
-static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) +/* Defined in Appendix A of RFC 5531 */ +static __be32 +nfsd4_decode_authsys_parms(struct nfsd4_compoundargs *argp, + struct nfsd4_cb_sec *cbs) { - DECODE_HEAD; - struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); - u32 dummy, uid, gid; - char *machine_name; - int i; - int nr_secflavs; + u32 stamp, gidcount, uid, gid; + __be32 *p, status; + + if (xdr_stream_decode_u32(argp->xdr, &stamp) < 0) + return nfserr_bad_xdr; + /* machine name */ + status = nfsd4_decode_ignored_string(argp, 255); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &uid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gidcount) < 0) + return nfserr_bad_xdr; + if (gidcount > 16) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, gidcount << 2); + if (!p) + return nfserr_bad_xdr; + if (cbs->flavor == (u32)(-1)) { + struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); + + kuid_t kuid = make_kuid(userns, uid); + kgid_t kgid = make_kgid(userns, gid); + if (uid_valid(kuid) && gid_valid(kgid)) { + cbs->uid = kuid; + cbs->gid = kgid; + cbs->flavor = RPC_AUTH_UNIX; + } else { + dprintk("RPC_AUTH_UNIX with invalid uid or gid, ignoring!\n"); + } + } + + return nfs_ok; +} + +static __be32 +nfsd4_decode_gss_cb_handles4(struct nfsd4_compoundargs *argp, + struct nfsd4_cb_sec *cbs) +{ + __be32 status; + u32 service; + + dprintk("RPC_AUTH_GSS callback secflavor not supported!\n"); + + if (xdr_stream_decode_u32(argp->xdr, &service) < 0) + return nfserr_bad_xdr; + if (service < RPC_GSS_SVC_NONE || service > RPC_GSS_SVC_PRIVACY) + return nfserr_bad_xdr; + /* gcbp_handle_from_server */ + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + /* gcbp_handle_from_client */ + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + + return nfs_ok; +} + +/* a counted array of callback_sec_parms4 items */ +static __be32 +nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) +{ + u32 i, secflavor, nr_secflavs; + __be32 status;
/* callback_sec_params4 */ - READ_BUF(4); - nr_secflavs = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &nr_secflavs) < 0) + return nfserr_bad_xdr; if (nr_secflavs) cbs->flavor = (u32)(-1); else /* Is this legal? Be generous, take it to mean AUTH_NONE: */ cbs->flavor = 0; + for (i = 0; i < nr_secflavs; ++i) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - switch (dummy) { + if (xdr_stream_decode_u32(argp->xdr, &secflavor) < 0) + return nfserr_bad_xdr; + switch (secflavor) { case RPC_AUTH_NULL: - /* Nothing to read */ + /* void */ if (cbs->flavor == (u32)(-1)) cbs->flavor = RPC_AUTH_NULL; break; case RPC_AUTH_UNIX: - READ_BUF(8); - /* stamp */ - dummy = be32_to_cpup(p++); - - /* machine name */ - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - SAVEMEM(machine_name, dummy); - - /* uid, gid */ - READ_BUF(8); - uid = be32_to_cpup(p++); - gid = be32_to_cpup(p++); - - /* more gids */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - if (cbs->flavor == (u32)(-1)) { - kuid_t kuid = make_kuid(userns, uid); - kgid_t kgid = make_kgid(userns, gid); - if (uid_valid(kuid) && gid_valid(kgid)) { - cbs->uid = kuid; - cbs->gid = kgid; - cbs->flavor = RPC_AUTH_UNIX; - } else { - dprintk("RPC_AUTH_UNIX with invalid" - "uid or gid ignoring!\n"); - } - } + status = nfsd4_decode_authsys_parms(argp, cbs); + if (status) + return status; break; case RPC_AUTH_GSS: - dprintk("RPC_AUTH_GSS callback secflavor " - "not supported!\n"); - READ_BUF(8); - /* gcbp_service */ - dummy = be32_to_cpup(p++); - /* gcbp_handle_from_server */ - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - /* gcbp_handle_from_client */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); + status = nfsd4_decode_gss_cb_handles4(argp, cbs); + if (status) + return status; break; default: - dprintk("Illegal callback secflavor\n"); return nfserr_inval; } } - DECODE_TAIL; + + return nfs_ok; }
+ /* * NFSv4 operation argument decoders */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0f81d96098f8eb707afe2f8d5c3fe0f9316ef5ce ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6f3c86bee6211..efd1504cd02b6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -788,17 +788,6 @@ nfsd4_decode_access(struct nfsd4_compoundargs *argp, return nfs_ok; }
-static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) -{ - DECODE_HEAD; - - READ_BUF(4); - bc->bc_cb_program = be32_to_cpup(p++); - nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) { DECODE_HEAD; @@ -1483,6 +1472,13 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel return nfs_ok; }
+static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) +{ + if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 571e0451c4de0a545960ffaea16d969931afc563 ]
A dedicated sessionid4 decoder is introduced that will be used by other operation decoders in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index efd1504cd02b6..5dad32ab02ec4 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -664,6 +664,19 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); }
+static __be32 +nfsd4_decode_sessionid4(struct nfsd4_compoundargs *argp, + struct nfs4_sessionid *sessionid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_MAX_SESSIONID_LEN); + if (!p) + return nfserr_bad_xdr; + memcpy(sessionid->data, p, sizeof(sessionid->data)); + return nfs_ok; +} + /* Defined in Appendix A of RFC 5531 */ static __be32 nfsd4_decode_authsys_parms(struct nfsd4_compoundargs *argp, @@ -788,18 +801,6 @@ nfsd4_decode_access(struct nfsd4_compoundargs *argp, return nfs_ok; }
-static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) -{ - DECODE_HEAD; - - READ_BUF(NFS4_MAX_SESSIONID_LEN + 8); - COPYMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN); - bcts->dir = be32_to_cpup(p++); - /* XXX: skipping ctsa_use_conn_in_rdma_mode. Perhaps Tom Tucker - * could help us figure out we should be using it. */ - DECODE_TAIL; -} - static __be32 nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) { @@ -1479,6 +1480,22 @@ static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, stru return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); }
+static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) +{ + u32 use_conn_in_rdma_mode; + __be32 status; + + status = nfsd4_decode_sessionid4(argp, &bcts->sessionid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &bcts->dir) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &use_conn_in_rdma_mode) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2548aa784d760567c2a77cbd8b7c55b211167c37 ]
Refactor for clarity and de-duplication of code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 66 +++++++++++++++++------------------------------ 1 file changed, 23 insertions(+), 43 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5dad32ab02ec4..15535b14328e4 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -315,32 +315,6 @@ nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) return nfs_ok; }
-static __be32 -nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) -{ - u32 bmlen; - DECODE_HEAD; - - bmval[0] = 0; - bmval[1] = 0; - bmval[2] = 0; - - READ_BUF(4); - bmlen = be32_to_cpup(p++); - if (bmlen > 1000) - goto xdr_error; - - READ_BUF(bmlen << 2); - if (bmlen > 0) - bmval[0] = be32_to_cpup(p++); - if (bmlen > 1) - bmval[1] = be32_to_cpup(p++); - if (bmlen > 2) - bmval[2] = be32_to_cpup(p++); - - DECODE_TAIL; -} - /** * nfsd4_decode_bitmap4 - Decode an NFSv4 bitmap4 * @argp: NFSv4 compound argument structure @@ -1496,6 +1470,24 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_state_protect_ops(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + __be32 status; + + status = nfsd4_decode_bitmap4(argp, exid->spo_must_enforce, + ARRAY_SIZE(exid->spo_must_enforce)); + if (status) + return nfserr_bad_xdr; + status = nfsd4_decode_bitmap4(argp, exid->spo_must_allow, + ARRAY_SIZE(exid->spo_must_allow)); + if (status) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) @@ -1520,27 +1512,15 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, case SP4_NONE: break; case SP4_MACH_CRED: - /* spo_must_enforce */ - status = nfsd4_decode_bitmap(argp, - exid->spo_must_enforce); - if (status) - goto out; - /* spo_must_allow */ - status = nfsd4_decode_bitmap(argp, exid->spo_must_allow); + status = nfsd4_decode_state_protect_ops(argp, exid); if (status) - goto out; + return status; break; case SP4_SSV: /* ssp_ops */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - p += dummy; - - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - p += dummy; + status = nfsd4_decode_state_protect_ops(argp, exid); + if (status) + return status;
/* ssp_hash_algs<> */ READ_BUF(4);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 547bfeb4cd8d491aabbd656d5a6f410cb4249b4e ]
Refactor for clarity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 70 +++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 26 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 15535b14328e4..8c5701367e4af 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1488,12 +1488,54 @@ nfsd4_decode_state_protect_ops(struct nfsd4_compoundargs *argp, return nfs_ok; }
+/* + * This implementation currently does not support SP4_SSV. + * This decoder simply skips over these arguments. + */ +static noinline __be32 +nfsd4_decode_ssv_sp_parms(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + u32 count, window, num_gss_handles; + __be32 status; + + /* ssp_ops */ + status = nfsd4_decode_state_protect_ops(argp, exid); + if (status) + return status; + + /* ssp_hash_algs<> */ + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + while (count--) { + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + } + + /* ssp_encr_algs<> */ + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + while (count--) { + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + } + + if (xdr_stream_decode_u32(argp->xdr, &window) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &num_gss_handles) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) { - int dummy, tmp; DECODE_HEAD; + int dummy;
READ_BUF(NFS4_VERIFIER_SIZE); COPYMEM(exid->verifier.data, NFS4_VERIFIER_SIZE); @@ -1517,33 +1559,9 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return status; break; case SP4_SSV: - /* ssp_ops */ - status = nfsd4_decode_state_protect_ops(argp, exid); + status = nfsd4_decode_ssv_sp_parms(argp, exid); if (status) return status; - - /* ssp_hash_algs<> */ - READ_BUF(4); - tmp = be32_to_cpup(p++); - while (tmp--) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - } - - /* ssp_encr_algs<> */ - READ_BUF(4); - tmp = be32_to_cpup(p++); - while (tmp--) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - } - - /* ignore ssp_window and ssp_num_gss_handles: */ - READ_BUF(8); break; default: goto xdr_error;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 523ec6ed6fb80fd1537d748a06bffd060a8b3235 ]
Refactor for clarity.
Also, remove a stale comment. Commit ed94164398c9 ("nfsd: implement machine credential support for some operations") added support for SP4_MACH_CRED, so state_protect_a is no longer completely ignored.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 +- fs/nfsd/nfs4xdr.c | 44 +++++++++++++++++++++++++++----------------- fs/nfsd/xdr4.h | 2 +- 3 files changed, 29 insertions(+), 19 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d402ca0b535f0..e7ec7593eaaa3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3089,7 +3089,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
rpc_ntop(sa, addr_str, sizeof(addr_str)); dprintk("%s rqstp=%p exid=%p clname.len=%u clname.data=%p " - "ip_addr=%s flags %x, spa_how %d\n", + "ip_addr=%s flags %x, spa_how %u\n", __func__, rqstp, exid, exid->clname.len, exid->clname.data, addr_str, exid->flags, exid->spa_how);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8c5701367e4af..6a4ab81e01ffc 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1531,25 +1531,13 @@ nfsd4_decode_ssv_sp_parms(struct nfsd4_compoundargs *argp, }
static __be32 -nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) +nfsd4_decode_state_protect4_a(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) { - DECODE_HEAD; - int dummy; - - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(exid->verifier.data, NFS4_VERIFIER_SIZE); + __be32 status;
- status = nfsd4_decode_opaque(argp, &exid->clname); - if (status) + if (xdr_stream_decode_u32(argp->xdr, &exid->spa_how) < 0) return nfserr_bad_xdr; - - READ_BUF(4); - exid->flags = be32_to_cpup(p++); - - /* Ignore state_protect4_a */ - READ_BUF(4); - exid->spa_how = be32_to_cpup(p++); switch (exid->spa_how) { case SP4_NONE: break; @@ -1564,9 +1552,31 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return status; break; default: - goto xdr_error; + return nfserr_bad_xdr; }
+ return nfs_ok; +} + +static __be32 +nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + DECODE_HEAD; + int dummy; + + status = nfsd4_decode_verifier4(argp, &exid->verifier); + if (status) + return status; + status = nfsd4_decode_opaque(argp, &exid->clname); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &exid->flags) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_state_protect4_a(argp, exid); + if (status) + return status; + READ_BUF(4); /* nfs_impl_id4 array length */ dummy = be32_to_cpup(p++);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 6245004a9993b..232529bc1b798 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -433,7 +433,7 @@ struct nfsd4_exchange_id { u32 flags; clientid_t clientid; u32 seqid; - int spa_how; + u32 spa_how; u32 spo_must_enforce[3]; u32 spo_must_allow[3]; struct xdr_netobj nii_domain;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 10ff84228197f47401833495ba19a50131323b4a ]
Refactor for clarity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 63 ++++++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6a4ab81e01ffc..e06e657c3d91c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1558,12 +1558,47 @@ nfsd4_decode_state_protect4_a(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_nfs_impl_id4(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + __be32 status; + u32 count; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + switch (count) { + case 0: + break; + case 1: + /* Note that RFC 8881 places no length limit on + * nii_domain, but this implementation permits no + * more than NFS4_OPAQUE_LIMIT bytes */ + status = nfsd4_decode_opaque(argp, &exid->nii_domain); + if (status) + return status; + /* Note that RFC 8881 places no length limit on + * nii_name, but this implementation permits no + * more than NFS4_OPAQUE_LIMIT bytes */ + status = nfsd4_decode_opaque(argp, &exid->nii_name); + if (status) + return status; + status = nfsd4_decode_nfstime4(argp, &exid->nii_time); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) { - DECODE_HEAD; - int dummy; + __be32 status;
status = nfsd4_decode_verifier4(argp, &exid->verifier); if (status) @@ -1576,29 +1611,7 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, status = nfsd4_decode_state_protect4_a(argp, exid); if (status) return status; - - READ_BUF(4); /* nfs_impl_id4 array length */ - dummy = be32_to_cpup(p++); - - if (dummy > 1) - goto xdr_error; - - if (dummy == 1) { - status = nfsd4_decode_opaque(argp, &exid->nii_domain); - if (status) - goto xdr_error; - - /* nii_name */ - status = nfsd4_decode_opaque(argp, &exid->nii_name); - if (status) - goto xdr_error; - - /* nii_date */ - status = nfsd4_decode_time(argp, &exid->nii_time); - if (status) - goto xdr_error; - } - DECODE_TAIL; + return nfsd4_decode_nfs_impl_id4(argp, exid); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3a3f1fbacb0960b628e5a9f07c78287312f7a99d ]
De-duplicate some code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 71 +++++++++++++++++++++++++---------------------- 1 file changed, 38 insertions(+), 33 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index e06e657c3d91c..716a16961ff48 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1614,6 +1614,38 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return nfsd4_decode_nfs_impl_id4(argp, exid); }
+static __be32 +nfsd4_decode_channel_attrs4(struct nfsd4_compoundargs *argp, + struct nfsd4_channel_attrs *ca) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 7); + if (!p) + return nfserr_bad_xdr; + + /* headerpadsz is ignored */ + p++; + ca->maxreq_sz = be32_to_cpup(p++); + ca->maxresp_sz = be32_to_cpup(p++); + ca->maxresp_cached = be32_to_cpup(p++); + ca->maxops = be32_to_cpup(p++); + ca->maxreqs = be32_to_cpup(p++); + ca->nr_rdma_attrs = be32_to_cpup(p); + switch (ca->nr_rdma_attrs) { + case 0: + break; + case 1: + if (xdr_stream_decode_u32(argp->xdr, &ca->rdma_attrs) < 0) + return nfserr_bad_xdr; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, struct nfsd4_create_session *sess) @@ -1625,39 +1657,12 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, sess->seqid = be32_to_cpup(p++); sess->flags = be32_to_cpup(p++);
- /* Fore channel attrs */ - READ_BUF(28); - p++; /* headerpadsz is always 0 */ - sess->fore_channel.maxreq_sz = be32_to_cpup(p++); - sess->fore_channel.maxresp_sz = be32_to_cpup(p++); - sess->fore_channel.maxresp_cached = be32_to_cpup(p++); - sess->fore_channel.maxops = be32_to_cpup(p++); - sess->fore_channel.maxreqs = be32_to_cpup(p++); - sess->fore_channel.nr_rdma_attrs = be32_to_cpup(p++); - if (sess->fore_channel.nr_rdma_attrs == 1) { - READ_BUF(4); - sess->fore_channel.rdma_attrs = be32_to_cpup(p++); - } else if (sess->fore_channel.nr_rdma_attrs > 1) { - dprintk("Too many fore channel attr bitmaps!\n"); - goto xdr_error; - } - - /* Back channel attrs */ - READ_BUF(28); - p++; /* headerpadsz is always 0 */ - sess->back_channel.maxreq_sz = be32_to_cpup(p++); - sess->back_channel.maxresp_sz = be32_to_cpup(p++); - sess->back_channel.maxresp_cached = be32_to_cpup(p++); - sess->back_channel.maxops = be32_to_cpup(p++); - sess->back_channel.maxreqs = be32_to_cpup(p++); - sess->back_channel.nr_rdma_attrs = be32_to_cpup(p++); - if (sess->back_channel.nr_rdma_attrs == 1) { - READ_BUF(4); - sess->back_channel.rdma_attrs = be32_to_cpup(p++); - } else if (sess->back_channel.nr_rdma_attrs > 1) { - dprintk("Too many back channel attr bitmaps!\n"); - goto xdr_error; - } + status = nfsd4_decode_channel_attrs4(argp, &sess->fore_channel); + if (status) + return status; + status = nfsd4_decode_channel_attrs4(argp, &sess->back_channel); + if (status) + return status;
READ_BUF(4); sess->callback_prog = be32_to_cpup(p++);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 81243e3fe37ed547fc4ed8aab1cec2865540bb18 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 716a16961ff48..3e2e0de00c3a7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1650,24 +1650,28 @@ static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, struct nfsd4_create_session *sess) { - DECODE_HEAD; - - READ_BUF(16); - COPYMEM(&sess->clientid, 8); - sess->seqid = be32_to_cpup(p++); - sess->flags = be32_to_cpup(p++); + __be32 status;
+ status = nfsd4_decode_clientid4(argp, &sess->clientid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &sess->seqid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &sess->flags) < 0) + return nfserr_bad_xdr; status = nfsd4_decode_channel_attrs4(argp, &sess->fore_channel); if (status) return status; status = nfsd4_decode_channel_attrs4(argp, &sess->back_channel); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &sess->callback_prog) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_cb_sec(argp, &sess->cb_sec); if (status) return status;
- READ_BUF(4); - sess->callback_prog = be32_to_cpup(p++); - nfsd4_decode_cb_sec(argp, &sess->cb_sec); - DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 94e254af1f873b4b551db4c4549294f2c4d385ef ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3e2e0de00c3a7..7a0730688b2f0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1678,11 +1678,7 @@ static __be32 nfsd4_decode_destroy_session(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_session *destroy_session) { - DECODE_HEAD; - READ_BUF(NFS4_MAX_SESSIONID_LEN); - COPYMEM(destroy_session->sessionid.data, NFS4_MAX_SESSIONID_LEN); - - DECODE_TAIL; + return nfsd4_decode_sessionid4(argp, &destroy_session->sessionid); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit aec387d5909304810d899f7d90ae57df33f3a75c ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7a0730688b2f0..92988926d9540 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1685,13 +1685,7 @@ static __be32 nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_free_stateid *free_stateid) { - DECODE_HEAD; - - READ_BUF(sizeof(stateid_t)); - free_stateid->fr_stateid.si_generation = be32_to_cpup(p++); - COPYMEM(&free_stateid->fr_stateid.si_opaque, sizeof(stateid_opaque_t)); - - DECODE_TAIL; + return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 044959715f370b24870c95df3940add8710c5a29 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 50 +++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 92988926d9540..11e32c244b23c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -638,6 +638,21 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); }
+#ifdef CONFIG_NFSD_PNFS +static __be32 +nfsd4_decode_deviceid4(struct nfsd4_compoundargs *argp, + struct nfsd4_deviceid *devid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_DEVICEID4_SIZE); + if (!p) + return nfserr_bad_xdr; + memcpy(devid, p, sizeof(*devid)); + return nfs_ok; +} +#endif /* CONFIG_NFSD_PNFS */ + static __be32 nfsd4_decode_sessionid4(struct nfsd4_compoundargs *argp, struct nfs4_sessionid *sessionid) @@ -1765,27 +1780,20 @@ static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, struct nfsd4_getdeviceinfo *gdev) { - DECODE_HEAD; - u32 num, i; - - READ_BUF(sizeof(struct nfsd4_deviceid) + 3 * 4); - COPYMEM(&gdev->gd_devid, sizeof(struct nfsd4_deviceid)); - gdev->gd_layout_type = be32_to_cpup(p++); - gdev->gd_maxcount = be32_to_cpup(p++); - num = be32_to_cpup(p++); - if (num) { - if (num > 1000) - goto xdr_error; - READ_BUF(4 * num); - gdev->gd_notify_types = be32_to_cpup(p++); - for (i = 1; i < num; i++) { - if (be32_to_cpup(p++)) { - status = nfserr_inval; - goto out; - } - } - } - DECODE_TAIL; + __be32 status; + + status = nfsd4_decode_deviceid4(argp, &gdev->gd_devid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_maxcount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_uint32_array(argp->xdr, + &gdev->gd_notify_types, 1) < 0) + return nfserr_bad_xdr; + + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5185980d8a23001c2317c290129ab7ab20067e20 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 120 ++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 62 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 11e32c244b23c..9cd7270e67adc 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -274,20 +274,6 @@ nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) return nfs_ok; }
-static __be32 -nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) -{ - DECODE_HEAD; - - READ_BUF(12); - p = xdr_decode_hyper(p, &tv->tv_sec); - tv->tv_nsec = be32_to_cpup(p++); - if (tv->tv_nsec >= (u32)1000000000) - return nfserr_inval; - - DECODE_TAIL; -} - static __be32 nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) { @@ -651,6 +637,29 @@ nfsd4_decode_deviceid4(struct nfsd4_compoundargs *argp, memcpy(devid, p, sizeof(*devid)); return nfs_ok; } + +static __be32 +nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutcommit *lcp) +{ + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_layout_type) < 0) + return nfserr_bad_xdr; + if (lcp->lc_layout_type < LAYOUT_NFSV4_1_FILES) + return nfserr_bad_xdr; + if (lcp->lc_layout_type >= LAYOUT_TYPE_MAX) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_up_len) < 0) + return nfserr_bad_xdr; + if (lcp->lc_up_len > 0) { + lcp->lc_up_layout = xdr_inline_decode(argp->xdr, lcp->lc_up_len); + if (!lcp->lc_up_layout) + return nfserr_bad_xdr; + } + + return nfs_ok; +} + #endif /* CONFIG_NFSD_PNFS */
static __be32 @@ -1796,6 +1805,41 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutcommit *lcp) +{ + __be32 *p, status; + + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.length) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_bool(argp->xdr, &lcp->lc_reclaim) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lcp->lc_sid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_newoffset) < 0) + return nfserr_bad_xdr; + if (lcp->lc_newoffset) { + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_last_wr) < 0) + return nfserr_bad_xdr; + } else + lcp->lc_last_wr = 0; + p = xdr_inline_decode(argp->xdr, XDR_UNIT); + if (!p) + return nfserr_bad_xdr; + if (xdr_item_is_present(p)) { + status = nfsd4_decode_nfstime4(argp, &lcp->lc_mtime); + if (status) + return status; + } else { + lcp->lc_mtime.tv_nsec = UTIME_NOW; + } + return nfsd4_decode_layoutupdate4(argp, lcp); +} + static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, struct nfsd4_layoutget *lgp) @@ -1820,54 +1864,6 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, DECODE_TAIL; }
-static __be32 -nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutcommit *lcp) -{ - DECODE_HEAD; - u32 timechange; - - READ_BUF(20); - p = xdr_decode_hyper(p, &lcp->lc_seg.offset); - p = xdr_decode_hyper(p, &lcp->lc_seg.length); - lcp->lc_reclaim = be32_to_cpup(p++); - - status = nfsd4_decode_stateid(argp, &lcp->lc_sid); - if (status) - return status; - - READ_BUF(4); - lcp->lc_newoffset = be32_to_cpup(p++); - if (lcp->lc_newoffset) { - READ_BUF(8); - p = xdr_decode_hyper(p, &lcp->lc_last_wr); - } else - lcp->lc_last_wr = 0; - READ_BUF(4); - timechange = be32_to_cpup(p++); - if (timechange) { - status = nfsd4_decode_time(argp, &lcp->lc_mtime); - if (status) - return status; - } else { - lcp->lc_mtime.tv_nsec = UTIME_NOW; - } - READ_BUF(8); - lcp->lc_layout_type = be32_to_cpup(p++); - - /* - * Save the layout update in XDR format and let the layout driver deal - * with it later. - */ - lcp->lc_up_len = be32_to_cpup(p++); - if (lcp->lc_up_len > 0) { - READ_BUF(lcp->lc_up_len); - READMEM(lcp->lc_up_layout, lcp->lc_up_len); - } - - DECODE_TAIL; -} - static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c8e88e3aa73889421461f878cd569ef84f231ceb ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9cd7270e67adc..837d2d8fb3ff8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1844,24 +1844,27 @@ static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, struct nfsd4_layoutget *lgp) { - DECODE_HEAD; - - READ_BUF(36); - lgp->lg_signal = be32_to_cpup(p++); - lgp->lg_layout_type = be32_to_cpup(p++); - lgp->lg_seg.iomode = be32_to_cpup(p++); - p = xdr_decode_hyper(p, &lgp->lg_seg.offset); - p = xdr_decode_hyper(p, &lgp->lg_seg.length); - p = xdr_decode_hyper(p, &lgp->lg_minlength); + __be32 status;
- status = nfsd4_decode_stateid(argp, &lgp->lg_sid); + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_signal) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_seg.iomode) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.length) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_minlength) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lgp->lg_sid); if (status) return status; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_maxcount) < 0) + return nfserr_bad_xdr;
- READ_BUF(4); - lgp->lg_maxcount = be32_to_cpup(p++); - - DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 645fcad371420913c30e9aca80fc0a38f3acf432 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 72 +++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 837d2d8fb3ff8..ae70070a58213 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -660,6 +660,43 @@ nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_layoutreturn4(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutreturn *lrp) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_return_type) < 0) + return nfserr_bad_xdr; + switch (lrp->lr_return_type) { + case RETURN_FILE: + if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.length) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lrp->lr_sid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lrf_body_len) < 0) + return nfserr_bad_xdr; + if (lrp->lrf_body_len > 0) { + lrp->lrf_body = xdr_inline_decode(argp->xdr, lrp->lrf_body_len); + if (!lrp->lrf_body) + return nfserr_bad_xdr; + } + break; + case RETURN_FSID: + case RETURN_ALL: + lrp->lr_seg.offset = 0; + lrp->lr_seg.length = NFS4_MAX_UINT64; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + #endif /* CONFIG_NFSD_PNFS */
static __be32 @@ -1871,34 +1908,13 @@ static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp) { - DECODE_HEAD; - - READ_BUF(16); - lrp->lr_reclaim = be32_to_cpup(p++); - lrp->lr_layout_type = be32_to_cpup(p++); - lrp->lr_seg.iomode = be32_to_cpup(p++); - lrp->lr_return_type = be32_to_cpup(p++); - if (lrp->lr_return_type == RETURN_FILE) { - READ_BUF(16); - p = xdr_decode_hyper(p, &lrp->lr_seg.offset); - p = xdr_decode_hyper(p, &lrp->lr_seg.length); - - status = nfsd4_decode_stateid(argp, &lrp->lr_sid); - if (status) - return status; - - READ_BUF(4); - lrp->lrf_body_len = be32_to_cpup(p++); - if (lrp->lrf_body_len > 0) { - READ_BUF(lrp->lrf_body_len); - READMEM(lrp->lrf_body, lrp->lrf_body_len); - } - } else { - lrp->lr_seg.offset = 0; - lrp->lr_seg.length = NFS4_MAX_UINT64; - } - - DECODE_TAIL; + if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_seg.iomode) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_layoutreturn4(argp, lrp); } #endif /* CONFIG_NFSD_PNFS */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 53d70873e37c09a582167ed73d1858e3a2af0157 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ae70070a58213..0561b43855839 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1356,17 +1356,6 @@ nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); }
-static __be32 -nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo_no_name *sin) -{ - DECODE_HEAD; - - READ_BUF(4); - sin->sin_style = be32_to_cpup(p++); - DECODE_TAIL; -} - static __be32 nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *setattr) { @@ -1918,6 +1907,14 @@ nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, } #endif /* CONFIG_NFSD_PNFS */
+static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, + struct nfsd4_secinfo_no_name *sin) +{ + if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cf907b11326d9360877d6c6ea8f75e1b29f39f2f ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0561b43855839..3fe0d0228c4ac 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,22 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
-static __be32 -nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, - struct nfsd4_sequence *seq) -{ - DECODE_HEAD; - - READ_BUF(NFS4_MAX_SESSIONID_LEN + 16); - COPYMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN); - seq->seqid = be32_to_cpup(p++); - seq->slotid = be32_to_cpup(p++); - seq->maxslots = be32_to_cpup(p++); - seq->cachethis = be32_to_cpup(p++); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) { @@ -1915,6 +1899,26 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, + struct nfsd4_sequence *seq) +{ + __be32 *p, status; + + status = nfsd4_decode_sessionid4(argp, &seq->sessionid); + if (status) + return status; + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 4); + if (!p) + return nfserr_bad_xdr; + seq->seqid = be32_to_cpup(p++); + seq->slotid = be32_to_cpup(p++); + seq->maxslots = be32_to_cpup(p++); + seq->cachethis = be32_to_cpup(p); + + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b7a0c8f6e741bf9dee0d24e69d3ce51fa4ccce78 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 61 +++++++++++++++++++---------------------------- 1 file changed, 25 insertions(+), 36 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3fe0d0228c4ac..9642de1550431 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,42 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
-static __be32 -nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) -{ - int i; - __be32 *p, status; - struct nfsd4_test_stateid_id *stateid; - - READ_BUF(4); - test_stateid->ts_num_ids = ntohl(*p++); - - INIT_LIST_HEAD(&test_stateid->ts_stateid_list); - - for (i = 0; i < test_stateid->ts_num_ids; i++) { - stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); - if (!stateid) { - status = nfserrno(-ENOMEM); - goto out; - } - - INIT_LIST_HEAD(&stateid->ts_id_list); - list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); - - status = nfsd4_decode_stateid(argp, &stateid->ts_id_stateid); - if (status) - goto out; - } - - status = 0; -out: - return status; -xdr_error: - dprintk("NFSD: xdr error (%s:%d)\n", __FILE__, __LINE__); - status = nfserr_bad_xdr; - goto out; -} - static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_clientid *dc) { DECODE_HEAD; @@ -1919,6 +1883,31 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, return nfs_ok; }
+static __be32 +nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) +{ + struct nfsd4_test_stateid_id *stateid; + __be32 status; + u32 i; + + if (xdr_stream_decode_u32(argp->xdr, &test_stateid->ts_num_ids) < 0) + return nfserr_bad_xdr; + + INIT_LIST_HEAD(&test_stateid->ts_stateid_list); + for (i = 0; i < test_stateid->ts_num_ids; i++) { + stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); + if (!stateid) + return nfserrno(-ENOMEM); /* XXX: not jukebox? */ + INIT_LIST_HEAD(&stateid->ts_id_list); + list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); + status = nfsd4_decode_stateid4(argp, &stateid->ts_id_stateid); + if (status) + return status; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c95f2ec3490586cbb33badc8f4c82d6aa4955078 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9642de1550431..d0f0b7cd4e74e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,16 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
-static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_clientid *dc) -{ - DECODE_HEAD; - - READ_BUF(8); - COPYMEM(&dc->clientid, 8); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, struct nfsd4_reclaim_complete *rc) { DECODE_HEAD; @@ -1908,6 +1898,12 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta return nfs_ok; }
+static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, + struct nfsd4_destroy_clientid *dc) +{ + return nfsd4_decode_clientid4(argp, &dc->clientid); +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0d6467844d437e07db1e76d96176b1a55401018c ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d0f0b7cd4e74e..2c684f7e74650 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,16 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
-static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, struct nfsd4_reclaim_complete *rc) -{ - DECODE_HEAD; - - READ_BUF(4); - rc->rca_one_fs = be32_to_cpup(p++); - - DECODE_TAIL; -} - #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, @@ -1904,6 +1894,14 @@ static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, return nfsd4_decode_clientid4(argp, &dc->clientid); }
+static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, + struct nfsd4_reclaim_complete *rc) +{ + if (xdr_stream_decode_bool(argp->xdr, &rc->rca_one_fs) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6aef27aaeae7611f98af08205acc79f5a8f3aa59 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2c684f7e74650..c8506bb6d8725 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1906,17 +1906,17 @@ static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) { - DECODE_HEAD; + __be32 status;
- status = nfsd4_decode_stateid(argp, &fallocate->falloc_stateid); + status = nfsd4_decode_stateid4(argp, &fallocate->falloc_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_length) < 0) + return nfserr_bad_xdr;
- READ_BUF(16); - p = xdr_decode_hyper(p, &fallocate->falloc_offset); - xdr_decode_hyper(p, &fallocate->falloc_length); - - DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f49e4b4d58cc835d8bd0cc9663f7b9c5497e0e7e ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c8506bb6d8725..05aa36f92a929 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1941,36 +1941,42 @@ nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, struct nl4_server *ns) { - DECODE_HEAD; struct nfs42_netaddr *naddr; + __be32 *p;
- READ_BUF(4); - ns->nl4_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &ns->nl4_type) < 0) + return nfserr_bad_xdr;
/* currently support for 1 inter-server source server */ switch (ns->nl4_type) { case NL4_NETADDR: naddr = &ns->u.nl4_addr;
- READ_BUF(4); - naddr->netid_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &naddr->netid_len) < 0) + return nfserr_bad_xdr; if (naddr->netid_len > RPCBIND_MAXNETIDLEN) - goto xdr_error; + return nfserr_bad_xdr;
- READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */ - COPYMEM(naddr->netid, naddr->netid_len); + p = xdr_inline_decode(argp->xdr, naddr->netid_len); + if (!p) + return nfserr_bad_xdr; + memcpy(naddr->netid, p, naddr->netid_len);
- naddr->addr_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &naddr->addr_len) < 0) + return nfserr_bad_xdr; if (naddr->addr_len > RPCBIND_MAXUADDRLEN) - goto xdr_error; + return nfserr_bad_xdr;
- READ_BUF(naddr->addr_len); - COPYMEM(naddr->addr, naddr->addr_len); + p = xdr_inline_decode(argp->xdr, naddr->addr_len); + if (!p) + return nfserr_bad_xdr; + memcpy(naddr->addr, p, naddr->addr_len); break; default: - goto xdr_error; + return nfserr_bad_xdr; } - DECODE_TAIL; + + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e8febea7190bcbd1e608093acb67f2a5009556aa ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 41 ++++++++++++++++++++++------------------- fs/nfsd/xdr4.h | 2 +- 2 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 05aa36f92a929..2529368cbbc0b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1982,40 +1982,44 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) { - DECODE_HEAD; struct nl4_server *ns_dummy; - int i, count; + u32 consecutive, i, count; + __be32 status;
- status = nfsd4_decode_stateid(argp, ©->cp_src_stateid); + status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); if (status) return status; - status = nfsd4_decode_stateid(argp, ©->cp_dst_stateid); + status = nfsd4_decode_stateid4(argp, ©->cp_dst_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_src_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_dst_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_count) < 0) + return nfserr_bad_xdr; + /* ca_consecutive: we always do consecutive copies */ + if (xdr_stream_decode_u32(argp->xdr, &consecutive) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, ©->cp_synchronous) < 0) + return nfserr_bad_xdr;
- READ_BUF(8 + 8 + 8 + 4 + 4 + 4); - p = xdr_decode_hyper(p, ©->cp_src_pos); - p = xdr_decode_hyper(p, ©->cp_dst_pos); - p = xdr_decode_hyper(p, ©->cp_count); - p++; /* ca_consecutive: we always do consecutive copies */ - copy->cp_synchronous = be32_to_cpup(p++); - - count = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; copy->cp_intra = false; if (count == 0) { /* intra-server copy */ copy->cp_intra = true; - goto intra; + return nfs_ok; }
- /* decode all the supplied server addresses but use first */ + /* decode all the supplied server addresses but use only the first */ status = nfsd4_decode_nl4_server(argp, ©->cp_src); if (status) return status;
ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL); if (ns_dummy == NULL) - return nfserrno(-ENOMEM); + return nfserrno(-ENOMEM); /* XXX: jukebox? */ for (i = 0; i < count - 1; i++) { status = nfsd4_decode_nl4_server(argp, ns_dummy); if (status) { @@ -2024,9 +2028,8 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) } } kfree(ns_dummy); -intra:
- DECODE_TAIL; + return nfs_ok; }
static __be32 @@ -4792,7 +4795,7 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, __be32 *p;
nfserr = nfsd42_encode_write_res(resp, ©->cp_res, - copy->cp_synchronous); + !!copy->cp_synchronous); if (nfserr) return nfserr;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 232529bc1b798..facc5762bf831 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -554,7 +554,7 @@ struct nfsd4_copy { bool cp_intra;
/* both */ - bool cp_synchronous; + u32 cp_synchronous;
/* response */ struct nfsd42_write_res cp_res;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f9a953fb369bbd2135ccead3393ec1ef66544471 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2529368cbbc0b..09aea361c1755 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2032,25 +2032,25 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) return nfs_ok; }
-static __be32 -nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, - struct nfsd4_offload_status *os) -{ - return nfsd4_decode_stateid(argp, &os->stateid); -} - static __be32 nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, struct nfsd4_copy_notify *cn) { __be32 status;
- status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid); + status = nfsd4_decode_stateid4(argp, &cn->cpn_src_stateid); if (status) return status; return nfsd4_decode_nl4_server(argp, &cn->cpn_dst); }
+static __be32 +nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, + struct nfsd4_offload_status *os) +{ + return nfsd4_decode_stateid(argp, &os->stateid); +} + static __be32 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2846bb0525a73e00b3566fda535ea6a5879e2971 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 09aea361c1755..101beb315d963 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2048,7 +2048,7 @@ static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, struct nfsd4_offload_status *os) { - return nfsd4_decode_stateid(argp, &os->stateid); + return nfsd4_decode_stateid4(argp, &os->stateid); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9d32b412fe0a6186cc57789d218e8f8299454ae2 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 101beb315d963..2e2fae7e38db2 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2054,17 +2054,17 @@ nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) { - DECODE_HEAD; + __be32 status;
- status = nfsd4_decode_stateid(argp, &seek->seek_stateid); + status = nfsd4_decode_stateid4(argp, &seek->seek_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, &seek->seek_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &seek->seek_whence) < 0) + return nfserr_bad_xdr;
- READ_BUF(8 + 4); - p = xdr_decode_hyper(p, &seek->seek_offset); - seek->seek_whence = be32_to_cpup(p); - - DECODE_TAIL; + return nfs_ok; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3dfd0b0e15671e2b4047ccb9222432f0b2d930be ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 52 +++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 31 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2e2fae7e38db2..bf2a2ef6a8b97 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -575,18 +575,6 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, return nfs_ok; }
-static __be32 -nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) -{ - DECODE_HEAD; - - READ_BUF(sizeof(stateid_t)); - sid->si_generation = be32_to_cpup(p++); - COPYMEM(&sid->si_opaque, sizeof(stateid_opaque_t)); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) { @@ -1919,25 +1907,6 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, return nfs_ok; }
-static __be32 -nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) -{ - DECODE_HEAD; - - status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid); - if (status) - return status; - status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid); - if (status) - return status; - - READ_BUF(8 + 8 + 8); - p = xdr_decode_hyper(p, &clone->cl_src_pos); - p = xdr_decode_hyper(p, &clone->cl_dst_pos); - p = xdr_decode_hyper(p, &clone->cl_count); - DECODE_TAIL; -} - static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, struct nl4_server *ns) { @@ -2067,6 +2036,27 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) return nfs_ok; }
+static __be32 +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +{ + __be32 status; + + status = nfsd4_decode_stateid4(argp, &clone->cl_src_stateid); + if (status) + return status; + status = nfsd4_decode_stateid4(argp, &clone->cl_dst_stateid); + if (status) + return status; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_src_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_dst_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_count) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + /* * XDR data that is more than PAGE_SIZE in size is normally part of a * read or write. However, the size of extended attributes is limited
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 830c71502ae0ae1677ac6c08ffbcf85a6e7b2937 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bf2a2ef6a8b97..1fcb668e4110d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2117,25 +2117,22 @@ nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, static __be32 nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) { - DECODE_HEAD; char *name, *sp, *dp; u32 namelen, cnt; + __be32 *p;
- READ_BUF(4); - namelen = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &namelen) < 0) + return nfserr_bad_xdr; if (namelen > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) return nfserr_nametoolong; - if (namelen == 0) - goto xdr_error; - - READ_BUF(namelen); - + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, namelen); + if (!p) + return nfserr_bad_xdr; name = svcxdr_tmpalloc(argp, namelen + XATTR_USER_PREFIX_LEN + 1); if (!name) return nfserr_jukebox; - memcpy(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
/* @@ -2148,14 +2145,14 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep)
while (cnt-- > 0) { if (*sp == '\0') - goto xdr_error; + return nfserr_bad_xdr; *dp++ = *sp++; } *dp = '\0';
*namep = name;
- DECODE_TAIL; + return nfs_ok; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 403366a7e8e2930002157525cd44add7fa01bca9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1fcb668e4110d..38610764d7161 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2184,11 +2184,11 @@ static __be32 nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, struct nfsd4_setxattr *setxattr) { - DECODE_HEAD; u32 flags, maxcount, size; + __be32 status;
- READ_BUF(4); - flags = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &flags) < 0) + return nfserr_bad_xdr;
if (flags > SETXATTR4_REPLACE) return nfserr_inval; @@ -2201,8 +2201,8 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, maxcount = svc_max_payload(argp->rqstp); maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount);
- READ_BUF(4); - size = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &size) < 0) + return nfserr_bad_xdr; if (size > maxcount) return nfserr_xattr2big;
@@ -2211,12 +2211,12 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, struct xdr_buf payload;
if (!xdr_stream_subsegment(argp->xdr, &payload, size)) - goto xdr_error; + return nfserr_bad_xdr; status = nfsd4_vbuf_from_vector(argp, &payload, &setxattr->setxa_buf, size); }
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2212036cadf4da3c4b0e4bd2a9a8c3d78617ab4f ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 38610764d7161..bf8eacab64952 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2223,11 +2223,10 @@ static __be32 nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, struct nfsd4_listxattrs *listxattrs) { - DECODE_HEAD; u32 maxcount;
- READ_BUF(12); - p = xdr_decode_hyper(p, &listxattrs->lsxa_cookie); + if (xdr_stream_decode_u64(argp->xdr, &listxattrs->lsxa_cookie) < 0) + return nfserr_bad_xdr;
/* * If the cookie is too large to have even one user.x attribute @@ -2237,7 +2236,8 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, (XATTR_LIST_MAX / (XATTR_USER_PREFIX_LEN + 2))) return nfserr_badcookie;
- maxcount = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &maxcount) < 0) + return nfserr_bad_xdr; if (maxcount < 8) /* Always need at least 2 words (length and one character) */ return nfserr_inval; @@ -2245,7 +2245,7 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, maxcount = min(maxcount, svc_max_payload(argp->rqstp)); listxattrs->lsxa_maxcount = maxcount;
- DECODE_TAIL; + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3a237b4af5b7b0e77588e120554077cab3341943 ]
Avoid passing a "pointer to int" argument to xdr_stream_decode_u32.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 7 +++---- fs/nfsd/xdr4.h | 2 +- 3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a038d1e182ff3..6b06f0ad05615 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3272,7 +3272,7 @@ int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op) void warn_on_nonidempotent_op(struct nfsd4_op *op) { if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) { - pr_err("unable to encode reply to nonidempotent op %d (%s)\n", + pr_err("unable to encode reply to nonidempotent op %u (%s)\n", op->opnum, nfsd4_op_name(op->opnum)); WARN_ON_ONCE(1); } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bf8eacab64952..085191b4b3aa5 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2419,9 +2419,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op = &argp->ops[i]; op->replay = NULL;
- READ_BUF(4); - op->opnum = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) + return nfserr_bad_xdr; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -5395,7 +5394,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) if (op->status && opdesc && !(opdesc->op_flags & OP_NONTRIVIAL_ERROR_ENCODE)) goto status; - BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || + BUG_ON(op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || !nfsd4_enc_ops[op->opnum]); encoder = nfsd4_enc_ops[op->opnum]; op->status = encoder(resp, op->status, &op->u); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index facc5762bf831..2c31f3a7d7c74 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -615,7 +615,7 @@ struct nfsd4_copy_notify { };
struct nfsd4_op { - int opnum; + u32 opnum; const struct nfsd4_operation * opdesc; __be32 status; union nfsd4_op_u {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d9b74bdac6f24afc3101b6a5b6f59842610c9c94 ]
And clean-up: Now that we have removed the DECODE_TAIL macro from nfsd4_decode_compound(), we observe that there's no benefit for nfsd4_decode_compound() to return nfs_ok or nfserr_bad_xdr only to have its sole caller convert those values to one or zero, respectively. Have nfsd4_decode_compound() return 1/0 instead.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 69 ++++++++++++++++++++--------------------------- 1 file changed, 29 insertions(+), 40 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 085191b4b3aa5..30604a3e70c0f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -186,28 +186,6 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; }
-/** - * savemem - duplicate a chunk of memory for later processing - * @argp: NFSv4 compound argument structure to be freed with - * @p: pointer to be duplicated - * @nbytes: length to be duplicated - * - * Returns a pointer to a copy of @nbytes bytes of memory at @p - * that are preserved until processing of the NFSv4 compound - * operation described by @argp finishes. - */ -static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) -{ - void *ret; - - ret = svcxdr_tmpalloc(argp, nbytes); - if (!ret) - return NULL; - memcpy(ret, p, nbytes); - return ret; -} - - /* * NFSv4 basic data type decoders */ @@ -2372,43 +2350,54 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op) return true; }
-static __be32 +static int nfsd4_decode_compound(struct nfsd4_compoundargs *argp) { - DECODE_HEAD; struct nfsd4_op *op; bool cachethis = false; int auth_slack= argp->rqstp->rq_auth_slack; int max_reply = auth_slack + 8; /* opcnt, status */ int readcount = 0; int readbytes = 0; + __be32 *p; int i;
- READ_BUF(4); - argp->taglen = be32_to_cpup(p++); - READ_BUF(argp->taglen); - SAVEMEM(argp->tag, argp->taglen); - READ_BUF(8); - argp->minorversion = be32_to_cpup(p++); - argp->opcnt = be32_to_cpup(p++); - max_reply += 4 + (XDR_QUADLEN(argp->taglen) << 2); - - if (argp->taglen > NFSD4_MAX_TAGLEN) - goto xdr_error; + if (xdr_stream_decode_u32(argp->xdr, &argp->taglen) < 0) + return 0; + max_reply += XDR_UNIT; + argp->tag = NULL; + if (unlikely(argp->taglen)) { + if (argp->taglen > NFSD4_MAX_TAGLEN) + return 0; + p = xdr_inline_decode(argp->xdr, argp->taglen); + if (!p) + return 0; + argp->tag = svcxdr_tmpalloc(argp, argp->taglen); + if (!argp->tag) + return 0; + memcpy(argp->tag, p, argp->taglen); + max_reply += xdr_align_size(argp->taglen); + } + + if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) + return 0; + if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) + return 0; + /* * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS * here, so we return success at the xdr level so that * nfsd4_proc can handle this is an NFS-level error. */ if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return 0; + return 1;
if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; dprintk("nfsd: couldn't allocate room for COMPOUND\n"); - goto xdr_error; + return 0; } }
@@ -2420,7 +2409,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->replay = NULL;
if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) - return nfserr_bad_xdr; + return 0; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -2467,7 +2456,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags);
- DECODE_TAIL; + return 1; }
static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, @@ -5496,7 +5485,7 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) args->ops = args->iops; args->rqstp = rqstp;
- return !nfsd4_decode_compound(args); + return nfsd4_decode_compound(args); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5cfc822f3e77b0477e6602d399116130317f537a ]
Now that all the NFSv4 decoder functions have been converted to make direct calls to the xdr helpers, remove the unused C macros.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 40 ---------------------------------------- fs/nfsd/xdr4.h | 9 --------- 2 files changed, 49 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 30604a3e70c0f..315be1c1ab85c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -102,45 +102,6 @@ check_filename(char *str, int len) return 0; }
-#define DECODE_HEAD \ - __be32 *p; \ - __be32 status -#define DECODE_TAIL \ - status = 0; \ -out: \ - return status; \ -xdr_error: \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - status = nfserr_bad_xdr; \ - goto out - -#define READMEM(x,nbytes) do { \ - x = (char *)p; \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define SAVEMEM(x,nbytes) do { \ - if (!(x = (p==argp->tmp || p == argp->tmpp) ? \ - savemem(argp, p, nbytes) : \ - (char *)p)) { \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - goto xdr_error; \ - } \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define COPYMEM(x,nbytes) do { \ - memcpy((x), p, nbytes); \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define READ_BUF(nbytes) \ - do { \ - p = xdr_inline_decode(argp->xdr,\ - nbytes); \ - if (!p) \ - goto xdr_error; \ - } while (0) - static int zero_clientid(clientid_t *clid) { return (clid->cl_boot == 0) && (clid->cl_id == 0); @@ -5478,7 +5439,6 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) struct nfsd4_compoundargs *args = rqstp->rq_argp;
/* svcxdr_tmp_alloc */ - args->tmpp = NULL; args->to_free = NULL;
args->xdr = &rqstp->rq_arg_stream; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 2c31f3a7d7c74..e12fbe382e3f3 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -386,13 +386,6 @@ struct nfsd4_setclientid_confirm { nfs4_verifier sc_confirm; };
-struct nfsd4_saved_compoundargs { - __be32 *p; - __be32 *end; - int pagelen; - struct page **pagelist; -}; - struct nfsd4_test_stateid_id { __be32 ts_id_status; stateid_t ts_id_stateid; @@ -696,8 +689,6 @@ struct svcxdr_tmpbuf {
struct nfsd4_compoundargs { /* scratch variables for XDR decode */ - __be32 tmp[8]; - __be32 * tmpp; struct xdr_stream *xdr; struct svcxdr_tmpbuf *to_free; struct svc_rqst *rqstp;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 70b87f77294d16d3e567056ba4c9ee2b091a5b50 ]
inode_query_iversion() can modify i_version. Depending on the exported filesystem, that may not be safe. For example, if you're re-exporting NFS, NFS stores the server's change attribute in i_version and does not expect it to be modified locally. This has been observed causing unnecessary cache invalidations.
The way a filesystem indicates that it's OK to call inode_query_iverson() is by setting SB_I_VERSION.
So, move the I_VERSION check out of encode_change(), where it's used only in GETATTR responses, to nfsd4_change_attribute(), which is also called for pre- and post- operation attributes.
(Note we could also pull the NFSEXP_V4ROOT case into nfsd4_change_attribute() as well. That would actually be a no-op, since pre/post attrs are only used for metadata-modifying operations, and V4ROOT exports are read-only. But we might make the change in the future just for simplicity.)
Reported-by: Daire Byrne daire@dneg.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 5 ++--- fs/nfsd/nfs4xdr.c | 6 +----- fs/nfsd/nfsfh.h | 14 ++++++++++---- 3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0d75d201db1b3..5956b0317c55e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -291,14 +291,13 @@ void fill_post_wcc(struct svc_fh *fhp) printk("nfsd: inode locked twice during operation.\n");
err = fh_getattr(fhp, &fhp->fh_post_attr); - fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, - d_inode(fhp->fh_dentry)); if (err) { fhp->fh_post_saved = false; - /* Grab the ctime anyway - set_change_info might use it */ fhp->fh_post_attr.ctime = d_inode(fhp->fh_dentry)->i_ctime; } else fhp->fh_post_saved = true; + fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, + d_inode(fhp->fh_dentry)); }
/* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 315be1c1ab85c..bdcfb5f7021da 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2426,12 +2426,8 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, if (exp->ex_flags & NFSEXP_V4ROOT) { *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time)); *p++ = 0; - } else if (IS_I_VERSION(inode)) { + } else p = xdr_encode_hyper(p, nfsd4_change_attribute(stat, inode)); - } else { - *p++ = cpu_to_be32(stat->ctime.tv_sec); - *p++ = cpu_to_be32(stat->ctime.tv_nsec); - } return p; }
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 56cfbc3615618..39d764b129fa3 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -261,10 +261,16 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, { u64 chattr;
- chattr = stat->ctime.tv_sec; - chattr <<= 30; - chattr += stat->ctime.tv_nsec; - chattr += inode_query_iversion(inode); + if (IS_I_VERSION(inode)) { + chattr = stat->ctime.tv_sec; + chattr <<= 30; + chattr += stat->ctime.tv_nsec; + chattr += inode_query_iversion(inode); + } else { + chattr = stat->ctime.tv_sec; + chattr <<= 32; + chattr += stat->ctime.tv_nsec; + } return chattr; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit b2140338d8dca827ad9e83f3e026e9d51748b265 ]
It doesn't make sense to carry all these extra fields around. Just make everything into change attribute from the start.
This is just cleanup, there should be no change in behavior.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 ++--------- fs/nfsd/xdr4.h | 11 ----------- 2 files changed, 2 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bdcfb5f7021da..4df6c75d0eb7f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2459,15 +2459,8 @@ static __be32 *encode_time_delta(__be32 *p, struct inode *inode) static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c) { *p++ = cpu_to_be32(c->atomic); - if (c->change_supported) { - p = xdr_encode_hyper(p, c->before_change); - p = xdr_encode_hyper(p, c->after_change); - } else { - *p++ = cpu_to_be32(c->before_ctime_sec); - *p++ = cpu_to_be32(c->before_ctime_nsec); - *p++ = cpu_to_be32(c->after_ctime_sec); - *p++ = cpu_to_be32(c->after_ctime_nsec); - } + p = xdr_encode_hyper(p, c->before_change); + p = xdr_encode_hyper(p, c->after_change); return p; }
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index e12fbe382e3f3..b4556e86e97c3 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -76,12 +76,7 @@ static inline bool nfsd4_has_session(struct nfsd4_compound_state *cs)
struct nfsd4_change_info { u32 atomic; - bool change_supported; - u32 before_ctime_sec; - u32 before_ctime_nsec; u64 before_change; - u32 after_ctime_sec; - u32 after_ctime_nsec; u64 after_change; };
@@ -754,15 +749,9 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) { BUG_ON(!fhp->fh_pre_saved); cinfo->atomic = (u32)fhp->fh_post_saved; - cinfo->change_supported = IS_I_VERSION(d_inode(fhp->fh_dentry));
cinfo->before_change = fhp->fh_pre_change; cinfo->after_change = fhp->fh_post_change; - cinfo->before_ctime_sec = fhp->fh_pre_ctime.tv_sec; - cinfo->before_ctime_nsec = fhp->fh_pre_ctime.tv_nsec; - cinfo->after_ctime_sec = fhp->fh_post_attr.ctime.tv_sec; - cinfo->after_ctime_nsec = fhp->fh_post_attr.ctime.tv_nsec; - }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 4b03d99794eeed27650597a886247c6427ce1055 ]
Minor cleanup, no change in behavior.
Also pull out a common helper that'll be useful elsewhere.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.h | 13 +++++-------- include/linux/iversion.h | 13 +++++++++++++ 2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 39d764b129fa3..45bd776290d52 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -259,19 +259,16 @@ fh_clear_wcc(struct svc_fh *fhp) static inline u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) { - u64 chattr; - if (IS_I_VERSION(inode)) { + u64 chattr; + chattr = stat->ctime.tv_sec; chattr <<= 30; chattr += stat->ctime.tv_nsec; chattr += inode_query_iversion(inode); - } else { - chattr = stat->ctime.tv_sec; - chattr <<= 32; - chattr += stat->ctime.tv_nsec; - } - return chattr; + return chattr; + } else + return time_to_chattr(&stat->ctime); }
extern void fill_pre_wcc(struct svc_fh *fhp); diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 2917ef990d435..3bfebde5a1a6d 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -328,6 +328,19 @@ inode_query_iversion(struct inode *inode) return cur >> I_VERSION_QUERIED_SHIFT; }
+/* + * For filesystems without any sort of change attribute, the best we can + * do is fake one up from the ctime: + */ +static inline u64 time_to_chattr(struct timespec64 *t) +{ + u64 chattr = t->tv_sec; + + chattr <<= 32; + chattr += t->tv_nsec; + return chattr; +} + /** * inode_eq_iversion_raw - check whether the raw i_version counter has changed * @inode: inode to check
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 942b20dc245590327ee0187c15c78174cd96dd52 ]
inode_query_iversion() has side effects, and there's no point calling it when we're not even going to use it.
We check whether we're currently processing a v4 request by checking fh_maxsize, which is arguably a little hacky; we could add a flag to svc_fh instead.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 5956b0317c55e..7d44e10a5f5dd 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -259,11 +259,11 @@ void fill_pre_wcc(struct svc_fh *fhp) { struct inode *inode; struct kstat stat; + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); __be32 err;
if (fhp->fh_pre_saved) return; - inode = d_inode(fhp->fh_dentry); err = fh_getattr(fhp, &stat); if (err) { @@ -272,11 +272,12 @@ void fill_pre_wcc(struct svc_fh *fhp) stat.ctime = inode->i_ctime; stat.size = inode->i_size; } + if (v4) + fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode);
fhp->fh_pre_mtime = stat.mtime; fhp->fh_pre_ctime = stat.ctime; fhp->fh_pre_size = stat.size; - fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); fhp->fh_pre_saved = true; }
@@ -285,6 +286,8 @@ void fill_pre_wcc(struct svc_fh *fhp) */ void fill_post_wcc(struct svc_fh *fhp) { + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode = d_inode(fhp->fh_dentry); __be32 err;
if (fhp->fh_post_saved) @@ -293,11 +296,12 @@ void fill_post_wcc(struct svc_fh *fhp) err = fh_getattr(fhp, &fhp->fh_post_attr); if (err) { fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = d_inode(fhp->fh_dentry)->i_ctime; + fhp->fh_post_attr.ctime = inode->i_ctime; } else fhp->fh_post_saved = true; - fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, - d_inode(fhp->fh_dentry)); + if (v4) + fhp->fh_post_change = + nfsd4_change_attribute(&fhp->fh_post_attr, inode); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
This reverts commit a85857633b04d57f4524cca0a2bfaf87b2543f9f.
We're still factoring ctime into our change attribute even in the IS_I_VERSION case. If someone sets the system time backwards, a client could see the change attribute go backwards. Maybe we can just say "well, don't do that", but there's some question whether that's good enough, or whether we need a better guarantee.
Also, the client still isn't actually using the attribute.
While we're still figuring this out, let's just stop returning this attribute.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 ---------- fs/nfsd/nfsd.h | 1 - include/linux/nfs4.h | 8 -------- 3 files changed, 19 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4df6c75d0eb7f..bb037e8eb8304 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3311,16 +3311,6 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; }
- if (bmval2 & FATTR4_WORD2_CHANGE_ATTR_TYPE) { - p = xdr_reserve_space(xdr, 4); - if (!p) - goto out_resource; - if (IS_I_VERSION(d_inode(dentry))) - *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR); - else - *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_TIME_METADATA); - } - #ifdef CONFIG_NFSD_V4_SECURITY_LABEL if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) { status = nfsd4_encode_security_label(xdr, rqstp, context, diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3eaa81e001f9c..2326428e2c5bf 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -394,7 +394,6 @@ void nfsd_lockd_shutdown(void);
#define NFSD4_2_SUPPORTED_ATTRS_WORD2 \ (NFSD4_1_SUPPORTED_ATTRS_WORD2 | \ - FATTR4_WORD2_CHANGE_ATTR_TYPE | \ FATTR4_WORD2_MODE_UMASK | \ NFSD4_2_SECURITY_ATTRS | \ FATTR4_WORD2_XATTR_SUPPORT) diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index 9dc7eeac924f0..5b4c67c91f56a 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -385,13 +385,6 @@ enum lock_type4 { NFS4_WRITEW_LT = 4 };
-enum change_attr_type4 { - NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, - NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, - NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, - NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, - NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 -};
/* Mandatory Attributes */ #define FATTR4_WORD0_SUPPORTED_ATTRS (1UL << 0) @@ -459,7 +452,6 @@ enum change_attr_type4 { #define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1) #define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4) #define FATTR4_WORD2_CLONE_BLKSIZE (1UL << 13) -#define FATTR4_WORD2_CHANGE_ATTR_TYPE (1UL << 15) #define FATTR4_WORD2_SECURITY_LABEL (1UL << 16) #define FATTR4_WORD2_MODE_UMASK (1UL << 17) #define FATTR4_WORD2_XATTR_SUPPORT (1UL << 18)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jeff.layton@primarydata.com
[ Upstream commit daab110e47f8d7aa6da66923e3ac1a8dbd2b2a72 ]
With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op.
Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicity is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC.
This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation.
The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway.
Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd.
Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though.
Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton jeff.layton@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++ fs/nfs/export.c | 1 + fs/nfsd/nfs3xdr.c | 7 ++++-- fs/nfsd/nfsfh.c | 14 +++++++++++ fs/nfsd/nfsfh.h | 2 +- include/linux/exportfs.h | 2 ++ 6 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 33d588a01ace1..cbe542ad52333 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,6 +154,11 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match.
+ flags + Some filesystems may need to be handled differently than others. The + export_operations struct also includes a flags field that allows the + filesystem to communicate such information to nfsd. See the Export + Operations Flags section below for more explanation.
A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. + +Export Operations Flags +----------------------- +In addition to the operation vector pointers, struct export_operations also +contains a "flags" field that allows the filesystem to communicate to nfsd +that it may want to do things differently when dealing with it. The +following flags are defined: + + EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem + RFC 1813 recommends that servers always send weak cache consistency + (WCC) data to the client after each operation. The server should + atomically collect attributes about the inode, do an operation on it, + and then collect the attributes afterward. This allows the client to + skip issuing GETATTRs in some situations but means that the server + is calling vfs_getattr for almost all RPCs. On some filesystems + (particularly those that are clustered or networked) this is expensive + and atomicity is difficult to guarantee. This flag indicates to nfsd + that it should skip providing WCC attributes to the client in NFSv3 + replies when doing operations on this filesystem. Consider enabling + this on filesystems that have an expensive ->getattr inode operation, + or when atomicity between pre and post operation attribute collection + is impossible to guarantee. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 3430d6891e89f..8f4c528865c57 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .flags = EXPORT_OP_NOWCC, }; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 7d44e10a5f5dd..34b880211e5ea 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -206,7 +206,7 @@ static __be32 * encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { struct dentry *dentry = fhp->fh_dentry; - if (dentry && d_really_is_positive(dentry)) { + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { __be32 err; struct kstat stat;
@@ -262,7 +262,7 @@ void fill_pre_wcc(struct svc_fh *fhp) bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); __be32 err;
- if (fhp->fh_pre_saved) + if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); err = fh_getattr(fhp, &stat); @@ -290,6 +290,9 @@ void fill_post_wcc(struct svc_fh *fhp) struct inode *inode = d_inode(fhp->fh_dentry); __be32 err;
+ if (fhp->fh_no_wcc) + return; + if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n");
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index c81dbbad87920..9c29a523f4848 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
fhp->fh_dentry = dentry; fhp->fh_export = exp; + + switch (rqstp->rq_vers) { + case 3: + if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) + fhp->fh_no_wcc = true; + break; + case 2: + fhp->fh_no_wcc = true; + } + return 0; out: exp_put(exp); @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, */ set_version_and_fsid_type(fhp, exp, ref_fh);
+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; + if (ref_fh == fhp) fh_put(ref_fh);
@@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp) exp_put(exp); fhp->fh_export = NULL; } + fhp->fh_no_wcc = false; return; }
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 45bd776290d52..347d10aa62655 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -35,6 +35,7 @@ typedef struct svc_fh {
bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ + bool fh_no_wcc; /* no wcc data needed */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ @@ -54,7 +55,6 @@ typedef struct svc_fh { struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ #endif /* CONFIG_NFSD_V3 */ - } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3ceb72b67a7aa..e7de0103a32e8 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,8 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ + unsigned long flags; };
extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jeff.layton@primarydata.com
[ Upstream commit ba5e8187c55555519ae0b63c0fb681391bc42af9 ]
When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle.
To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking.
Signed-off-by: Jeff Layton jeff.layton@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/nfs/exporting.rst | 12 ++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/export.c | 6 ++++++ include/linux/exportfs.h | 1 + 4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index cbe542ad52333..960be64446cb9 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -190,3 +190,15 @@ following flags are defined: this on filesystems that have an expensive ->getattr inode operation, or when atomicity between pre and post operation attribute collection is impossible to guarantee. + + EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs + Many NFS operations deal with filehandles, which the server must then + vet to ensure that they live inside of an exported tree. When the + export consists of an entire filesystem, this is trivial. nfsd can just + ensure that the filehandle live on the filesystem. When only part of a + filesystem is exported however, then nfsd must walk the ancestors of the + inode to ensure that it's within an exported subtree. This is an + expensive operation and not all filesystems can support it properly. + This flag exempts the filesystem from subtree checking and causes + exportfs to get back an error if it tries to enable subtree checking + on it. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 8f4c528865c57..b9ba306bf9120 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, }; diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 21e404e7cb68c..81e7bb12aca69 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -408,6 +408,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid) return -EINVAL; }
+ if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK && + !(*flags & NFSEXP_NOSUBTREECHECK)) { + dprintk("%s: %s does not support subtree checking!\n", + __func__, inode->i_sb->s_type->name); + return -EINVAL; + } return 0;
} diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index e7de0103a32e8..2fcbab0f6b612 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -214,6 +214,7 @@ struct export_operations { int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); #define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jeff.layton@primarydata.com
[ Upstream commit 7f84b488f9add1d5cca3e6197c95914c7bd3c1cf ]
It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename.
None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand.
Signed-off-by: Jeff Layton jeff.layton@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com [ cel: adjusted to apply to 5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/nfs/exporting.rst | 13 +++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/vfs.c | 16 +++++++++------- include/linux/exportfs.h | 5 +++-- 4 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 960be64446cb9..0e98edd353b5f 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -202,3 +202,16 @@ following flags are defined: This flag exempts the filesystem from subtree checking and causes exportfs to get back an error if it tries to enable subtree checking on it. + + EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking + On some exportable filesystems (such as NFS) unlinking a file that + is still open can cause a fair bit of extra work. For instance, + the NFS client will do a "sillyrename" to ensure that the file + sticks around while it's still open. When reexporting, that open + file is held by nfsd so we usually end up doing a sillyrename, and + then immediately deleting the sillyrenamed file just afterward when + the link count actually goes to zero. Sometimes this delete can race + with other operations (for instance an rmdir of the parent directory). + This flag causes nfsd to close any open files for this inode _before_ + calling into the vfs to do an unlink or a rename that would replace + an existing file. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b9ba306bf9120..5428713af5fee 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 31edb883afd0d..fb4e6c57ce0bb 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1739,7 +1739,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, struct inode *fdir, *tdir; __be32 err; int host_err; - bool has_cached = false; + bool close_cached = false;
err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE); if (err) @@ -1798,8 +1798,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, if (ndentry == trap) goto out_dput_new;
- if (nfsd_has_cached_files(ndentry)) { - has_cached = true; + if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) && + nfsd_has_cached_files(ndentry)) { + close_cached = true; goto out_dput_old; } else { host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); @@ -1820,7 +1821,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * as that would do the wrong thing if the two directories * were the same, so again we do it by hand. */ - if (!has_cached) { + if (!close_cached) { fill_post_wcc(ffhp); fill_post_wcc(tfhp); } @@ -1834,8 +1835,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * shouldn't be done with locks held however, so we delay it until this * point and then reattempt the whole shebang. */ - if (has_cached) { - has_cached = false; + if (close_cached) { + close_cached = false; nfsd_close_cached_files(ndentry); dput(ndentry); goto retry; @@ -1887,7 +1888,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, type = d_inode(rdentry)->i_mode & S_IFMT;
if (type != S_IFDIR) { - nfsd_close_cached_files(rdentry); + if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) + nfsd_close_cached_files(rdentry); host_err = vfs_unlink(dirp, rdentry, NULL); } else { host_err = vfs_rmdir(dirp, rdentry); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 2fcbab0f6b612..d829403ffd3bb 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,8 +213,9 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); -#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ -#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ +#define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ +#define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit d045465fc6cbfa4acfb5a7d817a7c1a57a078109 ]
In order to allow nfsd to accept return values that are not acceptable to overlayfs and others, add a new function.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exportfs/expfs.c | 32 ++++++++++++++++++++++++-------- include/linux/exportfs.h | 5 +++++ 2 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 2dd55b172d57f..0106eba46d5af 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -417,9 +417,11 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, } EXPORT_SYMBOL_GPL(exportfs_encode_fh);
-struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, - int fh_len, int fileid_type, - int (*acceptable)(void *, struct dentry *), void *context) +struct dentry * +exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) { const struct export_operations *nop = mnt->mnt_sb->s_export_op; struct dentry *result, *alias; @@ -432,10 +434,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, if (!nop || !nop->fh_to_dentry) return ERR_PTR(-ESTALE); result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); - if (PTR_ERR(result) == -ENOMEM) - return ERR_CAST(result); if (IS_ERR_OR_NULL(result)) - return ERR_PTR(-ESTALE); + return result;
/* * If no acceptance criteria was specified by caller, a disconnected @@ -561,10 +561,26 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
err_result: dput(result); - if (err != -ENOMEM) - err = -ESTALE; return ERR_PTR(err); } +EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw); + +struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, + int fh_len, int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) +{ + struct dentry *ret; + + ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type, + acceptable, context); + if (IS_ERR_OR_NULL(ret)) { + if (ret == ERR_PTR(-ENOMEM)) + return ret; + return ERR_PTR(-ESTALE); + } + return ret; +} EXPORT_SYMBOL_GPL(exportfs_decode_fh);
MODULE_LICENSE("GPL"); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index d829403ffd3bb..846df3c96730f 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -223,6 +223,11 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len, struct inode *parent); extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, int connectable); +extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt, + struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context); extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *), void *context);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 2e19d10c1438241de32467637a2a411971547991 ]
If the underlying filesystem times out, then we want knfsd to return NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 9c29a523f4848..e80a7525561d0 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -268,12 +268,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); else { - dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, - data_left, fileid_type, - nfsd_acceptable, exp); - if (IS_ERR_OR_NULL(dentry)) + dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid, + data_left, fileid_type, + nfsd_acceptable, exp); + if (IS_ERR_OR_NULL(dentry)) { trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp, dentry ? PTR_ERR(dentry) : -ESTALE); + switch (PTR_ERR(dentry)) { + case -ENOMEM: + case -ETIMEDOUT: + break; + default: + dentry = ERR_PTR(-ESTALE); + } + } } if (dentry == NULL) goto out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 01cbf3853959feec40ec9b9a399e12a021cd4d81 ]
Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 3 ++- fs/nfsd/vfs.c | 13 +++++++++++-- include/linux/exportfs.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 5428713af5fee..48b879cfe6e3b 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,6 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| + EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index fb4e6c57ce0bb..a515cbd0a7d8f 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -986,6 +986,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, __be32 *verf) { struct file *file = nf->nf_file; + struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; struct iov_iter iter; errseq_t since; @@ -993,12 +994,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, int host_err; int use_wgather; loff_t pos = offset; + unsigned long exp_op_flags = 0; unsigned int pflags = current->flags; rwf_t flags = 0; + bool restore_flags = false;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
- if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (sb->s_export_op) + exp_op_flags = sb->s_export_op->flags; + + if (test_bit(RQ_LOCAL, &rqstp->rq_flags) && + !(exp_op_flags & EXPORT_OP_REMOTE_FS)) { /* * We want throttling in balance_dirty_pages() * and shrink_inactive_list() to only consider @@ -1007,6 +1014,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, * the client's dirty pages or its congested queue. */ current->flags |= PF_LOCAL_THROTTLE; + restore_flags = true; + }
exp = fhp->fh_export; use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp); @@ -1062,7 +1071,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, trace_nfsd_write_err(rqstp, fhp, offset, host_err); nfserr = nfserrno(host_err); } - if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (restore_flags) current_restore_flags(pflags, PF_LOCAL_THROTTLE); return nfserr; } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 846df3c96730f..d93e8a6737bb0 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -216,6 +216,7 @@ struct export_operations { #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ +#define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 716a8bc7f706eeef80ab42c99d9f210eda845c81 ]
For the case of NFSv4, specify to the client that the pre/post-op attributes were not recorded atomically with the main operation.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 3 ++- fs/nfsd/nfsfh.c | 4 ++++ fs/nfsd/nfsfh.h | 5 +++++ fs/nfsd/xdr4.h | 2 +- include/linux/exportfs.h | 3 +++ 5 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 48b879cfe6e3b..7412bb164fa77 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -172,5 +172,6 @@ const struct export_operations nfs_export_ops = { .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| - EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS, + EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| + EXPORT_OP_NOATOMIC_ATTR, }; diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index e80a7525561d0..66f2ef67792a7 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -301,6 +301,10 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_export = exp;
switch (rqstp->rq_vers) { + case 4: + if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) + fhp->fh_no_atomic_attr = true; + break; case 3: if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) fhp->fh_no_wcc = true; diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 347d10aa62655..cb20c2cd34695 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -36,6 +36,11 @@ typedef struct svc_fh { bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ bool fh_no_wcc; /* no wcc data needed */ + bool fh_no_atomic_attr; + /* + * wcc data is not atomic with + * operation + */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index b4556e86e97c3..a60ff5ce1a375 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -748,7 +748,7 @@ static inline void set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) { BUG_ON(!fhp->fh_pre_saved); - cinfo->atomic = (u32)fhp->fh_post_saved; + cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr);
cinfo->before_change = fhp->fh_pre_change; cinfo->after_change = fhp->fh_post_change; diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index d93e8a6737bb0..9f4d4bcbf251d 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -217,6 +217,9 @@ struct export_operations { #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ #define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ +#define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply + atomic attribute updates + */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 878f12dbb8f514799d126544d59be4d2675caac3 ]
Al Viro pointed out that using the phrase "close_on_exec(fd, rcu_dereference_raw(current->files->fdt))" instead of wrapping it in rcu_read_lock(), rcu_read_unlock() is a very questionable optimization[1].
Once wrapped with rcu_read_lock()/rcu_read_unlock() that phrase becomes equivalent the helper function get_close_on_exec so simplify the code and make it more robust by simply using get_close_on_exec.
[1] https://lkml.kernel.org/r/20201207222214.GA4115853@ZenIV.linux.org.uk Suggested-by: Al Viro viro@ftp.linux.org.uk Link: https://lkml.kernel.org/r/87k0tqr6zi.fsf_-_@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c index ebe9011955b9b..fb8813cc532d0 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1821,8 +1821,7 @@ static int bprm_execve(struct linux_binprm *bprm, * inaccessible after exec. Relies on having exclusive access to * current->files (due to unshare_files above). */ - if (bprm->fdpath && - close_on_exec(fd, rcu_dereference_raw(current->files->fdt))) + if (bprm->fdpath && get_close_on_exec(fd)) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
/* Set the unchanging part of bprm->cred */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit b6043501289ebf169ae19b810a882d517377302f ]
Many moons ago the binfmts were doing some very questionable things with file descriptors and an unsharing of the file descriptor table was added to make things better[1][2]. The helper steal_lockss was added to avoid breaking the userspace programs[3][4][6].
Unfortunately it turned out that steal_locks did not work for network file systems[5], so it was removed to see if anyone would complain[7][8]. It was thought at the time that NPTL would not be affected as the unshare_files happened after the other threads were killed[8]. Unfortunately because there was an unshare_files in binfmt_elf.c before the threads were killed this analysis was incorrect.
This unshare_files in binfmt_elf.c resulted in the unshares_files happening whenever threads were present. Which led to unshare_files being moved to the start of do_execve[9].
Later the problems were rediscovered and the suggested approach was to readd steal_locks under a different name[10]. I happened to be reviewing patches and I noticed that this approach was a step backwards[11].
I proposed simply moving unshare_files[12] and it was pointed out that moving unshare_files without auditing the code was also unsafe[13].
There were then several attempts to solve this[14][15][16] and I even posted this set of changes[17]. Unfortunately because auditing all of execve is time consuming this change did not make it in at the time.
Well now that I am cleaning up exec I have made the time to read through all of the binfmts and the only playing with file descriptors is either the security modules closing them in security_bprm_committing_creds or is in the generic code in fs/exec.c. None of it happens before begin_new_exec is called.
So move unshare_files into begin_new_exec, after the point of no return. If memory is very very very low and the application calling exec is sharing file descriptor tables between processes we might fail past the point of no return. Which is unfortunate but no different than any of the other places where we allocate memory after the point of no return.
This movement allows another process that shares the file table, or another thread of the same process and that closes files or changes their close on exec behavior and races with execve to cause some unexpected things to happen. There is only one time of check to time of use race and it is just there so that execve fails instead of an interpreter failing when it tries to open the file it is supposed to be interpreting. Failing later if userspace is being silly is not a problem.
With this change it the following discription from the removal of steal_locks[8] finally becomes true.
Apps using NPTL are not affected, since all other threads are killed before execve.
Apps using LinuxThreads are only affected if they
- have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec
Both conditions are documented, but not their interaction.
Apps using clone() natively are affected if they
- use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec
I have investigated some paths to make it possible to solve this without moving unshare_files but they all look more complicated[18].
Reported-by: Daniel P. Berrangé berrange@redhat.com Reported-by: Jeff Layton jlayton@redhat.com History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] 02cda956de0b ("[PATCH] unshare_files" [2] 04e9bcb4d106 ("[PATCH] use new unshare_files helper") [3] 088f5d7244de ("[PATCH] add steal_locks helper") [4] 02c541ec8ffa ("[PATCH] use new steal_locks helper") [5] https://lkml.kernel.org/r/E1FLIlF-0007zR-00@dorka.pomaz.szeredi.hu [6] https://lkml.kernel.org/r/0060321191605.GB15997@sorel.sous-sol.org [7] https://lkml.kernel.org/r/E1FLwjC-0000kJ-00@dorka.pomaz.szeredi.hu [8] c89681ed7d0e ("[PATCH] remove steal_locks()") [9] fd8328be874f ("[PATCH] sanitize handling of shared descriptor tables in failing execve()") [10] https://lkml.kernel.org/r/20180317142520.30520-1-jlayton@kernel.org [11] https://lkml.kernel.org/r/87r2nwqk73.fsf@xmission.com [12] https://lkml.kernel.org/r/87bmfgvg8w.fsf@xmission.com [13] https://lkml.kernel.org/r/20180322111424.GE30522@ZenIV.linux.org.uk [14] https://lkml.kernel.org/r/20180827174722.3723-1-jlayton@kernel.org [15] https://lkml.kernel.org/r/20180830172423.21964-1-jlayton@kernel.org [16] https://lkml.kernel.org/r/20180914105310.6454-1-jlayton@kernel.org [17] https://lkml.kernel.org/r/87a7ohs5ow.fsf@xmission.com [18] https://lkml.kernel.org/r/87pn8c1uj6.fsf_-_@x220.int.ebiederm.org Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-1-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exec.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c index fb8813cc532d0..42952cf90f4af 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1245,6 +1245,7 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) int begin_new_exec(struct linux_binprm * bprm) { struct task_struct *me = current; + struct files_struct *displaced; int retval;
/* Once we are committed compute the creds */ @@ -1264,6 +1265,13 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out;
+ /* Ensure the files table is not shared. */ + retval = unshare_files(&displaced); + if (retval) + goto out; + if (displaced) + put_files_struct(displaced); + /* * Must be called _before_ exec_mmap() as bprm->mm is * not visibile until then. This also enables the update @@ -1789,7 +1797,6 @@ static int bprm_execve(struct linux_binprm *bprm, int fd, struct filename *filename, int flags) { struct file *file; - struct files_struct *displaced; int retval;
/* @@ -1797,13 +1804,9 @@ static int bprm_execve(struct linux_binprm *bprm, */ io_uring_task_cancel();
- retval = unshare_files(&displaced); - if (retval) - return retval; - retval = prepare_bprm_creds(bprm); if (retval) - goto out_files; + return retval;
check_unsafe_exec(bprm); current->in_execve = 1; @@ -1818,8 +1821,12 @@ static int bprm_execve(struct linux_binprm *bprm, bprm->file = file; /* * Record that a name derived from an O_CLOEXEC fd will be - * inaccessible after exec. Relies on having exclusive access to - * current->files (due to unshare_files above). + * inaccessible after exec. This allows the code in exec to + * choose to fail when the executable is not mmaped into the + * interpreter and an open file descriptor is not passed to + * the interpreter. This makes for a better user experience + * than having the interpreter start and then immediately fail + * when it finds the executable is inaccessible. */ if (bprm->fdpath && get_close_on_exec(fd)) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; @@ -1839,8 +1846,6 @@ static int bprm_execve(struct linux_binprm *bprm, rseq_execve(current); acct_update_integrals(current); task_numa_free(current, false); - if (displaced) - put_files_struct(displaced); return retval;
out: @@ -1857,10 +1862,6 @@ static int bprm_execve(struct linux_binprm *bprm, current->fs->in_exec = 0; current->in_execve = 0;
-out_files: - if (displaced) - reset_files_struct(displaced); - return retval; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 1f702603e7125a390b5cdf5ce00539781cfcc86a ]
Now that exec no longer needs to return the unshared files to their previous value there is no reason to return displaced.
Instead when unshare_fd creates a copy of the file table, call put_files_struct before returning from unshare_files.
Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-2-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-2-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/coredump.c | 5 +---- fs/exec.c | 5 +---- include/linux/fdtable.h | 2 +- kernel/fork.c | 12 ++++++------ 4 files changed, 9 insertions(+), 15 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c index 9d91e831ed0b2..7b085975ea163 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,7 +590,6 @@ void do_coredump(const kernel_siginfo_t *siginfo) int ispipe; size_t *argv = NULL; int argc = 0; - struct files_struct *displaced; /* require nonrelative corefile path and be extra careful */ bool need_suid_safe = false; bool core_dumped = false; @@ -797,11 +796,9 @@ void do_coredump(const kernel_siginfo_t *siginfo) }
/* get us an unshared descriptor table; almost always a no-op */ - retval = unshare_files(&displaced); + retval = unshare_files(); if (retval) goto close_fail; - if (displaced) - put_files_struct(displaced); if (!dump_interrupted()) { /* * umh disabled with CONFIG_STATIC_USERMODEHELPER_PATH="" would diff --git a/fs/exec.c b/fs/exec.c index 42952cf90f4af..d5c8f085235bc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1245,7 +1245,6 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) int begin_new_exec(struct linux_binprm * bprm) { struct task_struct *me = current; - struct files_struct *displaced; int retval;
/* Once we are committed compute the creds */ @@ -1266,11 +1265,9 @@ int begin_new_exec(struct linux_binprm * bprm) goto out;
/* Ensure the files table is not shared. */ - retval = unshare_files(&displaced); + retval = unshare_files(); if (retval) goto out; - if (displaced) - put_files_struct(displaced);
/* * Must be called _before_ exec_mmap() as bprm->mm is diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index f1a99d3e55707..b32ab2163dc2d 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -109,7 +109,7 @@ struct task_struct; struct files_struct *get_files_struct(struct task_struct *); void put_files_struct(struct files_struct *fs); void reset_files_struct(struct files_struct *); -int unshare_files(struct files_struct **); +int unshare_files(void); struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; void do_close_on_exec(struct files_struct *); int iterate_fd(struct files_struct *, unsigned, diff --git a/kernel/fork.c b/kernel/fork.c index 633b0af1d1a73..8b8a5a172b158 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3077,21 +3077,21 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) * the exec layer of the kernel. */
-int unshare_files(struct files_struct **displaced) +int unshare_files(void) { struct task_struct *task = current; - struct files_struct *copy = NULL; + struct files_struct *old, *copy = NULL; int error;
error = unshare_fd(CLONE_FILES, NR_OPEN_MAX, ©); - if (error || !copy) { - *displaced = NULL; + if (error || !copy) return error; - } - *displaced = task->files; + + old = task->files; task_lock(task); task->files = copy; task_unlock(task); + put_files_struct(old); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 950db38ff2c01b7aabbd7ab4a50b7992750fa63d ]
Now that exec no longer needs to restore the previous value of current->files on error there are no more callers of reset_files_struct so remove it.
Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-3-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-3-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 12 ------------ include/linux/fdtable.h | 1 - 2 files changed, 13 deletions(-)
diff --git a/fs/file.c b/fs/file.c index d6bc73960e4ac..5065252bb474e 100644 --- a/fs/file.c +++ b/fs/file.c @@ -466,18 +466,6 @@ void put_files_struct(struct files_struct *files) } }
-void reset_files_struct(struct files_struct *files) -{ - struct task_struct *tsk = current; - struct files_struct *old; - - old = tsk->files; - task_lock(tsk); - tsk->files = files; - task_unlock(tsk); - put_files_struct(old); -} - void exit_files(struct task_struct *tsk) { struct files_struct * files = tsk->files; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index b32ab2163dc2d..c0ca6fb3f0f95 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -108,7 +108,6 @@ struct task_struct;
struct files_struct *get_files_struct(struct task_struct *); void put_files_struct(struct files_struct *fs); -void reset_files_struct(struct files_struct *); int unshare_files(void); struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; void do_close_on_exec(struct files_struct *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit f43c283a89a7dc531a47d4b1e001503cf3dc3234 ]
Use the helper fget_task and simplify the code.
As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance.
Suggested-by: Oleg Nesterov oleg@redhat.com Reviewed-by: Cyrill Gorcunov gorcunov@gmail.com v1: https://lkml.kernel.org/r/20200817220425.9389-4-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-4-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kcmp.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-)
diff --git a/kernel/kcmp.c b/kernel/kcmp.c index c0d2ad9b4705d..bd6f9edf98fd3 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -107,7 +107,6 @@ static int kcmp_epoll_target(struct task_struct *task1, { struct file *filp, *filp_epoll, *filp_tgt; struct kcmp_epoll_slot slot; - struct files_struct *files;
if (copy_from_user(&slot, uslot, sizeof(slot))) return -EFAULT; @@ -116,23 +115,12 @@ static int kcmp_epoll_target(struct task_struct *task1, if (!filp) return -EBADF;
- files = get_files_struct(task2); - if (!files) + filp_epoll = fget_task(task2, slot.efd); + if (!filp_epoll) return -EBADF;
- spin_lock(&files->file_lock); - filp_epoll = fcheck_files(files, slot.efd); - if (filp_epoll) - get_file(filp_epoll); - else - filp_tgt = ERR_PTR(-EBADF); - spin_unlock(&files->file_lock); - put_files_struct(files); - - if (filp_epoll) { - filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); - fput(filp_epoll); - } + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); + fput(filp_epoll);
if (IS_ERR(filp_tgt)) return PTR_ERR(filp_tgt);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit b48845af0152d790a54b8ab78cc2b7c07485fc98 ]
Use the helper fget_task to simplify bpf_task_fd_query.
As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance.
This simplification comes from the observation that none of the callers of get_files_struct actually need to call get_files_struct that was made when discussing[1] exec and posix file locks.
[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov oleg@redhat.com v1: https://lkml.kernel.org/r/20200817220425.9389-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-5-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e1bee8cd34044..fbe7f8e2b022c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3929,7 +3929,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, pid_t pid = attr->task_fd_query.pid; u32 fd = attr->task_fd_query.fd; const struct perf_event *event; - struct files_struct *files; struct task_struct *task; struct file *file; int err; @@ -3949,23 +3948,11 @@ static int bpf_task_fd_query(const union bpf_attr *attr, if (!task) return -ENOENT;
- files = get_files_struct(task); - put_task_struct(task); - if (!files) - return -ENOENT; - err = 0; - spin_lock(&files->file_lock); - file = fcheck_files(files, fd); + file = fget_task(task, fd); + put_task_struct(task); if (!file) - err = -EBADF; - else - get_file(file); - spin_unlock(&files->file_lock); - put_files_struct(files); - - if (err) - goto out; + return -EBADF;
if (file->f_op == &bpf_link_fops) { struct bpf_link *link = file->private_data; @@ -4005,7 +3992,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, err = -ENOTSUPP; put_file: fput(file); -out: return err; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 439be32656035d3239fd56f9b83353ec06cb3b45 ]
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Simplifying proc_fd_link is a little bit tricky. It is necessary to know that there is a reference to fd_f ile while path_get is running. This reference can either be guaranteed to exist either by locking the fdtable as the code currently does or by taking a reference on the file in question.
Use fget_task to remove the need for get_files_struct and to take a reference to file in question.
[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov oleg@redhat.com v1: https://lkml.kernel.org/r/20200817220425.9389-8-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-6-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/fd.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 81882a13212d3..d58960f6ee524 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -146,29 +146,22 @@ static const struct dentry_operations tid_fd_dentry_operations = {
static int proc_fd_link(struct dentry *dentry, struct path *path) { - struct files_struct *files = NULL; struct task_struct *task; int ret = -ENOENT;
task = get_proc_task(d_inode(dentry)); if (task) { - files = get_files_struct(task); - put_task_struct(task); - } - - if (files) { unsigned int fd = proc_fd(d_inode(dentry)); struct file *fd_file;
- spin_lock(&files->file_lock); - fd_file = fcheck_files(files, fd); + fd_file = fget_task(task, fd); if (fd_file) { *path = fd_file->f_path; path_get(&fd_file->f_path); ret = 0; + fput(fd_file); } - spin_unlock(&files->file_lock); - put_files_struct(files); + put_task_struct(task); }
return ret;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
Temporarily revert commit 0849f83e4782 ("fget: clarify and improve __fget_files() implementation") to enable subsequent upstream commits to apply and build cleanly.
Stable-dep-of: bebf684bf330 ("file: Rename __fcheck_files to files_lookup_fd_raw") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 72 +++++++++++++------------------------------------------ 1 file changed, 16 insertions(+), 56 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 5065252bb474e..fea693acc065e 100644 --- a/fs/file.c +++ b/fs/file.c @@ -849,68 +849,28 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); }
-static inline struct file *__fget_files_rcu(struct files_struct *files, - unsigned int fd, fmode_t mask, unsigned int refs) -{ - for (;;) { - struct file *file; - struct fdtable *fdt = rcu_dereference_raw(files->fdt); - struct file __rcu **fdentry; - - if (unlikely(fd >= fdt->max_fds)) - return NULL; - - fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); - file = rcu_dereference_raw(*fdentry); - if (unlikely(!file)) - return NULL; - - if (unlikely(file->f_mode & mask)) - return NULL; - - /* - * Ok, we have a file pointer. However, because we do - * this all locklessly under RCU, we may be racing with - * that file being closed. - * - * Such a race can take two forms: - * - * (a) the file ref already went down to zero, - * and get_file_rcu_many() fails. Just try - * again: - */ - if (unlikely(!get_file_rcu_many(file, refs))) - continue; - - /* - * (b) the file table entry has changed under us. - * Note that we don't need to re-check the 'fdt->fd' - * pointer having changed, because it always goes - * hand-in-hand with 'fdt'. - * - * If so, we need to put our refs and try again. - */ - if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || - unlikely(rcu_dereference_raw(*fdentry) != file)) { - fput_many(file, refs); - continue; - } - - /* - * Ok, we have a ref to the file, and checked that it - * still exists. - */ - return file; - } -} - static struct file *__fget_files(struct files_struct *files, unsigned int fd, fmode_t mask, unsigned int refs) { struct file *file;
rcu_read_lock(); - file = __fget_files_rcu(files, fd, mask, refs); +loop: + file = fcheck_files(files, fd); + if (file) { + /* File object ref couldn't be taken. + * dup2() atomicity guarantee is the reason + * we loop to catch the new file (or NULL pointer) + */ + if (file->f_mode & mask) + file = NULL; + else if (!get_file_rcu_many(file, refs)) + goto loop; + else if (__fcheck_files(files, fd) != file) { + fput_many(file, refs); + goto loop; + } + } rcu_read_unlock();
return file;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit bebf684bf330915e6c96313ad7db89a5480fc9c2 ]
The function fcheck despite it's comment is poorly named as it has no callers that only check it's return value. All of fcheck's callers use the returned file descriptor. The same is true for fcheck_files and __fcheck_files.
A new less confusing name is needed. In addition the names of these functions are confusing as they do not report the kind of locks that are needed to be held when these functions are called making error prone to use them.
To remedy this I am making the base functio name lookup_fd and will and prefixes and sufficies to indicate the rest of the context.
Name the function (previously called __fcheck_files) that proceeds from a struct files_struct, looks up the struct file of a file descriptor, and requires it's callers to verify all of the appropriate locks are held files_lookup_fd_raw.
The need for better names became apparent in the last round of discussion of this set of changes[1].
[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqy... Link: https://lkml.kernel.org/r/20201120231441.29911-7-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 4 ++-- include/linux/fdtable.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/file.c b/fs/file.c index fea693acc065e..eb1e2b7220ac6 100644 --- a/fs/file.c +++ b/fs/file.c @@ -866,7 +866,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd, file = NULL; else if (!get_file_rcu_many(file, refs)) goto loop; - else if (__fcheck_files(files, fd) != file) { + else if (files_lookup_fd_raw(files, fd) != file) { fput_many(file, refs); goto loop; } @@ -933,7 +933,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) struct file *file;
if (atomic_read(&files->count) == 1) { - file = __fcheck_files(files, fd); + file = files_lookup_fd_raw(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0; return (unsigned long)file; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index c0ca6fb3f0f95..10e75b4c30a43 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -80,7 +80,7 @@ struct dentry; /* * The caller must ensure that fd table isn't shared or hold rcu or file lock */ -static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd) +static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd) { struct fdtable *fdt = rcu_dereference_raw(files->fdt);
@@ -96,7 +96,7 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int RCU_LOCKDEP_WARN(!rcu_read_lock_held() && !lockdep_is_held(&files->file_lock), "suspicious rcu_dereference_check() usage"); - return __fcheck_files(files, fd); + return files_lookup_fd_raw(files, fd); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 120ce2b0cd52abe73e8b16c23461eb14df5a87d8 ]
To make it easy to tell where files->file_lock protection is being used when looking up a file create files_lookup_fd_locked. Only allow this function to be called with the file_lock held.
Update the callers of fcheck and fcheck_files that are called with the files->file_lock held to call files_lookup_fd_locked instead.
Hopefully this makes it easier to quickly understand what is going on.
The need for better names became apparent in the last round of discussion of this set of changes[1].
[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqy... Link: https://lkml.kernel.org/r/20201120231441.29911-8-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 2 +- fs/locks.c | 14 ++++++++------ fs/proc/fd.c | 2 +- include/linux/fdtable.h | 7 +++++++ 4 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/file.c b/fs/file.c index eb1e2b7220ac6..6a6b03ce4ad69 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1158,7 +1158,7 @@ static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
spin_lock(&files->file_lock); err = expand_files(files, newfd); - file = fcheck(oldfd); + file = files_lookup_fd_locked(files, oldfd); if (unlikely(!file)) goto Ebadf; if (unlikely(err < 0)) { diff --git a/fs/locks.c b/fs/locks.c index cbb5701ce9f37..873f97504bddf 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2536,14 +2536,15 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { + struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(¤t->files->file_lock); - f = fcheck(fd); - spin_unlock(¤t->files->file_lock); + spin_lock(&files->file_lock); + f = files_lookup_fd_locked(files, fd); + spin_unlock(&files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); @@ -2667,14 +2668,15 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { + struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(¤t->files->file_lock); - f = fcheck(fd); - spin_unlock(¤t->files->file_lock); + spin_lock(&files->file_lock); + f = files_lookup_fd_locked(files, fd); + spin_unlock(&files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); diff --git a/fs/proc/fd.c b/fs/proc/fd.c index d58960f6ee524..2cca9bca3b3a3 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -35,7 +35,7 @@ static int seq_show(struct seq_file *m, void *v) unsigned int fd = proc_fd(m->private);
spin_lock(&files->file_lock); - file = fcheck_files(files, fd); + file = files_lookup_fd_locked(files, fd); if (file) { struct fdtable *fdt = files_fdtable(files);
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 10e75b4c30a43..87be704268d26 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -91,6 +91,13 @@ static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsig return NULL; }
+static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd) +{ + RCU_LOCKDEP_WARN(!lockdep_is_held(&files->file_lock), + "suspicious rcu_dereference_check() usage"); + return files_lookup_fd_raw(files, fd); +} + static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) { RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit f36c2943274199cb8aef32ac96531ffb7c4b43d0 ]
This change renames fcheck_files to files_lookup_fd_rcu. All of the remaining callers take the rcu_read_lock before calling this function so the _rcu suffix is appropriate. This change also tightens up the debug check to verify that all callers hold the rcu_read_lock.
All callers that used to call files_check with the files->file_lock held have now been changed to call files_lookup_fd_locked.
This change of name has helped remind me of which locks and which guarantees are in place helping me to catch bugs later in the patchset.
The need for better names became apparent in the last round of discussion of this set of changes[1].
[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqy... Link: https://lkml.kernel.org/r/20201120231441.29911-9-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/files.rst | 6 +++--- fs/file.c | 4 ++-- fs/proc/fd.c | 4 ++-- include/linux/fdtable.h | 7 +++---- kernel/bpf/task_iter.c | 2 +- kernel/kcmp.c | 2 +- 6 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index cbf8e57376bf6..ea75acdb632c0 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -62,7 +62,7 @@ the fdtable structure - be held.
4. To look up the file structure given an fd, a reader - must use either fcheck() or fcheck_files() APIs. These + must use either fcheck() or files_lookup_fd_rcu() APIs. These take care of barrier requirements due to lock-free lookup.
An example:: @@ -84,7 +84,7 @@ the fdtable structure - on ->f_count::
rcu_read_lock(); - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) { if (atomic_long_inc_not_zero(&file->f_count)) *fput_needed = 1; @@ -104,7 +104,7 @@ the fdtable structure - lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() - and fcheck()/fcheck_files() which take care of these issues. + and fcheck()/files_lookup_fd_rcu() which take care of these issues.
7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then diff --git a/fs/file.c b/fs/file.c index 6a6b03ce4ad69..6149f75a18a66 100644 --- a/fs/file.c +++ b/fs/file.c @@ -856,7 +856,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd,
rcu_read_lock(); loop: - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) { /* File object ref couldn't be taken. * dup2() atomicity guarantee is the reason @@ -1187,7 +1187,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd) int retval = oldfd;
rcu_read_lock(); - if (!fcheck_files(files, oldfd)) + if (!files_lookup_fd_rcu(files, oldfd)) retval = -EBADF; rcu_read_unlock(); return retval; diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 2cca9bca3b3a3..3dec44d7c5c5c 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -90,7 +90,7 @@ static bool tid_fd_mode(struct task_struct *task, unsigned fd, fmode_t *mode) return false;
rcu_read_lock(); - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) *mode = file->f_mode; rcu_read_unlock(); @@ -243,7 +243,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, char name[10 + 1]; unsigned int len;
- f = fcheck_files(files, fd); + f = files_lookup_fd_rcu(files, fd); if (!f) continue; data.mode = f->f_mode; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 87be704268d26..a45fa2ef723f5 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -98,10 +98,9 @@ static inline struct file *files_lookup_fd_locked(struct files_struct *files, un return files_lookup_fd_raw(files, fd); }
-static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) +static inline struct file *files_lookup_fd_rcu(struct files_struct *files, unsigned int fd) { - RCU_LOCKDEP_WARN(!rcu_read_lock_held() && - !lockdep_is_held(&files->file_lock), + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious rcu_dereference_check() usage"); return files_lookup_fd_raw(files, fd); } @@ -109,7 +108,7 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int /* * Check whether the specified fd has an open file. */ -#define fcheck(fd) fcheck_files(current->files, fd) +#define fcheck(fd) files_lookup_fd_rcu(current->files, fd)
struct task_struct;
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index f3d3a562a802a..762b4d7c37795 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -185,7 +185,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) for (; curr_fd < max_fds; curr_fd++) { struct file *f;
- f = fcheck_files(curr_files, curr_fd); + f = files_lookup_fd_rcu(curr_files, curr_fd); if (!f) continue; if (!get_file_rcu(f)) diff --git a/kernel/kcmp.c b/kernel/kcmp.c index bd6f9edf98fd3..5b2435e030472 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -67,7 +67,7 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx) rcu_read_lock();
if (task->files) - file = fcheck_files(task->files, idx); + file = files_lookup_fd_rcu(task->files, idx);
rcu_read_unlock(); task_unlock(task);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 460b4f812a9d473d4b39d87d37844f9fc30a9eb3 ]
Also remove the confusing comment about checking if a fd exists. I could not find one instance in the entire kernel that still matches the description or the reason for the name fcheck.
The need for better names became apparent in the last round of discussion of this set of changes[1].
[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqy... Link: https://lkml.kernel.org/r/20201120231441.29911-10-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/files.rst | 6 +++--- arch/powerpc/platforms/cell/spufs/coredump.c | 2 +- fs/notify/dnotify/dnotify.c | 2 +- include/linux/fdtable.h | 8 ++++---- 4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index ea75acdb632c0..bcf84459917f5 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -62,7 +62,7 @@ the fdtable structure - be held.
4. To look up the file structure given an fd, a reader - must use either fcheck() or files_lookup_fd_rcu() APIs. These + must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These take care of barrier requirements due to lock-free lookup.
An example:: @@ -70,7 +70,7 @@ the fdtable structure - struct file *file;
rcu_read_lock(); - file = fcheck(fd); + file = lookup_fd_rcu(fd); if (file) { ... } @@ -104,7 +104,7 @@ the fdtable structure - lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() - and fcheck()/files_lookup_fd_rcu() which take care of these issues. + and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues.
7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 026c181a98c5d..60b5583e9eafc 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -74,7 +74,7 @@ static struct spu_context *coredump_next_context(int *fd) *fd = n - 1;
rcu_read_lock(); - file = fcheck(*fd); + file = lookup_fd_rcu(*fd); ctx = SPUFS_I(file_inode(file))->i_ctx; get_spu_context(ctx); rcu_read_unlock(); diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e45ca6ecba959..e85e13c50d6d4 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -327,7 +327,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) }
rcu_read_lock(); - f = fcheck(fd); + f = lookup_fd_rcu(fd); rcu_read_unlock();
/* if (f != filp) means that we lost a race and another task/thread diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index a45fa2ef723f5..695306cc5337a 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -105,10 +105,10 @@ static inline struct file *files_lookup_fd_rcu(struct files_struct *files, unsig return files_lookup_fd_raw(files, fd); }
-/* - * Check whether the specified fd has an open file. - */ -#define fcheck(fd) files_lookup_fd_rcu(current->files, fd) +static inline struct file *lookup_fd_rcu(unsigned int fd) +{ + return files_lookup_fd_rcu(current->files, fd); +}
struct task_struct;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 3a879fb38082125cc0d8aa89b70c7f3a7cdf584b ]
As a companion to lookup_fd_rcu implement task_lookup_fd_rcu for querying an arbitrary process about a specific file.
Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200818103713.aw46m7vprsy4vlve@wittgenstein Link: https://lkml.kernel.org/r/20201120231441.29911-11-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 15 +++++++++++++++ include/linux/fdtable.h | 2 ++ 2 files changed, 17 insertions(+)
diff --git a/fs/file.c b/fs/file.c index 6149f75a18a66..60a3ccba728cd 100644 --- a/fs/file.c +++ b/fs/file.c @@ -911,6 +911,21 @@ struct file *fget_task(struct task_struct *task, unsigned int fd) return file; }
+struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) +{ + /* Must be called with rcu_read_lock held */ + struct files_struct *files; + struct file *file = NULL; + + task_lock(task); + files = task->files; + if (files) + file = files_lookup_fd_rcu(files, fd); + task_unlock(task); + + return file; +} + /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 695306cc5337a..a88f68f740677 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -110,6 +110,8 @@ static inline struct file *lookup_fd_rcu(unsigned int fd) return files_lookup_fd_rcu(current->files, fd); }
+struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd); + struct task_struct;
struct files_struct *get_files_struct(struct task_struct *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 64eb661fda0269276b4c46965832938e3f268268 ]
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Instead of manually coding finding the files struct for a task and then calling files_lookup_fd_rcu, use the helper task_lookup_fd_rcu that combines those to steps. Making the code simpler and removing the need to get a reference on a files_struct.
[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov oleg@redhat.com Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/fd.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 3dec44d7c5c5c..c1a984f3c4df7 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -83,18 +83,13 @@ static const struct file_operations proc_fdinfo_file_operations = {
static bool tid_fd_mode(struct task_struct *task, unsigned fd, fmode_t *mode) { - struct files_struct *files = get_files_struct(task); struct file *file;
- if (!files) - return false; - rcu_read_lock(); - file = files_lookup_fd_rcu(files, fd); + file = task_lookup_fd_rcu(task, fd); if (file) *mode = file->f_mode; rcu_read_unlock(); - put_files_struct(files); return !!file; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit ed77e80e14a3cd55c73848b9e8043020e717ce12 ]
Modify get_file_raw_ptr to use task_lookup_fd_rcu. The helper task_lookup_fd_rcu does the work of taking the task lock and verifying that task->files != NULL and then calls files_lookup_fd_rcu. So let use the helper to make a simpler implementation of get_file_raw_ptr.
Acked-by: Cyrill Gorcunov gorcunov@gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kcmp.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/kernel/kcmp.c b/kernel/kcmp.c index 5b2435e030472..5353edfad8e11 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -61,16 +61,11 @@ static int kcmp_ptr(void *v1, void *v2, enum kcmp_type type) static struct file * get_file_raw_ptr(struct task_struct *task, unsigned int idx) { - struct file *file = NULL; + struct file *file;
- task_lock(task); rcu_read_lock(); - - if (task->files) - file = files_lookup_fd_rcu(task->files, idx); - + file = task_lookup_fd_rcu(task, idx); rcu_read_unlock(); - task_unlock(task);
return file; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit e9a53aeb5e0a838f10fcea74235664e7ad5e6e1a ]
As a companion to fget_task and task_lookup_fd_rcu implement task_lookup_next_fd_rcu that will return the struct file for the first file descriptor number that is equal or greater than the fd argument value, or NULL if there is no such struct file.
This allows file descriptors of foreign processes to be iterated through safely, without needed to increment the count on files_struct.
Some concern[1] has been expressed that this function takes the task_lock for each iteration and thus for each file descriptor. This place where this function will be called in a commonly used code path is for listing /proc/<pid>/fd. I did some small benchmarks and did not see any measurable performance differences. For ordinary users ls is likely to stat each of the directory entries and tid_fd_mode called from tid_fd_revalidae has always taken the task lock for each file descriptor. So this does not look like it will be a big change in practice.
At some point is will probably be worth changing put_files_struct to free files_struct after an rcu grace period so that task_lock won't be needed at all.
[1] https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com v1: https://lkml.kernel.org/r/20200817220425.9389-9-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-14-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 21 +++++++++++++++++++++ include/linux/fdtable.h | 1 + 2 files changed, 22 insertions(+)
diff --git a/fs/file.c b/fs/file.c index 60a3ccba728cd..9fa49e6298fba 100644 --- a/fs/file.c +++ b/fs/file.c @@ -926,6 +926,27 @@ struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) return file; }
+struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret_fd) +{ + /* Must be called with rcu_read_lock held */ + struct files_struct *files; + unsigned int fd = *ret_fd; + struct file *file = NULL; + + task_lock(task); + files = task->files; + if (files) { + for (; fd < files_fdtable(files)->max_fds; fd++) { + file = files_lookup_fd_rcu(files, fd); + if (file) + break; + } + } + task_unlock(task); + *ret_fd = fd; + return file; +} + /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index a88f68f740677..b0c6a959c6a00 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -111,6 +111,7 @@ static inline struct file *lookup_fd_rcu(unsigned int fd) }
struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd); +struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *fd);
struct task_struct;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 5b17b61870e2f4b0a4fdc5c6039fbdb4ffb796df ]
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving the checking for the maximum file descritor into the generic code, and by remvoing the need for capturing and releasing a reference on files_struct.
As task_lookup_fd_rcu may update the fd ctx->pos has been changed to be the fd +2 after task_lookup_fd_rcu returns.
[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov oleg@redhat.com Tested-by: Andy Lavr andy.lavr@gmail.com v1: https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-15-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/fd.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/fs/proc/fd.c b/fs/proc/fd.c index c1a984f3c4df7..72c1525b4b3eb 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -217,7 +217,6 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, instantiate_t instantiate) { struct task_struct *p = get_proc_task(file_inode(file)); - struct files_struct *files; unsigned int fd;
if (!p) @@ -225,22 +224,18 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
if (!dir_emit_dots(file, ctx)) goto out; - files = get_files_struct(p); - if (!files) - goto out;
rcu_read_lock(); - for (fd = ctx->pos - 2; - fd < files_fdtable(files)->max_fds; - fd++, ctx->pos++) { + for (fd = ctx->pos - 2;; fd++) { struct file *f; struct fd_data data; char name[10 + 1]; unsigned int len;
- f = files_lookup_fd_rcu(files, fd); + f = task_lookup_next_fd_rcu(p, &fd); + ctx->pos = fd + 2LL; if (!f) - continue; + break; data.mode = f->f_mode; rcu_read_unlock(); data.fd = fd; @@ -249,13 +244,11 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!proc_fill_cache(file, ctx, name, len, instantiate, p, &data)) - goto out_fd_loop; + goto out; cond_resched(); rcu_read_lock(); } rcu_read_unlock(); -out_fd_loop: - put_files_struct(files); out: put_task_struct(p); return 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 775e0656b27210ae668e33af00bece858f44576f ]
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance.
Instead hold task_lock for the duration that task->files needs to be stable in seq_show. The task_lock was already taken in get_files_struct, and so skipping get_files_struct performs less work overall, and avoids the problems with the files_struct reference count.
[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov oleg@redhat.com Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-12-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-17-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/fd.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 72c1525b4b3eb..cb51763ed554b 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -28,9 +28,8 @@ static int seq_show(struct seq_file *m, void *v) if (!task) return -ENOENT;
- files = get_files_struct(task); - put_task_struct(task); - + task_lock(task); + files = task->files; if (files) { unsigned int fd = proc_fd(m->private);
@@ -47,8 +46,9 @@ static int seq_show(struct seq_file *m, void *v) ret = 0; } spin_unlock(&files->file_lock); - put_files_struct(files); } + task_unlock(task); + put_task_struct(task);
if (ret) return ret; @@ -57,6 +57,7 @@ static int seq_show(struct seq_file *m, void *v) (long long)file->f_pos, f_flags, real_mount(file->f_path.mnt)->mnt_id);
+ /* show_fd_locks() never deferences files so a stale value is safe */ show_fd_locks(m, file, files); if (seq_has_overflowed(m)) goto out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit d74ba04d919ebe30bf47406819c18c6b50003d92 ]
The function __fd_install was added to support binder[1]. With binder fixed[2] there are no more users.
As fd_install just calls __fd_install with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files.
[1] f869e8a7f753 ("expose a low-level variant of fd_install() for binder") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-14-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-18-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 25 ++++++------------------- include/linux/fdtable.h | 2 -- 2 files changed, 6 insertions(+), 21 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 9fa49e6298fba..a80deabe7f7dc 100644 --- a/fs/file.c +++ b/fs/file.c @@ -175,7 +175,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr);
- /* make sure all __fd_install() have seen resize_in_progress + /* make sure all fd_install() have seen resize_in_progress * or have finished their rcu_read_lock_sched() section. */ if (atomic_read(&files->count) > 1) @@ -198,7 +198,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) rcu_assign_pointer(files->fdt, new_fdt); if (cur_fdt != &files->fdtab) call_rcu(&cur_fdt->rcu, free_fdtable_rcu); - /* coupled with smp_rmb() in __fd_install() */ + /* coupled with smp_rmb() in fd_install() */ smp_wmb(); return 1; } @@ -613,17 +613,13 @@ EXPORT_SYMBOL(put_unused_fd); * It should never happen - if we allow dup2() do it, _really_ bad things * will follow. * - * NOTE: __fd_install() variant is really, really low-level; don't - * use it unless you are forced to by truly lousy API shoved down - * your throat. 'files' *MUST* be either current->files or obtained - * by get_files_struct(current) done by whoever had given it to you, - * or really bad things will happen. Normally you want to use - * fd_install() instead. + * This consumes the "file" refcount, so callers should treat it + * as if they had called fput(file). */
-void __fd_install(struct files_struct *files, unsigned int fd, - struct file *file) +void fd_install(unsigned int fd, struct file *file) { + struct files_struct *files = current->files; struct fdtable *fdt;
rcu_read_lock_sched(); @@ -645,15 +641,6 @@ void __fd_install(struct files_struct *files, unsigned int fd, rcu_read_unlock_sched(); }
-/* - * This consumes the "file" refcount, so callers should treat it - * as if they had called fput(file). - */ -void fd_install(unsigned int fd, struct file *file) -{ - __fd_install(current->files, fd, file); -} - EXPORT_SYMBOL(fd_install);
static struct file *pick_file(struct files_struct *files, unsigned fd) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index b0c6a959c6a00..6e8743a4c9d31 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -126,8 +126,6 @@ int iterate_fd(struct files_struct *, unsigned,
extern int __alloc_fd(struct files_struct *files, unsigned start, unsigned end, unsigned flags); -extern void __fd_install(struct files_struct *files, - unsigned int fd, struct file *file); extern int __close_fd(struct files_struct *files, unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
Simplify the code, and remove the chance of races by reading RLIMIT_NOFILE only once in f_dupfd.
Pass the read value of RLIMIT_NOFILE into alloc_fd which is the other location the rlimit was read in f_dupfd. As f_dupfd is the only caller of alloc_fd this changing alloc_fd is trivially safe.
Further this causes alloc_fd to take all of the same arguments as __alloc_fd except for the files_struct argument.
Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-15-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-19-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/file.c b/fs/file.c index a80deabe7f7dc..9e2b171b92520 100644 --- a/fs/file.c +++ b/fs/file.c @@ -567,9 +567,9 @@ int __alloc_fd(struct files_struct *files, return error; }
-static int alloc_fd(unsigned start, unsigned flags) +static int alloc_fd(unsigned start, unsigned end, unsigned flags) { - return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags); + return __alloc_fd(current->files, start, end, flags); }
int __get_unused_fd_flags(unsigned flags, unsigned long nofile) @@ -1235,10 +1235,11 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
int f_dupfd(unsigned int from, struct file *file, unsigned flags) { + unsigned long nofile = rlimit(RLIMIT_NOFILE); int err; - if (from >= rlimit(RLIMIT_NOFILE)) + if (from >= nofile) return -EINVAL; - err = alloc_fd(from, flags); + err = alloc_fd(from, nofile, flags); if (err >= 0) { get_file(file); fd_install(err, file);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit aa384d10f3d06d4b85597ff5df41551262220e16 ]
The function __alloc_fd was added to support binder[1]. With binder fixed[2] there are no more users.
As alloc_fd just calls __alloc_fd with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files.
[1] dcfadfa4ec5a ("new helper: __alloc_fd()") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-16-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-20-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 11 +++-------- include/linux/fdtable.h | 2 -- 2 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 9e2b171b92520..48d0306e42ccc 100644 --- a/fs/file.c +++ b/fs/file.c @@ -509,9 +509,9 @@ static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) /* * allocate a file descriptor, mark it busy. */ -int __alloc_fd(struct files_struct *files, - unsigned start, unsigned end, unsigned flags) +static int alloc_fd(unsigned start, unsigned end, unsigned flags) { + struct files_struct *files = current->files; unsigned int fd; int error; struct fdtable *fdt; @@ -567,14 +567,9 @@ int __alloc_fd(struct files_struct *files, return error; }
-static int alloc_fd(unsigned start, unsigned end, unsigned flags) -{ - return __alloc_fd(current->files, start, end, flags); -} - int __get_unused_fd_flags(unsigned flags, unsigned long nofile) { - return __alloc_fd(current->files, 0, nofile, flags); + return alloc_fd(0, nofile, flags); }
int get_unused_fd_flags(unsigned flags) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 6e8743a4c9d31..d26b884fcc5cc 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -124,8 +124,6 @@ int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), const void *);
-extern int __alloc_fd(struct files_struct *files, - unsigned start, unsigned end, unsigned flags); extern int __close_fd(struct files_struct *files, unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 8760c909f54a82aaa6e76da19afe798a0c77c3c3 ]
The function __close_fd was added to support binder[1]. Now that binder has been fixed to no longer need __close_fd[2] all calls to __close_fd pass current->files.
Therefore transform the files parameter into a local variable initialized to current->files, and rename __close_fd to close_fd to reflect this change, and keep it in sync with the similar changes to __alloc_fd, and __fd_install.
This removes the need for callers to care about the extra care that needs to be take if anything except current->files is passed, by limiting the callers to only operation on current->files.
[1] 483ce1d4b8c3 ("take descriptor-related part of close() to file.c") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner christian.brauner@ubuntu.com v1: https://lkml.kernel.org/r/20200817220425.9389-17-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-21-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 10 ++++------ fs/open.c | 2 +- include/linux/fdtable.h | 3 +-- include/linux/syscalls.h | 6 +++--- 4 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/fs/file.c b/fs/file.c index 48d0306e42ccc..fdb84a64724b7 100644 --- a/fs/file.c +++ b/fs/file.c @@ -659,11 +659,9 @@ static struct file *pick_file(struct files_struct *files, unsigned fd) return file; }
-/* - * The same warnings as for __alloc_fd()/__fd_install() apply here... - */ -int __close_fd(struct files_struct *files, unsigned fd) +int close_fd(unsigned fd) { + struct files_struct *files = current->files; struct file *file;
file = pick_file(files, fd); @@ -672,7 +670,7 @@ int __close_fd(struct files_struct *files, unsigned fd)
return filp_close(file, files); } -EXPORT_SYMBOL(__close_fd); /* for ksys_close() */ +EXPORT_SYMBOL(close_fd); /* for ksys_close() */
/** * __close_range() - Close all file descriptors in a given range. @@ -1087,7 +1085,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) struct files_struct *files = current->files;
if (!file) - return __close_fd(files, fd); + return close_fd(fd);
if (fd >= rlimit(RLIMIT_NOFILE)) return -EBADF; diff --git a/fs/open.c b/fs/open.c index 83f62cf1432c8..48933cbb75391 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1310,7 +1310,7 @@ EXPORT_SYMBOL(filp_close); */ SYSCALL_DEFINE1(close, unsigned int, fd) { - int retval = __close_fd(current->files, fd); + int retval = close_fd(fd);
/* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index d26b884fcc5cc..4ed3589f9294e 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -124,8 +124,7 @@ int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), const void *);
-extern int __close_fd(struct files_struct *files, - unsigned int fd); +extern int close_fd(unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); extern int close_fd_get_file(unsigned int fd, struct file **res); extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 17a24e1180dad..0bc3dd86f9e50 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1320,16 +1320,16 @@ static inline long ksys_ftruncate(unsigned int fd, loff_t length) return do_sys_ftruncate(fd, length, 1); }
-extern int __close_fd(struct files_struct *files, unsigned int fd); +extern int close_fd(unsigned int fd);
/* * In contrast to sys_close(), this stub does not check whether the syscall * should or should not be restarted, but returns the raw error codes from - * __close_fd(). + * close_fd(). */ static inline int ksys_close(unsigned int fd) { - return __close_fd(current->files, fd); + return close_fd(fd); }
extern long do_sys_truncate(const char __user *pathname, loff_t length);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit 1572bfdf21d4d50e51941498ffe0b56c2289f783 ]
Now that ksys_close is exactly identical to close_fd replace the one caller of ksys_close with close_fd.
[1] https://lkml.kernel.org/r/20200818112020.GA17080@infradead.org Suggested-by: Christoph Hellwig hch@infradead.org Link: https://lkml.kernel.org/r/20201120231441.29911-22-ebiederm@xmission.com Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/autofs/dev-ioctl.c | 5 +++-- include/linux/syscalls.h | 12 ------------ 2 files changed, 3 insertions(+), 14 deletions(-)
diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c index 322b7dfb4ea01..5bf781ea6d676 100644 --- a/fs/autofs/dev-ioctl.c +++ b/fs/autofs/dev-ioctl.c @@ -4,9 +4,10 @@ * Copyright 2008 Ian Kent raven@themaw.net */
+#include <linux/module.h> #include <linux/miscdevice.h> #include <linux/compat.h> -#include <linux/syscalls.h> +#include <linux/fdtable.h> #include <linux/magic.h> #include <linux/nospec.h>
@@ -289,7 +290,7 @@ static int autofs_dev_ioctl_closemount(struct file *fp, struct autofs_sb_info *sbi, struct autofs_dev_ioctl *param) { - return ksys_close(param->ioctlfd); + return close_fd(param->ioctlfd); }
/* diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0bc3dd86f9e50..1ea422b1a9f1c 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1320,18 +1320,6 @@ static inline long ksys_ftruncate(unsigned int fd, loff_t length) return do_sys_ftruncate(fd, length, 1); }
-extern int close_fd(unsigned int fd); - -/* - * In contrast to sys_close(), this stub does not check whether the syscall - * should or should not be restarted, but returns the raw error codes from - * close_fd(). - */ -static inline int ksys_close(unsigned int fd) -{ - return close_fd(fd); -} - extern long do_sys_truncate(const char __user *pathname, loff_t length);
static inline long ksys_truncate(const char __user *pathname, loff_t length)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
[ Upstream commit 92890123749bafc317bbfacbe0a62ce08d78efb7 ]
The default value of inotify.max_user_watches sysctl parameter was set to 8192 since the introduction of the inotify feature in 2005 by commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too small for many modern usage. As a result, users have to explicitly set it to a larger value to make it work.
After some searching around the web, these are the inotify.max_user_watches values used by some projects: - vscode: 524288 - dropbox support: 100000 - users on stackexchange: 12228 - lsyncd user: 2000000 - code42 support: 1048576 - monodevelop: 16384 - tectonic: 524288 - openshift origin: 65536
Each watch point adds an inotify_inode_mark structure to an inode to be watched. It also pins the watched inode.
Modeled after the epoll.max_user_watches behavior to adjust the default value according to the amount of addressable memory available, make inotify.max_user_watches behave in a similar way to make it use no more than 1% of addressable memory within the range [8192, 1048576].
We estimate the amount of memory used by inotify mark to size of inotify_inode_mark plus two times the size of struct inode (we double the inode size to cover the additional filesystem private inode part). That means that a 64-bit system with 128GB or more memory will likely have the maximum value of 1048576 for inotify.max_user_watches. This default should be big enough for most use cases.
Link: https://lore.kernel.org/r/20201109035931.4740-1-longman@redhat.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/inotify/inotify_user.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 32b6b97021bef..b564a32403aa5 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -37,6 +37,15 @@
#include <asm/ioctls.h>
+/* + * An inotify watch requires allocating an inotify_inode_mark structure as + * well as pinning the watched inode. Doubling the size of a VFS inode + * should be more than enough to cover the additional filesystem inode + * size increase. + */ +#define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \ + 2 * sizeof(struct inode)) + /* configurable via /proc/sys/fs/inotify/ */ static int inotify_max_queued_events __read_mostly;
@@ -797,6 +806,18 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd) */ static int __init inotify_user_setup(void) { + unsigned long watches_max; + struct sysinfo si; + + si_meminfo(&si); + /* + * Allow up to 1% of addressable memory to be allocated for inotify + * watches (per user) limited to the range [8192, 1048576]. + */ + watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / + INOTIFY_WATCH_COST; + watches_max = clamp(watches_max, 8192UL, 1048576UL); + BUILD_BUG_ON(IN_ACCESS != FS_ACCESS); BUILD_BUG_ON(IN_MODIFY != FS_MODIFY); BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB); @@ -823,7 +844,7 @@ static int __init inotify_user_setup(void)
inotify_max_queued_events = 16384; init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; - init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192; + init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Yongjun zhengyongjun3@huawei.com
[ Upstream commit 3316fb80a0b4c1fef03a3eb1a7f0651e2133c429 ]
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/lockd/host.c b/fs/lockd/host.c index 771c289f6df7f..f802223e71abe 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -163,7 +163,7 @@ static struct nlm_host *nlm_alloc_host(struct nlm_lookup_host_info *ni, host->h_nsmhandle = nsm; host->h_addrbuf = nsm->sm_addrbuf; host->net = ni->net; - host->h_cred = get_cred(ni->cred), + host->h_cred = get_cred(ni->cred); strlcpy(host->nodename, utsname()->nodename, sizeof(host->nodename));
out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d6c9e4368cc6a61bf25c9c72437ced509c854563 ]
fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static?
The parameter was added by commit ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy"). Relocate it into the source file that uses it, and make it static. This approach is similar to the nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable module parameters.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 5 +++++ fs/nfsd/nfssvc.c | 6 ------ fs/nfsd/xdr4.h | 1 - 3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 6b06f0ad05615..1ef98398362a5 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -50,6 +50,11 @@ #include "pnfs.h" #include "trace.h"
+static bool inter_copy_offload_enable; +module_param(inter_copy_offload_enable, bool, 0644); +MODULE_PARM_DESC(inter_copy_offload_enable, + "Enable inter server to server copy offload. Default: false"); + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include <linux/security.h>
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 3fb9607d67a37..423410cc02145 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -33,12 +33,6 @@
#define NFSDDBG_FACILITY NFSDDBG_SVC
-bool inter_copy_offload_enable; -EXPORT_SYMBOL_GPL(inter_copy_offload_enable); -module_param(inter_copy_offload_enable, bool, 0644); -MODULE_PARM_DESC(inter_copy_offload_enable, - "Enable inter server to server copy offload. Default: false"); - extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index a60ff5ce1a375..c300885ae75dd 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -568,7 +568,6 @@ struct nfsd4_copy { struct nfs_fh c_fh; nfs4_stateid stateid; }; -extern bool inter_copy_offload_enable;
struct nfsd4_seek { /* request */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7b723008f9c95624c848fad661c01b06e47b20da ]
While converting the NFSv4 decoder to use xdr_stream-based XDR processing, I removed the old SAVEMEM() macro. This macro wrapped a bit of logic that avoided a memory allocation by recognizing when the decoded item resides in a linear section of the Receive buffer. In that case, it returned a pointer into that buffer instead of allocating a bounce buffer.
The bounce buffer is necessary only when xdr_inline_decode() has placed the decoded item in the xdr_stream's scratch buffer, which disappears the next time xdr_inline_decode() is called with that xdr_stream. That happens only if the data item crosses a page boundary in the receive buffer, an exceedingly rare occurrence.
Allocating a bounce buffer every time results in a minor performance regression that was introduced by the recent NFSv4 decoder overhaul. Let's restore the previous behavior. On average, it saves about 1.5 kmalloc() calls per COMPOUND.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bb037e8eb8304..8d5fdae568aeb 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -147,6 +147,25 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; }
+static void * +svcxdr_savemem(struct nfsd4_compoundargs *argp, __be32 *p, u32 len) +{ + __be32 *tmp; + + /* + * The location of the decoded data item is stable, + * so @p is OK to use. This is the common case. + */ + if (p != argp->xdr->scratch.iov_base) + return p; + + tmp = svcxdr_tmpalloc(argp, len); + if (!tmp) + return NULL; + memcpy(tmp, p, len); + return tmp; +} + /* * NFSv4 basic data type decoders */ @@ -183,11 +202,10 @@ nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) p = xdr_inline_decode(argp->xdr, len); if (!p) return nfserr_bad_xdr; - o->data = svcxdr_tmpalloc(argp, len); + o->data = svcxdr_savemem(argp, p, len); if (!o->data) return nfserr_jukebox; o->len = len; - memcpy(o->data, p, len);
return nfs_ok; } @@ -205,10 +223,9 @@ nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) status = check_filename((char *)p, *lenp); if (status) return status; - *namp = svcxdr_tmpalloc(argp, *lenp); + *namp = svcxdr_savemem(argp, p, *lenp); if (!*namp) return nfserr_jukebox; - memcpy(*namp, p, *lenp);
return nfs_ok; } @@ -1200,10 +1217,9 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) p = xdr_inline_decode(argp->xdr, putfh->pf_fhlen); if (!p) return nfserr_bad_xdr; - putfh->pf_fhval = svcxdr_tmpalloc(argp, putfh->pf_fhlen); + putfh->pf_fhval = svcxdr_savemem(argp, p, putfh->pf_fhlen); if (!putfh->pf_fhval) return nfserr_jukebox; - memcpy(putfh->pf_fhval, p, putfh->pf_fhlen);
return nfs_ok; } @@ -1318,24 +1334,20 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient p = xdr_inline_decode(argp->xdr, setclientid->se_callback_netid_len); if (!p) return nfserr_bad_xdr; - setclientid->se_callback_netid_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_netid_val = svcxdr_savemem(argp, p, setclientid->se_callback_netid_len); if (!setclientid->se_callback_netid_val) return nfserr_jukebox; - memcpy(setclientid->se_callback_netid_val, p, - setclientid->se_callback_netid_len);
if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_addr_len) < 0) return nfserr_bad_xdr; p = xdr_inline_decode(argp->xdr, setclientid->se_callback_addr_len); if (!p) return nfserr_bad_xdr; - setclientid->se_callback_addr_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_addr_val = svcxdr_savemem(argp, p, setclientid->se_callback_addr_len); if (!setclientid->se_callback_addr_val) return nfserr_jukebox; - memcpy(setclientid->se_callback_addr_val, p, - setclientid->se_callback_addr_len); if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_ident) < 0) return nfserr_bad_xdr;
@@ -1375,10 +1387,9 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify p = xdr_inline_decode(argp->xdr, verify->ve_attrlen); if (!p) return nfserr_bad_xdr; - verify->ve_attrval = svcxdr_tmpalloc(argp, verify->ve_attrlen); + verify->ve_attrval = svcxdr_savemem(argp, p, verify->ve_attrlen); if (!verify->ve_attrval) return nfserr_jukebox; - memcpy(verify->ve_attrval, p, verify->ve_attrlen);
return nfs_ok; } @@ -2333,10 +2344,9 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) p = xdr_inline_decode(argp->xdr, argp->taglen); if (!p) return 0; - argp->tag = svcxdr_tmpalloc(argp, argp->taglen); + argp->tag = svcxdr_savemem(argp, p, argp->taglen); if (!argp->tag) return 0; - memcpy(argp->tag, p, argp->taglen); max_reply += xdr_align_size(argp->taglen); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2289e87b5951f97783f07fc895e6c5e804b53668 ]
The next few patches will employ these strings to help make server- side trace logs more human-readable. A similar technique is already in use in kernel RPC client code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 24 ++++++++++++++++++++++++ fs/lockd/svcproc.c | 24 ++++++++++++++++++++++++ fs/nfs/callback_xdr.c | 2 ++ fs/nfsd/nfs2acl.c | 5 +++++ fs/nfsd/nfs3acl.c | 3 +++ fs/nfsd/nfs3proc.c | 22 ++++++++++++++++++++++ fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/nfsproc.c | 18 ++++++++++++++++++ include/linux/sunrpc/svc.h | 1 + 9 files changed, 101 insertions(+)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index fa41dda399259..4c10fb5138f10 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -512,6 +512,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlm4svc_proc_test, @@ -520,6 +521,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, + .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlm4svc_proc_lock, @@ -528,6 +530,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlm4svc_proc_cancel, @@ -536,6 +539,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlm4svc_proc_unlock, @@ -544,6 +548,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlm4svc_proc_granted, @@ -552,6 +557,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlm4svc_proc_test_msg, @@ -560,6 +566,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlm4svc_proc_lock_msg, @@ -568,6 +575,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlm4svc_proc_cancel_msg, @@ -576,6 +584,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlm4svc_proc_unlock_msg, @@ -584,6 +593,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlm4svc_proc_granted_msg, @@ -592,6 +602,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlm4svc_proc_null, @@ -600,6 +611,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlm4svc_proc_null, @@ -608,6 +620,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlm4svc_proc_null, @@ -616,6 +629,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlm4svc_proc_null, @@ -624,6 +638,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlm4svc_proc_granted_res, @@ -632,6 +647,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlm4svc_proc_sm_notify, @@ -640,6 +656,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlm4svc_proc_unused, @@ -648,6 +665,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [18] = { .pc_func = nlm4svc_proc_unused, @@ -656,6 +674,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [19] = { .pc_func = nlm4svc_proc_unused, @@ -664,6 +683,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlm4svc_proc_share, @@ -672,6 +692,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlm4svc_proc_unshare, @@ -680,6 +701,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlm4svc_proc_nm_lock, @@ -688,6 +710,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlm4svc_proc_free_all, @@ -696,5 +719,6 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "FREE_ALL", }, }; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 50855f2c1f4b8..4ae4b63b53925 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -554,6 +554,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlmsvc_proc_test, @@ -562,6 +563,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, + .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlmsvc_proc_lock, @@ -570,6 +572,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlmsvc_proc_cancel, @@ -578,6 +581,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlmsvc_proc_unlock, @@ -586,6 +590,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlmsvc_proc_granted, @@ -594,6 +599,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlmsvc_proc_test_msg, @@ -602,6 +608,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlmsvc_proc_lock_msg, @@ -610,6 +617,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlmsvc_proc_cancel_msg, @@ -618,6 +626,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlmsvc_proc_unlock_msg, @@ -626,6 +635,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlmsvc_proc_granted_msg, @@ -634,6 +644,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlmsvc_proc_null, @@ -642,6 +653,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlmsvc_proc_null, @@ -650,6 +662,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlmsvc_proc_null, @@ -658,6 +671,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlmsvc_proc_null, @@ -666,6 +680,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlmsvc_proc_granted_res, @@ -674,6 +689,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlmsvc_proc_sm_notify, @@ -682,6 +698,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlmsvc_proc_unused, @@ -690,6 +707,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [18] = { .pc_func = nlmsvc_proc_unused, @@ -698,6 +716,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [19] = { .pc_func = nlmsvc_proc_unused, @@ -706,6 +725,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlmsvc_proc_share, @@ -714,6 +734,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlmsvc_proc_unshare, @@ -722,6 +743,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlmsvc_proc_nm_lock, @@ -730,6 +752,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlmsvc_proc_free_all, @@ -738,5 +761,6 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "FREE_ALL", }, }; diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index ca8a4aa351dc9..f9dfd4e712a30 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1056,6 +1056,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_decode = nfs4_decode_void, .pc_encode = nfs4_encode_void, .pc_xdrressize = 1, + .pc_name = "NULL", }, [CB_COMPOUND] = { .pc_func = nfs4_callback_compound, @@ -1063,6 +1064,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_argsize = 256, .pc_ressize = 256, .pc_xdrressize = NFS4_CALLBACK_BUFSIZE, + .pc_name = "COMPOUND", } };
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index b0f66604532a5..899762da23c92 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -371,6 +371,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [ACLPROC2_GETACL] = { .pc_func = nfsacld_proc_getacl, @@ -381,6 +382,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), + .pc_name = "GETACL", }, [ACLPROC2_SETACL] = { .pc_func = nfsacld_proc_setacl, @@ -391,6 +393,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "SETACL", }, [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, @@ -401,6 +404,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [ACLPROC2_ACCESS] = { .pc_func = nfsacld_proc_access, @@ -411,6 +415,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1, + .pc_name = "SETATTR", }, };
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 7c30876a31a1b..9e1a92fb97712 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -251,6 +251,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [ACLPROC3_GETACL] = { .pc_func = nfsd3_proc_getacl, @@ -261,6 +262,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), + .pc_name = "GETACL", }, [ACLPROC3_SETACL] = { .pc_func = nfsd3_proc_setacl, @@ -271,6 +273,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd3_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT, + .pc_name = "SETACL", }, };
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 3257233d1a655..0f79e007c620f 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -713,6 +713,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [NFS3PROC_GETATTR] = { .pc_func = nfsd3_proc_getattr, @@ -723,6 +724,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [NFS3PROC_SETATTR] = { .pc_func = nfsd3_proc_setattr, @@ -733,6 +735,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "SETATTR", }, [NFS3PROC_LOOKUP] = { .pc_func = nfsd3_proc_lookup, @@ -743,6 +746,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+pAT+pAT, + .pc_name = "LOOKUP", }, [NFS3PROC_ACCESS] = { .pc_func = nfsd3_proc_access, @@ -753,6 +757,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1, + .pc_name = "ACCESS", }, [NFS3PROC_READLINK] = { .pc_func = nfsd3_proc_readlink, @@ -763,6 +768,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, + .pc_name = "READLINK", }, [NFS3PROC_READ] = { .pc_func = nfsd3_proc_read, @@ -773,6 +779,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4, + .pc_name = "READ", }, [NFS3PROC_WRITE] = { .pc_func = nfsd3_proc_write, @@ -783,6 +790,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_writeres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+4, + .pc_name = "WRITE", }, [NFS3PROC_CREATE] = { .pc_func = nfsd3_proc_create, @@ -793,6 +801,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "CREATE", }, [NFS3PROC_MKDIR] = { .pc_func = nfsd3_proc_mkdir, @@ -803,6 +812,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "MKDIR", }, [NFS3PROC_SYMLINK] = { .pc_func = nfsd3_proc_symlink, @@ -813,6 +823,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "SYMLINK", }, [NFS3PROC_MKNOD] = { .pc_func = nfsd3_proc_mknod, @@ -823,6 +834,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "MKNOD", }, [NFS3PROC_REMOVE] = { .pc_func = nfsd3_proc_remove, @@ -833,6 +845,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "REMOVE", }, [NFS3PROC_RMDIR] = { .pc_func = nfsd3_proc_rmdir, @@ -843,6 +856,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "RMDIR", }, [NFS3PROC_RENAME] = { .pc_func = nfsd3_proc_rename, @@ -853,6 +867,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_renameres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+WC, + .pc_name = "RENAME", }, [NFS3PROC_LINK] = { .pc_func = nfsd3_proc_link, @@ -863,6 +878,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_linkres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+pAT+WC, + .pc_name = "LINK", }, [NFS3PROC_READDIR] = { .pc_func = nfsd3_proc_readdir, @@ -872,6 +888,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_argsize = sizeof(struct nfsd3_readdirargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIR", }, [NFS3PROC_READDIRPLUS] = { .pc_func = nfsd3_proc_readdirplus, @@ -881,6 +898,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_argsize = sizeof(struct nfsd3_readdirplusargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIRPLUS", }, [NFS3PROC_FSSTAT] = { .pc_func = nfsd3_proc_fsstat, @@ -890,6 +908,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_fsstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+2*6+1, + .pc_name = "FSSTAT", }, [NFS3PROC_FSINFO] = { .pc_func = nfsd3_proc_fsinfo, @@ -899,6 +918,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_fsinfores), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+12, + .pc_name = "FSINFO", }, [NFS3PROC_PATHCONF] = { .pc_func = nfsd3_proc_pathconf, @@ -908,6 +928,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_pathconfres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+6, + .pc_name = "PATHCONF", }, [NFS3PROC_COMMIT] = { .pc_func = nfsd3_proc_commit, @@ -918,6 +939,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_commitres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+WC+2, + .pc_name = "COMMIT", }, };
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1ef98398362a5..a5e1f5c1a4d64 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3299,6 +3299,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, + .pc_name = "NULL", }, [NFSPROC4_COMPOUND] = { .pc_func = nfsd4_proc_compound, @@ -3309,6 +3310,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, .pc_xdrressize = NFSD_BUFSIZE/4, + .pc_name = "COMPOUND", }, };
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index dbd8d36046539..f22f70f63b53e 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -623,6 +623,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "NULL", }, [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, @@ -633,6 +634,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [NFSPROC_SETATTR] = { .pc_func = nfsd_proc_setattr, @@ -643,6 +645,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, + .pc_name = "SETATTR", }, [NFSPROC_ROOT] = { .pc_func = nfsd_proc_root, @@ -652,6 +655,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "ROOT", }, [NFSPROC_LOOKUP] = { .pc_func = nfsd_proc_lookup, @@ -662,6 +666,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+AT, + .pc_name = "LOOKUP", }, [NFSPROC_READLINK] = { .pc_func = nfsd_proc_readlink, @@ -671,6 +676,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, + .pc_name = "READLINK", }, [NFSPROC_READ] = { .pc_func = nfsd_proc_read, @@ -681,6 +687,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4, + .pc_name = "READ", }, [NFSPROC_WRITECACHE] = { .pc_func = nfsd_proc_writecache, @@ -690,6 +697,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "WRITECACHE", }, [NFSPROC_WRITE] = { .pc_func = nfsd_proc_write, @@ -700,6 +708,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, + .pc_name = "WRITE", }, [NFSPROC_CREATE] = { .pc_func = nfsd_proc_create, @@ -710,6 +719,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, + .pc_name = "CREATE", }, [NFSPROC_REMOVE] = { .pc_func = nfsd_proc_remove, @@ -719,6 +729,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "REMOVE", }, [NFSPROC_RENAME] = { .pc_func = nfsd_proc_rename, @@ -728,6 +739,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "RENAME", }, [NFSPROC_LINK] = { .pc_func = nfsd_proc_link, @@ -737,6 +749,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "LINK", }, [NFSPROC_SYMLINK] = { .pc_func = nfsd_proc_symlink, @@ -746,6 +759,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "SYMLINK", }, [NFSPROC_MKDIR] = { .pc_func = nfsd_proc_mkdir, @@ -756,6 +770,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, + .pc_name = "MKDIR", }, [NFSPROC_RMDIR] = { .pc_func = nfsd_proc_rmdir, @@ -765,6 +780,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "RMDIR", }, [NFSPROC_READDIR] = { .pc_func = nfsd_proc_readdir, @@ -773,6 +789,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_argsize = sizeof(struct nfsd_readdirargs), .pc_ressize = sizeof(struct nfsd_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIR", }, [NFSPROC_STATFS] = { .pc_func = nfsd_proc_statfs, @@ -782,6 +799,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_statfsres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+5, + .pc_name = "STATFS", }, };
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 34c2a69820e93..31ee3b6047c30 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -463,6 +463,7 @@ struct svc_procedure { unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ + const char * pc_name; /* for display */ };
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 89ff87494c6e4b32ea7960d0c644efdbb2fe6ef5 ]
Make the sunrpc trace subsystem trace events easier to use.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/sunrpc.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 8220369ee6105..200978b94a0b9 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1578,6 +1578,7 @@ TRACE_EVENT(svc_process, __field(u32, vers) __field(u32, proc) __string(service, name) + __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)") ), @@ -1587,13 +1588,16 @@ TRACE_EVENT(svc_process, __entry->vers = rqst->rq_vers; __entry->proc = rqst->rq_proc; __assign_str(service, name); + __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)"); ),
- TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%u", + TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%s", __get_str(addr), __entry->xid, - __get_str(service), __entry->vers, __entry->proc) + __get_str(service), __entry->vers, + __get_str(procedure) + ) );
DECLARE_EVENT_CLASS(svc_rqst_event, @@ -1849,6 +1853,7 @@ TRACE_EVENT(svc_stats_latency, TP_STRUCT__entry( __field(u32, xid) __field(unsigned long, execute) + __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt->xpt_remotebuf) ),
@@ -1856,11 +1861,13 @@ TRACE_EVENT(svc_stats_latency, __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->execute = ktime_to_us(ktime_sub(ktime_get(), rqst->rq_stime)); + __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt->xpt_remotebuf); ),
- TP_printk("addr=%s xid=0x%08x execute-us=%lu", - __get_str(addr), __entry->xid, __entry->execute) + TP_printk("addr=%s xid=0x%08x proc=%s execute-us=%lu", + __get_str(addr), __entry->xid, __get_str(procedure), + __entry->execute) );
DECLARE_EVENT_CLASS(svc_deferred_event,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 81d217474326b25d7f14274b02fe3da1e85ad934 ]
Clean up: The unit of XDR alignment is defined by RFC 4506, not as part of the RPC message header. Thus it belongs in include/linux/sunrpc/xdr.h.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/msg_prot.h | 3 --- include/linux/sunrpc/xdr.h | 13 ++++++++++--- 2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h index 43f854487539b..938c2bf29db88 100644 --- a/include/linux/sunrpc/msg_prot.h +++ b/include/linux/sunrpc/msg_prot.h @@ -10,9 +10,6 @@
#define RPC_VERSION 2
-/* size of an XDR encoding unit in bytes, i.e. 32bit */ -#define XDR_UNIT (4) - /* spec defines authentication flavor as an unsigned 32 bit integer */ typedef u32 rpc_authflavor_t;
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f6569b620beab..eba6204330b3c 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -19,6 +19,13 @@ struct bio_vec; struct rpc_rqst;
+/* + * Size of an XDR encoding unit in bytes, i.e. 32 bits, + * as defined in Section 3 of RFC 4506. All encoded + * XDR data items are aligned on a boundary of 32 bits. + */ +#define XDR_UNIT sizeof(__be32) + /* * Buffer adjustment */ @@ -329,7 +336,7 @@ ssize_t xdr_stream_decode_string_dup(struct xdr_stream *xdr, char **str, static inline size_t xdr_align_size(size_t n) { - const size_t mask = sizeof(__u32) - 1; + const size_t mask = XDR_UNIT - 1;
return (n + mask) & ~mask; } @@ -359,7 +366,7 @@ static inline size_t xdr_pad_size(size_t n) */ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) { - const size_t len = sizeof(__be32); + const size_t len = XDR_UNIT; __be32 *p = xdr_reserve_space(xdr, len);
if (unlikely(!p)) @@ -378,7 +385,7 @@ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) */ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) { - const size_t len = sizeof(__be32); + const size_t len = XDR_UNIT; __be32 *p = xdr_reserve_space(xdr, len);
if (unlikely(!p))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9575363a9e4c8d7e2f9ba5e79884d623fff0be6f ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 3 +-- fs/nfsd/nfs3xdr.c | 31 +++++++++++++++++++++++++------ fs/nfsd/xdr3.h | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 0f79e007c620f..a1d743bbb837d 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -688,7 +688,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) * NFSv3 Server procedures. * Only the results of non-idempotent operations are cached. */ -#define nfs3svc_decode_fhandleargs nfs3svc_decode_fhandle #define nfs3svc_encode_attrstatres nfs3svc_encode_attrstat #define nfs3svc_encode_wccstatres nfs3svc_encode_wccstat #define nfsd3_mkdirargs nfsd3_createargs @@ -720,7 +719,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_attrstatres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 34b880211e5ea..3a2b4abea1a42 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -29,8 +29,9 @@ static u32 nfs3_ftypes[] = {
/* - * XDR functions for basic NFS types + * Basic NFSv3 data types (RFC 1813 Sections 2.5 and 2.6) */ + static __be32 * encode_time3(__be32 *p, struct timespec64 *time) { @@ -46,6 +47,26 @@ decode_time3(__be32 *p, struct timespec64 *time) return p; }
+static bool +svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) +{ + __be32 *p; + u32 size; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS3_FHSIZE) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + fh_init(fhp, NFS3_FHSIZE); + fhp->fh_handle.fh_size = size; + memcpy(&fhp->fh_handle.fh_base, p, size); + + return true; +} + static __be32 * decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -312,14 +333,12 @@ void fill_post_wcc(struct svc_fh *fhp) */
int -nfs3svc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->fh); }
int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 456fcd7a10383..62ea669768cf3 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -273,7 +273,7 @@ union nfsd3_xdrstore {
#define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore)
-int nfs3svc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfs3svc_decode_fhandleargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_accessargs(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3b921a2b14251e9e203f1e8af76e8ade79f50e50 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 9 +++++---- fs/nfsd/xdr3.h | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3a2b4abea1a42..e07cebd80ef7f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -375,14 +375,15 @@ nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->access) < 0) return 0; - args->access = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 62ea669768cf3..a4dce4baec7c3 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -25,7 +25,7 @@ struct nfsd3_diropargs {
struct nfsd3_accessargs { struct svc_fh fh; - unsigned int access; + __u32 access; };
struct nfsd3_readargs {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit be63bd2ac6bbf8c065a0ef6dfbea76934326c352 ]
The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 23 ++++++++++++++++++----- fs/nfsd/nfs3xdr.c | 28 +++++++--------------------- fs/nfsd/xdr3.h | 1 - 3 files changed, 25 insertions(+), 27 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a1d743bbb837d..2e477cd870913 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -144,25 +144,38 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - u32 max_blocksize = svc_max_payload(rqstp); - unsigned long cnt = min(argp->count, max_blocksize); + u32 max_blocksize = svc_max_payload(rqstp); + unsigned int len; + int v; + + argp->count = min_t(u32, argp->count, max_blocksize);
dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset);
+ v = 0; + len = argp->count; + while (len > 0) { + struct page *page = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(page); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + /* Obtain buffer pointer for payload. * 1 (status) + 22 (post_op_attr) + 1 (count) + 1 (eof) * + 1 (xdr opaque byte count) = 26 */ - resp->count = cnt; + resp->count = argp->count; svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3)<<2) + resp->count +4);
fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, - rqstp->rq_vec, argp->vlen, &resp->count, - &resp->eof); + rqstp->rq_vec, v, &resp->count, &resp->eof); return rpc_success; }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e07cebd80ef7f..2f32df15a7e87 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -389,31 +389,17 @@ nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readargs *args = rqstp->rq_argp; - unsigned int len; - int v; - u32 max_blocksize = svc_max_payload(rqstp);
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->offset); - - args->count = ntohl(*p++); - len = min(args->count, max_blocksize); - - /* set up the kvec */ - v=0; - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++);
- rqstp->rq_vec[v].iov_base = page_address(p); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - args->vlen = v; - return xdr_argsize_check(rqstp, p); + return 1; }
int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index a4dce4baec7c3..7dfeeaa4e1dfc 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -32,7 +32,6 @@ struct nfsd3_readargs { struct svc_fh fh; __u64 offset; __u32 count; - int vlen; };
struct nfsd3_writeargs {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c43b2f229a01969a7ccf94b033c5085e0ec2040c ]
As part of the update, open code that sanity-checks the size of the data payload against the length of the RPC Call message has to be re-implemented to use xdr_stream infrastructure.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 51 +++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 31 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2f32df15a7e87..c06467e8ac829 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -405,52 +405,41 @@ nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; - unsigned int len, hdr, dlen; u32 max_blocksize = svc_max_payload(rqstp); struct kvec *head = rqstp->rq_arg.head; struct kvec *tail = rqstp->rq_arg.tail; + size_t remaining;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) return 0; - p = xdr_decode_hyper(p, &args->offset); - - args->count = ntohl(*p++); - args->stable = ntohl(*p++); - len = args->len = ntohl(*p++); - if ((void *)p > head->iov_base + head->iov_len) + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) return 0; - /* - * The count must equal the amount of data passed. - */ - if (args->count != args->len) + if (xdr_stream_decode_u32(xdr, &args->count) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->stable) < 0) return 0;
- /* - * Check to make sure that we got the right number of - * bytes. - */ - hdr = (void*)p - head->iov_base; - dlen = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len - hdr; - /* - * Round the length of the data which was specified up to - * the next multiple of XDR units and then compare that - * against the length which was actually received. - * Note that when RPCSEC/GSS (for example) is used, the - * data buffer can be padded so dlen might be larger - * than required. It must never be smaller. - */ - if (dlen < XDR_QUADLEN(len)*4) + /* opaque data */ + if (xdr_stream_decode_u32(xdr, &args->len) < 0) return 0;
+ /* request sanity */ + if (args->count != args->len) + return 0; + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->len)) + return 0; if (args->count > max_blocksize) { args->count = max_blocksize; - len = args->len = max_blocksize; + args->len = max_blocksize; }
- args->first.iov_base = (void *)p; - args->first.iov_len = head->iov_len - hdr; + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + return 1; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 224c1c894e48cd72e4dd9fb6311be80cbe1369b0 ]
The NFSv3 READLINK request takes a single filehandle, so it can re-use GETATTR's decoder.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 9 +++++---- fs/nfsd/nfs3xdr.c | 13 ------------- fs/nfsd/xdr3.h | 6 ------ 3 files changed, 5 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 2e477cd870913..71db0ed3c49ed 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -124,15 +124,16 @@ nfsd3_proc_access(struct svc_rqst *rqstp) static __be32 nfsd3_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd3_readlinkargs *argp = rqstp->rq_argp; + struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd3_readlinkres *resp = rqstp->rq_resp; + char *buffer = page_address(*(rqstp->rq_next_page++));
dprintk("nfsd: READLINK(3) %s\n", SVCFH_fmt(&argp->fh));
/* Read the symlink. */ fh_copy(&resp->fh, &argp->fh); resp->len = NFS3_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &resp->fh, argp->buffer, &resp->len); + resp->status = nfsd_readlink(rqstp, &resp->fh, buffer, &resp->len); return rpc_success; }
@@ -773,10 +774,10 @@ static const struct svc_procedure nfsd_procedures3[22] = { }, [NFS3PROC_READLINK] = { .pc_func = nfsd3_proc_readlink, - .pc_decode = nfs3svc_decode_readlinkargs, + .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_readlinkres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd3_readlinkargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index c06467e8ac829..6b6a839c1fc8c 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -543,19 +543,6 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nfs3svc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd3_readlinkargs *args = rqstp->rq_argp; - - p = decode_fh(p, &args->fh); - if (!p) - return 0; - args->buffer = page_address(*(rqstp->rq_next_page++)); - - return xdr_argsize_check(rqstp, p); -} - int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7dfeeaa4e1dfc..08f909142ddf7 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -70,11 +70,6 @@ struct nfsd3_renameargs { unsigned int tlen; };
-struct nfsd3_readlinkargs { - struct svc_fh fh; - char * buffer; -}; - struct nfsd3_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -282,7 +277,6 @@ int nfs3svc_decode_createargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_mkdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_mknodargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_linkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0a8f37fb34a96267c656f7254e69bb9a2fc89fe4 ]
Code inspection shows that the server's NFSv3 READDIR implementation handles offset cookies slightly differently than the NFSv2 READDIR, NFSv3 READDIRPLUS, and NFSv4 READDIR implementations, and there doesn't seem to be any need for this difference.
As a clean up, I copied the logic from nfsd3_proc_readdirplus().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 71db0ed3c49ed..8cffd9852ef04 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -449,6 +449,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; int count = 0; + loff_t offset; struct page **p; caddr_t page_addr = NULL;
@@ -467,7 +468,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) resp->common.err = nfs_ok; resp->buffer = argp->buffer; resp->rqstp = rqstp; - resp->status = nfsd_readdir(rqstp, &resp->fh, (loff_t *)&argp->cookie, + offset = argp->cookie; + + resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry); memcpy(resp->verf, argp->verf, 8); count = 0; @@ -483,8 +486,6 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) } resp->count = count >> 2; if (resp->offset) { - loff_t offset = argp->cookie; - if (unlikely(resp->offset1)) { /* we ended up with offset on a page boundary */ *resp->offset = htonl(offset >> 32);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 40116ebd0934cca7e46423bdb3397d3d27eb9fb9 ]
De-duplicate some code that is used by both READDIR and READDIRPLUS to build the dirlist in the Reply. Because this code is not related to decoding READ arguments, it is moved to a more appropriate spot.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 29 +++++++++++++++++++---------- fs/nfsd/nfs3xdr.c | 20 -------------------- fs/nfsd/xdr3.h | 1 - 3 files changed, 19 insertions(+), 31 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 8cffd9852ef04..25f31a03c4f1b 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -440,6 +440,23 @@ nfsd3_proc_link(struct svc_rqst *rqstp) return rpc_success; }
+static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, + struct nfsd3_readdirres *resp, + int count) +{ + count = min_t(u32, count, svc_max_payload(rqstp)); + + /* Convert byte count to number of words (i.e. >> 2), + * and reserve room for the NULL ptr & eof flag (-2 words) */ + resp->buflen = (count >> 2) - 2; + + resp->buffer = page_address(*rqstp->rq_next_page); + while (count > 0) { + rqstp->rq_next_page++; + count -= PAGE_SIZE; + } +} + /* * Read a portion of a directory. */ @@ -457,16 +474,12 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie);
- /* Make sure we've room for the NULL ptr & eof flag, and shrink to - * client read size */ - count = (argp->count >> 2) - 2; + nfsd3_init_dirlist_pages(rqstp, resp, argp->count);
/* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh);
- resp->buflen = count; resp->common.err = nfs_ok; - resp->buffer = argp->buffer; resp->rqstp = rqstp; offset = argp->cookie;
@@ -518,16 +531,12 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie);
- /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->count = (argp->count >> 2) - 2; + nfsd3_init_dirlist_pages(rqstp, resp, argp->count);
/* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh);
resp->common.err = nfs_ok; - resp->buffer = argp->buffer; - resp->buflen = resp->count; resp->rqstp = rqstp; offset = argp->cookie;
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 6b6a839c1fc8c..8394aeb8381e6 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -560,8 +560,6 @@ int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; - int len; - u32 max_blocksize = svc_max_payload(rqstp);
p = decode_fh(p, &args->fh); if (!p) @@ -570,14 +568,6 @@ nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) args->verf = p; p += 2; args->dircount = ~0; args->count = ntohl(*p++); - len = args->count = min_t(u32, args->count, max_blocksize); - - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - if (!args->buffer) - args->buffer = page_address(p); - len -= PAGE_SIZE; - }
return xdr_argsize_check(rqstp, p); } @@ -586,8 +576,6 @@ int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; - int len; - u32 max_blocksize = svc_max_payload(rqstp);
p = decode_fh(p, &args->fh); if (!p) @@ -597,14 +585,6 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) args->dircount = ntohl(*p++); args->count = ntohl(*p++);
- len = args->count = min(args->count, max_blocksize); - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - if (!args->buffer) - args->buffer = page_address(p); - len -= PAGE_SIZE; - } - return xdr_argsize_check(rqstp, p); }
diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 08f909142ddf7..789a364d5e69d 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -93,7 +93,6 @@ struct nfsd3_readdirargs { __u32 dircount; __u32 count; __be32 * verf; - __be32 * buffer; };
struct nfsd3_commitargs {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9cedc2e64c296efb3bebe93a0ceeb5e71e8d722d ]
As an additional clean up, neither nfsd3_proc_readdir() nor nfsd3_proc_readdirplus() make use of the dircount argument, so remove it from struct nfsd3_readdirargs.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 38 ++++++++++++++++++++++++-------------- fs/nfsd/xdr3.h | 1 - 2 files changed, 24 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 8394aeb8381e6..eb55be106a04e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -559,33 +559,43 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) + return 0; + args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); + if (!args->verf) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->cookie); - args->verf = p; p += 2; - args->dircount = ~0; - args->count = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; + u32 dircount;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) + return 0; + args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); + if (!args->verf) + return 0; + /* dircount is ignored */ + if (xdr_stream_decode_u32(xdr, &dircount) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->cookie); - args->verf = p; p += 2; - args->dircount = ntohl(*p++); - args->count = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 789a364d5e69d..64af5b01c5d7b 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -90,7 +90,6 @@ struct nfsd3_symlinkargs { struct nfsd3_readdirargs { struct svc_fh fh; __u64 cookie; - __u32 dircount; __u32 count; __be32 * verf; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c8d26a0acfe77f0880e0acfe77e4209cf8f3a38b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eb55be106a04e..bafb84c978616 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -601,14 +601,17 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_commitargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) + + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->offset); - args->count = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 54d1d43dc709f58be38d278bfc38e9bfb38d35fc ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 40 +++++++++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index bafb84c978616..299ea8bbd685f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -117,6 +117,39 @@ decode_filename(__be32 *p, char **namp, unsigned int *lenp) return p; }
+static bool +svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) +{ + u32 size, i; + __be32 *p; + char *c; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS3_MAXNAMLEN) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + + *len = size; + *name = (char *)p; + for (i = 0, c = *name; i < size; i++, c++) { + if (*c == '\0' || *c == '/') + return false; + } + + return true; +} + +static bool +svcxdr_decode_diropargs3(struct xdr_stream *xdr, struct svc_fh *fhp, + char **name, unsigned int *len) +{ + return svcxdr_decode_nfs_fh3(xdr, fhp) && + svcxdr_decode_filename3(xdr, name, len); +} + static __be32 * decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -363,13 +396,10 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_diropargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d181e0a4bef36ee74d1338e5b5c2561d7463a5d0 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 299ea8bbd685f..f870a068aad85 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -562,15 +562,13 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_renameargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->ffh, + &args->fname, &args->flen) && + svcxdr_decode_diropargs3(xdr, &args->tfh, + &args->tname, &args->tlen); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit efaa1e7c2c7475f0a9bbeb904d9aba09b73dd52a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index f870a068aad85..9437dda2646f2 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -574,14 +574,12 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_linkargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->ffh) && + svcxdr_decode_diropargs3(xdr, &args->tfh, + &args->tname, &args->tlen); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9cde9360d18d8b352b737d10f90f2aecccf93dbe ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 138 +++++++++++++++++++++++++++++++++----- include/uapi/linux/nfs3.h | 6 ++ 2 files changed, 127 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9437dda2646f2..6a6bf8e34d82b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -39,12 +39,18 @@ encode_time3(__be32 *p, struct timespec64 *time) return p; }
-static __be32 * -decode_time3(__be32 *p, struct timespec64 *time) +static bool +svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) { - time->tv_sec = ntohl(*p++); - time->tv_nsec = ntohl(*p++); - return p; + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + timep->tv_sec = be32_to_cpup(p++); + timep->tv_nsec = be32_to_cpup(p); + + return true; }
static bool @@ -150,6 +156,112 @@ svcxdr_decode_diropargs3(struct xdr_stream *xdr, struct svc_fh *fhp, svcxdr_decode_filename3(xdr, name, len); }
+static bool +svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct iattr *iap) +{ + u32 set_it; + + iap->ia_valid = 0; + + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 mode; + + if (xdr_stream_decode_u32(xdr, &mode) < 0) + return false; + iap->ia_valid |= ATTR_MODE; + iap->ia_mode = mode; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 uid; + + if (xdr_stream_decode_u32(xdr, &uid) < 0) + return false; + iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), uid); + if (uid_valid(iap->ia_uid)) + iap->ia_valid |= ATTR_UID; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 gid; + + if (xdr_stream_decode_u32(xdr, &gid) < 0) + return false; + iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), gid); + if (gid_valid(iap->ia_gid)) + iap->ia_valid |= ATTR_GID; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u64 newsize; + + if (xdr_stream_decode_u64(xdr, &newsize) < 0) + return false; + iap->ia_valid |= ATTR_SIZE; + iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); + } + if (xdr_stream_decode_u32(xdr, &set_it) < 0) + return false; + switch (set_it) { + case DONT_CHANGE: + break; + case SET_TO_SERVER_TIME: + iap->ia_valid |= ATTR_ATIME; + break; + case SET_TO_CLIENT_TIME: + if (!svcxdr_decode_nfstime3(xdr, &iap->ia_atime)) + return false; + iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; + break; + default: + return false; + } + if (xdr_stream_decode_u32(xdr, &set_it) < 0) + return false; + switch (set_it) { + case DONT_CHANGE: + break; + case SET_TO_SERVER_TIME: + iap->ia_valid |= ATTR_MTIME; + break; + case SET_TO_CLIENT_TIME: + if (!svcxdr_decode_nfstime3(xdr, &iap->ia_mtime)) + return false; + iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; + break; + default: + return false; + } + + return true; +} + +static bool +svcxdr_decode_sattrguard3(struct xdr_stream *xdr, struct nfsd3_sattrargs *args) +{ + __be32 *p; + u32 check; + + if (xdr_stream_decode_bool(xdr, &check) < 0) + return false; + if (check) { + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + args->check_guard = 1; + args->guardtime = be32_to_cpup(p); + } else + args->check_guard = 0; + + return true; +} + static __be32 * decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -377,20 +489,12 @@ nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_sattrargs *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) - return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - if ((args->check_guard = ntohl(*p++)) != 0) { - struct timespec64 time; - p = decode_time3(p, &time); - args->guardtime = time.tv_sec; - } - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->fh) && + svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && + svcxdr_decode_sattrguard3(xdr, args); }
int diff --git a/include/uapi/linux/nfs3.h b/include/uapi/linux/nfs3.h index 37e4b34e6b435..c22ab77713bd0 100644 --- a/include/uapi/linux/nfs3.h +++ b/include/uapi/linux/nfs3.h @@ -63,6 +63,12 @@ enum nfs3_ftype { NF3BAD = 8 };
+enum nfs3_time_how { + DONT_CHANGE = 0, + SET_TO_SERVER_TIME = 1, + SET_TO_CLIENT_TIME = 2, +}; + struct nfs3_fh { unsigned short size; unsigned char data[NFS3_FHSIZE];
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6b3a11960d898b25a30103cc6a2ff0b24b90a83b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 6a6bf8e34d82b..24db3725a070b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -580,26 +580,26 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) + if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) return 0; - - switch (args->createmode = ntohl(*p++)) { + if (xdr_stream_decode_u32(xdr, &args->createmode) < 0) + return 0; + switch (args->createmode) { case NFS3_CREATE_UNCHECKED: case NFS3_CREATE_GUARDED: - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - break; + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); case NFS3_CREATE_EXCLUSIVE: - args->verf = p; - p += 2; + args->verf = xdr_inline_decode(xdr, NFS3_CREATEVERFSIZE); + if (!args->verf) + return 0; break; default: return 0; } - - return xdr_argsize_check(rqstp, p); + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 83374c278db193f3e8b2608b45da1132b867a760 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 24db3725a070b..b4071cda1d652 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -605,14 +605,12 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->fh)) || - !(p = decode_filename(p, &args->name, &args->len))) - return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->fh, + &args->name, &args->len) && + svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit da39201637297460c13134c29286a00f3a1c92fe ]
Similar to the WRITE decoder, code that checks the sanity of the payload size is re-wired to work with xdr_stream infrastructure.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index b4071cda1d652..eb17231ab1661 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -616,25 +616,28 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_symlinkargs *args = rqstp->rq_argp; - char *base = (char *)p; - size_t dlen; + struct kvec *head = rqstp->rq_arg.head; + struct kvec *tail = rqstp->rq_arg.tail; + size_t remaining;
- if (!(p = decode_fh(p, &args->ffh)) || - !(p = decode_filename(p, &args->fname, &args->flen))) + if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) + return 0; + if (!svcxdr_decode_sattr3(rqstp, xdr, &args->attrs)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp));
- args->tlen = ntohl(*p++); + /* request sanity */ + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->tlen)) + return 0;
- args->first.iov_base = p; - args->first.iov_len = rqstp->rq_arg.head[0].iov_len; - args->first.iov_len -= (char *)p - base; + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr);
- dlen = args->first.iov_len + rqstp->rq_arg.page_len + - rqstp->rq_arg.tail[0].iov_len; - if (dlen < XDR_QUADLEN(args->tlen) << 2) - return 0; return 1; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f8a38e2d6c885f9d7cd03febc515d36293de4a5b ]
This commit removes the last usage of the original decode_sattr3(), so it is removed as a clean-up.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 107 +++++++++++++++------------------------------- 1 file changed, 35 insertions(+), 72 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eb17231ab1661..a30b418a51160 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -103,26 +103,6 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); }
-/* - * Decode a file name and make sure that the path contains - * no slashes or null bytes. - */ -static __be32 * -decode_filename(__be32 *p, char **namp, unsigned int *lenp) -{ - char *name; - unsigned int i; - - if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS3_MAXNAMLEN)) != NULL) { - for (i = 0, name = *namp; i < *lenp; i++, name++) { - if (*name == '\0' || *name == '/') - return NULL; - } - } - - return p; -} - static bool svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -262,49 +242,26 @@ svcxdr_decode_sattrguard3(struct xdr_stream *xdr, struct nfsd3_sattrargs *args) return true; }
-static __be32 * -decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) +static bool +svcxdr_decode_specdata3(struct xdr_stream *xdr, struct nfsd3_mknodargs *args) { - u32 tmp; + __be32 *p;
- iap->ia_valid = 0; + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + args->major = be32_to_cpup(p++); + args->minor = be32_to_cpup(p);
- if (*p++) { - iap->ia_valid |= ATTR_MODE; - iap->ia_mode = ntohl(*p++); - } - if (*p++) { - iap->ia_uid = make_kuid(userns, ntohl(*p++)); - if (uid_valid(iap->ia_uid)) - iap->ia_valid |= ATTR_UID; - } - if (*p++) { - iap->ia_gid = make_kgid(userns, ntohl(*p++)); - if (gid_valid(iap->ia_gid)) - iap->ia_valid |= ATTR_GID; - } - if (*p++) { - u64 newsize; + return true; +}
- iap->ia_valid |= ATTR_SIZE; - p = xdr_decode_hyper(p, &newsize); - iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); - } - if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ - iap->ia_valid |= ATTR_ATIME; - } else if (tmp == 2) { /* set to client time */ - iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - iap->ia_atime.tv_sec = ntohl(*p++); - iap->ia_atime.tv_nsec = ntohl(*p++); - } - if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ - iap->ia_valid |= ATTR_MTIME; - } else if (tmp == 2) { /* set to client time */ - iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - iap->ia_mtime.tv_sec = ntohl(*p++); - iap->ia_mtime.tv_nsec = ntohl(*p++); - } - return p; +static bool +svcxdr_decode_devicedata3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct nfsd3_mknodargs *args) +{ + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && + svcxdr_decode_specdata3(xdr, args); }
static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) @@ -644,24 +601,30 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_mknodargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) + if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->ftype) < 0) + return 0; + switch (args->ftype) { + case NF3CHR: + case NF3BLK: + return svcxdr_decode_devicedata3(rqstp, xdr, args); + case NF3SOCK: + case NF3FIFO: + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); + case NF3REG: + case NF3DIR: + case NF3LNK: + /* Valid XDR but illegal file types */ + break; + default: return 0; - - args->ftype = ntohl(*p++); - - if (args->ftype == NF3BLK || args->ftype == NF3CHR - || args->ftype == NF3SOCK || args->ftype == NF3FIFO) - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - if (args->ftype == NF3BLK || args->ftype == NF3CHR) { - args->major = ntohl(*p++); - args->minor = ntohl(*p++); }
- return xdr_argsize_check(rqstp, p); + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ebcd8e8b28535b643a4c06685bd363b3b73a96af ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 4 ++-- fs/nfsd/nfsxdr.c | 26 ++++++++++++++++++++------ fs/nfsd/xdr.h | 2 +- 3 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index f22f70f63b53e..3cac9972aa83f 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -627,7 +627,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, - .pc_decode = nfssvc_decode_fhandle, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_attrstat, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), @@ -793,7 +793,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_STATFS] = { .pc_func = nfsd_proc_statfs, - .pc_decode = nfssvc_decode_fhandle, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_statfsres, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_statfsres), diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 7aa6e8aca2c1a..f3189e1be20fa 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -23,8 +23,9 @@ static u32 nfs_ftypes[] = {
/* - * XDR functions for basic NFS types + * Basic NFSv2 data types (RFC 1094 Section 2.3) */ + static __be32 * decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -37,6 +38,21 @@ decode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE >> 2); }
+static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) +{ + __be32 *p; + + p = xdr_inline_decode(xdr, NFS_FHSIZE); + if (!p) + return false; + fh_init(fhp, NFS_FHSIZE); + memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); + fhp->fh_handle.fh_size = NFS_FHSIZE; + + return true; +} + /* Helper function for NFSv2 ACL code */ __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -194,14 +210,12 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f */
int -nfssvc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->fh); }
int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index edd87688ff863..50466ac6200cc 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -144,7 +144,7 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore)
-int nfssvc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfssvc_decode_fhandleargs(struct svc_rqst *, __be32 *); int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readargs(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8c293ef993c8df0b1bea9ecb0de6eb96dec3ac9d ]
The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 32 ++++++++++++++++++-------------- fs/nfsd/nfsxdr.c | 36 ++++++++++++------------------------ fs/nfsd/xdr.h | 1 - 3 files changed, 30 insertions(+), 39 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3cac9972aa83f..c70ae20e54c49 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -171,32 +171,36 @@ nfsd_proc_read(struct svc_rqst *rqstp) { struct nfsd_readargs *argp = rqstp->rq_argp; struct nfsd_readres *resp = rqstp->rq_resp; + unsigned int len; u32 eof; + int v;
dprintk("nfsd: READ %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->offset);
+ argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2); + + v = 0; + len = argp->count; + while (len > 0) { + struct page *page = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(page); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + /* Obtain buffer pointer for payload. 19 is 1 word for * status, 17 words for fattr, and 1 word for the byte count. */ - - if (NFSSVC_MAXBLKSIZE_V2 < argp->count) { - char buf[RPC_MAX_ADDRBUFLEN]; - printk(KERN_NOTICE - "oversized read request from %s (%d bytes)\n", - svc_print_addr(rqstp, buf, sizeof(buf)), - argp->count); - argp->count = NFSSVC_MAXBLKSIZE_V2; - } svc_reserve_auth(rqstp, (19<<2) + argp->count + 4);
resp->count = argp->count; - resp->status = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh), - argp->offset, - rqstp->rq_vec, argp->vlen, - &resp->count, - &eof); + fh_copy(&resp->fh, &argp->fh); + resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, + rqstp->rq_vec, v, &resp->count, &eof); if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index f3189e1be20fa..1eacaa2c13a95 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -246,33 +246,21 @@ nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readargs *args = rqstp->rq_argp; - unsigned int len; - int v; - p = decode_fh(p, &args->fh); - if (!p) - return 0; + u32 totalcount;
- args->offset = ntohl(*p++); - len = args->count = ntohl(*p++); - p++; /* totalcount - unused */ - - len = min_t(unsigned int, len, NFSSVC_MAXBLKSIZE_V2); + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) + return 0; + /* totalcount is ignored */ + if (xdr_stream_decode_u32(xdr, &totalcount) < 0) + return 0;
- /* set up somewhere to store response. - * We take pages, put them on reslist and include in iovec - */ - v=0; - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - - rqstp->rq_vec[v].iov_base = page_address(p); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - args->vlen = v; - return xdr_argsize_check(rqstp, p); + return 1; }
int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 50466ac6200cc..7c704fa3215eb 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -27,7 +27,6 @@ struct nfsd_readargs { struct svc_fh fh; __u32 offset; __u32 count; - int vlen; };
struct nfsd_writeargs {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a51b5b737a0be93fae6ea2a18df03ab2359a3f4b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 52 +++++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 31 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1eacaa2c13a95..11d27b219cff2 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -266,46 +266,36 @@ nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; - unsigned int len, hdr, dlen; struct kvec *head = rqstp->rq_arg.head; + struct kvec *tail = rqstp->rq_arg.tail; + u32 beginoffset, totalcount; + size_t remaining;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) return 0; - - p++; /* beginoffset */ - args->offset = ntohl(*p++); /* offset */ - p++; /* totalcount */ - len = args->len = ntohl(*p++); - /* - * The protocol specifies a maximum of 8192 bytes. - */ - if (len > NFSSVC_MAXBLKSIZE_V2) + /* beginoffset is ignored */ + if (xdr_stream_decode_u32(xdr, &beginoffset) < 0) return 0; - - /* - * Check to make sure that we got the right number of - * bytes. - */ - hdr = (void*)p - head->iov_base; - if (hdr > head->iov_len) + if (xdr_stream_decode_u32(xdr, &args->offset) < 0) + return 0; + /* totalcount is ignored */ + if (xdr_stream_decode_u32(xdr, &totalcount) < 0) return 0; - dlen = head->iov_len + rqstp->rq_arg.page_len - hdr;
- /* - * Round the length of the data which was specified up to - * the next multiple of XDR units and then compare that - * against the length which was actually received. - * Note that when RPCSEC/GSS (for example) is used, the - * data buffer can be padded so dlen might be larger - * than required. It must never be smaller. - */ - if (dlen < XDR_QUADLEN(len)*4) + /* opaque data */ + if (xdr_stream_decode_u32(xdr, &args->len) < 0) + return 0; + if (args->len > NFSSVC_MAXBLKSIZE_V2) + return 0; + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->len)) return 0; + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr);
- args->first.iov_base = (void *)p; - args->first.iov_len = head->iov_len - hdr; return 1; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1fcbd1c9456ba129d38420e345e91c4b6363db47 ]
If the code that sets up the sink buffer for nfsd_readlink() is moved adjacent to the nfsd_readlink() call site that uses it, then the only argument is a file handle, and the fhandle decoder can be used instead.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 9 +++++---- fs/nfsd/nfsxdr.c | 13 ------------- fs/nfsd/xdr.h | 6 ------ 3 files changed, 5 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index c70ae20e54c49..6352da0168e04 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -149,14 +149,15 @@ nfsd_proc_lookup(struct svc_rqst *rqstp) static __be32 nfsd_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd_readlinkargs *argp = rqstp->rq_argp; + struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd_readlinkres *resp = rqstp->rq_resp; + char *buffer = page_address(*(rqstp->rq_next_page++));
dprintk("nfsd: READLINK %s\n", SVCFH_fmt(&argp->fh));
/* Read the symlink. */ resp->len = NFS_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &argp->fh, argp->buffer, &resp->len); + resp->status = nfsd_readlink(rqstp, &argp->fh, buffer, &resp->len);
fh_put(&argp->fh); return rpc_success; @@ -674,9 +675,9 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_READLINK] = { .pc_func = nfsd_proc_readlink, - .pc_decode = nfssvc_decode_readlinkargs, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_readlinkres, - .pc_argsize = sizeof(struct nfsd_readlinkargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 11d27b219cff2..02dd9888d93b2 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -326,19 +326,6 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nfssvc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_readlinkargs *args = rqstp->rq_argp; - - p = decode_fh(p, &args->fh); - if (!p) - return 0; - args->buffer = page_address(*(rqstp->rq_next_page++)); - - return xdr_argsize_check(rqstp, p); -} - int nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 7c704fa3215eb..1338551de828e 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -52,11 +52,6 @@ struct nfsd_renameargs { unsigned int tlen; };
-struct nfsd_readlinkargs { - struct svc_fh fh; - char * buffer; -}; - struct nfsd_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -150,7 +145,6 @@ int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); int nfssvc_decode_writeargs(struct svc_rqst *, __be32 *); int nfssvc_decode_createargs(struct svc_rqst *, __be32 *); int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 788cd46ecf83ee2d561cb4e754e276dc8089b787 ]
Add a helper similar to nfsd3_init_dirlist_pages().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 29 ++++++++++++++++++----------- fs/nfsd/nfsxdr.c | 2 -- fs/nfsd/xdr.h | 1 - 3 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 6352da0168e04..a628ea4d66ead 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -553,6 +553,20 @@ nfsd_proc_rmdir(struct svc_rqst *rqstp) return rpc_success; }
+static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, + struct nfsd_readdirres *resp, + int count) +{ + count = min_t(u32, count, PAGE_SIZE); + + /* Convert byte count to number of words (i.e. >> 2), + * and reserve room for the NULL ptr & eof flag (-2 words) */ + resp->buflen = (count >> 2) - 2; + + resp->buffer = page_address(*rqstp->rq_next_page); + rqstp->rq_next_page++; +} + /* * Read a portion of a directory. */ @@ -561,31 +575,24 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) { struct nfsd_readdirargs *argp = rqstp->rq_argp; struct nfsd_readdirres *resp = rqstp->rq_resp; - int count; loff_t offset; + __be32 *buffer;
dprintk("nfsd: READDIR %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->cookie);
- /* Shrink to the client read size */ - count = (argp->count >> 2) - 2; - - /* Make sure we've room for the NULL ptr & eof flag */ - count -= 2; - if (count < 0) - count = 0; + nfsd_init_dirlist_pages(rqstp, resp, argp->count); + buffer = resp->buffer;
- resp->buffer = argp->buffer; resp->offset = NULL; - resp->buflen = count; resp->common.err = nfs_ok; /* Read directory and encode entries on the fly */ offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, &resp->common, nfssvc_encode_entry);
- resp->count = resp->buffer - argp->buffer; + resp->count = resp->buffer - buffer; if (resp->offset) *resp->offset = htonl(offset);
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 02dd9888d93b2..3d72334e16733 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -388,8 +388,6 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) return 0; args->cookie = ntohl(*p++); args->count = ntohl(*p++); - args->count = min_t(u32, args->count, PAGE_SIZE); - args->buffer = page_address(*(rqstp->rq_next_page++));
return xdr_argsize_check(rqstp, p); } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 1338551de828e..ff68643504c3c 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -73,7 +73,6 @@ struct nfsd_readdirargs { struct svc_fh fh; __u32 cookie; __u32 count; - __be32 * buffer; };
struct nfsd_stat {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8688361ae2edb8f7e61d926dc5000c9a44f29370 ]
As an additional clean up, move code not related to XDR decoding into readdir's .pc_func call out.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 3d72334e16733..7b33093f8d8b4 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -381,15 +381,17 @@ nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readdirargs *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->cookie) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - args->cookie = ntohl(*p++); - args->count = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6d742c1864c18f143ea2031f1ed66bcd8f4812de ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 7b33093f8d8b4..00a7db8548ebf 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -86,6 +86,38 @@ decode_filename(__be32 *p, char **namp, unsigned int *lenp) return p; }
+static bool +svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) +{ + u32 size, i; + __be32 *p; + char *c; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS_MAXNAMLEN) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + + *len = size; + *name = (char *)p; + for (i = 0, c = *name; i < size; i++, c++) + if (*c == '\0' || *c == '/') + return false; + + return true; +} + +static bool +svcxdr_decode_diropargs(struct xdr_stream *xdr, struct svc_fh *fhp, + char **name, unsigned int *len) +{ + return svcxdr_decode_fhandle(xdr, fhp) && + svcxdr_decode_filename(xdr, name, len); +} + static __be32 * decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -234,13 +266,10 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_diropargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 62aa557efb81ea3339fabe7f5b1a343e742bbbdf ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 00a7db8548ebf..d4f4729c7b1c0 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -344,15 +344,13 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_renameargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->ffh, + &args->fname, &args->flen) && + svcxdr_decode_diropargs(xdr, &args->tfh, + &args->tname, &args->tlen); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 77edcdf91f6245a9881b84e4e101738148bd039a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index d4f4729c7b1c0..3d0fe03a3fb94 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -356,14 +356,12 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_linkargs *args = rqstp->rq_argp;
- if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->ffh) && + svcxdr_decode_diropargs(xdr, &args->tfh, + &args->tname, &args->tlen); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2fdd6bd293b9e7dda61220538b2759fbf06f5af0 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 82 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 3d0fe03a3fb94..6c87ea8f38769 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -173,6 +173,79 @@ decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) return p; }
+static bool +svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct iattr *iap) +{ + u32 tmp1, tmp2; + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT * 8); + if (!p) + return false; + + iap->ia_valid = 0; + + /* + * Some Sun clients put 0xffff in the mode field when they + * mean 0xffffffff. + */ + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp1 != 0xffff) { + iap->ia_valid |= ATTR_MODE; + iap->ia_mode = tmp1; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), tmp1); + if (uid_valid(iap->ia_uid)) + iap->ia_valid |= ATTR_UID; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), tmp1); + if (gid_valid(iap->ia_gid)) + iap->ia_valid |= ATTR_GID; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_valid |= ATTR_SIZE; + iap->ia_size = tmp1; + } + + tmp1 = be32_to_cpup(p++); + tmp2 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; + iap->ia_atime.tv_sec = tmp1; + iap->ia_atime.tv_nsec = tmp2 * NSEC_PER_USEC; + } + + tmp1 = be32_to_cpup(p++); + tmp2 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; + iap->ia_mtime.tv_sec = tmp1; + iap->ia_mtime.tv_nsec = tmp2 * NSEC_PER_USEC; + /* + * Passing the invalid value useconds=1000000 for mtime + * is a Sun convention for "set both mtime and atime to + * current server time". It's needed to make permissions + * checks for the "touch" program across v2 mounts to + * Solaris and Irix boxes work correctly. See description of + * sattr in section 6.1 of "NFS Illustrated" by + * Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5 + */ + if (tmp2 == 1000000) + iap->ia_valid &= ~(ATTR_ATIME_SET|ATTR_MTIME_SET); + } + + return true; +} + static __be32 * encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) @@ -253,14 +326,11 @@ nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_sattrargs *args = rqstp->rq_argp;
- p = decode_fh(p, &args->fh); - if (!p) - return 0; - p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->fh) && + svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7dcf65b91ecaf60ce593e7859ae2b29b7c46ccbd ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 6c87ea8f38769..2e2806cbe7b88 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -401,14 +401,12 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_createargs *args = rqstp->rq_argp;
- if ( !(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->fh, + &args->name, &args->len) && + svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 09f75a5375ac61f4adb94da0accc1cfc60eb4f2b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 113 +++++------------------------------------------ 1 file changed, 10 insertions(+), 103 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 2e2806cbe7b88..f2cb4794aeaf6 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -66,26 +66,6 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE>> 2); }
-/* - * Decode a file name and make sure that the path contains - * no slashes or null bytes. - */ -static __be32 * -decode_filename(__be32 *p, char **namp, unsigned int *lenp) -{ - char *name; - unsigned int i; - - if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS_MAXNAMLEN)) != NULL) { - for (i = 0, name = *namp; i < *lenp; i++, name++) { - if (*name == '\0' || *name == '/') - return NULL; - } - } - - return p; -} - static bool svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -118,61 +98,6 @@ svcxdr_decode_diropargs(struct xdr_stream *xdr, struct svc_fh *fhp, svcxdr_decode_filename(xdr, name, len); }
-static __be32 * -decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) -{ - u32 tmp, tmp1; - - iap->ia_valid = 0; - - /* Sun client bug compatibility check: some sun clients seem to - * put 0xffff in the mode field when they mean 0xffffffff. - * Quoting the 4.4BSD nfs server code: Nah nah nah nah na nah. - */ - if ((tmp = ntohl(*p++)) != (u32)-1 && tmp != 0xffff) { - iap->ia_valid |= ATTR_MODE; - iap->ia_mode = tmp; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_uid = make_kuid(userns, tmp); - if (uid_valid(iap->ia_uid)) - iap->ia_valid |= ATTR_UID; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_gid = make_kgid(userns, tmp); - if (gid_valid(iap->ia_gid)) - iap->ia_valid |= ATTR_GID; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_valid |= ATTR_SIZE; - iap->ia_size = tmp; - } - tmp = ntohl(*p++); tmp1 = ntohl(*p++); - if (tmp != (u32)-1 && tmp1 != (u32)-1) { - iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - iap->ia_atime.tv_sec = tmp; - iap->ia_atime.tv_nsec = tmp1 * 1000; - } - tmp = ntohl(*p++); tmp1 = ntohl(*p++); - if (tmp != (u32)-1 && tmp1 != (u32)-1) { - iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - iap->ia_mtime.tv_sec = tmp; - iap->ia_mtime.tv_nsec = tmp1 * 1000; - /* - * Passing the invalid value useconds=1000000 for mtime - * is a Sun convention for "set both mtime and atime to - * current server time". It's needed to make permissions - * checks for the "touch" program across v2 mounts to - * Solaris and Irix boxes work correctly. See description of - * sattr in section 6.1 of "NFS Illustrated" by - * Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5 - */ - if (tmp1 == 1000000) - iap->ia_valid &= ~(ATTR_ATIME_SET|ATTR_MTIME_SET); - } - return p; -} - static bool svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, struct iattr *iap) @@ -435,40 +360,22 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_symlinkargs *args = rqstp->rq_argp; - char *base = (char *)p; - size_t xdrlen; + struct kvec *head = rqstp->rq_arg.head;
- if ( !(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen))) + if (!svcxdr_decode_diropargs(xdr, &args->ffh, &args->fname, &args->flen)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) return 0; - - args->tlen = ntohl(*p++); if (args->tlen == 0) return 0;
- args->first.iov_base = p; - args->first.iov_len = rqstp->rq_arg.head[0].iov_len; - args->first.iov_len -= (char *)p - base; - - /* This request is never larger than a page. Therefore, - * transport will deliver either: - * 1. pathname in the pagelist -> sattr is in the tail. - * 2. everything in the head buffer -> sattr is in the head. - */ - if (rqstp->rq_arg.page_len) { - if (args->tlen != rqstp->rq_arg.page_len) - return 0; - p = rqstp->rq_arg.tail[0].iov_base; - } else { - xdrlen = XDR_QUADLEN(args->tlen); - if (xdrlen > args->first.iov_len - (8 * sizeof(__be32))) - return 0; - p += xdrlen; - } - decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return 1; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + args->first.iov_base = xdr_inline_decode(xdr, args->tlen); + if (!args->first.iov_base) + return 0; + return svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5650682e16f41722f735b7beeb2dbc3411dfbeb6 ]
Now that the argument decoders for NFSv2 and NFSv3 use the xdr_stream mechanism, the version-specific length checking logic in nfsd_dispatch() is no longer necessary.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 34 ---------------------------------- 1 file changed, 34 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 423410cc02145..6c1d70935ea81 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -988,37 +988,6 @@ nfsd(void *vrqstp) return 0; }
-/* - * A write procedure can have a large argument, and a read procedure can - * have a large reply, but no NFSv2 or NFSv3 procedure has argument and - * reply that can both be larger than a page. The xdr code has taken - * advantage of this assumption to be a sloppy about bounds checking in - * some cases. Pending a rewrite of the NFSv2/v3 xdr code to fix that - * problem, we enforce these assumptions here: - */ -static bool nfs_request_too_big(struct svc_rqst *rqstp, - const struct svc_procedure *proc) -{ - /* - * The ACL code has more careful bounds-checking and is not - * susceptible to this problem: - */ - if (rqstp->rq_prog != NFS_PROGRAM) - return false; - /* - * Ditto NFSv4 (which can in theory have argument and reply both - * more than a page): - */ - if (rqstp->rq_vers >= 4) - return false; - /* The reply will be small, we're OK: */ - if (proc->pc_xdrressize > 0 && - proc->pc_xdrressize < XDR_QUADLEN(PAGE_SIZE)) - return false; - - return rqstp->rq_arg.len > PAGE_SIZE; -} - /** * nfsd_dispatch - Process an NFS or NFSACL Request * @rqstp: incoming request @@ -1037,9 +1006,6 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p;
- if (nfs_request_too_big(rqstp, proc)) - goto out_decode_err; - /* * Give the xdr decoder a chance to change this if it wants * (necessary in the NFSv4.0 compound case)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 635a45d34706400c59c3b18ca9fccba195147bda ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 10 +++++----- fs/nfsd/nfsxdr.c | 11 ++++++++++- fs/nfsd/xdr.h | 1 + fs/nfsd/xdr3.h | 2 +- 4 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 899762da23c92..df2e145cfab0d 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -188,17 +188,17 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp)
static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *argp = rqstp->rq_argp;
- p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &argp->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) return 0; - argp->mask = ntohl(*p); p++;
- return xdr_argsize_check(rqstp, p); + return 1; }
- static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_setaclargs *argp = rqstp->rq_argp; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index f2cb4794aeaf6..5ab9fc14816c2 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -38,7 +38,16 @@ decode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE >> 2); }
-static bool +/** + * svcxdr_decode_fhandle - Decode an NFSv2 file handle + * @xdr: XDR stream positioned at an encoded NFSv2 FH + * @fhp: OUT: filled-in server file handle + * + * Return values: + * %false: The encoded file handle was not valid + * %true: @fhp has been initialized + */ +bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) { __be32 *p; diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index ff68643504c3c..77afad72c2aa1 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -165,5 +165,6 @@ void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp); +bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp);
#endif /* LINUX_NFSD_H */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 64af5b01c5d7b..43db4206cd254 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -102,7 +102,7 @@ struct nfsd3_commitargs {
struct nfsd3_getaclargs { struct svc_fh fh; - int mask; + __u32 mask; };
struct posix_acl;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6bb844b4eb6e3b109a2fdaffb60e6da722dc4356 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs_common/nfsacl.c | 52 ++++++++++++++++++++++++++++++++++++++++++ include/linux/nfsacl.h | 3 +++ 2 files changed, 55 insertions(+)
diff --git a/fs/nfs_common/nfsacl.c b/fs/nfs_common/nfsacl.c index d056ad2fdefd6..79c563c1a5e84 100644 --- a/fs/nfs_common/nfsacl.c +++ b/fs/nfs_common/nfsacl.c @@ -295,3 +295,55 @@ int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, nfsacl_desc.desc.array_len; } EXPORT_SYMBOL_GPL(nfsacl_decode); + +/** + * nfs_stream_decode_acl - Decode an NFSv3 ACL + * + * @xdr: an xdr_stream positioned at an encoded ACL + * @aclcnt: OUT: count of ACEs in decoded posix_acl + * @pacl: OUT: a dynamically-allocated buffer containing the decoded posix_acl + * + * Return values: + * %false: The encoded ACL is not valid + * %true: @pacl contains a decoded ACL, and @xdr is advanced + * + * On a successful return, caller must release *pacl using posix_acl_release(). + */ +bool nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, + struct posix_acl **pacl) +{ + const size_t elem_size = XDR_UNIT * 3; + struct nfsacl_decode_desc nfsacl_desc = { + .desc = { + .elem_size = elem_size, + .xcode = pacl ? xdr_nfsace_decode : NULL, + }, + }; + unsigned int base; + u32 entries; + + if (xdr_stream_decode_u32(xdr, &entries) < 0) + return false; + if (entries > NFS_ACL_MAX_ENTRIES) + return false; + + base = xdr_stream_pos(xdr); + if (!xdr_inline_decode(xdr, XDR_UNIT + elem_size * entries)) + return false; + nfsacl_desc.desc.array_maxlen = entries; + if (xdr_decode_array2(xdr->buf, base, &nfsacl_desc.desc)) + return false; + + if (pacl) { + if (entries != nfsacl_desc.desc.array_len || + posix_acl_from_nfsacl(nfsacl_desc.acl) != 0) { + posix_acl_release(nfsacl_desc.acl); + return false; + } + *pacl = nfsacl_desc.acl; + } + if (aclcnt) + *aclcnt = entries; + return true; +} +EXPORT_SYMBOL_GPL(nfs_stream_decode_acl); diff --git a/include/linux/nfsacl.h b/include/linux/nfsacl.h index 103d446953234..0ba99c5136491 100644 --- a/include/linux/nfsacl.h +++ b/include/linux/nfsacl.h @@ -38,5 +38,8 @@ nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, extern int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, struct posix_acl **pacl); +extern bool +nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, + struct posix_acl **pacl);
#endif /* __LINUX_NFSACL_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 427eab3ba22891845265f9a3846de6ac152ec836 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 29 ++++++++++++----------------- fs/nfsd/xdr3.h | 2 +- 2 files changed, 13 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index df2e145cfab0d..123820ec79d37 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -201,28 +201,23 @@ static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p)
static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - unsigned int base; - int n;
- p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &argp->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) + return 0; + if (argp->mask & ~NFS_ACL_MASK) return 0; - argp->mask = ntohl(*p++); - if (argp->mask & ~NFS_ACL_MASK || - !xdr_argsize_check(rqstp, p)) + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? + &argp->acl_access : NULL)) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? + &argp->acl_default : NULL)) return 0;
- base = (char *)p - (char *)head->iov_base; - n = nfsacl_decode(&rqstp->rq_arg, base, NULL, - (argp->mask & NFS_ACL) ? - &argp->acl_access : NULL); - if (n > 0) - n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, - (argp->mask & NFS_DFACL) ? - &argp->acl_default : NULL); - return (n > 0); + return 1; }
static int nfsaclsvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 43db4206cd254..5afb3ce4f0622 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -108,7 +108,7 @@ struct nfsd3_getaclargs { struct posix_acl; struct nfsd3_setaclargs { struct svc_fh fh; - int mask; + __u32 mask; struct posix_acl *acl_access; struct posix_acl *acl_default; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 571d31f37a57729c9d3463b5a692a84e619b408a ]
Since the ACL GETATTR procedure is the same as the normal GETATTR procedure, simply re-use nfssvc_decode_fhandleargs.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 123820ec79d37..0274348f6679e 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -220,16 +220,6 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static int nfsaclsvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_fhandle *argp = rqstp->rq_argp; - - p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); -} - static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_accessargs *argp = rqstp->rq_argp; @@ -392,7 +382,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { }, [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, - .pc_decode = nfsaclsvc_decode_fhandleargs, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfsaclsvc_encode_attrstatres, .pc_release = nfsaclsvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 64063892efc1daa3a48882673811ff327ba75ed5 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 0274348f6679e..7eeac5b81c200 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -222,14 +222,15 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p)
static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_accessargs *argp = rqstp->rq_argp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nfsd3_accessargs *args = rqstp->rq_argp;
- p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->access) < 0) return 0; - argp->access = ntohl(*p++);
- return xdr_argsize_check(rqstp, p); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit baadce65d6ee3032b921d9c043ba808bc69d6b13 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 18 ------------------ fs/nfsd/xdr.h | 1 - 2 files changed, 19 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5ab9fc14816c2..5d79ef6a0c7fc 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,18 +26,6 @@ static u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */
-static __be32 * -decode_fh(__be32 *p, struct svc_fh *fhp) -{ - fh_init(fhp, NFS_FHSIZE); - memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); - fhp->fh_handle.fh_size = NFS_FHSIZE; - - /* FIXME: Look up export pointer here and verify - * Sun Secure RPC if requested */ - return p + (NFS_FHSIZE >> 2); -} - /** * svcxdr_decode_fhandle - Decode an NFSv2 file handle * @xdr: XDR stream positioned at an encoded NFSv2 FH @@ -62,12 +50,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) return true; }
-/* Helper function for NFSv2 ACL code */ -__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) -{ - return decode_fh(p, fhp); -} - static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 77afad72c2aa1..b92f1acec9e77 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -164,7 +164,6 @@ void nfssvc_release_readres(struct svc_rqst *rqstp);
/* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); -__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp);
#endif /* LINUX_NFSD_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 05027eafc266487c6e056d10ab352861df95b5d4 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 11 ++++++----- fs/nfsd/nfs3xdr.c | 11 ++++++++++- fs/nfsd/xdr3.h | 1 + 3 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 9e1a92fb97712..addb0d7d5500f 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -124,19 +124,20 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) /* * XDR decode functions */ + static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *args = rqstp->rq_argp;
- p = nfs3svc_decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->mask) < 0) return 0; - args->mask = ntohl(*p); p++;
- return xdr_argsize_check(rqstp, p); + return 1; }
- static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_setaclargs *args = rqstp->rq_argp; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index a30b418a51160..aa55d0ba2a548 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -53,7 +53,16 @@ svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) return true; }
-static bool +/** + * svcxdr_decode_nfs_fh3 - Decode an NFSv3 file handle + * @xdr: XDR stream positioned at an undecoded NFSv3 FH + * @fhp: OUT: filled-in server file handle + * + * Return values: + * %false: The encoded file handle was not valid + * %true: @fhp has been initialized + */ +bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) { __be32 *p; diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 5afb3ce4f0622..7456aee74f3df 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -308,6 +308,7 @@ int nfs3svc_encode_entry_plus(void *, const char *name, __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); __be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp); +bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp);
#endif /* _LINUX_NFSD_XDR3_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 68519ff2a1c72c67fcdc4b81671acda59f420af9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index addb0d7d5500f..a568b842e9ebe 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -140,28 +140,23 @@ static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p)
static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_setaclargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - unsigned int base; - int n; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nfsd3_setaclargs *argp = rqstp->rq_argp;
- p = nfs3svc_decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) return 0; - args->mask = ntohl(*p++); - if (args->mask & ~NFS_ACL_MASK || - !xdr_argsize_check(rqstp, p)) + if (argp->mask & ~NFS_ACL_MASK) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? + &argp->acl_access : NULL)) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? + &argp->acl_default : NULL)) return 0;
- base = (char *)p - (char *)head->iov_base; - n = nfsacl_decode(&rqstp->rq_arg, base, NULL, - (args->mask & NFS_ACL) ? - &args->acl_access : NULL); - if (n > 0) - n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, - (args->mask & NFS_DFACL) ? - &args->acl_default : NULL); - return (n > 0); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9cee763ee654ce8622d673b8e32687d738e24ace ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 20 -------------------- fs/nfsd/xdr3.h | 2 -- 2 files changed, 22 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index aa55d0ba2a548..00a96054280a6 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -82,26 +82,6 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; }
-static __be32 * -decode_fh(__be32 *p, struct svc_fh *fhp) -{ - unsigned int size; - fh_init(fhp, NFS3_FHSIZE); - size = ntohl(*p++); - if (size > NFS3_FHSIZE) - return NULL; - - memcpy(&fhp->fh_handle.fh_base, p, size); - fhp->fh_handle.fh_size = size; - return p + XDR_QUADLEN(size); -} - -/* Helper function for NFSv3 ACL code */ -__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp) -{ - return decode_fh(p, fhp); -} - static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7456aee74f3df..3e1578953f544 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -307,8 +307,6 @@ int nfs3svc_encode_entry_plus(void *, const char *name, /* Helper functions for NFSv3 ACL code */ __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); -__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp);
- #endif /* _LINUX_NFSD_XDR3_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 1b76d1df1a3683b6b23cd1c813d13c5e6a9d35e5 ]
Commit 501cb1849f86 ("nfsd: rip out the raparms cache") removed the code that updates read-ahead cache stats counters, commit 8bbfa9f3889b ("knfsd: remove the nfsd thread busy histogram") removed code that updates the thread busy stats counters back in 2009 and code that updated filehandle cache stats was removed back in 2002.
Remove the unused stats counters from nfsd_stats struct and print hardcoded zeros in /proc/net/rpc/nfsd.
Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/stats.c | 41 ++++++++++++++++------------------------- fs/nfsd/stats.h | 10 ---------- 2 files changed, 16 insertions(+), 35 deletions(-)
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index b1bc582b0493e..e928e224205ac 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -7,16 +7,14 @@ * Format: * rc <hits> <misses> <nocache> * Statistsics for the reply cache - * fh <stale> <total-lookups> <anonlookups> <dir-not-in-dcache> <nondir-not-in-dcache> + * fh <stale> <deprecated filehandle cache stats> * statistics for filehandle lookup * io <bytes-read> <bytes-written> * statistics for IO throughput - * th <threads> <fullcnt> <10%-20%> <20%-30%> ... <90%-100%> <100%> - * time (seconds) when nfsd thread usage above thresholds - * and number of times that all threads were in use - * ra cache-size <10% <20% <30% ... <100% not-found - * number of times that read-ahead entry was found that deep in - * the cache. + * th <threads> <deprecated thread usage histogram stats> + * number of threads + * ra <deprecated ra-cache stats> + * * plus generic RPC stats (see net/sunrpc/stats.c) * * Copyright (C) 1995, 1996, 1997 Olaf Kirch okir@monad.swb.de @@ -38,31 +36,24 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) { int i;
- seq_printf(seq, "rc %u %u %u\nfh %u %u %u %u %u\nio %u %u\n", + seq_printf(seq, "rc %u %u %u\nfh %u 0 0 0 0\nio %u %u\n", nfsdstats.rchits, nfsdstats.rcmisses, nfsdstats.rcnocache, nfsdstats.fh_stale, - nfsdstats.fh_lookup, - nfsdstats.fh_anon, - nfsdstats.fh_nocache_dir, - nfsdstats.fh_nocache_nondir, nfsdstats.io_read, nfsdstats.io_write); + /* thread usage: */ - seq_printf(seq, "th %u %u", nfsdstats.th_cnt, nfsdstats.th_fullcnt); - for (i=0; i<10; i++) { - unsigned int jifs = nfsdstats.th_usage[i]; - unsigned int sec = jifs / HZ, msec = (jifs % HZ)*1000/HZ; - seq_printf(seq, " %u.%03u", sec, msec); - } - - /* newline and ra-cache */ - seq_printf(seq, "\nra %u", nfsdstats.ra_size); - for (i=0; i<11; i++) - seq_printf(seq, " %u", nfsdstats.ra_depth[i]); - seq_putc(seq, '\n'); - + seq_printf(seq, "th %u 0", nfsdstats.th_cnt); + + /* deprecated thread usage histogram stats */ + for (i = 0; i < 10; i++) + seq_puts(seq, " 0.000"); + + /* deprecated ra-cache stats */ + seq_puts(seq, "\nra 0 0 0 0 0 0 0 0 0 0 0 0\n"); + /* show my rpc info */ svc_seq_show(seq, &nfsd_svcstats);
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index b23fdac698201..5e3cdf21556a1 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -15,19 +15,9 @@ struct nfsd_stats { unsigned int rcmisses; /* repcache hits */ unsigned int rcnocache; /* uncached reqs */ unsigned int fh_stale; /* FH stale error */ - unsigned int fh_lookup; /* dentry cached */ - unsigned int fh_anon; /* anon file dentry returned */ - unsigned int fh_nocache_dir; /* filehandle not found in dcache */ - unsigned int fh_nocache_nondir; /* filehandle not found in dcache */ unsigned int io_read; /* bytes returned to read requests */ unsigned int io_write; /* bytes passed in write requests */ unsigned int th_cnt; /* number of available threads */ - unsigned int th_usage[10]; /* number of ticks during which n perdeciles - * of available threads were in use */ - unsigned int th_fullcnt; /* number of times last free thread was used */ - unsigned int ra_size; /* size of ra cache */ - unsigned int ra_depth[11]; /* number of times ra entry was found that deep - * in the cache (10percentiles). [10] = not found */ #ifdef CONFIG_NFSD_V4 unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ #endif
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e567b98ce9a4b35b63c364d24828a9e5cd7a8179 ]
nfsd stats counters can be updated by concurrent nfsd threads without any protection.
Convert some nfsd_stats and nfsd_net struct members to use percpu counters.
The longest_chain* members of struct nfsd_net remain unprotected.
Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 23 +++++++------ fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfscache.c | 52 +++++++++++++++++++++--------- fs/nfsd/nfsctl.c | 5 ++- fs/nfsd/nfsfh.c | 2 +- fs/nfsd/stats.c | 77 ++++++++++++++++++++++++++++++++++++-------- fs/nfsd/stats.h | 80 +++++++++++++++++++++++++++++++++++++++------- fs/nfsd/vfs.c | 4 +-- 8 files changed, 192 insertions(+), 53 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 02d3d2f0e6168..a75abeb1e6988 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -10,6 +10,7 @@
#include <net/net_namespace.h> #include <net/netns/generic.h> +#include <linux/percpu_counter.h>
/* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -21,6 +22,14 @@ struct cld_net; struct nfsd4_client_tracking_ops;
+enum { + /* cache misses due only to checksum comparison failures */ + NFSD_NET_PAYLOAD_MISSES, + /* amount of memory (in bytes) currently consumed by the DRC */ + NFSD_NET_DRC_MEM_USAGE, + NFSD_NET_COUNTERS_NUM +}; + /* * Represents a nfsd "container". With respect to nfsv4 state tracking, the * fields of interest are the *_id_hashtbls and the *_name_tree. These track @@ -149,20 +158,16 @@ struct nfsd_net {
/* * Stats and other tracking of on the duplicate reply cache. - * These fields and the "rc" fields in nfsdstats are modified - * with only the per-bucket cache lock, which isn't really safe - * and should be fixed if we want the statistics to be - * completely accurate. + * The longest_chain* fields are modified with only the per-bucket + * cache lock, which isn't really safe and should be fixed if we want + * these statistics to be completely accurate. */
/* total number of entries */ atomic_t num_drc_entries;
- /* cache misses due only to checksum comparison failures */ - unsigned int payload_misses; - - /* amount of memory (in bytes) currently consumed by the DRC */ - unsigned int drc_mem_usage; + /* Per-netns stats counters */ + struct percpu_counter counter[NFSD_NET_COUNTERS_NUM];
/* longest hash chain seen */ unsigned int longest_chain; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a5e1f5c1a4d64..4f64d94909ec1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2168,7 +2168,7 @@ nfsd4_proc_null(struct svc_rqst *rqstp) static inline void nfsd4_increment_op_stats(u32 opnum) { if (opnum >= FIRST_NFS4_OP && opnum <= LAST_NFS4_OP) - nfsdstats.nfs4_opcount[opnum]++; + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_NFS4_OP(opnum)]); }
static const struct nfsd4_operation nfsd4_ops[]; diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 80c90fc231a53..96cdf77925f33 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -121,14 +121,14 @@ nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp, struct nfsd_net *nn) { if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) { - nn->drc_mem_usage -= rp->c_replvec.iov_len; + nfsd_stats_drc_mem_usage_sub(nn, rp->c_replvec.iov_len); kfree(rp->c_replvec.iov_base); } if (rp->c_state != RC_UNUSED) { rb_erase(&rp->c_node, &b->rb_head); list_del(&rp->c_lru); atomic_dec(&nn->num_drc_entries); - nn->drc_mem_usage -= sizeof(*rp); + nfsd_stats_drc_mem_usage_sub(nn, sizeof(*rp)); } kmem_cache_free(drc_slab, rp); } @@ -154,6 +154,16 @@ void nfsd_drc_slab_free(void) kmem_cache_destroy(drc_slab); }
+static int nfsd_reply_cache_stats_init(struct nfsd_net *nn) +{ + return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM); +} + +static void nfsd_reply_cache_stats_destroy(struct nfsd_net *nn) +{ + nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM); +} + int nfsd_reply_cache_init(struct nfsd_net *nn) { unsigned int hashsize; @@ -165,12 +175,16 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) hashsize = nfsd_hashsize(nn->max_drc_entries); nn->maskbits = ilog2(hashsize);
+ status = nfsd_reply_cache_stats_init(nn); + if (status) + goto out_nomem; + nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan; nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count; nn->nfsd_reply_cache_shrinker.seeks = 1; status = register_shrinker(&nn->nfsd_reply_cache_shrinker); if (status) - goto out_nomem; + goto out_stats_destroy;
nn->drc_hashtbl = kvzalloc(array_size(hashsize, sizeof(*nn->drc_hashtbl)), GFP_KERNEL); @@ -186,6 +200,8 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) return 0; out_shrinker: unregister_shrinker(&nn->nfsd_reply_cache_shrinker); +out_stats_destroy: + nfsd_reply_cache_stats_destroy(nn); out_nomem: printk(KERN_ERR "nfsd: failed to allocate reply cache\n"); return -ENOMEM; @@ -196,6 +212,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) struct svc_cacherep *rp; unsigned int i;
+ nfsd_reply_cache_stats_destroy(nn); unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
for (i = 0; i < nn->drc_hashsize; i++) { @@ -324,7 +341,7 @@ nfsd_cache_key_cmp(const struct svc_cacherep *key, { if (key->c_key.k_xid == rp->c_key.k_xid && key->c_key.k_csum != rp->c_key.k_csum) { - ++nn->payload_misses; + nfsd_stats_payload_misses_inc(nn); trace_nfsd_drc_mismatch(nn, key, rp); }
@@ -407,7 +424,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp)
rqstp->rq_cacherep = NULL; if (type == RC_NOCACHE) { - nfsdstats.rcnocache++; + nfsd_stats_rc_nocache_inc(); goto out; }
@@ -429,12 +446,12 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) goto found_entry; }
- nfsdstats.rcmisses++; + nfsd_stats_rc_misses_inc(); rqstp->rq_cacherep = rp; rp->c_state = RC_INPROG;
atomic_inc(&nn->num_drc_entries); - nn->drc_mem_usage += sizeof(*rp); + nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
/* go ahead and prune the cache */ prune_bucket(b, nn); @@ -446,7 +463,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp)
found_entry: /* We found a matching entry which is either in progress or done. */ - nfsdstats.rchits++; + nfsd_stats_rc_hits_inc(); rtn = RC_DROPIT;
/* Request being processed */ @@ -548,7 +565,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) return; } spin_lock(&b->cache_lock); - nn->drc_mem_usage += bufsize; + nfsd_stats_drc_mem_usage_add(nn, bufsize); lru_put_end(b, rp); rp->c_secure = test_bit(RQ_SECURE, &rqstp->rq_flags); rp->c_type = cachetype; @@ -588,13 +605,18 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
seq_printf(m, "max entries: %u\n", nn->max_drc_entries); seq_printf(m, "num entries: %u\n", - atomic_read(&nn->num_drc_entries)); + atomic_read(&nn->num_drc_entries)); seq_printf(m, "hash buckets: %u\n", 1 << nn->maskbits); - seq_printf(m, "mem usage: %u\n", nn->drc_mem_usage); - seq_printf(m, "cache hits: %u\n", nfsdstats.rchits); - seq_printf(m, "cache misses: %u\n", nfsdstats.rcmisses); - seq_printf(m, "not cached: %u\n", nfsdstats.rcnocache); - seq_printf(m, "payload misses: %u\n", nn->payload_misses); + seq_printf(m, "mem usage: %lld\n", + percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE])); + seq_printf(m, "cache hits: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS])); + seq_printf(m, "cache misses: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES])); + seq_printf(m, "not cached: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE])); + seq_printf(m, "payload misses: %lld\n", + percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES])); seq_printf(m, "longest chain len: %u\n", nn->longest_chain); seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index c4b11560ac1b6..7f85c171f83aa 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1522,7 +1522,9 @@ static int __init init_nfsd(void) retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; - nfsd_stat_init(); /* Statistics */ + retval = nfsd_stat_init(); /* Statistics */ + if (retval) + goto out_free_pnfs; retval = nfsd_drc_slab_create(); if (retval) goto out_free_stat; @@ -1552,6 +1554,7 @@ static int __init init_nfsd(void) nfsd_drc_slab_free(); out_free_stat: nfsd_stat_shutdown(); +out_free_pnfs: nfsd4_exit_pnfs(); out_free_slabs: nfsd4_free_slabs(); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 66f2ef67792a7..9e31b2b5c6d26 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -422,7 +422,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) } out: if (error == nfserr_stale) - nfsdstats.fh_stale++; + nfsd_stats_fh_stale_inc(); return error; }
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index e928e224205ac..1d3b881e73821 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -36,13 +36,13 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) { int i;
- seq_printf(seq, "rc %u %u %u\nfh %u 0 0 0 0\nio %u %u\n", - nfsdstats.rchits, - nfsdstats.rcmisses, - nfsdstats.rcnocache, - nfsdstats.fh_stale, - nfsdstats.io_read, - nfsdstats.io_write); + seq_printf(seq, "rc %lld %lld %lld\nfh %lld 0 0 0 0\nio %lld %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_FH_STALE]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_READ]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE]));
/* thread usage: */ seq_printf(seq, "th %u 0", nfsdstats.th_cnt); @@ -61,8 +61,10 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) /* Show count for individual nfsv4 operations */ /* Writing operation numbers 0 1 2 also for maintaining uniformity */ seq_printf(seq,"proc4ops %u", LAST_NFS4_OP + 1); - for (i = 0; i <= LAST_NFS4_OP; i++) - seq_printf(seq, " %u", nfsdstats.nfs4_opcount[i]); + for (i = 0; i <= LAST_NFS4_OP; i++) { + seq_printf(seq, " %lld", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_NFS4_OP(i)])); + }
seq_putc(seq, '\n'); #endif @@ -82,14 +84,63 @@ static const struct proc_ops nfsd_proc_ops = { .proc_release = single_release, };
-void -nfsd_stat_init(void) +int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) { + int i, err = 0; + + for (i = 0; !err && i < num; i++) + err = percpu_counter_init(&counters[i], 0, GFP_KERNEL); + + if (!err) + return 0; + + for (; i > 0; i--) + percpu_counter_destroy(&counters[i-1]); + + return err; +} + +void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num) +{ + int i; + + for (i = 0; i < num; i++) + percpu_counter_set(&counters[i], 0); +} + +void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num) +{ + int i; + + for (i = 0; i < num; i++) + percpu_counter_destroy(&counters[i]); +} + +static int nfsd_stat_counters_init(void) +{ + return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); +} + +static void nfsd_stat_counters_destroy(void) +{ + nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); +} + +int nfsd_stat_init(void) +{ + int err; + + err = nfsd_stat_counters_init(); + if (err) + return err; + svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops); + + return 0; }
-void -nfsd_stat_shutdown(void) +void nfsd_stat_shutdown(void) { + nfsd_stat_counters_destroy(); svc_proc_unregister(&init_net, "nfsd"); } diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 5e3cdf21556a1..87c3150c200f0 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -8,27 +8,85 @@ #define _NFSD_STATS_H
#include <uapi/linux/nfsd/stats.h> +#include <linux/percpu_counter.h>
-struct nfsd_stats { - unsigned int rchits; /* repcache hits */ - unsigned int rcmisses; /* repcache hits */ - unsigned int rcnocache; /* uncached reqs */ - unsigned int fh_stale; /* FH stale error */ - unsigned int io_read; /* bytes returned to read requests */ - unsigned int io_write; /* bytes passed in write requests */ - unsigned int th_cnt; /* number of available threads */ +enum { + NFSD_STATS_RC_HITS, /* repcache hits */ + NFSD_STATS_RC_MISSES, /* repcache misses */ + NFSD_STATS_RC_NOCACHE, /* uncached reqs */ + NFSD_STATS_FH_STALE, /* FH stale error */ + NFSD_STATS_IO_READ, /* bytes returned to read requests */ + NFSD_STATS_IO_WRITE, /* bytes passed in write requests */ #ifdef CONFIG_NFSD_V4 - unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ + NFSD_STATS_FIRST_NFS4_OP, /* count of individual nfsv4 operations */ + NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP, +#define NFSD_STATS_NFS4_OP(op) (NFSD_STATS_FIRST_NFS4_OP + (op)) #endif + NFSD_STATS_COUNTERS_NUM +}; + +struct nfsd_stats { + struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM];
+ /* Protected by nfsd_mutex */ + unsigned int th_cnt; /* number of available threads */ };
extern struct nfsd_stats nfsdstats; + extern struct svc_stat nfsd_svcstats;
-void nfsd_stat_init(void); -void nfsd_stat_shutdown(void); +int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); +void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num); +void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num); +int nfsd_stat_init(void); +void nfsd_stat_shutdown(void); + +static inline void nfsd_stats_rc_hits_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_HITS]); +} + +static inline void nfsd_stats_rc_misses_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_MISSES]); +} + +static inline void nfsd_stats_rc_nocache_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); +} + +static inline void nfsd_stats_fh_stale_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); +} + +static inline void nfsd_stats_io_read_add(s64 amount) +{ + percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); +} + +static inline void nfsd_stats_io_write_add(s64 amount) +{ + percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); +} + +static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) +{ + percpu_counter_inc(&nn->counter[NFSD_NET_PAYLOAD_MISSES]); +} + +static inline void nfsd_stats_drc_mem_usage_add(struct nfsd_net *nn, s64 amount) +{ + percpu_counter_add(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); +} + +static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount) +{ + percpu_counter_sub(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); +}
#endif /* _NFSD_STATS_H */ diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index a515cbd0a7d8f..1b44d8f985be9 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -897,7 +897,7 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsdstats.io_read += host_err; + nfsd_stats_io_read_add(host_err); *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1050,7 +1050,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, goto out_nfserr; } *cnt = host_err; - nfsdstats.io_write += *cnt; + nfsd_stats_io_write_add(*cnt); fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 20ad856e47323e208ae8d6a9ecfe5bf0be6f505e ]
Collect some nfsd stats per export in addition to the global stats.
A new nfsdfs export_stats file is created. It uses the same ops as the exports file to iterate the export entries and we use the file's name to determine the reported info per export. For example:
$ cat /proc/fs/nfsd/export_stats # Version 1.1 # Path Client Start-time # Stats /test localhost 92 fh_stale: 0 io_read: 9 io_write: 1
Every export entry reports the start time when stats collection started, so stats collecting scripts can know if stats where reset between samples.
Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/export.c | 68 ++++++++++++++++++++++++++++++++++++++++++------ fs/nfsd/export.h | 15 +++++++++++ fs/nfsd/nfsctl.c | 3 +++ fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfsfh.c | 4 +-- fs/nfsd/stats.h | 12 ++++++--- fs/nfsd/vfs.c | 4 +-- 7 files changed, 92 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 81e7bb12aca69..7c863f2c21e0c 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -331,12 +331,29 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc) fsloc->locations = NULL; }
+static int export_stats_init(struct export_stats *stats) +{ + stats->start_time = ktime_get_seconds(); + return nfsd_percpu_counters_init(stats->counter, EXP_STATS_COUNTERS_NUM); +} + +static void export_stats_reset(struct export_stats *stats) +{ + nfsd_percpu_counters_reset(stats->counter, EXP_STATS_COUNTERS_NUM); +} + +static void export_stats_destroy(struct export_stats *stats) +{ + nfsd_percpu_counters_destroy(stats->counter, EXP_STATS_COUNTERS_NUM); +} + static void svc_export_put(struct kref *ref) { struct svc_export *exp = container_of(ref, struct svc_export, h.ref); path_put(&exp->ex_path); auth_domain_put(exp->ex_client); nfsd4_fslocs_free(&exp->ex_fslocs); + export_stats_destroy(&exp->ex_stats); kfree(exp->ex_uuid); kfree_rcu(exp, ex_rcu); } @@ -692,22 +709,47 @@ static void exp_flags(struct seq_file *m, int flag, int fsid, kuid_t anonu, kgid_t anong, struct nfsd4_fs_locations *fslocs); static void show_secinfo(struct seq_file *m, struct svc_export *exp);
+static int is_export_stats_file(struct seq_file *m) +{ + /* + * The export_stats file uses the same ops as the exports file. + * We use the file's name to determine the reported info per export. + * There is no rename in nsfdfs, so d_name.name is stable. + */ + return !strcmp(m->file->f_path.dentry->d_name.name, "export_stats"); +} + static int svc_export_show(struct seq_file *m, struct cache_detail *cd, struct cache_head *h) { - struct svc_export *exp ; + struct svc_export *exp; + bool export_stats = is_export_stats_file(m);
- if (h ==NULL) { - seq_puts(m, "#path domain(flags)\n"); + if (h == NULL) { + if (export_stats) + seq_puts(m, "#path domain start-time\n#\tstats\n"); + else + seq_puts(m, "#path domain(flags)\n"); return 0; } exp = container_of(h, struct svc_export, h); seq_path(m, &exp->ex_path, " \t\n\"); seq_putc(m, '\t'); seq_escape(m, exp->ex_client->name, " \t\n\"); + if (export_stats) { + seq_printf(m, "\t%lld\n", exp->ex_stats.start_time); + seq_printf(m, "\tfh_stale: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_FH_STALE])); + seq_printf(m, "\tio_read: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_READ])); + seq_printf(m, "\tio_write: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_WRITE])); + seq_putc(m, '\n'); + return 0; + } seq_putc(m, '('); - if (test_bit(CACHE_VALID, &h->flags) && + if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) { exp_flags(m, exp->ex_flags, exp->ex_fsid, exp->ex_anon_uid, exp->ex_anon_gid, &exp->ex_fslocs); @@ -748,6 +790,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem) new->ex_layout_types = 0; new->ex_uuid = NULL; new->cd = item->cd; + export_stats_reset(&new->ex_stats); }
static void export_update(struct cache_head *cnew, struct cache_head *citem) @@ -780,10 +823,15 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem) static struct cache_head *svc_export_alloc(void) { struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL); - if (i) - return &i->h; - else + if (!i) + return NULL; + + if (export_stats_init(&i->ex_stats)) { + kfree(i); return NULL; + } + + return &i->h; }
static const struct cache_detail svc_export_cache_template = { @@ -1245,10 +1293,14 @@ static int e_show(struct seq_file *m, void *p) struct cache_head *cp = p; struct svc_export *exp = container_of(cp, struct svc_export, h); struct cache_detail *cd = m->private; + bool export_stats = is_export_stats_file(m);
if (p == SEQ_START_TOKEN) { seq_puts(m, "# Version 1.1\n"); - seq_puts(m, "# Path Client(Flags) # IPs\n"); + if (export_stats) + seq_puts(m, "# Path Client Start-time\n#\tStats\n"); + else + seq_puts(m, "# Path Client(Flags) # IPs\n"); return 0; }
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index e7daa1f246f08..ee0e3aba4a6e5 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -6,6 +6,7 @@ #define NFSD_EXPORT_H
#include <linux/sunrpc/cache.h> +#include <linux/percpu_counter.h> #include <uapi/linux/nfsd/export.h> #include <linux/nfs4.h>
@@ -46,6 +47,19 @@ struct exp_flavor_info { u32 flags; };
+/* Per-export stats */ +enum { + EXP_STATS_FH_STALE, + EXP_STATS_IO_READ, + EXP_STATS_IO_WRITE, + EXP_STATS_COUNTERS_NUM +}; + +struct export_stats { + time64_t start_time; + struct percpu_counter counter[EXP_STATS_COUNTERS_NUM]; +}; + struct svc_export { struct cache_head h; struct auth_domain * ex_client; @@ -62,6 +76,7 @@ struct svc_export { struct nfsd4_deviceid_map *ex_devid_map; struct cache_detail *cd; struct rcu_head ex_rcu; + struct export_stats ex_stats; };
/* an "export key" (expkey) maps a filehandlefragement to an diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 7f85c171f83aa..fa060f91df1b8 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -32,6 +32,7 @@ enum { NFSD_Root = 1, NFSD_List, + NFSD_Export_Stats, NFSD_Export_features, NFSD_Fh, NFSD_FO_UnlockIP, @@ -1352,6 +1353,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
static const struct tree_descr nfsd_files[] = { [NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO}, + /* Per-export io stats use same ops as exports file */ + [NFSD_Export_Stats] = {"export_stats", &exports_nfsd_operations, S_IRUGO}, [NFSD_Export_features] = {"export_features", &export_features_operations, S_IRUGO}, [NFSD_FO_UnlockIP] = {"unlock_ip", diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 2326428e2c5bf..27c1308ffc2ba 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -24,8 +24,8 @@ #include <uapi/linux/nfsd/debug.h>
#include "netns.h" -#include "stats.h" #include "export.h" +#include "stats.h"
#undef ifdebug #ifdef CONFIG_SUNRPC_DEBUG diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 9e31b2b5c6d26..4744a276058d4 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -349,7 +349,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) { - struct svc_export *exp; + struct svc_export *exp = NULL; struct dentry *dentry; __be32 error;
@@ -422,7 +422,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) } out: if (error == nfserr_stale) - nfsd_stats_fh_stale_inc(); + nfsd_stats_fh_stale_inc(exp); return error; }
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 87c3150c200f0..51ecda852e23b 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -59,19 +59,25 @@ static inline void nfsd_stats_rc_nocache_inc(void) percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); }
-static inline void nfsd_stats_fh_stale_inc(void) +static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp) { percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); + if (exp) + percpu_counter_inc(&exp->ex_stats.counter[EXP_STATS_FH_STALE]); }
-static inline void nfsd_stats_io_read_add(s64 amount) +static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); + if (exp) + percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_READ], amount); }
-static inline void nfsd_stats_io_write_add(s64 amount) +static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); + if (exp) + percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_WRITE], amount); }
static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 1b44d8f985be9..3e30788e0046b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -897,7 +897,7 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsd_stats_io_read_add(host_err); + nfsd_stats_io_read_add(fhp->fh_export, host_err); *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1050,7 +1050,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, goto out_nfserr; } *cnt = host_err; - nfsd_stats_io_write_add(*cnt); + nfsd_stats_io_write_add(exp, *cnt); fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 33311873adb0d55c287b164117b5b4bb7b1bdc40 ]
This STALE_CLIENTID check is redundant with the one in lookup_clientid().
There's a difference in behavior is in case of memory allocation failure, which I think isn't a big deal.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e7ec7593eaaa3..3f26047376368 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4722,8 +4722,6 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, struct nfs4_openowner *oo = NULL; __be32 status;
- if (STALE_CLIENTID(&open->op_clientid, nn)) - return nfserr_stale_clientid; /* * In case we need it later, after we've already created the * file and don't want to risk a further failure:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit a9d53a75cf574d6aa41f3cb4968fffe4f64e0fad ]
Similarly, this STALE_CLIENTID check is already handled by:
nfs4_preprocess_confirmed_seqid_op()-> nfs4_preprocess_seqid_op()-> nfsd4_lookup_stateid()-> set_client()-> STALE_CLIENTID()
(This may cause it to return a different error in some cases where there are multiple things wrong; pynfs test SEQ10 regressed on this commit because of that, but I think that's the test's fault, and I've fixed it separately.)
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3f26047376368..15ed72b0ef55b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6720,10 +6720,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &cstate->session->se_client->cl_clientid, sizeof(clientid_t));
- status = nfserr_stale_clientid; - if (STALE_CLIENTID(&lock->lk_new_clientid, nn)) - goto out; - /* validate and update open stateid and open seqid */ status = nfs4_preprocess_confirmed_seqid_op(cstate, lock->lk_new_open_seqid,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit b4587eb2cf4b6271f67fb93b75f7de2a2026e853 ]
You can take the single-exit thing too far, I think.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 15ed72b0ef55b..574c88a9da268 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5322,15 +5322,12 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, trace_nfsd_clid_renew(clid); status = lookup_clientid(clid, cstate, nn, false); if (status) - goto out; + return status; clp = cstate->clp; - status = nfserr_cb_path_down; if (!list_empty(&clp->cl_delegations) && clp->cl_cb_state != NFSD4_CB_UP) - goto out; - status = nfs_ok; -out: - return status; + return nfserr_cb_path_down; + return nfs_ok; }
void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 460d27091ae2c23e7ac959a61cd481c58832db58 ]
I think this is a better name, and I'm going to reuse elsewhere the code that does the lookup itself.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 574c88a9da268..60c66becbc876 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,7 +4675,7 @@ static __be32 nfsd4_check_seqid(struct nfsd4_compound_state *cstate, struct nfs4 return nfserr_bad_seqid; }
-static __be32 lookup_clientid(clientid_t *clid, +static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, struct nfsd_net *nn, bool sessions) @@ -4730,7 +4730,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (open->op_file == NULL) return nfserr_jukebox;
- status = lookup_clientid(clientid, cstate, nn, false); + status = set_client(clientid, cstate, nn, false); if (status) return status; clp = cstate->clp; @@ -5320,7 +5320,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
trace_nfsd_clid_renew(clid); - status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return status; clp = cstate->clp; @@ -5701,8 +5701,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) || CLOSE_STATEID(stateid)) return nfserr_bad_stateid; - status = lookup_clientid(&stateid->si_opaque.so_clid, cstate, nn, - false); + status = set_client(&stateid->si_opaque.so_clid, cstate, nn, false); if (status == nfserr_stale_clientid) { if (cstate->session) return nfserr_bad_stateid; @@ -5841,7 +5840,7 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
cps->cpntf_time = ktime_get_boottime_seconds(); memset(&cstate, 0, sizeof(cstate)); - status = lookup_clientid(&cps->cp_p_clid, &cstate, nn, true); + status = set_client(&cps->cp_p_clid, &cstate, nn, true); if (status) goto out; status = nfsd4_lookup_stateid(&cstate, &cps->cp_p_stateid, @@ -6933,8 +6932,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_inval;
if (!nfsd4_has_session(cstate)) { - status = lookup_clientid(&lockt->lt_clientid, cstate, nn, - false); + status = set_client(&lockt->lt_clientid, cstate, nn, false); if (status) goto out; } @@ -7118,7 +7116,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id);
- status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return status;
@@ -7258,7 +7256,7 @@ nfs4_check_open_reclaim(clientid_t *clid, __be32 status;
/* find clientid in conf_id_hashtbl */ - status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return nfserr_reclaim_bad;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 7950b5316e40d99dcb85ab81a2d1dbb913d7c1c8 ]
This'll be useful elsewhere.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 60c66becbc876..c4e9f807b4a1b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,40 +4675,40 @@ static __be32 nfsd4_check_seqid(struct nfsd4_compound_state *cstate, struct nfs4 return nfserr_bad_seqid; }
+static struct nfs4_client *lookup_clientid(clientid_t *clid, bool sessions, + struct nfsd_net *nn) +{ + struct nfs4_client *found; + + spin_lock(&nn->client_lock); + found = find_confirmed_client(clid, sessions, nn); + if (found) + atomic_inc(&found->cl_rpc_users); + spin_unlock(&nn->client_lock); + return found; +} + static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, struct nfsd_net *nn, bool sessions) { - struct nfs4_client *found; - if (cstate->clp) { - found = cstate->clp; - if (!same_clid(&found->cl_clientid, clid)) + if (!same_clid(&cstate->clp->cl_clientid, clid)) return nfserr_stale_clientid; return nfs_ok; } - if (STALE_CLIENTID(clid, nn)) return nfserr_stale_clientid; - /* * For v4.1+ we get the client in the SEQUENCE op. If we don't have one * cached already then we know this is for is for v4.0 and "sessions" * will be false. */ WARN_ON_ONCE(cstate->session); - spin_lock(&nn->client_lock); - found = find_confirmed_client(clid, sessions, nn); - if (!found) { - spin_unlock(&nn->client_lock); + cstate->clp = lookup_clientid(clid, sessions, nn); + if (!cstate->clp) return nfserr_expired; - } - atomic_inc(&found->cl_rpc_users); - spin_unlock(&nn->client_lock); - - /* Cache the nfs4_client in cstate! */ - cstate->clp = found; return nfs_ok; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 47fdb22dacae78f37701d82a94c16a014186d34e ]
I think this unusual use of struct compound_state could cause confusion.
It's not that much more complicated just to open-code this stateid lookup.
The only change in behavior should be a different error return in the case the copy is using a source stateid that is a revoked delegation, but I doubt that matters.
Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: squashed in fix reported by Coverity ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c4e9f807b4a1b..b05598e5bc168 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5832,21 +5832,27 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st, { __be32 status; struct nfs4_cpntf_state *cps = NULL; - struct nfsd4_compound_state cstate; + struct nfs4_client *found;
status = manage_cpntf_state(nn, st, NULL, &cps); if (status) return status;
cps->cpntf_time = ktime_get_boottime_seconds(); - memset(&cstate, 0, sizeof(cstate)); - status = set_client(&cps->cp_p_clid, &cstate, nn, true); - if (status) + + status = nfserr_expired; + found = lookup_clientid(&cps->cp_p_clid, true, nn); + if (!found) goto out; - status = nfsd4_lookup_stateid(&cstate, &cps->cp_p_stateid, - NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID, - stid, nn); - put_client_renew(cstate.clp); + + *stid = find_stateid_by_type(found, &cps->cp_p_stateid, + NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID); + if (*stid) + status = nfs_ok; + else + status = nfserr_bad_stateid; + + put_client_renew(found); out: nfs4_put_cpntf_state(nn, cps); return status;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit f71475ba8c2a77fff8051903cf4b7d826c3d1693 ]
Every caller is setting this argument to false, so we don't need it.
Also cut this comment a bit and remove an unnecessary warning.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b05598e5bc168..cbec87ee6bc0e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4690,8 +4690,7 @@ static struct nfs4_client *lookup_clientid(clientid_t *clid, bool sessions,
static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, - struct nfsd_net *nn, - bool sessions) + struct nfsd_net *nn) { if (cstate->clp) { if (!same_clid(&cstate->clp->cl_clientid, clid)) @@ -4701,12 +4700,10 @@ static __be32 set_client(clientid_t *clid, if (STALE_CLIENTID(clid, nn)) return nfserr_stale_clientid; /* - * For v4.1+ we get the client in the SEQUENCE op. If we don't have one - * cached already then we know this is for is for v4.0 and "sessions" - * will be false. + * We're in the 4.0 case (otherwise the SEQUENCE op would have + * set cstate->clp), so session = false: */ - WARN_ON_ONCE(cstate->session); - cstate->clp = lookup_clientid(clid, sessions, nn); + cstate->clp = lookup_clientid(clid, false, nn); if (!cstate->clp) return nfserr_expired; return nfs_ok; @@ -4730,7 +4727,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (open->op_file == NULL) return nfserr_jukebox;
- status = set_client(clientid, cstate, nn, false); + status = set_client(clientid, cstate, nn); if (status) return status; clp = cstate->clp; @@ -5320,7 +5317,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
trace_nfsd_clid_renew(clid); - status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return status; clp = cstate->clp; @@ -5701,7 +5698,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) || CLOSE_STATEID(stateid)) return nfserr_bad_stateid; - status = set_client(&stateid->si_opaque.so_clid, cstate, nn, false); + status = set_client(&stateid->si_opaque.so_clid, cstate, nn); if (status == nfserr_stale_clientid) { if (cstate->session) return nfserr_bad_stateid; @@ -6938,7 +6935,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_inval;
if (!nfsd4_has_session(cstate)) { - status = set_client(&lockt->lt_clientid, cstate, nn, false); + status = set_client(&lockt->lt_clientid, cstate, nn); if (status) goto out; } @@ -7122,7 +7119,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id);
- status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return status;
@@ -7262,7 +7259,7 @@ nfs4_check_open_reclaim(clientid_t *clid, __be32 status;
/* find clientid in conf_id_hashtbl */ - status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return nfserr_reclaim_bad;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 1722b04624806ced51693f546edb83e8b2297a77 ]
The set_client() was already taken care of by process_open1().
The comments here are mostly redundant with the code.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 3 +-- fs/nfsd/nfs4state.c | 18 +++--------------- fs/nfsd/state.h | 3 +-- 3 files changed, 5 insertions(+), 19 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 4f64d94909ec1..5d304b51914a2 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -428,8 +428,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; break; case NFS4_OPEN_CLAIM_PREVIOUS: - status = nfs4_check_open_reclaim(&open->op_clientid, - cstate, nn); + status = nfs4_check_open_reclaim(cstate->clp); if (status) goto out; open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index cbec87ee6bc0e..4da8467a3570d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7248,25 +7248,13 @@ nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn) return NULL; }
-/* -* Called from OPEN. Look for clientid in reclaim list. -*/ __be32 -nfs4_check_open_reclaim(clientid_t *clid, - struct nfsd4_compound_state *cstate, - struct nfsd_net *nn) +nfs4_check_open_reclaim(struct nfs4_client *clp) { - __be32 status; - - /* find clientid in conf_id_hashtbl */ - status = set_client(clid, cstate, nn); - if (status) - return nfserr_reclaim_bad; - - if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags)) + if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) return nfserr_no_grace;
- if (nfsd4_client_record_check(cstate->clp)) + if (nfsd4_client_record_check(clp)) return nfserr_reclaim_bad;
return nfs_ok; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 9eae11a9d21ca..73deea3531699 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -649,8 +649,7 @@ void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *) extern void nfs4_release_reclaim(struct nfsd_net *); extern struct nfs4_client_reclaim *nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn); -extern __be32 nfs4_check_open_reclaim(clientid_t *clid, - struct nfsd4_compound_state *cstate, struct nfsd_net *nn); +extern __be32 nfs4_check_open_reclaim(struct nfs4_client *); extern void nfsd4_probe_callback(struct nfs4_client *clp); extern void nfsd4_probe_callback_sync(struct nfs4_client *clp); extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit ec59659b4972ec25851aa03b4b5baba6764a62e4 ]
I'm not sure why we're writing this out the hard way in so many places.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 5 ++--- fs/nfsd/nfs4state.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5d304b51914a2..0acf7af9aab8f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -378,8 +378,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * Before RECLAIM_COMPLETE done, server should deny new lock */ if (nfsd4_has_session(cstate) && - !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, - &cstate->session->se_client->cl_flags) && + !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags) && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) return nfserr_grace;
@@ -1881,7 +1880,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp, nfserr = nfs_ok; if (gdp->gd_maxcount != 0) { nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb, - rqstp, cstate->session->se_client, gdp); + rqstp, cstate->clp, gdp); }
gdp->gd_notify_types &= ops->notify_types; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4da8467a3570d..c2eba93ab5138 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3923,6 +3923,7 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; + struct nfs4_client *clp = cstate->clp; __be32 status = 0;
if (rc->rca_one_fs) { @@ -3936,12 +3937,11 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, }
status = nfserr_complete_already; - if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, - &cstate->session->se_client->cl_flags)) + if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) goto out;
status = nfserr_stale_clientid; - if (is_client_expired(cstate->session->se_client)) + if (is_client_expired(clp)) /* * The following error isn't really legal. * But we only get here if the client just explicitly @@ -3952,8 +3952,8 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, goto out;
status = nfs_ok; - nfsd4_client_record_create(cstate->session->se_client); - inc_reclaim_complete(cstate->session->se_client); + nfsd4_client_record_create(clp); + inc_reclaim_complete(clp); out: return status; } @@ -5938,7 +5938,7 @@ nfsd4_test_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct nfsd4_test_stateid_id *stateid; - struct nfs4_client *cl = cstate->session->se_client; + struct nfs4_client *cl = cstate->clp;
list_for_each_entry(stateid, &test_stateid->ts_stateid_list, ts_id_list) stateid->ts_id_status = @@ -5984,7 +5984,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stateid_t *stateid = &free_stateid->fr_stateid; struct nfs4_stid *s; struct nfs4_delegation *dp; - struct nfs4_client *cl = cstate->session->se_client; + struct nfs4_client *cl = cstate->clp; __be32 ret = nfserr_bad_stateid;
spin_lock(&cl->cl_lock); @@ -6716,7 +6716,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (nfsd4_has_session(cstate)) /* See rfc 5661 18.10.3: given clientid is ignored: */ memcpy(&lock->lk_new_clientid, - &cstate->session->se_client->cl_clientid, + &cstate->clp->cl_clientid, sizeof(clientid_t));
/* validate and update open stateid and open seqid */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 02591f9febd5f69bb4c266a4abf899c4cf21964f ]
Currently NFSv4_2 SSC helper, nfs_ssc, incorrectly uses GRACE_PERIOD as its config. Fix by adding new config NFS_V4_2_SSC_HELPER which depends on NFS_V4_2 and is automatically selected when NFSD_V4 is enabled. Also removed the file name from a comment in nfs_ssc.c.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/Kconfig | 4 ++++ fs/nfs/nfs4file.c | 4 ++++ fs/nfs/super.c | 12 ++++++++++++ fs/nfs_common/Makefile | 2 +- fs/nfs_common/nfs_ssc.c | 2 -- fs/nfsd/Kconfig | 1 + 6 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig index da524c4d7b7e0..462253ae483a3 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -333,6 +333,10 @@ config NFS_COMMON depends on NFSD || NFS_FS || LOCKD default y
+config NFS_V4_2_SSC_HELPER + tristate + default y if NFS_V4=y || NFS_FS=y + source "net/sunrpc/Kconfig" source "fs/ceph/Kconfig" source "fs/cifs/Kconfig" diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 70cd0d764c447..5ad57ad89fb1e 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -430,7 +430,9 @@ static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = { */ void nfs42_ssc_register_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs42_ssc_register(&nfs4_ssc_clnt_ops_tbl); +#endif }
/** @@ -441,7 +443,9 @@ void nfs42_ssc_register_ops(void) */ void nfs42_ssc_unregister_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs42_ssc_unregister(&nfs4_ssc_clnt_ops_tbl); +#endif } #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/super.c b/fs/nfs/super.c index b3fcc27b95648..7179d59d73ca4 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -86,9 +86,11 @@ const struct super_operations nfs_sops = { }; EXPORT_SYMBOL_GPL(nfs_sops);
+#ifdef CONFIG_NFS_V4_2 static const struct nfs_ssc_client_ops nfs_ssc_clnt_ops_tbl = { .sco_sb_deactive = nfs_sb_deactive, }; +#endif
#if IS_ENABLED(CONFIG_NFS_V4) static int __init register_nfs4_fs(void) @@ -111,15 +113,21 @@ static void unregister_nfs4_fs(void) } #endif
+#ifdef CONFIG_NFS_V4_2 static void nfs_ssc_register_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs_ssc_register(&nfs_ssc_clnt_ops_tbl); +#endif }
static void nfs_ssc_unregister_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs_ssc_unregister(&nfs_ssc_clnt_ops_tbl); +#endif } +#endif /* CONFIG_NFS_V4_2 */
static struct shrinker acl_shrinker = { .count_objects = nfs_access_cache_count, @@ -148,7 +156,9 @@ int __init register_nfs_fs(void) ret = register_shrinker(&acl_shrinker); if (ret < 0) goto error_3; +#ifdef CONFIG_NFS_V4_2 nfs_ssc_register_ops(); +#endif return 0; error_3: nfs_unregister_sysctl(); @@ -168,7 +178,9 @@ void __exit unregister_nfs_fs(void) unregister_shrinker(&acl_shrinker); nfs_unregister_sysctl(); unregister_nfs4_fs(); +#ifdef CONFIG_NFS_V4_2 nfs_ssc_unregister_ops(); +#endif unregister_filesystem(&nfs_fs_type); }
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile index fa82f5aaa6d95..119c75ab9fd08 100644 --- a/fs/nfs_common/Makefile +++ b/fs/nfs_common/Makefile @@ -7,4 +7,4 @@ obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o nfs_acl-objs := nfsacl.o
obj-$(CONFIG_GRACE_PERIOD) += grace.o -obj-$(CONFIG_GRACE_PERIOD) += nfs_ssc.o +obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o diff --git a/fs/nfs_common/nfs_ssc.c b/fs/nfs_common/nfs_ssc.c index f43bbb3739134..7c1509e968c81 100644 --- a/fs/nfs_common/nfs_ssc.c +++ b/fs/nfs_common/nfs_ssc.c @@ -1,7 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * fs/nfs_common/nfs_ssc_comm.c - * * Helper for knfsd's SSC to access ops in NFS client modules * * Author: Dai Ngo dai.ngo@oracle.com diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 248f1459c0399..d6cff5fbe705b 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -77,6 +77,7 @@ config NFSD_V4 select CRYPTO_MD5 select CRYPTO_SHA256 select GRACE_PERIOD + select NFS_V4_2_SSC_HELPER if NFS_V4_2 help This option enables support in your system's NFS server for version 4 of the NFS protocol (RFC 3530).
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 3cc55f4434b421d37300aa9a167ace7d60b45ccf ]
When exporting NFS, we may as well use the real change attribute returned by the original server instead of faking up a change attribute from the ctime.
Note we can't do that by setting I_VERSION--that would also turn on the logic in iversion.h which treats the lower bit specially, and that doesn't make sense for NFS.
So instead we define a new export operation for filesystems like NFS that want to manage the change attribute themselves.
Signed-off-by: J. Bruce Fields bfields@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 18 ++++++++++++++++++ fs/nfsd/nfsfh.h | 5 ++++- include/linux/exportfs.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 7412bb164fa77..f2b34cfe286c2 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -167,10 +167,28 @@ nfs_get_parent(struct dentry *dentry) return parent; }
+static u64 nfs_fetch_iversion(struct inode *inode) +{ + struct nfs_server *server = NFS_SERVER(inode); + + /* Is this the right call?: */ + nfs_revalidate_inode(server, inode); + /* + * Also, note we're ignoring any returned error. That seems to be + * the practice for cache consistency information elsewhere in + * the server, but I'm not sure why. + */ + if (server->nfs_client->rpc_ops->version >= 4) + return inode_peek_iversion_raw(inode); + else + return time_to_chattr(&inode->i_ctime); +} + const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| EXPORT_OP_NOATOMIC_ATTR, diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index cb20c2cd34695..f58933519f380 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -12,6 +12,7 @@ #include <linux/sunrpc/svc.h> #include <uapi/linux/nfsd/nfsfh.h> #include <linux/iversion.h> +#include <linux/exportfs.h>
static inline __u32 ino_t_to_u32(ino_t ino) { @@ -264,7 +265,9 @@ fh_clear_wcc(struct svc_fh *fhp) static inline u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) { - if (IS_I_VERSION(inode)) { + if (inode->i_sb->s_export_op->fetch_iversion) + return inode->i_sb->s_export_op->fetch_iversion(inode); + else if (IS_I_VERSION(inode)) { u64 chattr;
chattr = stat->ctime.tv_sec; diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 9f4d4bcbf251d..fe848901fcc3a 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,7 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); + u64 (*fetch_iversion)(struct inode *); #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 428a23d2bf0ca8fd4d364a464c3e468f0e81671e ]
In the typical case of v4 and an i_version-supporting filesystem, we can skip a stat which is only required to fake up a change attribute from ctime.
Signed-off-by: J. Bruce Fields bfields@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 44 +++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 00a96054280a6..9d9a01ce0b270 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -364,6 +364,11 @@ encode_wcc_data(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_post_op_attr(rqstp, p, fhp); }
+static bool fs_supports_change_attribute(struct super_block *sb) +{ + return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; +} + /* * Fill in the pre_op attr for the wcc data */ @@ -372,24 +377,26 @@ void fill_pre_wcc(struct svc_fh *fhp) struct inode *inode; struct kstat stat; bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - __be32 err;
if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); - err = fh_getattr(fhp, &stat); - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; + if (fs_supports_change_attribute(inode->i_sb) || !v4) { + __be32 err = fh_getattr(fhp, &stat); + + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; + } + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; } if (v4) fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode);
- fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; fhp->fh_pre_saved = true; }
@@ -400,7 +407,6 @@ void fill_post_wcc(struct svc_fh *fhp) { bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); struct inode *inode = d_inode(fhp->fh_dentry); - __be32 err;
if (fhp->fh_no_wcc) return; @@ -408,12 +414,16 @@ void fill_post_wcc(struct svc_fh *fhp) if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n");
- err = fh_getattr(fhp, &fhp->fh_post_attr); - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } else - fhp->fh_post_saved = true; + fhp->fh_post_saved = true; + + if (fs_supports_change_attribute(inode->i_sb) || !v4) { + __be32 err = fh_getattr(fhp, &fhp->fh_post_attr); + + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } + } if (v4) fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, inode);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shakeel Butt shakeelb@google.com
[ Upstream commit ac7b79fd190b02e7151bc7d2b9da692f537657f3 ]
Currently the fs sysctl inotify/max_user_instances is used to limit the number of inotify instances on the system. For systems running multiple workloads, the per-user namespace sysctl max_inotify_instances can be used to further partition inotify instances. However there is no easy way to set a sensible system level max limit on inotify instances and further partition it between the workloads. It is much easier to charge the underlying resource (i.e. memory) behind the inotify instances to the memcg of the workload and let their memory limits limit the number of inotify instances they can create.
With inotify instances charged to memcg, the admin can simply set max_user_instances to INT_MAX and let the memcg limits of the jobs limit their inotify instances.
Link: https://lore.kernel.org/r/20201220044608.1258123-1-shakeelb@google.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Shakeel Butt shakeelb@google.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/group.c | 25 ++++++++++++++++++++----- fs/notify/inotify/inotify_user.c | 4 ++-- include/linux/fsnotify_backend.h | 1 + 4 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 84de9f97bbc09..3e905b2e1b9c3 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -976,7 +976,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) f_flags |= O_NONBLOCK;
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ - group = fsnotify_alloc_group(&fanotify_fsnotify_ops); + group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); if (IS_ERR(group)) { free_uid(user); return PTR_ERR(group); diff --git a/fs/notify/group.c b/fs/notify/group.c index a4a4b1c64d32a..ffd723ffe46de 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -111,14 +111,12 @@ void fsnotify_put_group(struct fsnotify_group *group) } EXPORT_SYMBOL_GPL(fsnotify_put_group);
-/* - * Create a new fsnotify_group and hold a reference for the group returned. - */ -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +static struct fsnotify_group *__fsnotify_alloc_group( + const struct fsnotify_ops *ops, gfp_t gfp) { struct fsnotify_group *group;
- group = kzalloc(sizeof(struct fsnotify_group), GFP_KERNEL); + group = kzalloc(sizeof(struct fsnotify_group), gfp); if (!group) return ERR_PTR(-ENOMEM);
@@ -139,8 +137,25 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
return group; } + +/* + * Create a new fsnotify_group and hold a reference for the group returned. + */ +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +{ + return __fsnotify_alloc_group(ops, GFP_KERNEL); +} EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
+/* + * Create a new fsnotify_group and hold a reference for the group returned. + */ +struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops) +{ + return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT); +} +EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group); + int fsnotify_fasync(int fd, struct file *file, int on) { struct fsnotify_group *group = file->private_data; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index b564a32403aa5..ad8fb4bca6dc1 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -632,11 +632,11 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) struct fsnotify_group *group; struct inotify_event_info *oevent;
- group = fsnotify_alloc_group(&inotify_fsnotify_ops); + group = fsnotify_alloc_user_group(&inotify_fsnotify_ops); if (IS_ERR(group)) return group;
- oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); + oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL_ACCOUNT); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index a2e42d3cd87cf..e5409b83e7313 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -470,6 +470,7 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
/* create a new group */ extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); +extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 089049f6c9956c5cf1fc89fe10229c76e99f4bef ]
find_module is not used by modular code any more, and random driver code has no business calling it to start with.
Reviewed-by: Miroslav Benes mbenes@suse.cz Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jessica Yu jeyu@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/module.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c index 72a5dcdccf7b1..c0e51ffe26f0a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -88,7 +88,6 @@ * 3) module_addr_min/module_addr_max. * (delete and add uses RCU list operations). */ DEFINE_MUTEX(module_mutex); -EXPORT_SYMBOL_GPL(module_mutex); static LIST_HEAD(modules);
/* Work queue for freeing init sections in success case */ @@ -645,7 +644,6 @@ struct module *find_module(const char *name) module_assert_mutex(); return find_module_all(name, strlen(name), false); } -EXPORT_SYMBOL_GPL(find_module);
#ifdef CONFIG_SMP
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit a006050575745ca2be25118b90f1c37f454ac542 ]
Allow for a RCU-sched critical section around find_module, following the lower level find_module_all helper, and switch the two callers outside of module.c to use such a RCU-sched critical section instead of module_mutex.
Reviewed-by: Petr Mladek pmladek@suse.com Acked-by: Miroslav Benes mbenes@suse.cz Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jessica Yu jeyu@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/module.h | 2 +- kernel/livepatch/core.c | 5 +++-- kernel/module.c | 1 - kernel/trace/trace_kprobe.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h index 6264617bab4d4..86fae5d1c0e39 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -582,7 +582,7 @@ static inline bool within_module(unsigned long addr, const struct module *mod) return within_module_init(addr, mod) || within_module_core(addr, mod); }
-/* Search for module by name: must hold module_mutex. */ +/* Search for module by name: must be in a RCU-sched critical section. */ struct module *find_module(const char *name);
struct symsearch { diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index f5faf935c2d8f..e660ea4f90a28 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -19,6 +19,7 @@ #include <linux/moduleloader.h> #include <linux/completion.h> #include <linux/memory.h> +#include <linux/rcupdate.h> #include <asm/cacheflush.h> #include "core.h" #include "patch.h" @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj) if (!klp_is_module(obj)) return;
- mutex_lock(&module_mutex); + rcu_read_lock_sched(); /* * We do not want to block removal of patched modules and therefore * we do not take a reference here. The patches are removed by @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj) if (mod && mod->klp_alive) obj->mod = mod;
- mutex_unlock(&module_mutex); + rcu_read_unlock_sched(); }
static bool klp_initialized(void) diff --git a/kernel/module.c b/kernel/module.c index c0e51ffe26f0a..1f9f6133c30ef 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -641,7 +641,6 @@ static struct module *find_module_all(const char *name, size_t len,
struct module *find_module(const char *name) { - module_assert_mutex(); return find_module_all(name, strlen(name), false); }
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 7183572898998..5453af26ff764 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -124,9 +124,9 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk) if (!p) return true; *p = '\0'; - mutex_lock(&module_mutex); + rcu_read_lock_sched(); ret = !!find_module(tk->symbol); - mutex_unlock(&module_mutex); + rcu_read_unlock_sched(); *p = ':';
return ret;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 013c1667cf78c1d847152f7116436d82dcab3db4 ]
Require an explicit call to module_kallsyms_on_each_symbol to look for symbols in modules instead of the call from kallsyms_on_each_symbol, and acquire module_mutex inside of module_kallsyms_on_each_symbol instead of leaving that up to the caller. Note that this slightly changes the behavior for the livepatch code in that the symbols from vmlinux are not iterated anymore if objname is set, but that actually is the desired behavior in this case.
Reviewed-by: Petr Mladek pmladek@suse.com Acked-by: Miroslav Benes mbenes@suse.cz Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jessica Yu jeyu@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kallsyms.c | 6 +++++- kernel/livepatch/core.c | 2 -- kernel/module.c | 13 ++++--------- 3 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index fe9de067771c3..a0d3f0865916f 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -177,6 +177,10 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); }
+/* + * Iterate over all symbols in vmlinux. For symbols from modules use + * module_kallsyms_on_each_symbol instead. + */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -192,7 +196,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, if (ret != 0) return ret; } - return module_kallsyms_on_each_symbol(fn, data); + return 0; }
static unsigned long get_symbol_pos(unsigned long addr, diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index e660ea4f90a28..147ed154ebc77 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -164,12 +164,10 @@ static int klp_find_object_symbol(const char *objname, const char *name, .pos = sympos, };
- mutex_lock(&module_mutex); if (objname) module_kallsyms_on_each_symbol(klp_find_callback, &args); else kallsyms_on_each_symbol(klp_find_callback, &args); - mutex_unlock(&module_mutex);
/* * Ensure an address was found. If sympos is 0, ensure symbol is unique; diff --git a/kernel/module.c b/kernel/module.c index 1f9f6133c30ef..330387d63c633 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -255,11 +255,6 @@ static void mod_update_bounds(struct module *mod) struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */ #endif /* CONFIG_KGDB_KDB */
-static void module_assert_mutex(void) -{ - lockdep_assert_held(&module_mutex); -} - static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP @@ -4457,8 +4452,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, unsigned int i; int ret;
- module_assert_mutex(); - + mutex_lock(&module_mutex); list_for_each_entry(mod, &modules, list) { /* We hold module_mutex: no need for rcu_dereference_sched */ struct mod_kallsyms *kallsyms = mod->kallsyms; @@ -4474,10 +4468,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, ret = fn(data, kallsyms_symbol_name(kallsyms, i), mod, kallsyms_symbol_value(sym)); if (ret != 0) - return ret; + break; } } - return 0; + mutex_unlock(&module_mutex); + return ret; } #endif /* CONFIG_KALLSYMS */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 3e3552056ab42f883d7723eeb42fed712b66bacf ]
kallsyms_on_each_symbol and module_kallsyms_on_each_symbol are only used by the livepatching code, so don't build them if livepatching is not enabled.
Reviewed-by: Miroslav Benes mbenes@suse.cz Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jessica Yu jeyu@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/kallsyms.h | 17 ++++------------- include/linux/module.h | 16 ++++------------ kernel/kallsyms.c | 2 ++ kernel/module.c | 2 ++ 4 files changed, 12 insertions(+), 25 deletions(-)
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 481273f0c72d4..465060acc9816 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -71,15 +71,14 @@ static inline void *dereference_symbol_descriptor(void *ptr) return ptr; }
-#ifdef CONFIG_KALLSYMS -/* Lookup the address for a symbol. Returns 0 if not found. */ -unsigned long kallsyms_lookup_name(const char *name); - -/* Call a function on each kallsyms symbol in the core kernel */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data);
+#ifdef CONFIG_KALLSYMS +/* Lookup the address for a symbol. Returns 0 if not found. */ +unsigned long kallsyms_lookup_name(const char *name); + extern int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset); @@ -108,14 +107,6 @@ static inline unsigned long kallsyms_lookup_name(const char *name) return 0; }
-static inline int kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, - unsigned long), - void *data) -{ - return 0; -} - static inline int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset) diff --git a/include/linux/module.h b/include/linux/module.h index 86fae5d1c0e39..59cbd8e1be2d6 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -604,10 +604,6 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, /* Look for this name: can be of form module:name. */ unsigned long module_kallsyms_lookup_name(const char *name);
-int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, unsigned long), - void *data); - extern void __noreturn __module_put_and_exit(struct module *mod, long code); #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code) @@ -791,14 +787,6 @@ static inline unsigned long module_kallsyms_lookup_name(const char *name) return 0; }
-static inline int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, - unsigned long), - void *data) -{ - return 0; -} - static inline int register_module_notifier(struct notifier_block *nb) { /* no events will happen anyway, so this can always succeed */ @@ -887,4 +875,8 @@ static inline bool module_sig_ok(struct module *module) } #endif /* CONFIG_MODULE_SIG */
+int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, + struct module *, unsigned long), + void *data); + #endif /* _LINUX_MODULE_H */ diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index a0d3f0865916f..8043a90aa50ed 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -177,6 +177,7 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); }
+#ifdef CONFIG_LIVEPATCH /* * Iterate over all symbols in vmlinux. For symbols from modules use * module_kallsyms_on_each_symbol instead. @@ -198,6 +199,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, } return 0; } +#endif /* CONFIG_LIVEPATCH */
static unsigned long get_symbol_pos(unsigned long addr, unsigned long *symbolsize, diff --git a/kernel/module.c b/kernel/module.c index 330387d63c633..949d09d2d8297 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -4444,6 +4444,7 @@ unsigned long module_kallsyms_lookup_name(const char *name) return ret; }
+#ifdef CONFIG_LIVEPATCH int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -4474,6 +4475,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, mutex_unlock(&module_mutex); return ret; } +#endif /* CONFIG_LIVEPATCH */ #endif /* CONFIG_KALLSYMS */
/* Maximum number of characters written by module_flags() */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner christian.brauner@ubuntu.com
[ Upstream commit 02f92b3868a1b34ab98464e76b0e4e060474ba10 ]
Add two simple helpers to check permissions on a file and path respectively and convert over some callers. It simplifies quite a few codepaths and also reduces the churn in later patches quite a bit. Christoph also correctly points out that this makes codepaths (e.g. ioctls) way easier to follow that would otherwise have to do more complex argument passing than necessary.
Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.c... Cc: David Howells dhowells@redhat.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Suggested-by: Christoph Hellwig hch@lst.de Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: James Morris jamorris@linux.microsoft.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/init.c | 6 +++--- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/inotify/inotify_user.c | 2 +- fs/open.c | 6 +++--- fs/udf/file.c | 2 +- fs/verity/enable.c | 2 +- include/linux/fs.h | 8 ++++++++ kernel/bpf/inode.c | 2 +- kernel/sys.c | 2 +- mm/madvise.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- net/unix/af_unix.c | 2 +- 13 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/fs/init.c b/fs/init.c index e9c320a48cf15..02723bea84990 100644 --- a/fs/init.c +++ b/fs/init.c @@ -49,7 +49,7 @@ int __init init_chdir(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &path); path_put(&path); @@ -64,7 +64,7 @@ int __init init_chroot(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; error = -EPERM; @@ -118,7 +118,7 @@ int __init init_eaccess(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW, &path); if (error) return error; - error = inode_permission(d_inode(path.dentry), MAY_ACCESS); + error = path_permission(&path, MAY_ACCESS); path_put(&path); return error; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3e905b2e1b9c3..829ead2792dfb 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -702,7 +702,7 @@ static int fanotify_find_path(int dfd, const char __user *filename, }
/* you can only watch an inode if you have read permissions on it */ - ret = inode_permission(path->dentry->d_inode, MAY_READ); + ret = path_permission(path, MAY_READ); if (ret) { path_put(path); goto out; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index ad8fb4bca6dc1..82fc0cf86a7c3 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -352,7 +352,7 @@ static int inotify_find_inode(const char __user *dirname, struct path *path, if (error) return error; /* you can only watch an inode if you have read permissions on it */ - error = inode_permission(path->dentry->d_inode, MAY_READ); + error = path_permission(path, MAY_READ); if (error) { path_put(path); return error; diff --git a/fs/open.c b/fs/open.c index 48933cbb75391..9f56ebacfbefe 100644 --- a/fs/open.c +++ b/fs/open.c @@ -492,7 +492,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename) if (error) goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out;
@@ -521,7 +521,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) if (!d_can_lookup(f.file->f_path.dentry)) goto out_putf;
- error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR); + error = file_permission(f.file, MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &f.file->f_path); out_putf: @@ -540,7 +540,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename) if (error) goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out;
diff --git a/fs/udf/file.c b/fs/udf/file.c index e283a62701b83..25f7c915f22b7 100644 --- a/fs/udf/file.c +++ b/fs/udf/file.c @@ -181,7 +181,7 @@ long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) long old_block, new_block; int result;
- if (inode_permission(inode, MAY_READ) != 0) { + if (file_permission(filp, MAY_READ) != 0) { udf_debug("no permission to access inode %lu\n", inode->i_ino); return -EPERM; } diff --git a/fs/verity/enable.c b/fs/verity/enable.c index 5ceae66e1ae02..29becb66d0d88 100644 --- a/fs/verity/enable.c +++ b/fs/verity/enable.c @@ -369,7 +369,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg) * has verity enabled, and to stabilize the data being hashed. */
- err = inode_permission(inode, MAY_WRITE); + err = file_permission(filp, MAY_WRITE); if (err) return err;
diff --git a/include/linux/fs.h b/include/linux/fs.h index 6de70634e5471..0974e8160f50c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2824,6 +2824,14 @@ static inline int bmap(struct inode *inode, sector_t *block) extern int notify_change(struct dentry *, struct iattr *, struct inode **); extern int inode_permission(struct inode *, int); extern int generic_permission(struct inode *, int); +static inline int file_permission(struct file *file, int mask) +{ + return inode_permission(file_inode(file), mask); +} +static inline int path_permission(const struct path *path, int mask) +{ + return inode_permission(d_inode(path->dentry), mask); +} extern int __check_sticky(struct inode *dir, struct inode *inode);
static inline bool execute_ok(struct inode *inode) diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 6b14b4c4068cc..5966013bc788b 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -507,7 +507,7 @@ static void *bpf_obj_do_get(const char __user *pathname, return ERR_PTR(ret);
inode = d_backing_inode(path.dentry); - ret = inode_permission(inode, ACC_MODE(flags)); + ret = path_permission(&path, ACC_MODE(flags)); if (ret) goto out;
diff --git a/kernel/sys.c b/kernel/sys.c index efc213ae4c5ad..7a2cfb57fa9e7 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1873,7 +1873,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path)) goto exit;
- err = inode_permission(inode, MAY_EXEC); + err = file_permission(exe.file, MAY_EXEC); if (err) goto exit;
diff --git a/mm/madvise.c b/mm/madvise.c index f71fc88f0b331..a63aa04ec7fa3 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -543,7 +543,7 @@ static inline bool can_do_pageout(struct vm_area_struct *vma) * opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; + file_permission(vma->vm_file, MAY_WRITE) == 0; }
static long madvise_pageout(struct vm_area_struct *vma, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ddc8ed096deca..186ae9dba0fd5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4918,7 +4918,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
/* the process need read permission on control file */ /* AV: shouldn't we check that it's been opened for read instead? */ - ret = inode_permission(file_inode(cfile.file), MAY_READ); + ret = file_permission(cfile.file, MAY_READ); if (ret < 0) goto out_put_cfile;
diff --git a/mm/mincore.c b/mm/mincore.c index 02db1a834021b..7bdb4673f776a 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -167,7 +167,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma) * mappings, which opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; + file_permission(vma->vm_file, MAY_WRITE) == 0; }
static const struct mm_walk_ops mincore_walk_ops = { diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3ab726a668e8a..405bf3e6eb796 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -959,7 +959,7 @@ static struct sock *unix_find_other(struct net *net, if (err) goto fail; inode = d_backing_inode(path.dentry); - err = inode_permission(inode, MAY_WRITE); + err = path_permission(&path, MAY_WRITE); if (err) goto put_fail;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner christian.brauner@ubuntu.com
[ Upstream commit 9fe61450972d3900bffb1dc26a17ebb9cdd92db2 ]
In order to handle idmapped mounts we will extend the vfs rename helper to take two new arguments in follow up patches. Since this operations already takes a bunch of arguments add a simple struct renamedata and make the current helper use it before we extend it.
Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.... Cc: Christoph Hellwig hch@lst.de Cc: David Howells dhowells@redhat.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig hch@lst.de [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/namei.c | 9 +++++++-- fs/ecryptfs/inode.c | 10 +++++++--- fs/namei.c | 21 +++++++++++++++------ fs/nfsd/vfs.c | 8 +++++++- fs/overlayfs/overlayfs.h | 9 ++++++++- include/linux/fs.h | 12 +++++++++++- 6 files changed, 55 insertions(+), 14 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index ecc8ecbbfa5ac..7b987de0babe8 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -412,9 +412,14 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache, if (ret < 0) { cachefiles_io_error(cache, "Rename security error %d", ret); } else { + struct renamedata rd = { + .old_dir = d_inode(dir), + .old_dentry = rep, + .new_dir = d_inode(cache->graveyard), + .new_dentry = grave, + }; trace_cachefiles_rename(object, rep, grave, why); - ret = vfs_rename(d_inode(dir), rep, - d_inode(cache->graveyard), grave, NULL, 0); + ret = vfs_rename(&rd); if (ret != 0 && ret != -ENOMEM) cachefiles_io_error(cache, "Rename failed with error %d", ret); diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index c867a0d62f360..1dbe0c3ff38ea 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -598,6 +598,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *lower_new_dir_dentry; struct dentry *trap; struct inode *target_inode; + struct renamedata rd = {};
if (flags) return -EINVAL; @@ -627,9 +628,12 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, rc = -ENOTEMPTY; goto out_lock; } - rc = vfs_rename(d_inode(lower_old_dir_dentry), lower_old_dentry, - d_inode(lower_new_dir_dentry), lower_new_dentry, - NULL, 0); + + rd.old_dir = d_inode(lower_old_dir_dentry); + rd.old_dentry = lower_old_dentry; + rd.new_dir = d_inode(lower_new_dir_dentry); + rd.new_dentry = lower_new_dentry; + rc = vfs_rename(&rd); if (rc) goto out_lock; if (target_inode) diff --git a/fs/namei.c b/fs/namei.c index cb37d7c477e0b..72521a614514b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4277,11 +4277,14 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname * ->i_mutex on parents, which works but leads to some truly excessive * locking]. */ -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry, - struct inode **delegated_inode, unsigned int flags) +int vfs_rename(struct renamedata *rd) { int error; + struct inode *old_dir = rd->old_dir, *new_dir = rd->new_dir; + struct dentry *old_dentry = rd->old_dentry; + struct dentry *new_dentry = rd->new_dentry; + struct inode **delegated_inode = rd->delegated_inode; + unsigned int flags = rd->flags; bool is_dir = d_is_dir(old_dentry); struct inode *source = old_dentry->d_inode; struct inode *target = new_dentry->d_inode; @@ -4429,6 +4432,7 @@ EXPORT_SYMBOL(vfs_rename); int do_renameat2(int olddfd, struct filename *from, int newdfd, struct filename *to, unsigned int flags) { + struct renamedata rd; struct dentry *old_dentry, *new_dentry; struct dentry *trap; struct path old_path, new_path; @@ -4532,9 +4536,14 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, &new_path, new_dentry, flags); if (error) goto exit5; - error = vfs_rename(old_path.dentry->d_inode, old_dentry, - new_path.dentry->d_inode, new_dentry, - &delegated_inode, flags); + + rd.old_dir = old_path.dentry->d_inode; + rd.old_dentry = old_dentry; + rd.new_dir = new_path.dentry->d_inode; + rd.new_dentry = new_dentry; + rd.delegated_inode = &delegated_inode; + rd.flags = flags; + error = vfs_rename(&rd); exit5: dput(new_dentry); exit4: diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3e30788e0046b..d12c3e71ca10e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1812,7 +1812,13 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, close_cached = true; goto out_dput_old; } else { - host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); + struct renamedata rd = { + .old_dir = fdir, + .old_dentry = odentry, + .new_dir = tdir, + .new_dentry = ndentry, + }; + host_err = vfs_rename(&rd); if (!host_err) { host_err = commit_metadata(tfhp); if (!host_err) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 26f91868fbdaf..87b7a4a74f4ed 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -212,9 +212,16 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry, unsigned int flags) { int err; + struct renamedata rd = { + .old_dir = olddir, + .old_dentry = olddentry, + .new_dir = newdir, + .new_dentry = newdentry, + .flags = flags, + };
pr_debug("rename(%pd2, %pd2, 0x%x)\n", olddentry, newdentry, flags); - err = vfs_rename(olddir, olddentry, newdir, newdentry, NULL, flags); + err = vfs_rename(&rd); if (err) { pr_debug("...rename(%pd2, %pd2, ...) = %i\n", olddentry, newdentry, err); diff --git a/include/linux/fs.h b/include/linux/fs.h index 0974e8160f50c..cc3b6ddf58223 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1780,7 +1780,17 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *); extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **); extern int vfs_rmdir(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); -extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); + +struct renamedata { + struct inode *old_dir; + struct dentry *old_dentry; + struct inode *new_dir; + struct dentry *new_dentry; + struct inode **delegated_inode; + unsigned int flags; +} __randomize_layout; + +int vfs_rename(struct renamedata *);
static inline int vfs_whiteout(struct inode *dir, struct dentry *dentry) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bddfdbcddbe267519cd36aeb115fdf8620980111 ]
NFSD initializes an encode xdr_stream only after the RPC layer has already inserted the RPC Reply header. Thus it behaves differently than xdr_init_encode does, which assumes the passed-in xdr_buf is entirely devoid of content.
nfs4proc.c has this server-side stream initialization helper, but it is visible only to the NFSv4 code. Move this helper to a place that can be accessed by NFSv2 and NFSv3 server XDR functions.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 31 +++-------- fs/nfsd/nfs4state.c | 6 +- fs/nfsd/nfs4xdr.c | 110 ++++++++++++++++++------------------- fs/nfsd/nfssvc.c | 4 +- fs/nfsd/xdr4.h | 2 +- include/linux/sunrpc/svc.h | 25 +++++++++ 6 files changed, 94 insertions(+), 84 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0acf7af9aab8f..2fba0808d975c 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2256,25 +2256,6 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp) return !(nextd->op_flags & OP_HANDLES_WRONGSEC); }
-static void svcxdr_init_encode(struct svc_rqst *rqstp, - struct nfsd4_compoundres *resp) -{ - struct xdr_stream *xdr = &resp->xdr; - struct xdr_buf *buf = &rqstp->rq_res; - struct kvec *head = buf->head; - - xdr->buf = buf; - xdr->iov = head; - xdr->p = head->iov_base + head->iov_len; - xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; - /* Tail and page_len should be zero at this point: */ - buf->len = buf->head[0].iov_len; - xdr_reset_scratch_buffer(xdr); - xdr->page_ptr = buf->pages - 1; - buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages) - - rqstp->rq_auth_slack; -} - #ifdef CONFIG_NFSD_V4_2_INTER_SSC static void check_if_stalefh_allowed(struct nfsd4_compoundargs *args) @@ -2329,10 +2310,14 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); __be32 status;
- svcxdr_init_encode(rqstp, resp); - resp->tagp = resp->xdr.p; + resp->xdr = &rqstp->rq_res_stream; + + /* reserve space for: NFS status code */ + xdr_reserve_space(resp->xdr, XDR_UNIT); + + resp->tagp = resp->xdr->p; /* reserve space for: taglen, tag, and opcnt */ - xdr_reserve_space(&resp->xdr, 8 + args->taglen); + xdr_reserve_space(resp->xdr, XDR_UNIT * 2 + args->taglen); resp->taglen = args->taglen; resp->tag = args->tag; resp->rqstp = rqstp; @@ -2438,7 +2423,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) encode_op: if (op->status == nfserr_replay_me) { op->replay = &cstate->replay_owner->so_replay; - nfsd4_encode_replay(&resp->xdr, op); + nfsd4_encode_replay(resp->xdr, op); status = op->status = op->replay->rp_status; } else { nfsd4_encode_operation(resp, op); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c2eba93ab5138..914f60cee3226 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2925,7 +2925,7 @@ gen_callback(struct nfs4_client *clp, struct nfsd4_setclientid *se, struct svc_r static void nfsd4_store_cache_entry(struct nfsd4_compoundres *resp) { - struct xdr_buf *buf = resp->xdr.buf; + struct xdr_buf *buf = resp->xdr->buf; struct nfsd4_slot *slot = resp->cstate.slot; unsigned int base;
@@ -2995,7 +2995,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp, struct nfsd4_sequence *seq) { struct nfsd4_slot *slot = resp->cstate.slot; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; __be32 status;
@@ -3740,7 +3740,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_sequence *seq = &u->sequence; struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfsd4_session *session; struct nfs4_client *clp; struct nfsd4_slot *slot; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8d5fdae568aeb..262c6fec56aa6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3595,7 +3595,7 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid) static __be32 nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 8); @@ -3608,7 +3608,7 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8); @@ -3625,7 +3625,7 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, static __be32 nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &close->cl_stateid); } @@ -3634,7 +3634,7 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c static __be32 nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE); @@ -3648,7 +3648,7 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ static __be32 nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 20); @@ -3663,7 +3663,7 @@ static __be32 nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr) { struct svc_fh *fhp = getattr->ga_fhp; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry, getattr->ga_bmval, resp->rqstp, 0); @@ -3672,7 +3672,7 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct svc_fh *fhp = *fhpp; unsigned int len; __be32 *p; @@ -3727,7 +3727,7 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld) static __be32 nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
if (!nfserr) nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid); @@ -3740,7 +3740,7 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo static __be32 nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
if (nfserr == nfserr_denied) nfsd4_encode_lock_denied(xdr, &lockt->lt_denied); @@ -3750,7 +3750,7 @@ nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l static __be32 nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &locku->lu_stateid); } @@ -3759,7 +3759,7 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l static __be32 nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 20); @@ -3773,7 +3773,7 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li static __be32 nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid); @@ -3867,7 +3867,7 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op static __be32 nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid); } @@ -3875,7 +3875,7 @@ nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct static __be32 nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &od->od_stateid); } @@ -3885,7 +3885,7 @@ static __be32 nfsd4_encode_splice_read( struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct xdr_buf *buf = xdr->buf; int status, space_left; u32 eof; @@ -3951,7 +3951,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; u32 eof; int starting_len = xdr->buf->len - 8; __be32 nfserr; @@ -3990,7 +3990,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { unsigned long maxcount; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file; int starting_len = xdr->buf->len; __be32 *p; @@ -4004,7 +4004,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)); return nfserr_resource; } - if (resp->xdr.buf->page_len && + if (resp->xdr->buf->page_len && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { WARN_ON_ONCE(1); return nfserr_serverfault; @@ -4034,7 +4034,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd int maxcount; __be32 wire_count; int zero = 0; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; int status; __be32 *p; @@ -4086,7 +4086,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 int bytes_left; loff_t offset; __be64 wire_offset; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; int starting_len = xdr->buf->len; __be32 *p;
@@ -4097,8 +4097,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 /* XXX: Following NFSv3, we ignore the READDIR verifier for now. */ *p++ = cpu_to_be32(0); *p++ = cpu_to_be32(0); - resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p) - - (char *)resp->xdr.buf->head[0].iov_base; + xdr->buf->head[0].iov_len = (char *)xdr->p - + (char *)xdr->buf->head[0].iov_base;
/* * Number of bytes left for directory entries allowing for the @@ -4173,7 +4173,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 20); @@ -4186,7 +4186,7 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ static __be32 nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 40); @@ -4269,7 +4269,7 @@ static __be32 nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_secinfo *secinfo) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_do_encode_secinfo(xdr, secinfo->si_exp); } @@ -4278,7 +4278,7 @@ static __be32 nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_secinfo_no_name *secinfo) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr;
return nfsd4_do_encode_secinfo(xdr, secinfo->sin_exp); } @@ -4290,7 +4290,7 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 16); @@ -4314,7 +4314,7 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
if (!nfserr) { @@ -4338,7 +4338,7 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n static __be32 nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 16); @@ -4355,7 +4355,7 @@ static __be32 nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_exchange_id *exid) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; char *major_id; char *server_scope; @@ -4433,7 +4433,7 @@ static __be32 nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create_session *sess) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 24); @@ -4486,7 +4486,7 @@ static __be32 nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_sequence *seq) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20); @@ -4509,7 +4509,7 @@ static __be32 nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_test_stateid *test_stateid) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfsd4_test_stateid_id *stateid, *next; __be32 *p;
@@ -4530,7 +4530,7 @@ static __be32 nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getdeviceinfo *gdev) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; u32 starting_len = xdr->buf->len, needed_len; __be32 *p; @@ -4583,7 +4583,7 @@ static __be32 nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutget *lgp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; __be32 *p;
@@ -4610,7 +4610,7 @@ static __be32 nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutcommit *lcp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 4); @@ -4631,7 +4631,7 @@ static __be32 nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutreturn *lrp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 4); @@ -4649,7 +4649,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write, bool sync) { __be32 *p; - p = xdr_reserve_space(&resp->xdr, 4); + p = xdr_reserve_space(resp->xdr, 4); if (!p) return nfserr_resource;
@@ -4658,11 +4658,11 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, else { __be32 nfserr; *p++ = cpu_to_be32(1); - nfserr = nfsd4_encode_stateid(&resp->xdr, &write->cb_stateid); + nfserr = nfsd4_encode_stateid(resp->xdr, &write->cb_stateid); if (nfserr) return nfserr; } - p = xdr_reserve_space(&resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); + p = xdr_reserve_space(resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); if (!p) return nfserr_resource;
@@ -4676,7 +4676,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, static __be32 nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfs42_netaddr *addr; __be32 *p;
@@ -4724,7 +4724,7 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, if (nfserr) return nfserr;
- p = xdr_reserve_space(&resp->xdr, 4 + 4); + p = xdr_reserve_space(resp->xdr, 4 + 4); *p++ = xdr_one; /* cr_consecutive */ *p++ = cpu_to_be32(copy->cp_synchronous); return 0; @@ -4734,7 +4734,7 @@ static __be32 nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_offload_status *os) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 8 + 4); @@ -4751,7 +4751,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, unsigned long *maxcount, u32 *eof, loff_t *pos) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file = read->rd_nf->nf_file; int starting_len = xdr->buf->len; loff_t hole_pos; @@ -4810,7 +4810,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, count = data_pos - read->rd_offset;
/* Content type, offset, byte count */ - p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8); + p = xdr_reserve_space(resp->xdr, 4 + 8 + 8); if (!p) return nfserr_resource;
@@ -4828,7 +4828,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { unsigned long maxcount, count; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file; int starting_len = xdr->buf->len; int last_segment = xdr->buf->len; @@ -4899,7 +4899,7 @@ static __be32 nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_copy_notify *cn) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
if (nfserr) @@ -4935,7 +4935,7 @@ nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, { __be32 *p;
- p = xdr_reserve_space(&resp->xdr, 4 + 8); + p = xdr_reserve_space(resp->xdr, 4 + 8); *p++ = cpu_to_be32(seek->seek_eof); p = xdr_encode_hyper(p, seek->seek_pos);
@@ -4996,7 +4996,7 @@ static __be32 nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getxattr *getxattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p, err;
p = xdr_reserve_space(xdr, 4); @@ -5020,7 +5020,7 @@ static __be32 nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setxattr *setxattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 20); @@ -5061,7 +5061,7 @@ static __be32 nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_listxattrs *listxattrs) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; u32 cookie_offset, count_offset, eof; u32 left, xdrleft, slen, count; u32 xdrlen, offset; @@ -5172,7 +5172,7 @@ static __be32 nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_removexattr *removexattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p;
p = xdr_reserve_space(xdr, 20); @@ -5312,7 +5312,7 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize) void nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfs4_stateowner *so = resp->cstate.replay_owner; struct svc_rqst *rqstp = resp->rqstp; const struct nfsd4_operation *opdesc = op->opdesc; @@ -5441,14 +5441,14 @@ int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = resp->xdr.buf; + struct xdr_buf *buf = resp->xdr->buf;
WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len);
*p = resp->cstate.status;
- rqstp->rq_next_page = resp->xdr.page_ptr + 1; + rqstp->rq_next_page = resp->xdr->page_ptr + 1;
p = resp->tagp; *p++ = htonl(resp->taglen); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 6c1d70935ea81..0666ef4b87b7a 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1030,7 +1030,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * NFSv4 does some encoding while processing */ p = resv->iov_base + resv->iov_len; - resv->iov_len += sizeof(__be32); + svcxdr_init_encode(rqstp);
*statp = proc->pc_func(rqstp); if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) @@ -1085,7 +1085,7 @@ int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) */ int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) { - return xdr_ressize_check(rqstp, p); + return 1; }
int nfsd_pool_stats_open(struct inode *inode, struct file *file) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c300885ae75dd..fe540a3415c6a 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -698,7 +698,7 @@ struct nfsd4_compoundargs {
struct nfsd4_compoundres { /* scratch variables for XDR encode */ - struct xdr_stream xdr; + struct xdr_stream *xdr; struct svc_rqst * rqstp;
u32 taglen; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 31ee3b6047c30..e91d51ea028bb 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -248,6 +248,7 @@ struct svc_rqst { size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; struct xdr_stream rq_arg_stream; + struct xdr_stream rq_res_stream; struct page *rq_scratch_page; struct xdr_buf rq_res; struct page *rq_pages[RPCSVC_MAXPAGES + 1]; @@ -574,4 +575,28 @@ static inline void svcxdr_init_decode(struct svc_rqst *rqstp) xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); }
+/** + * svcxdr_init_encode - Prepare an xdr_stream for svc Reply encoding + * @rqstp: controlling server RPC transaction context + * + */ +static inline void svcxdr_init_encode(struct svc_rqst *rqstp) +{ + struct xdr_stream *xdr = &rqstp->rq_res_stream; + struct xdr_buf *buf = &rqstp->rq_res; + struct kvec *resv = buf->head; + + xdr_reset_scratch_buffer(xdr); + + xdr->buf = buf; + xdr->iov = resv; + xdr->p = resv->iov_base + resv->iov_len; + xdr->end = resv->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; + buf->len = resv->iov_len; + xdr->page_ptr = buf->pages - 1; + buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages); + buf->buflen -= rqstp->rq_auth_slack; + xdr->rqst = NULL; +} + #endif /* SUNRPC_SVC_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2c42f804d30f6a8d86665eca84071b316821ea08 ]
As an additional clean up, some renaming is done to more closely reflect the data type and variable names used in the NFSv3 XDR definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 2 +- fs/nfsd/nfs3xdr.c | 95 ++++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/nfsfh.c | 2 +- fs/nfsd/nfsfh.h | 2 +- fs/nfsd/xdr3.h | 2 +- 5 files changed, 91 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 25f31a03c4f1b..1c3cf97ed95d2 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -741,7 +741,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_GETATTR] = { .pc_func = nfsd3_proc_getattr, .pc_decode = nfs3svc_decode_fhandleargs, - .pc_encode = nfs3svc_encode_attrstatres, + .pc_encode = nfs3svc_encode_getattrres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9d9a01ce0b270..75739861d235e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -20,7 +20,7 @@ /* * Mapping of S_IF* types to NFS file types */ -static u32 nfs3_ftypes[] = { +static const u32 nfs3_ftypes[] = { NF3NON, NF3FIFO, NF3CHR, NF3BAD, NF3DIR, NF3BAD, NF3BLK, NF3BAD, NF3REG, NF3BAD, NF3LNK, NF3BAD, @@ -39,6 +39,15 @@ encode_time3(__be32 *p, struct timespec64 *time) return p; }
+static __be32 * +encode_nfstime3(__be32 *p, const struct timespec64 *time) +{ + *p++ = cpu_to_be32((u32)time->tv_sec); + *p++ = cpu_to_be32(time->tv_nsec); + + return p; +} + static bool svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) { @@ -82,6 +91,19 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; }
+static bool +svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, sizeof(status)); + if (!p) + return false; + *p = status; + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -253,6 +275,58 @@ svcxdr_decode_devicedata3(struct svc_rqst *rqstp, struct xdr_stream *xdr, svcxdr_decode_specdata3(xdr, args); }
+static bool +svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat) +{ + struct user_namespace *userns = nfsd_user_namespace(rqstp); + __be32 *p; + u64 fsid; + + p = xdr_reserve_space(xdr, XDR_UNIT * 21); + if (!p) + return false; + + *p++ = cpu_to_be32(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); + *p++ = cpu_to_be32((u32)(stat->mode & S_IALLUGO)); + *p++ = cpu_to_be32((u32)stat->nlink); + *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); + *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); + if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) + p = xdr_encode_hyper(p, (u64)NFS3_MAXPATHLEN); + else + p = xdr_encode_hyper(p, (u64)stat->size); + + /* used */ + p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); + + /* rdev */ + *p++ = cpu_to_be32((u32)MAJOR(stat->rdev)); + *p++ = cpu_to_be32((u32)MINOR(stat->rdev)); + + switch(fsid_source(fhp)) { + case FSIDSOURCE_FSID: + fsid = (u64)fhp->fh_export->ex_fsid; + break; + case FSIDSOURCE_UUID: + fsid = ((u64 *)fhp->fh_export->ex_uuid)[0]; + fsid ^= ((u64 *)fhp->fh_export->ex_uuid)[1]; + break; + default: + fsid = (u64)huge_encode_dev(fhp->fh_dentry->d_sb->s_dev); + } + p = xdr_encode_hyper(p, fsid); + + /* fileid */ + p = xdr_encode_hyper(p, stat->ino); + + p = encode_nfstime3(p, &stat->atime); + p = encode_nfstime3(p, &stat->mtime); + encode_nfstime3(p, &stat->ctime); + + return true; +} + static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) { u64 f; @@ -713,17 +787,22 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p)
/* GETATTR */ int -nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status == 0) { - lease_get_mtime(d_inode(resp->fh.fh_dentry), - &resp->stat.mtime); - p = encode_fattr3(rqstp, p, &resp->fh, &resp->stat); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + lease_get_mtime(d_inode(resp->fh.fh_dentry), &resp->stat.mtime); + if (!svcxdr_encode_fattr3(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; } - return xdr_ressize_check(rqstp, p); + + return 1; }
/* SETATTR, REMOVE, RMDIR */ diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 4744a276058d4..04930056222b7 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -710,7 +710,7 @@ char * SVCFH_fmt(struct svc_fh *fhp) return buf; }
-enum fsid_source fsid_source(struct svc_fh *fhp) +enum fsid_source fsid_source(const struct svc_fh *fhp) { if (fhp->fh_handle.fh_version != 1) return FSIDSOURCE_DEV; diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index f58933519f380..aff2cda5c6c33 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -82,7 +82,7 @@ enum fsid_source { FSIDSOURCE_FSID, FSIDSOURCE_UUID, }; -extern enum fsid_source fsid_source(struct svc_fh *fhp); +extern enum fsid_source fsid_source(const struct svc_fh *fhp);
/* diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 3e1578953f544..0822981c61b93 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -280,7 +280,7 @@ int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); -int nfs3svc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 907c38227fb57f5c537491ca76dd0b9636029393 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/vfs.h | 2 +- 2 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 75739861d235e..9d6c989df6d8d 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -383,6 +383,35 @@ encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); }
+static bool +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp) +{ + struct dentry *dentry = fhp->fh_dentry; + struct kstat stat; + + /* + * The inode may be NULL if the call failed because of a + * stale file handle. In this case, no attributes are + * returned. + */ + if (fhp->fh_no_wcc || !dentry || !d_really_is_positive(dentry)) + goto no_post_op_attrs; + if (fh_getattr(fhp, &stat) != nfs_ok) + goto no_post_op_attrs; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + lease_get_mtime(d_inode(dentry), &stat.mtime); + if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &stat)) + return false; + + return true; + +no_post_op_attrs: + return xdr_stream_encode_item_absent(xdr) > 0; +} + /* * Encode post-operation attributes. * The inode may be NULL if the call failed because of a stale file @@ -835,13 +864,24 @@ nfs3svc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) - *p++ = htonl(resp->access); - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->access) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; }
/* READLINK */ diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index a2442ebe5acf6..b21b76e6b9a87 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -152,7 +152,7 @@ static inline void fh_drop_write(struct svc_fh *fh) } }
-static inline __be32 fh_getattr(struct svc_fh *fh, struct kstat *stat) +static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) { struct path p = {.mnt = fh->fh_export->ex_path.mnt, .dentry = fh->fh_dentry};
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5cf353354af1a385f29dec4609a1532d32c83a25 ]
Also, clean up: Rename the encoder function to match the name of the result structure in RFC 1813, consistent with other encoder function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 2 +- fs/nfsd/nfs3xdr.c | 43 +++++++++++++++++++++++++++++++++++-------- fs/nfsd/xdr3.h | 2 +- 3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1c3cf97ed95d2..60e8c25be7571 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -763,7 +763,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_LOOKUP] = { .pc_func = nfsd3_proc_lookup, .pc_decode = nfs3svc_decode_diropargs, - .pc_encode = nfs3svc_encode_diropres, + .pc_encode = nfs3svc_encode_lookupres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_diropres), diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9d6c989df6d8d..2bb998b3834bf 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -104,6 +104,23 @@ svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) return true; }
+static bool +svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + u32 size = fhp->fh_handle.fh_size; + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT + size); + if (!p) + return false; + *p++ = cpu_to_be32(size); + if (size) + p[XDR_QUADLEN(size) - 1] = 0; + memcpy(p, &fhp->fh_handle.fh_base, size); + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -846,18 +863,28 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) }
/* LOOKUP */ -int -nfs3svc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) +int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status == 0) { - p = encode_fh(p, &resp->fh); - p = encode_post_op_attr(rqstp, p, &resp->fh); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) + return 0; } - p = encode_post_op_attr(rqstp, p, &resp->dirfh); - return xdr_ressize_check(rqstp, p); + + return 1; }
/* ACCESS */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 0822981c61b93..7db4ee17aa209 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -282,7 +282,7 @@ int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); -int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); int nfs3svc_encode_readlinkres(struct svc_rqst *, __be32 *); int nfs3svc_encode_readres(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 70f8e839859a994e324e1d18889f8319bbd5bff9 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 68 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 65 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2bb998b3834bf..29b923a497f48 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -400,6 +400,35 @@ encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); }
+static bool +svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 6); + if (!p) + return false; + p = xdr_encode_hyper(p, (u64)fhp->fh_pre_size); + p = encode_nfstime3(p, &fhp->fh_pre_mtime); + encode_nfstime3(p, &fhp->fh_pre_ctime); + + return true; +} + +static bool +svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + if (!fhp->fh_pre_saved) { + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + return true; + } + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + return svcxdr_encode_wcc_attr(xdr, fhp); +} + static bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp) @@ -460,6 +489,39 @@ nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fh return encode_post_op_attr(rqstp, p, fhp); }
+/* + * Encode weak cache consistency data + */ +static bool +svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp) +{ + struct dentry *dentry = fhp->fh_dentry; + + if (!dentry || !d_really_is_positive(dentry) || !fhp->fh_post_saved) + goto neither; + + /* before */ + if (!svcxdr_encode_pre_op_attr(xdr, fhp)) + return false; + + /* after */ + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &fhp->fh_post_attr)) + return false; + + return true; + +neither: + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp)) + return false; + + return true; +} + /* * Enocde weak cache consistency data */ @@ -855,11 +917,11 @@ nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh); }
/* LOOKUP */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9a9c8923b3efd593d0e6a405efef9d58c6e6804b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 5 +++-- fs/nfsd/nfs3xdr.c | 34 ++++++++++++++++++---------------- fs/nfsd/xdr3.h | 1 + 3 files changed, 22 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 60e8c25be7571..e8d772f2c7769 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -126,14 +126,15 @@ nfsd3_proc_readlink(struct svc_rqst *rqstp) { struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd3_readlinkres *resp = rqstp->rq_resp; - char *buffer = page_address(*(rqstp->rq_next_page++));
dprintk("nfsd: READLINK(3) %s\n", SVCFH_fmt(&argp->fh));
/* Read the symlink. */ fh_copy(&resp->fh, &argp->fh); resp->len = NFS3_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &resp->fh, buffer, &resp->len); + resp->pages = rqstp->rq_next_page++; + resp->status = nfsd_readlink(rqstp, &resp->fh, + page_address(*resp->pages), &resp->len); return rpc_success; }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 29b923a497f48..352691c3e246a 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -977,26 +977,28 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
- *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->len); - xdr_ressize_check(rqstp, p); - rqstp->rq_res.page_len = resp->len; - if (resp->len & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - return 1; - } else - return xdr_ressize_check(rqstp, p); + if (xdr_stream_encode_u32(xdr, resp->len) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, 0, resp->len); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; }
/* READ */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7db4ee17aa209..1d633c5d5fa28 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -137,6 +137,7 @@ struct nfsd3_readlinkres { __be32 status; struct svc_fh fh; __u32 len; + struct page **pages; };
struct nfsd3_readres {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cc9bcdad7773c295375e66c892c7ac00524706f2 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 1 + fs/nfsd/nfs3xdr.c | 43 ++++++++++++++++++++------------------ fs/nfsd/xdr3.h | 1 + include/linux/sunrpc/xdr.h | 20 ++++++++++++++++++ 4 files changed, 45 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index e8d772f2c7769..201f2009b540b 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -159,6 +159,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp)
v = 0; len = argp->count; + resp->pages = rqstp->rq_next_page; while (len > 0) { struct page *page = *(rqstp->rq_next_page++);
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 352691c3e246a..859cc6c51c1a5 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1005,30 +1005,33 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
- *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->count); - *p++ = htonl(resp->eof); - *p++ = htonl(resp->count); /* xdr opaque count */ - xdr_ressize_check(rqstp, p); - /* now update rqstp->rq_res to reflect data as well */ - rqstp->rq_res.page_len = resp->count; - if (resp->count & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, - resp->count)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - return 1; - } else - return xdr_ressize_check(rqstp, p); + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->eof) < 0) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, + resp->count); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; }
/* WRITE */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 1d633c5d5fa28..8073350418ae0 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -145,6 +145,7 @@ struct nfsd3_readres { struct svc_fh fh; unsigned long count; __u32 eof; + struct page **pages; };
struct nfsd3_writeres { diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index eba6204330b3c..237b78146c7d6 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -394,6 +394,26 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) return len; }
+/** + * xdr_stream_encode_bool - Encode a "not present" list item + * @xdr: pointer to xdr_stream + * @n: boolean value to encode + * + * Return values: + * On success, returns length in bytes of XDR buffer consumed + * %-EMSGSIZE on XDR buffer overflow + */ +static inline int xdr_stream_encode_bool(struct xdr_stream *xdr, __u32 n) +{ + const size_t len = XDR_UNIT; + __be32 *p = xdr_reserve_space(xdr, len); + + if (unlikely(!p)) + return -EMSGSIZE; + *p = n ? xdr_one : xdr_zero; + return len; +} + /** * xdr_stream_encode_u32 - Encode a 32-bit integer * @xdr: pointer to xdr_stream
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ecb7a085ac15a8844ebf12fca6ae51ce71ac9b3b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 40 ++++++++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 859cc6c51c1a5..0191cbfc486b8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -131,6 +131,19 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); }
+static bool +svcxdr_encode_writeverf3(struct xdr_stream *xdr, const __be32 *verf) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, NFS3_WRITEVERFSIZE); + if (!p) + return false; + memcpy(p, verf, NFS3_WRITEVERFSIZE); + + return true; +} + static bool svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -1038,17 +1051,28 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_writeres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->count); - *p++ = htonl(resp->committed); - *p++ = resp->verf[0]; - *p++ = resp->verf[1]; + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + if (xdr_stream_encode_u32(xdr, resp->committed) < 0) + return 0; + if (!svcxdr_encode_writeverf3(xdr, resp->verf)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; }
/* CREATE, MKDIR, SYMLINK, MKNOD */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 78315b36781d259dcbdc102ff22c3f2f25712223 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0191cbfc486b8..052376a65f723 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -121,6 +121,17 @@ svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) return true; }
+static bool +svcxdr_encode_post_op_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + if (!svcxdr_encode_nfs_fh3(xdr, fhp)) + return false; + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -1079,16 +1090,26 @@ nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status == 0) { - *p++ = xdr_one; - p = encode_fh(p, &resp->fh); - p = encode_post_op_attr(rqstp, p, &resp->fh); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_fh3(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) + return 0; } - p = encode_wcc_data(rqstp, p, &resp->dirfh); - return xdr_ressize_check(rqstp, p); + + return 1; }
/* RENAME */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 89d79e9672dfa6d0cc416699c16f2d312da58ff2 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 052376a65f723..1d52a69562b82 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1116,12 +1116,12 @@ nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_renameres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->ffh); - p = encode_wcc_data(rqstp, p, &resp->tfh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->ffh) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); }
/* LINK */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4d74380a446f75eebb2171687d9b8baf0025bdf1 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1d52a69562b82..e159e45574288 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1128,12 +1128,12 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_linkres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - p = encode_wcc_data(rqstp, p, &resp->tfh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); }
/* READDIR */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8b7044984fd6eeadf72285e3617116bd15e9e676 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 58 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 44 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e159e45574288..e4a569e7216d5 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -17,6 +17,13 @@ #define NFSDDBG_FACILITY NFSDDBG_XDR
+/* + * Force construction of an empty post-op attr + */ +static const struct svc_fh nfs3svc_null_fh = { + .fh_no_wcc = true, +}; + /* * Mapping of S_IF* types to NFS file types */ @@ -1392,27 +1399,50 @@ nfs3svc_encode_entry_plus(void *cd, const char *name, return encode_entry(cd, name, namlen, offset, ino, d_type, 1); }
+static bool +svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, + const struct nfsd3_fsstatres *resp) +{ + const struct kstatfs *s = &resp->stats; + u64 bs = s->f_bsize; + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 13); + if (!p) + return false; + p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ + p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ + p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ + p = xdr_encode_hyper(p, s->f_files); /* total inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ + *p = cpu_to_be32(resp->invarsec); /* mean unchanged time */ + + return true; +} + /* FSSTAT */ int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsstatres *resp = rqstp->rq_resp; - struct kstatfs *s = &resp->stats; - u64 bs = s->f_bsize;
- *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ - p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ - p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ - p = xdr_encode_hyper(p, s->f_files); /* total inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ - *p++ = htonl(resp->invarsec); /* mean unchanged time */ + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_fsstat3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; }
/* FSINFO */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0a139d1b7f327010acc36e8162936d3108c7addb ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 62 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 46 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e4a569e7216d5..514f53ad73020 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -24,6 +24,15 @@ static const struct svc_fh nfs3svc_null_fh = { .fh_no_wcc = true, };
+/* + * time_delta. {1, 0} means the server is accurate only + * to the nearest second. + */ +static const struct timespec64 nfs3svc_time_delta = { + .tv_sec = 1, + .tv_nsec = 0, +}; + /* * Mapping of S_IF* types to NFS file types */ @@ -1445,30 +1454,51 @@ nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) return 1; }
+static bool +svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, + const struct nfsd3_fsinfores *resp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 12); + if (!p) + return false; + *p++ = cpu_to_be32(resp->f_rtmax); + *p++ = cpu_to_be32(resp->f_rtpref); + *p++ = cpu_to_be32(resp->f_rtmult); + *p++ = cpu_to_be32(resp->f_wtmax); + *p++ = cpu_to_be32(resp->f_wtpref); + *p++ = cpu_to_be32(resp->f_wtmult); + *p++ = cpu_to_be32(resp->f_dtpref); + p = xdr_encode_hyper(p, resp->f_maxfilesize); + p = encode_nfstime3(p, &nfs3svc_time_delta); + *p = cpu_to_be32(resp->f_properties); + + return true; +} + /* FSINFO */ int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsinfores *resp = rqstp->rq_resp;
- *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - *p++ = htonl(resp->f_rtmax); - *p++ = htonl(resp->f_rtpref); - *p++ = htonl(resp->f_rtmult); - *p++ = htonl(resp->f_wtmax); - *p++ = htonl(resp->f_wtpref); - *p++ = htonl(resp->f_wtmult); - *p++ = htonl(resp->f_dtpref); - p = xdr_encode_hyper(p, resp->f_maxfilesize); - *p++ = xdr_one; - *p++ = xdr_zero; - *p++ = htonl(resp->f_properties); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_fsinfo3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; }
- return xdr_ressize_check(rqstp, p); + return 1; }
/* PATHCONF */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ded04a587f6ceaaba3caefad4021f2212b46c9ff ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 44 ++++++++++++++++++++++++++++---------- include/linux/sunrpc/xdr.h | 18 ++++++++++++++-- 2 files changed, 49 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 514f53ad73020..1467bba02e180 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1501,25 +1501,47 @@ nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) return 1; }
+static bool +svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, + const struct nfsd3_pathconfres *resp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 6); + if (!p) + return false; + *p++ = cpu_to_be32(resp->p_link_max); + *p++ = cpu_to_be32(resp->p_name_max); + p = xdr_encode_bool(p, resp->p_no_trunc); + p = xdr_encode_bool(p, resp->p_chown_restricted); + p = xdr_encode_bool(p, resp->p_case_insensitive); + xdr_encode_bool(p, resp->p_case_preserving); + + return true; +} + /* PATHCONF */ int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_pathconfres *resp = rqstp->rq_resp;
- *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - *p++ = htonl(resp->p_link_max); - *p++ = htonl(resp->p_name_max); - *p++ = htonl(resp->p_no_trunc); - *p++ = htonl(resp->p_chown_restricted); - *p++ = htonl(resp->p_case_insensitive); - *p++ = htonl(resp->p_case_preserving); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_pathconf3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; }
- return xdr_ressize_check(rqstp, p); + return 1; }
/* COMMIT */ diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 237b78146c7d6..927f1458bcab9 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -395,7 +395,21 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) }
/** - * xdr_stream_encode_bool - Encode a "not present" list item + * xdr_encode_bool - Encode a boolean item + * @p: address in a buffer into which to encode + * @n: boolean value to encode + * + * Return value: + * Address of item following the encoded boolean + */ +static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) +{ + *p = n ? xdr_one : xdr_zero; + return p++; +} + +/** + * xdr_stream_encode_bool - Encode a boolean item * @xdr: pointer to xdr_stream * @n: boolean value to encode * @@ -410,7 +424,7 @@ static inline int xdr_stream_encode_bool(struct xdr_stream *xdr, __u32 n)
if (unlikely(!p)) return -EMSGSIZE; - *p = n ? xdr_one : xdr_zero; + xdr_encode_bool(p, n); return len; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5ef2826c761079e27904c85034df34e601b82d94 ]
As an additional clean up, encode_wcc_data() is removed because it is now no longer used.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 54 +++++++++++++---------------------------------- 1 file changed, 15 insertions(+), 39 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1467bba02e180..eab14b52db202 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -432,14 +432,6 @@ encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; }
-static __be32 * -encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - /* Attributes to follow */ - *p++ = xdr_one; - return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); -} - static bool svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) { @@ -562,30 +554,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-/* - * Enocde weak cache consistency data - */ -static __be32 * -encode_wcc_data(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - struct dentry *dentry = fhp->fh_dentry; - - if (dentry && d_really_is_positive(dentry) && fhp->fh_post_saved) { - if (fhp->fh_pre_saved) { - *p++ = xdr_one; - p = xdr_encode_hyper(p, (u64) fhp->fh_pre_size); - p = encode_time3(p, &fhp->fh_pre_mtime); - p = encode_time3(p, &fhp->fh_pre_ctime); - } else { - *p++ = xdr_zero; - } - return encode_saved_post_attr(rqstp, p, fhp); - } - /* no pre- or post-attrs */ - *p++ = xdr_zero; - return encode_post_op_attr(rqstp, p, fhp); -} - static bool fs_supports_change_attribute(struct super_block *sb) { return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; @@ -1548,16 +1516,24 @@ nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_commitres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_commitres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - /* Write verifier */ - if (resp->status == 0) { - *p++ = resp->verf[0]; - *p++ = resp->verf[1]; + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_writeverf3(xdr, resp->verf)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a161e6c76aeba835e475a2f27dbbe5c37e565e94 ]
Refactor: De-duplicate identical code that handles encoding of directory offset cookies across page boundaries.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 24 ++---------------------- fs/nfsd/nfs3xdr.c | 36 +++++++++++++++++++++++------------- fs/nfsd/xdr3.h | 2 ++ 3 files changed, 27 insertions(+), 35 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 201f2009b540b..acb0a2d37dcbb 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -500,17 +500,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) count += PAGE_SIZE; } resp->count = count >> 2; - if (resp->offset) { - if (unlikely(resp->offset1)) { - /* we ended up with offset on a page boundary */ - *resp->offset = htonl(offset >> 32); - *resp->offset1 = htonl(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; - } + nfs3svc_encode_cookie3(resp, offset);
return rpc_success; } @@ -565,17 +555,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) count += PAGE_SIZE; } resp->count = count >> 2; - if (resp->offset) { - if (unlikely(resp->offset1)) { - /* we ended up with offset on a page boundary */ - *resp->offset = htonl(offset >> 32); - *resp->offset1 = htonl(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; - } + nfs3svc_encode_cookie3(resp, offset);
out: return rpc_success; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eab14b52db202..e334a1454edbb 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1219,6 +1219,28 @@ static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, return p; }
+/** + * nfs3svc_encode_cookie3 - Encode a directory offset cookie + * @resp: readdir result context + * @offset: offset cookie to encode + * + */ +void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) +{ + if (!resp->offset) + return; + + if (resp->offset1) { + /* we ended up with offset on a page boundary */ + *resp->offset = cpu_to_be32(offset >> 32); + *resp->offset1 = cpu_to_be32(offset & 0xffffffff); + resp->offset1 = NULL; + } else { + xdr_encode_hyper(resp->offset, offset); + } + resp->offset = NULL; +} + /* * Encode a directory entry. This one works for both normal readdir * and readdirplus. @@ -1244,19 +1266,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen, int elen; /* estimated entry length in words */ int num_entry_words = 0; /* actual number of words */
- if (cd->offset) { - u64 offset64 = offset; - - if (unlikely(cd->offset1)) { - /* we ended up with offset on a page boundary */ - *cd->offset = htonl(offset64 >> 32); - *cd->offset1 = htonl(offset64 & 0xffffffff); - cd->offset1 = NULL; - } else { - xdr_encode_hyper(cd->offset, offset64); - } - cd->offset = NULL; - } + nfs3svc_encode_cookie3(cd, offset);
/* dprintk("encode_entry(%.*s @%ld%s)\n", diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 8073350418ae0..e76e9230827e4 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -300,6 +300,8 @@ int nfs3svc_encode_commitres(struct svc_rqst *, __be32 *);
void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); + +void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset); int nfs3svc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a1409e2de4f11034c8eb30775cc3e37039a4ef13 ]
Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 31 ++----------------------------- fs/nfsd/nfs3xdr.c | 1 + 2 files changed, 3 insertions(+), 29 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index acb0a2d37dcbb..7dcc7abb1f346 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -467,10 +467,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; - int count = 0; loff_t offset; - struct page **p; - caddr_t page_addr = NULL;
dprintk("nfsd: READDIR(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), @@ -481,6 +478,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh);
+ resp->count = 0; resp->common.err = nfs_ok; resp->rqstp = rqstp; offset = argp->cookie; @@ -488,18 +486,6 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry); memcpy(resp->verf, argp->verf, 8); - count = 0; - for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { - page_addr = page_address(*p); - - if (((caddr_t)resp->buffer >= page_addr) && - ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { - count += (caddr_t)resp->buffer - page_addr; - break; - } - count += PAGE_SIZE; - } - resp->count = count >> 2; nfs3svc_encode_cookie3(resp, offset);
return rpc_success; @@ -514,10 +500,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; - int count = 0; loff_t offset; - struct page **p; - caddr_t page_addr = NULL;
dprintk("nfsd: READDIR+(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), @@ -528,6 +511,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh);
+ resp->count = 0; resp->common.err = nfs_ok; resp->rqstp = rqstp; offset = argp->cookie; @@ -544,17 +528,6 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry_plus); memcpy(resp->verf, argp->verf, 8); - for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { - page_addr = page_address(*p); - - if (((caddr_t)resp->buffer >= page_addr) && - ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { - count += (caddr_t)resp->buffer - page_addr; - break; - } - count += PAGE_SIZE; - } - resp->count = count >> 2; nfs3svc_encode_cookie3(resp, offset);
out: diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e334a1454edbb..523b2dca04944 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1364,6 +1364,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen, return -EINVAL; }
+ cd->count += num_entry_words; cd->buflen -= num_entry_words; cd->buffer = p; cd->common.err = nfs_ok;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e4ccfe3014de435984939a3d84b7f241d3b57b0d ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 3 ++- fs/nfsd/nfs3xdr.c | 54 ++++++++++++++++++++++++++++++---------------- fs/nfsd/xdr3.h | 1 + 3 files changed, 38 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 7dcc7abb1f346..9e8481242dea8 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -452,7 +452,8 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, * and reserve room for the NULL ptr & eof flag (-2 words) */ resp->buflen = (count >> 2) - 2;
- resp->buffer = page_address(*rqstp->rq_next_page); + resp->pages = rqstp->rq_next_page; + resp->buffer = page_address(*resp->pages); while (count > 0) { rqstp->rq_next_page++; count -= PAGE_SIZE; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 523b2dca04944..3d076d3c5c7b8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -158,6 +158,19 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); }
+static bool +svcxdr_encode_cookieverf3(struct xdr_stream *xdr, const __be32 *verf) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, NFS3_COOKIEVERFSIZE); + if (!p) + return false; + memcpy(p, verf, NFS3_COOKIEVERFSIZE); + + return true; +} + static bool svcxdr_encode_writeverf3(struct xdr_stream *xdr, const __be32 *verf) { @@ -1124,27 +1137,30 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - - if (resp->status == 0) { - /* stupid readdir cookie */ - memcpy(p, resp->verf, 8); p += 2; - xdr_ressize_check(rqstp, p); - if (rqstp->rq_res.head[0].iov_len + (2<<2) > PAGE_SIZE) - return 1; /*No room for trailer */ - rqstp->rq_res.page_len = (resp->count) << 2; - - /* add the 'tail' to the end of the 'head' page - page 0. */ - rqstp->rq_res.tail[0].iov_base = p; - *p++ = 0; /* no more entries */ - *p++ = htonl(resp->common.err == nfserr_eof); - rqstp->rq_res.tail[0].iov_len = 2<<2; - return 1; - } else - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) + return 0; + xdr_write_pages(xdr, resp->pages, 0, resp->count << 2); + /* no more entries */ + if (xdr_stream_encode_item_absent(xdr) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; }
static __be32 * diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index e76e9230827e4..a4cdd8ccb175a 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -176,6 +176,7 @@ struct nfsd3_readdirres { struct svc_fh scratch; int count; __be32 verf[2]; + struct page **pages;
struct readdir_cd common; __be32 * buffer;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7f87fc2d34d475225e78b7f5c4eabb121f4282b2 ]
The benefit of the xdr_stream helpers is that they transparently handle encoding an XDR data item that crosses page boundaries. Most of the open-coded logic to do that here can be eliminated.
A sub-buffer and sub-stream are set up as a sink buffer for the directory entry encoder. As an entry is encoded, it is added to the end of the content in this buffer/stream. The total length of the directory list is tracked in the buffer's @len field.
When it comes time to encode the Reply, the sub-buffer is merged into rq_res's page array at the correct place using xdr_write_pages().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 35 ++++++---- fs/nfsd/nfs3xdr.c | 166 +++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/xdr3.h | 14 ++-- 3 files changed, 185 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 9e8481242dea8..781bb2b115e74 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -446,18 +446,30 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd3_readdirres *resp, int count) { + struct xdr_buf *buf = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + count = min_t(u32, count, svc_max_payload(rqstp));
- /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->buflen = (count >> 2) - 2; + memset(buf, 0, sizeof(*buf));
- resp->pages = rqstp->rq_next_page; - resp->buffer = page_address(*resp->pages); + /* Reserve room for the NULL ptr & eof flag (-2 words) */ + buf->buflen = count - XDR_UNIT * 2; + buf->pages = rqstp->rq_next_page; while (count > 0) { rqstp->rq_next_page++; count -= PAGE_SIZE; } + + /* This is xdr_init_encode(), but it assumes that + * the head kvec has already been consumed. */ + xdr_set_scratch_buffer(xdr, NULL, 0); + xdr->buf = buf; + xdr->page_ptr = buf->pages; + xdr->iov = NULL; + xdr->p = page_address(*buf->pages); + xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->rqst = NULL; }
/* @@ -476,16 +488,13 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
nfsd3_init_dirlist_pages(rqstp, resp, argp->count);
- /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - - resp->count = 0; resp->common.err = nfs_ok; + resp->cookie_offset = 0; resp->rqstp = rqstp; offset = argp->cookie; - resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entry); + &resp->common, nfs3svc_encode_entry3); memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset);
@@ -509,11 +518,9 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp)
nfsd3_init_dirlist_pages(rqstp, resp, argp->count);
- /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - - resp->count = 0; resp->common.err = nfs_ok; + resp->cookie_offset = 0; resp->rqstp = rqstp; offset = argp->cookie;
@@ -527,7 +534,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) }
resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entry_plus); + &resp->common, nfs3svc_encode_entryplus3); memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset);
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3d076d3c5c7b8..f38d845ac8a0f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1139,6 +1139,7 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp; + struct xdr_buf *dirlist = &resp->dirlist;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) return 0; @@ -1148,7 +1149,7 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) return 0; if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) return 0; - xdr_write_pages(xdr, resp->pages, 0, resp->count << 2); + xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) return 0; @@ -1240,21 +1241,18 @@ static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, * @resp: readdir result context * @offset: offset cookie to encode * + * The buffer space for the offset cookie has already been reserved + * by svcxdr_encode_entry3_common(). */ void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) { - if (!resp->offset) - return; + __be64 cookie = cpu_to_be64(offset);
- if (resp->offset1) { - /* we ended up with offset on a page boundary */ - *resp->offset = cpu_to_be32(offset >> 32); - *resp->offset1 = cpu_to_be32(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; + if (!resp->cookie_offset) + return; + write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, + sizeof(cookie)); + resp->cookie_offset = 0; }
/* @@ -1403,6 +1401,150 @@ nfs3svc_encode_entry_plus(void *cd, const char *name, return encode_entry(cd, name, namlen, offset, ino, d_type, 1); }
+static bool +svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, + int namlen, loff_t offset, u64 ino) +{ + struct xdr_buf *dirlist = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + /* fileid */ + if (xdr_stream_encode_u64(xdr, ino) < 0) + return false; + /* name */ + if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS3_MAXNAMLEN)) < 0) + return false; + /* cookie */ + resp->cookie_offset = dirlist->len; + if (xdr_stream_encode_u64(xdr, NFS_OFFSET_MAX) < 0) + return false; + + return true; +} + +/** + * nfs3svc_encode_entry3 - encode one NFSv3 READDIR entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs3svc_encode_entry3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd3_readdirres *resp = container_of(ccd, + struct nfsd3_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfs3svc_encode_cookie3(resp, offset); + + if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; +} + +static bool +svcxdr_encode_entry3_plus(struct nfsd3_readdirres *resp, const char *name, + int namlen, u64 ino) +{ + struct xdr_stream *xdr = &resp->xdr; + struct svc_fh *fhp = &resp->scratch; + bool result; + + result = false; + fh_init(fhp, NFS3_FHSIZE); + if (compose_entry_fh(resp, fhp, name, namlen, ino) != nfs_ok) + goto out_noattrs; + + if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp)) + goto out; + if (!svcxdr_encode_post_op_fh3(xdr, fhp)) + goto out; + result = true; + +out: + fh_put(fhp); + return result; + +out_noattrs: + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + return true; +} + +/** + * nfs3svc_encode_entryplus3 - encode one NFSv3 READDIRPLUS entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd3_readdirres *resp = container_of(ccd, + struct nfsd3_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfs3svc_encode_cookie3(resp, offset); + + if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + if (!svcxdr_encode_entry3_plus(resp, name, namlen, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; +} + static bool svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, const struct nfsd3_fsstatres *resp) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index a4cdd8ccb175a..81dea78b0f17e 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -169,20 +169,22 @@ struct nfsd3_linkres { };
struct nfsd3_readdirres { + /* Components of the reply */ __be32 status; struct svc_fh fh; - /* Just to save kmalloc on every readdirplus entry (svc_fh is a - * little large for the stack): */ - struct svc_fh scratch; int count; __be32 verf[2]; - struct page **pages;
+ /* Used to encode the reply's entry list */ + struct xdr_stream xdr; + struct xdr_buf dirlist; + struct svc_fh scratch; struct readdir_cd common; __be32 * buffer; int buflen; __be32 * offset; __be32 * offset1; + unsigned int cookie_offset; struct svc_rqst * rqstp;
}; @@ -309,6 +311,10 @@ int nfs3svc_encode_entry(void *, const char *name, int nfs3svc_encode_entry_plus(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int); +int nfs3svc_encode_entry3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); +int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); /* Helper functions for NFSv3 ACL code */ __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1411934627f9fe31a36ac8c43179ce9b63edce5c ]
Clean up.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 190 ---------------------------------------------- fs/nfsd/xdr3.h | 11 --- 2 files changed, 201 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index f38d845ac8a0f..646bbfc5b7794 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -148,16 +148,6 @@ svcxdr_encode_post_op_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) return true; }
-static __be32 * -encode_fh(__be32 *p, struct svc_fh *fhp) -{ - unsigned int size = fhp->fh_handle.fh_size; - *p++ = htonl(size); - if (size) p[XDR_QUADLEN(size)-1]=0; - memcpy(p, &fhp->fh_handle.fh_base, size); - return p + XDR_QUADLEN(size); -} - static bool svcxdr_encode_cookieverf3(struct xdr_stream *xdr, const __be32 *verf) { @@ -1164,20 +1154,6 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static __be32 * -encode_entry_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, - int namlen, u64 ino) -{ - *p++ = xdr_one; /* mark entry present */ - p = xdr_encode_hyper(p, ino); /* file id */ - p = xdr_encode_array(p, name, namlen);/* name length & name */ - - cd->offset = p; /* remember pointer */ - p = xdr_encode_hyper(p, NFS_OFFSET_MAX);/* offset of next entry */ - - return p; -} - static __be32 compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp, const char *name, int namlen, u64 ino) @@ -1216,26 +1192,6 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp, return rv; }
-static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, int namlen, u64 ino) -{ - struct svc_fh *fh = &cd->scratch; - __be32 err; - - fh_init(fh, NFS3_FHSIZE); - err = compose_entry_fh(cd, fh, name, namlen, ino); - if (err) { - *p++ = 0; - *p++ = 0; - goto out; - } - p = encode_post_op_attr(cd->rqstp, p, fh); - *p++ = xdr_one; /* yes, a file handle follows */ - p = encode_fh(p, fh); -out: - fh_put(fh); - return p; -} - /** * nfs3svc_encode_cookie3 - Encode a directory offset cookie * @resp: readdir result context @@ -1255,152 +1211,6 @@ void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) resp->cookie_offset = 0; }
-/* - * Encode a directory entry. This one works for both normal readdir - * and readdirplus. - * The normal readdir reply requires 2 (fileid) + 1 (stringlen) - * + string + 2 (cookie) + 1 (next) words, i.e. 6 + strlen. - * - * The readdirplus baggage is 1+21 words for post_op_attr, plus the - * file handle. - */ - -#define NFS3_ENTRY_BAGGAGE (2 + 1 + 2 + 1) -#define NFS3_ENTRYPLUS_BAGGAGE (1 + 21 + 1 + (NFS3_FHSIZE >> 2)) -static int -encode_entry(struct readdir_cd *ccd, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type, int plus) -{ - struct nfsd3_readdirres *cd = container_of(ccd, struct nfsd3_readdirres, - common); - __be32 *p = cd->buffer; - caddr_t curr_page_addr = NULL; - struct page ** page; - int slen; /* string (name) length */ - int elen; /* estimated entry length in words */ - int num_entry_words = 0; /* actual number of words */ - - nfs3svc_encode_cookie3(cd, offset); - - /* - dprintk("encode_entry(%.*s @%ld%s)\n", - namlen, name, (long) offset, plus? " plus" : ""); - */ - - /* truncate filename if too long */ - namlen = min(namlen, NFS3_MAXNAMLEN); - - slen = XDR_QUADLEN(namlen); - elen = slen + NFS3_ENTRY_BAGGAGE - + (plus? NFS3_ENTRYPLUS_BAGGAGE : 0); - - if (cd->buflen < elen) { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - - /* determine which page in rq_respages[] we are currently filling */ - for (page = cd->rqstp->rq_respages + 1; - page < cd->rqstp->rq_next_page; page++) { - curr_page_addr = page_address(*page); - - if (((caddr_t)cd->buffer >= curr_page_addr) && - ((caddr_t)cd->buffer < curr_page_addr + PAGE_SIZE)) - break; - } - - if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) { - /* encode entry in current page */ - - p = encode_entry_baggage(cd, p, name, namlen, ino); - - if (plus) - p = encode_entryplus_baggage(cd, p, name, namlen, ino); - num_entry_words = p - cd->buffer; - } else if (*(page+1) != NULL) { - /* temporarily encode entry into next page, then move back to - * current and next page in rq_respages[] */ - __be32 *p1, *tmp; - int len1, len2; - - /* grab next page for temporary storage of entry */ - p1 = tmp = page_address(*(page+1)); - - p1 = encode_entry_baggage(cd, p1, name, namlen, ino); - - if (plus) - p1 = encode_entryplus_baggage(cd, p1, name, namlen, ino); - - /* determine entry word length and lengths to go in pages */ - num_entry_words = p1 - tmp; - len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer; - if ((num_entry_words << 2) < len1) { - /* the actual number of words in the entry is less - * than elen and can still fit in the current page - */ - memmove(p, tmp, num_entry_words << 2); - p += num_entry_words; - - /* update offset */ - cd->offset = cd->buffer + (cd->offset - tmp); - } else { - unsigned int offset_r = (cd->offset - tmp) << 2; - - /* update pointer to offset location. - * This is a 64bit quantity, so we need to - * deal with 3 cases: - * - entirely in first page - * - entirely in second page - * - 4 bytes in each page - */ - if (offset_r + 8 <= len1) { - cd->offset = p + (cd->offset - tmp); - } else if (offset_r >= len1) { - cd->offset -= len1 >> 2; - } else { - /* sitting on the fence */ - BUG_ON(offset_r != len1 - 4); - cd->offset = p + (cd->offset - tmp); - cd->offset1 = tmp; - } - - len2 = (num_entry_words << 2) - len1; - - /* move from temp page to current and next pages */ - memmove(p, tmp, len1); - memmove(tmp, (caddr_t)tmp+len1, len2); - - p = tmp + (len2 >> 2); - } - } - else { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - - cd->count += num_entry_words; - cd->buflen -= num_entry_words; - cd->buffer = p; - cd->common.err = nfs_ok; - return 0; - -} - -int -nfs3svc_encode_entry(void *cd, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int d_type) -{ - return encode_entry(cd, name, namlen, offset, ino, d_type, 0); -} - -int -nfs3svc_encode_entry_plus(void *cd, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int d_type) -{ - return encode_entry(cd, name, namlen, offset, ino, d_type, 1); -} - static bool svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, int namlen, loff_t offset, u64 ino) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 81dea78b0f17e..b851458373db6 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -172,7 +172,6 @@ struct nfsd3_readdirres { /* Components of the reply */ __be32 status; struct svc_fh fh; - int count; __be32 verf[2];
/* Used to encode the reply's entry list */ @@ -180,10 +179,6 @@ struct nfsd3_readdirres { struct xdr_buf dirlist; struct svc_fh scratch; struct readdir_cd common; - __be32 * buffer; - int buflen; - __be32 * offset; - __be32 * offset1; unsigned int cookie_offset; struct svc_rqst * rqstp;
@@ -305,12 +300,6 @@ void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *);
void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset); -int nfs3svc_encode_entry(void *, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int); -int nfs3svc_encode_entry_plus(void *, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int); int nfs3svc_encode_entry3(void *data, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type); int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 76ed0dd96eeb2771b21bf5dcbd88326ef89ee0ed ]
During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances rq_next_page to the full size of the client-requested buffer, then releases all those pages at the end of the request. The next request to use that nfsd thread has to refill the pages.
NFSD does this even when the dirlist in the reply is small. With NFSv3 clients that send READDIR operations with large buffer sizes, that can be 256 put_page/alloc_page pairs per READDIR request, even though those pages often remain unused.
We can save some work by not releasing dirlist buffer pages that were not used to form the READDIR Reply. I've left the NFSv2 code alone since there are never more than three pages involved in an NFSv2 READDIR Reply.
Eventually we should nail down why these pages need to be released at all in order to avoid allocating and releasing pages unnecessarily.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 781bb2b115e74..be1ed33e424e0 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -498,6 +498,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset);
+ /* Recycle only pages that were part of the reply */ + rqstp->rq_next_page = resp->xdr.page_ptr + 1; + return rpc_success; }
@@ -538,6 +541,9 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset);
+ /* Recycle only pages that were part of the reply */ + rqstp->rq_next_page = resp->xdr.page_ptr + 1; + out: return rpc_success; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a887eaed2a964754334cd3f8c5fe87e413e68fef ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 10 +++++----- fs/nfsd/nfsxdr.c | 19 ++++++++++++++++--- fs/nfsd/xdr.h | 2 +- 3 files changed, 22 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index a628ea4d66ead..00eb722129ab3 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -736,7 +736,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_REMOVE] = { .pc_func = nfsd_proc_remove, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -746,7 +746,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_RENAME] = { .pc_func = nfsd_proc_rename, .pc_decode = nfssvc_decode_renameargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_renameargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -756,7 +756,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_LINK] = { .pc_func = nfsd_proc_link, .pc_decode = nfssvc_decode_linkargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_linkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -766,7 +766,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_SYMLINK] = { .pc_func = nfsd_proc_symlink, .pc_decode = nfssvc_decode_symlinkargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_symlinkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -787,7 +787,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_RMDIR] = { .pc_func = nfsd_proc_rmdir, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5d79ef6a0c7fc..10cd120044b30 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,6 +26,19 @@ static u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */
+static bool +svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, sizeof(status)); + if (!p) + return false; + *p = status; + + return true; +} + /** * svcxdr_decode_fhandle - Decode an NFSv2 file handle * @xdr: XDR stream positioned at an encoded NFSv2 FH @@ -390,12 +403,12 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) */
int -nfssvc_encode_stat(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_stat *resp = rqstp->rq_resp;
- *p++ = resp->status; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_stat(xdr, resp->status); }
int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b92f1acec9e77..f040123373bf5 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -147,7 +147,7 @@ int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfssvc_encode_stat(struct svc_rqst *, __be32 *); +int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 92b54a4fa4224e6116eb0d87a39dd05af23fcdfa ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 6 ++-- fs/nfsd/nfsxdr.c | 90 ++++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/xdr.h | 2 +- 3 files changed, 86 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 00eb722129ab3..2a30b27c9d9be 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -640,7 +640,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), @@ -651,7 +651,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_SETATTR] = { .pc_func = nfsd_proc_setattr, .pc_decode = nfssvc_decode_sattrargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_sattrargs), .pc_ressize = sizeof(struct nfsd_attrstat), @@ -714,7 +714,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_WRITE] = { .pc_func = nfsd_proc_write, .pc_decode = nfssvc_decode_writeargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_writeargs), .pc_ressize = sizeof(struct nfsd_attrstat), diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 10cd120044b30..65c8f8f314443 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -14,7 +14,7 @@ /* * Mapping of S_IF* types to NFS file types */ -static u32 nfs_ftypes[] = { +static const u32 nfs_ftypes[] = { NFNON, NFCHR, NFCHR, NFBAD, NFDIR, NFBAD, NFBLK, NFBAD, NFREG, NFBAD, NFLNK, NFBAD, @@ -70,6 +70,17 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE>> 2); }
+static __be32 * +encode_timeval(__be32 *p, const struct timespec64 *time) +{ + *p++ = cpu_to_be32((u32)time->tv_sec); + if (time->tv_nsec) + *p++ = cpu_to_be32(time->tv_nsec / NSEC_PER_USEC); + else + *p++ = xdr_zero; + return p; +} + static bool svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -233,6 +244,64 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; }
+static int +svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat) +{ + struct user_namespace *userns = nfsd_user_namespace(rqstp); + struct dentry *dentry = fhp->fh_dentry; + int type = stat->mode & S_IFMT; + struct timespec64 time; + __be32 *p; + u32 fsid; + + p = xdr_reserve_space(xdr, XDR_UNIT * 17); + if (!p) + return 0; + + *p++ = cpu_to_be32(nfs_ftypes[type >> 12]); + *p++ = cpu_to_be32((u32)stat->mode); + *p++ = cpu_to_be32((u32)stat->nlink); + *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); + *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); + + if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) + *p++ = cpu_to_be32(NFS_MAXPATHLEN); + else + *p++ = cpu_to_be32((u32) stat->size); + *p++ = cpu_to_be32((u32) stat->blksize); + if (S_ISCHR(type) || S_ISBLK(type)) + *p++ = cpu_to_be32(new_encode_dev(stat->rdev)); + else + *p++ = cpu_to_be32(0xffffffff); + *p++ = cpu_to_be32((u32)stat->blocks); + + switch (fsid_source(fhp)) { + case FSIDSOURCE_FSID: + fsid = (u32)fhp->fh_export->ex_fsid; + break; + case FSIDSOURCE_UUID: + fsid = ((u32 *)fhp->fh_export->ex_uuid)[0]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[1]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[2]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[3]; + break; + default: + fsid = new_encode_dev(stat->dev); + break; + } + *p++ = cpu_to_be32(fsid); + + *p++ = cpu_to_be32((u32)stat->ino); + p = encode_timeval(p, &stat->atime); + time = stat->mtime; + lease_get_mtime(d_inode(dentry), &time); + p = encode_timeval(p, &time); + encode_timeval(p, &stat->ctime); + + return 1; +} + /* Helper function for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) { @@ -412,16 +481,21 @@ nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_attrstat *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; + } + + return 1; }
int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index f040123373bf5..45aa6b75a5f87 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -148,7 +148,7 @@ int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfssvc_encode_statres(struct svc_rqst *, __be32 *); -int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); int nfssvc_encode_readres(struct svc_rqst *, __be32 *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e3b4ef221ac57c08341c97a10c8a81c041f76716 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 65c8f8f314443..989144b0d5be2 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -63,11 +63,17 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) return true; }
-static __be32 * -encode_fh(__be32 *p, struct svc_fh *fhp) +static bool +svcxdr_encode_fhandle(struct xdr_stream *xdr, const struct svc_fh *fhp) { + __be32 *p; + + p = xdr_reserve_space(xdr, NFS_FHSIZE); + if (!p) + return false; memcpy(p, &fhp->fh_handle.fh_base, NFS_FHSIZE); - return p + (NFS_FHSIZE>> 2); + + return true; }
static __be32 * @@ -244,7 +250,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; }
-static int +static bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp, const struct kstat *stat) { @@ -257,7 +263,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, XDR_UNIT * 17); if (!p) - return 0; + return false;
*p++ = cpu_to_be32(nfs_ftypes[type >> 12]); *p++ = cpu_to_be32((u32)stat->mode); @@ -299,7 +305,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, p = encode_timeval(p, &time); encode_timeval(p, &stat->ctime);
- return 1; + return true; }
/* Helper function for NFSv2 ACL code */ @@ -501,15 +507,21 @@ nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_diropres *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - p = encode_fh(p, &resp->fh); - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fhandle(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; + } + + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d9014b0f8fae11f22a3d356553844e06ddcdce4a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 5 +++-- fs/nfsd/nfsxdr.c | 26 ++++++++++++-------------- fs/nfsd/xdr.h | 1 + 3 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 2a30b27c9d9be..fa9a18897bc29 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -151,13 +151,14 @@ nfsd_proc_readlink(struct svc_rqst *rqstp) { struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd_readlinkres *resp = rqstp->rq_resp; - char *buffer = page_address(*(rqstp->rq_next_page++));
dprintk("nfsd: READLINK %s\n", SVCFH_fmt(&argp->fh));
/* Read the symlink. */ resp->len = NFS_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &argp->fh, buffer, &resp->len); + resp->page = *(rqstp->rq_next_page++); + resp->status = nfsd_readlink(rqstp, &argp->fh, + page_address(resp->page), &resp->len);
fh_put(&argp->fh); return rpc_success; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 989144b0d5be2..74d9d11949c6c 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -527,24 +527,22 @@ nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
- *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - *p++ = htonl(resp->len); - xdr_ressize_check(rqstp, p); - rqstp->rq_res.page_len = resp->len; - if (resp->len & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + if (!svcxdr_encode_stat(xdr, resp->status)) return 0; + switch (resp->status) { + case nfs_ok: + if (xdr_stream_encode_u32(xdr, resp->len) < 0) + return 0; + xdr_write_pages(xdr, &resp->page, 0, resp->len); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) + return 0; + break; + } + return 1; }
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 45aa6b75a5f87..b17fd72c69a6a 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -94,6 +94,7 @@ struct nfsd_diropres { struct nfsd_readlinkres { __be32 status; int len; + struct page *page; };
struct nfsd_readres {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a6f8d9dc9e44b51303d9abde4643460137d19b28 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 1 + fs/nfsd/nfsxdr.c | 32 +++++++++++++++----------------- fs/nfsd/xdr.h | 1 + 3 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index fa9a18897bc29..1fd91e00b97ba 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -185,6 +185,7 @@ nfsd_proc_read(struct svc_rqst *rqstp)
v = 0; len = argp->count; + resp->pages = rqstp->rq_next_page; while (len > 0) { struct page *page = *(rqstp->rq_next_page++);
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 74d9d11949c6c..d6d7d07dbb1b2 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -549,27 +549,25 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
- *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->count); - xdr_ressize_check(rqstp, p); - - /* now update rqstp->rq_res to reflect data as well */ - rqstp->rq_res.page_len = resp->count; - if (resp->count & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->count)) + if (!svcxdr_encode_stat(xdr, resp->status)) return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, + resp->count); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) + return 0; + break; + } + return 1; }
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b17fd72c69a6a..337c581e15b4c 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -102,6 +102,7 @@ struct nfsd_readres { struct svc_fh fh; unsigned long count; struct kstat stat; + struct page **pages; };
struct nfsd_readdirres {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bf15229f2ced4f14946eef958336f764e30f8efb ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index d6d7d07dbb1b2..39d296aecd3e7 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -592,19 +592,26 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_statfsres *resp = rqstp->rq_resp; struct kstatfs *stat = &resp->stats;
- *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + p = xdr_reserve_space(xdr, XDR_UNIT * 5); + if (!p) + return 0; + *p++ = cpu_to_be32(NFSSVC_MAXBLKSIZE_V2); + *p++ = cpu_to_be32(stat->f_bsize); + *p++ = cpu_to_be32(stat->f_blocks); + *p++ = cpu_to_be32(stat->f_bfree); + *p = cpu_to_be32(stat->f_bavail); + break; + }
- *p++ = htonl(NFSSVC_MAXBLKSIZE_V2); /* max transfer size */ - *p++ = htonl(stat->f_bsize); - *p++ = htonl(stat->f_blocks); - *p++ = htonl(stat->f_bfree); - *p++ = htonl(stat->f_bavail); - return xdr_ressize_check(rqstp, p); + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d52532002ffa217ad3fa4c3ba86c95203d21dd21 ]
Refactor: Add helper function similar to nfs3svc_encode_cookie3().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 3 +-- fs/nfsd/nfsxdr.c | 18 ++++++++++++++++-- fs/nfsd/xdr.h | 1 + 3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 1fd91e00b97ba..2d3d7cdffd52f 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -595,8 +595,7 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) &resp->common, nfssvc_encode_entry);
resp->count = resp->buffer - buffer; - if (resp->offset) - *resp->offset = htonl(offset); + nfssvc_encode_nfscookie(resp, offset);
fh_put(&argp->fh); return rpc_success; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 39d296aecd3e7..a87b21cfe0d03 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -614,6 +614,21 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) return 1; }
+/** + * nfssvc_encode_nfscookie - Encode a directory offset cookie + * @resp: readdir result context + * @offset: offset cookie to encode + * + */ +void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset) +{ + if (!resp->offset) + return; + + *resp->offset = cpu_to_be32(offset); + resp->offset = NULL; +} + int nfssvc_encode_entry(void *ccdv, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) @@ -632,8 +647,7 @@ nfssvc_encode_entry(void *ccdv, const char *name, cd->common.err = nfserr_fbig; return -EINVAL; } - if (cd->offset) - *cd->offset = htonl(offset); + nfssvc_encode_nfscookie(cd, offset);
/* truncate filename */ namlen = min(namlen, NFS2_MAXNAMLEN); diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 337c581e15b4c..75b3b31445340 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -157,6 +157,7 @@ int nfssvc_encode_readres(struct svc_rqst *, __be32 *); int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *);
+void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8141d6a2bb6c655ff0c0b81ced80d9025f03e926 ]
Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 4 ---- fs/nfsd/nfsxdr.c | 3 ++- 2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 2d3d7cdffd52f..23b2a900cb79d 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -578,14 +578,12 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) struct nfsd_readdirargs *argp = rqstp->rq_argp; struct nfsd_readdirres *resp = rqstp->rq_resp; loff_t offset; - __be32 *buffer;
dprintk("nfsd: READDIR %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->cookie);
nfsd_init_dirlist_pages(rqstp, resp, argp->count); - buffer = resp->buffer;
resp->offset = NULL; resp->common.err = nfs_ok; @@ -593,8 +591,6 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, &resp->common, nfssvc_encode_entry); - - resp->count = resp->buffer - buffer; nfssvc_encode_nfscookie(resp, offset);
fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index a87b21cfe0d03..8ae23ed6dc5db 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -584,7 +584,7 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) p = resp->buffer; *p++ = 0; /* no more entries */ *p++ = htonl((resp->common.err == nfserr_eof)); - rqstp->rq_res.page_len = (((unsigned long)p-1) & ~PAGE_MASK)+1; + rqstp->rq_res.page_len = resp->count << 2;
return 1; } @@ -667,6 +667,7 @@ nfssvc_encode_entry(void *ccdv, const char *name, cd->offset = p; /* remember pointer */ *p++ = htonl(~0U); /* offset of next entry */
+ cd->count += p - cd->buffer; cd->buflen = buflen; cd->buffer = p; cd->common.err = nfs_ok;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 94c8f8c682a6497af7ea71351b18f637c6337d42 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 22 +++++++++++++--------- fs/nfsd/xdr.h | 1 + 2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 8ae23ed6dc5db..9522e5c5f49db 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -574,17 +574,21 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - xdr_ressize_check(rqstp, p); - p = resp->buffer; - *p++ = 0; /* no more entries */ - *p++ = htonl((resp->common.err == nfserr_eof)); - rqstp->rq_res.page_len = resp->count << 2; + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + xdr_write_pages(xdr, &resp->page, 0, resp->count << 2); + /* no more entries */ + if (xdr_stream_encode_item_absent(xdr) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) + return 0; + break; + }
return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 75b3b31445340..d7beef2c2ed5b 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -114,6 +114,7 @@ struct nfsd_readdirres { __be32 * buffer; int buflen; __be32 * offset; + struct page *page; };
struct nfsd_statfsres {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f5dcccd647da513a89f3b6ca392b0c1eb050b9fc ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 26 +++++++++++---- fs/nfsd/nfsxdr.c | 81 ++++++++++++++++++++++++++++++++++++++++++++--- fs/nfsd/xdr.h | 7 ++++ 3 files changed, 103 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 23b2a900cb79d..135c0bc468bce 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -559,14 +559,27 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd_readdirres *resp, int count) { + struct xdr_buf *buf = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + count = min_t(u32, count, PAGE_SIZE);
- /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->buflen = (count >> 2) - 2; + memset(buf, 0, sizeof(*buf));
- resp->buffer = page_address(*rqstp->rq_next_page); + /* Reserve room for the NULL ptr & eof flag (-2 words) */ + buf->buflen = count - sizeof(__be32) * 2; + buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++; + + /* This is xdr_init_encode(), but it assumes that + * the head kvec has already been consumed. */ + xdr_set_scratch_buffer(xdr, NULL, 0); + xdr->buf = buf; + xdr->page_ptr = buf->pages; + xdr->iov = NULL; + xdr->p = page_address(*buf->pages); + xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->rqst = NULL; }
/* @@ -585,12 +598,11 @@ nfsd_proc_readdir(struct svc_rqst *rqstp)
nfsd_init_dirlist_pages(rqstp, resp, argp->count);
- resp->offset = NULL; resp->common.err = nfs_ok; - /* Read directory and encode entries on the fly */ + resp->cookie_offset = 0; offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, - &resp->common, nfssvc_encode_entry); + &resp->common, nfs2svc_encode_entry); nfssvc_encode_nfscookie(resp, offset);
fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 9522e5c5f49db..1102d40ded03f 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -576,12 +576,13 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp; + struct xdr_buf *dirlist = &resp->dirlist;
if (!svcxdr_encode_stat(xdr, resp->status)) return 0; switch (resp->status) { case nfs_ok: - xdr_write_pages(xdr, &resp->page, 0, resp->count << 2); + xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) return 0; @@ -623,14 +624,86 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) * @resp: readdir result context * @offset: offset cookie to encode * + * The buffer space for the offset cookie has already been reserved + * by svcxdr_encode_entry_common(). */ void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset) { - if (!resp->offset) + __be32 cookie = cpu_to_be32(offset); + + if (!resp->cookie_offset) return;
- *resp->offset = cpu_to_be32(offset); - resp->offset = NULL; + write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, + sizeof(cookie)); + resp->cookie_offset = 0; +} + +static bool +svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, + int namlen, loff_t offset, u64 ino) +{ + struct xdr_buf *dirlist = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + /* fileid */ + if (xdr_stream_encode_u32(xdr, (u32)ino) < 0) + return false; + /* name */ + if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS2_MAXNAMLEN)) < 0) + return false; + /* cookie */ + resp->cookie_offset = dirlist->len; + if (xdr_stream_encode_u32(xdr, ~0U) < 0) + return false; + + return true; +} + +/** + * nfs2svc_encode_entry - encode one NFSv2 READDIR entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs2svc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd_readdirres *resp = container_of(ccd, + struct nfsd_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfssvc_encode_nfscookie(resp, offset); + + if (!svcxdr_encode_entry_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; }
int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index d7beef2c2ed5b..a065852c9ea86 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -106,15 +106,20 @@ struct nfsd_readres { };
struct nfsd_readdirres { + /* Components of the reply */ __be32 status;
int count;
+ /* Used to encode the reply's entry list */ + struct xdr_stream xdr; + struct xdr_buf dirlist; struct readdir_cd common; __be32 * buffer; int buflen; __be32 * offset; struct page *page; + unsigned int cookie_offset; };
struct nfsd_statfsres { @@ -159,6 +164,8 @@ int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *);
void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); +int nfs2svc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); int nfssvc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8a2cf9f5709cc20a1114a7d22655928314fc86f8 ]
Clean up.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 2 +- fs/nfsd/nfsxdr.c | 51 +++-------------------------------------------- fs/nfsd/xdr.h | 10 ++-------- 3 files changed, 6 insertions(+), 57 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 135c0bc468bce..72f8bc4a7ea48 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -602,7 +602,7 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) resp->cookie_offset = 0; offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, - &resp->common, nfs2svc_encode_entry); + &resp->common, nfssvc_encode_entry); nfssvc_encode_nfscookie(resp, offset);
fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1102d40ded03f..5df6f00d76fd5 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -663,7 +663,7 @@ svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, }
/** - * nfs2svc_encode_entry - encode one NFSv2 READDIR entry + * nfssvc_encode_entry - encode one NFSv2 READDIR entry * @data: directory context * @name: name of the object to be encoded * @namlen: length of that name, in bytes @@ -680,8 +680,8 @@ svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, * - resp->common.err * - resp->cookie_offset */ -int nfs2svc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type) +int nfssvc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) { struct readdir_cd *ccd = data; struct nfsd_readdirres *resp = container_of(ccd, @@ -706,51 +706,6 @@ int nfs2svc_encode_entry(void *data, const char *name, int namlen, return -EINVAL; }
-int -nfssvc_encode_entry(void *ccdv, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int d_type) -{ - struct readdir_cd *ccd = ccdv; - struct nfsd_readdirres *cd = container_of(ccd, struct nfsd_readdirres, common); - __be32 *p = cd->buffer; - int buflen, slen; - - /* - dprintk("nfsd: entry(%.*s off %ld ino %ld)\n", - namlen, name, offset, ino); - */ - - if (offset > ~((u32) 0)) { - cd->common.err = nfserr_fbig; - return -EINVAL; - } - nfssvc_encode_nfscookie(cd, offset); - - /* truncate filename */ - namlen = min(namlen, NFS2_MAXNAMLEN); - slen = XDR_QUADLEN(namlen); - - if ((buflen = cd->buflen - slen - 4) < 0) { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - if (ino > ~((u32) 0)) { - cd->common.err = nfserr_fbig; - return -EINVAL; - } - *p++ = xdr_one; /* mark entry present */ - *p++ = htonl((u32) ino); /* file id */ - p = xdr_encode_array(p, name, namlen);/* name length & name */ - cd->offset = p; /* remember pointer */ - *p++ = htonl(~0U); /* offset of next entry */ - - cd->count += p - cd->buffer; - cd->buflen = buflen; - cd->buffer = p; - cd->common.err = nfs_ok; - return 0; -} - /* * XDR release functions */ diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index a065852c9ea86..10f3bd25e8ccc 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -115,10 +115,6 @@ struct nfsd_readdirres { struct xdr_stream xdr; struct xdr_buf dirlist; struct readdir_cd common; - __be32 * buffer; - int buflen; - __be32 * offset; - struct page *page; unsigned int cookie_offset; };
@@ -164,10 +160,8 @@ int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *);
void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); -int nfs2svc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type); -int nfssvc_encode_entry(void *, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int); +int nfssvc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type);
void nfssvc_release_attrstat(struct svc_rqst *rqstp); void nfssvc_release_diropres(struct svc_rqst *rqstp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8edc0648880a151026fe625fa1b76772b5766f68 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs_common/nfsacl.c | 71 ++++++++++++++++++++++++++++++++++++++++++ include/linux/nfsacl.h | 3 ++ 2 files changed, 74 insertions(+)
diff --git a/fs/nfs_common/nfsacl.c b/fs/nfs_common/nfsacl.c index 79c563c1a5e84..5a5bd85d08f8c 100644 --- a/fs/nfs_common/nfsacl.c +++ b/fs/nfs_common/nfsacl.c @@ -136,6 +136,77 @@ int nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, } EXPORT_SYMBOL_GPL(nfsacl_encode);
+/** + * nfs_stream_encode_acl - Encode an NFSv3 ACL + * + * @xdr: an xdr_stream positioned to receive an encoded ACL + * @inode: inode of file whose ACL this is + * @acl: posix_acl to encode + * @encode_entries: whether to encode ACEs as well + * @typeflag: ACL type: NFS_ACL_DEFAULT or zero + * + * Return values: + * %false: The ACL could not be encoded + * %true: @xdr is advanced to the next available position + */ +bool nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, + struct posix_acl *acl, int encode_entries, + int typeflag) +{ + const size_t elem_size = XDR_UNIT * 3; + u32 entries = (acl && acl->a_count) ? max_t(int, acl->a_count, 4) : 0; + struct nfsacl_encode_desc nfsacl_desc = { + .desc = { + .elem_size = elem_size, + .array_len = encode_entries ? entries : 0, + .xcode = xdr_nfsace_encode, + }, + .acl = acl, + .typeflag = typeflag, + .uid = inode->i_uid, + .gid = inode->i_gid, + }; + struct nfsacl_simple_acl aclbuf; + unsigned int base; + int err; + + if (entries > NFS_ACL_MAX_ENTRIES) + return false; + if (xdr_stream_encode_u32(xdr, entries) < 0) + return false; + + if (encode_entries && acl && acl->a_count == 3) { + struct posix_acl *acl2 = &aclbuf.acl; + + /* Avoid the use of posix_acl_alloc(). nfsacl_encode() is + * invoked in contexts where a memory allocation failure is + * fatal. Fortunately this fake ACL is small enough to + * construct on the stack. */ + posix_acl_init(acl2, 4); + + /* Insert entries in canonical order: other orders seem + to confuse Solaris VxFS. */ + acl2->a_entries[0] = acl->a_entries[0]; /* ACL_USER_OBJ */ + acl2->a_entries[1] = acl->a_entries[1]; /* ACL_GROUP_OBJ */ + acl2->a_entries[2] = acl->a_entries[1]; /* ACL_MASK */ + acl2->a_entries[2].e_tag = ACL_MASK; + acl2->a_entries[3] = acl->a_entries[2]; /* ACL_OTHER */ + nfsacl_desc.acl = acl2; + } + + base = xdr_stream_pos(xdr); + if (!xdr_reserve_space(xdr, XDR_UNIT + + elem_size * nfsacl_desc.desc.array_len)) + return false; + err = xdr_encode_array2(xdr->buf, base, &nfsacl_desc.desc); + if (err) + return false; + + return true; +} +EXPORT_SYMBOL_GPL(nfs_stream_encode_acl); + + struct nfsacl_decode_desc { struct xdr_array2_desc desc; unsigned int count; diff --git a/include/linux/nfsacl.h b/include/linux/nfsacl.h index 0ba99c5136491..8e76a79cdc6ae 100644 --- a/include/linux/nfsacl.h +++ b/include/linux/nfsacl.h @@ -41,5 +41,8 @@ nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, extern bool nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, struct posix_acl **pacl); +extern bool +nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, + struct posix_acl *acl, int encode_entries, int typeflag);
#endif /* __LINUX_NFSACL_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f8cba47344f794b54373189bec23195b51020faf ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 42 ++++++++++++++++-------------------------- fs/nfsd/nfsxdr.c | 24 ++++++++++++++++++++++-- fs/nfsd/xdr.h | 3 +++ 3 files changed, 41 insertions(+), 28 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 7eeac5b81c200..102f8a9a235cb 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -240,51 +240,41 @@ static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) /* GETACL */ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; - struct kvec *head = rqstp->rq_res.head; - unsigned int base; - int n; int w;
- *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0;
- /* - * Since this is version 2, the check for nfserr in - * nfsd_dispatch actually ensures the following cannot happen. - * However, it seems fragile to depend on that. - */ if (dentry == NULL || d_really_is_negative(dentry)) - return 0; + return 1; inode = d_inode(dentry);
- p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->mask); - if (!xdr_ressize_check(rqstp, p)) + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return 0; - base = (char *)p - (char *)head->iov_base;
rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 0; + return 1; w -= PAGE_SIZE; }
- n = nfsacl_encode(&rqstp->rq_res, base, inode, - resp->acl_access, - resp->mask & NFS_ACL, 0); - if (n > 0) - n = nfsacl_encode(&rqstp->rq_res, base + n, inode, - resp->acl_default, - resp->mask & NFS_DFACL, - NFS_ACL_DEFAULT); - return (n > 0); + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, + resp->mask & NFS_ACL, 0)) + return 0; + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, + resp->mask & NFS_DFACL, NFS_ACL_DEFAULT)) + return 0; + + return 1; }
static int nfsaclsvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5df6f00d76fd5..1fed3a8deb183 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,7 +26,16 @@ static const u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */
-static bool +/** + * svcxdr_encode_stat - Encode an NFSv2 status code + * @xdr: XDR stream + * @status: status value to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status) { __be32 *p; @@ -250,7 +259,18 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; }
-static bool +/** + * svcxdr_encode_fattr - Encode NFSv2 file attributes + * @rqstp: Context of a completed RPC transaction + * @xdr: XDR stream + * @fhp: File handle to encode + * @stat: Attributes to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp, const struct kstat *stat) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 10f3bd25e8ccc..8bcdc37398ab5 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -170,5 +170,8 @@ void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); +bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status); +bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat);
#endif /* LINUX_NFSD_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 778f068fa0c0846b650ebdb8795fd51b5badc332 ]
The SETACL result encoder is exactly the same as the NFSv2 attrstatres decoder.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 102f8a9a235cb..ef06a2a384bea 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -363,8 +363,8 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_SETACL] = { .pc_func = nfsacld_proc_setacl, .pc_decode = nfsaclsvc_decode_setaclargs, - .pc_encode = nfsaclsvc_encode_attrstatres, - .pc_release = nfsaclsvc_release_attrstat, + .pc_encode = nfssvc_encode_attrstatres, + .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8d2009a10b3abaa12a39deb4876b215714993fe8 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 24 ++---------------------- 1 file changed, 2 insertions(+), 22 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index ef06a2a384bea..c805ac8dd7e77 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -277,19 +277,6 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static int nfsaclsvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_attrstat *resp = rqstp->rq_resp; - - *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - - p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); -} - /* ACCESS */ static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { @@ -317,13 +304,6 @@ static void nfsaclsvc_release_getacl(struct svc_rqst *rqstp) posix_acl_release(resp->acl_default); }
-static void nfsaclsvc_release_attrstat(struct svc_rqst *rqstp) -{ - struct nfsd_attrstat *resp = rqstp->rq_resp; - - fh_put(&resp->fh); -} - static void nfsaclsvc_release_access(struct svc_rqst *rqstp) { struct nfsd3_accessres *resp = rqstp->rq_resp; @@ -374,8 +354,8 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfsaclsvc_encode_attrstatres, - .pc_release = nfsaclsvc_release_attrstat, + .pc_encode = nfssvc_encode_attrstatres, + .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 07f5c2963c04b11603e9667f89bb430c132e9cc1 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index c805ac8dd7e77..8703326fc1654 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -280,16 +280,21 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) /* ACCESS */ static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp;
- *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->access) < 0) + return 0; + break; + }
- p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->access); -out: - return xdr_ressize_check(rqstp, p); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 83d0b84572775a29f800de67a1b9b642a5376bc3 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsxdr.c | 64 ------------------------------------------------ fs/nfsd/xdr.h | 1 - 2 files changed, 65 deletions(-)
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1fed3a8deb183..b800cfefcab7a 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -201,64 +201,6 @@ svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-static __be32 * -encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, - struct kstat *stat) -{ - struct user_namespace *userns = nfsd_user_namespace(rqstp); - struct dentry *dentry = fhp->fh_dentry; - int type; - struct timespec64 time; - u32 f; - - type = (stat->mode & S_IFMT); - - *p++ = htonl(nfs_ftypes[type >> 12]); - *p++ = htonl((u32) stat->mode); - *p++ = htonl((u32) stat->nlink); - *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); - *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); - - if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) { - *p++ = htonl(NFS_MAXPATHLEN); - } else { - *p++ = htonl((u32) stat->size); - } - *p++ = htonl((u32) stat->blksize); - if (S_ISCHR(type) || S_ISBLK(type)) - *p++ = htonl(new_encode_dev(stat->rdev)); - else - *p++ = htonl(0xffffffff); - *p++ = htonl((u32) stat->blocks); - switch (fsid_source(fhp)) { - default: - case FSIDSOURCE_DEV: - *p++ = htonl(new_encode_dev(stat->dev)); - break; - case FSIDSOURCE_FSID: - *p++ = htonl((u32) fhp->fh_export->ex_fsid); - break; - case FSIDSOURCE_UUID: - f = ((u32*)fhp->fh_export->ex_uuid)[0]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[1]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[2]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[3]; - *p++ = htonl(f); - break; - } - *p++ = htonl((u32) stat->ino); - *p++ = htonl((u32) stat->atime.tv_sec); - *p++ = htonl(stat->atime.tv_nsec ? stat->atime.tv_nsec / 1000 : 0); - time = stat->mtime; - lease_get_mtime(d_inode(dentry), &time); - *p++ = htonl((u32) time.tv_sec); - *p++ = htonl(time.tv_nsec ? time.tv_nsec / 1000 : 0); - *p++ = htonl((u32) stat->ctime.tv_sec); - *p++ = htonl(stat->ctime.tv_nsec ? stat->ctime.tv_nsec / 1000 : 0); - - return p; -} - /** * svcxdr_encode_fattr - Encode NFSv2 file attributes * @rqstp: Context of a completed RPC transaction @@ -328,12 +270,6 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-/* Helper function for NFSv2 ACL code */ -__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) -{ - return encode_fattr(rqstp, p, fhp, stat); -} - /* * XDR decode functions */ diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 8bcdc37398ab5..c67ad02b9a028 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -168,7 +168,6 @@ void nfssvc_release_diropres(struct svc_rqst *rqstp); void nfssvc_release_readres(struct svc_rqst *rqstp);
/* Helper functions for NFSv2 ACL code */ -__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status); bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 20798dfe249a01ad1b12eec7dbc572db5003244a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 33 +++++++++++++++++++-------------- fs/nfsd/nfs3xdr.c | 23 +++++++++++++++++++++-- fs/nfsd/xdr3.h | 3 +++ 3 files changed, 43 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index a568b842e9ebe..04e157b0b201a 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -166,22 +166,25 @@ static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) /* GETACL */ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; + struct kvec *head = rqstp->rq_res.head; + struct inode *inode = d_inode(dentry); + unsigned int base; + int n; + int w;
- *p++ = resp->status; - p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0 && dentry && d_really_is_positive(dentry)) { - struct inode *inode = d_inode(dentry); - struct kvec *head = rqstp->rq_res.head; - unsigned int base; - int n; - int w; - - *p++ = htonl(resp->mask); - if (!xdr_ressize_check(rqstp, p)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - base = (char *)p - (char *)head->iov_base; + if (xdr_stream_encode_u32(xdr, resp->mask) < 0) + return 0; + + base = (char *)xdr->p - (char *)head->iov_base;
rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, @@ -202,9 +205,11 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) NFS_ACL_DEFAULT); if (n <= 0) return 0; - } else - if (!xdr_ressize_check(rqstp, p)) + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; + }
return 1; } diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 646bbfc5b7794..941740a97f8f5 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -107,7 +107,16 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; }
-static bool +/** + * svcxdr_encode_nfsstat3 - Encode an NFSv3 status code + * @xdr: XDR stream + * @status: status value to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) { __be32 *p; @@ -464,7 +473,17 @@ svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) return svcxdr_encode_wcc_attr(xdr, fhp); }
-static bool +/** + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes + * @rqstp: Context of a completed RPC transaction + * @xdr: XDR stream + * @fhp: File handle to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index b851458373db6..746c5f79964f1 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -308,5 +308,8 @@ int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); +bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status); +bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp);
#endif /* _LINUX_NFSD_XDR3_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 15e432bf0cfd1e6aebfa9ffd4e0cc2ff4f3ae2db ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 04e157b0b201a..cfb686f23e571 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -217,11 +217,11 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) /* SETACL */ static int nfs3svc_encode_setaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
- *p++ = resp->status; - p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1416f435303d81070c6bcf5a4a9b4ed0f7a9f013 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 86 ----------------------------------------------- fs/nfsd/xdr3.h | 2 -- 2 files changed, 88 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 941740a97f8f5..fcfa0d611b931 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -48,13 +48,6 @@ static const u32 nfs3_ftypes[] = { * Basic NFSv3 data types (RFC 1813 Sections 2.5 and 2.6) */
-static __be32 * -encode_time3(__be32 *p, struct timespec64 *time) -{ - *p++ = htonl((u32) time->tv_sec); *p++ = htonl(time->tv_nsec); - return p; -} - static __be32 * encode_nfstime3(__be32 *p, const struct timespec64 *time) { @@ -396,54 +389,6 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) -{ - u64 f; - switch(fsid_source(fhp)) { - default: - case FSIDSOURCE_DEV: - p = xdr_encode_hyper(p, (u64)huge_encode_dev - (fhp->fh_dentry->d_sb->s_dev)); - break; - case FSIDSOURCE_FSID: - p = xdr_encode_hyper(p, (u64) fhp->fh_export->ex_fsid); - break; - case FSIDSOURCE_UUID: - f = ((u64*)fhp->fh_export->ex_uuid)[0]; - f ^= ((u64*)fhp->fh_export->ex_uuid)[1]; - p = xdr_encode_hyper(p, f); - break; - } - return p; -} - -static __be32 * -encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, - struct kstat *stat) -{ - struct user_namespace *userns = nfsd_user_namespace(rqstp); - *p++ = htonl(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); - *p++ = htonl((u32) (stat->mode & S_IALLUGO)); - *p++ = htonl((u32) stat->nlink); - *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); - *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); - if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) { - p = xdr_encode_hyper(p, (u64) NFS3_MAXPATHLEN); - } else { - p = xdr_encode_hyper(p, (u64) stat->size); - } - p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); - *p++ = htonl((u32) MAJOR(stat->rdev)); - *p++ = htonl((u32) MINOR(stat->rdev)); - p = encode_fsid(p, fhp); - p = xdr_encode_hyper(p, stat->ino); - p = encode_time3(p, &stat->atime); - p = encode_time3(p, &stat->mtime); - p = encode_time3(p, &stat->ctime); - - return p; -} - static bool svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) { @@ -512,37 +457,6 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return xdr_stream_encode_item_absent(xdr) > 0; }
-/* - * Encode post-operation attributes. - * The inode may be NULL if the call failed because of a stale file - * handle. In this case, no attributes are returned. - */ -static __be32 * -encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - struct dentry *dentry = fhp->fh_dentry; - if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { - __be32 err; - struct kstat stat; - - err = fh_getattr(fhp, &stat); - if (!err) { - *p++ = xdr_one; /* attributes follow */ - lease_get_mtime(d_inode(dentry), &stat.mtime); - return encode_fattr3(rqstp, p, fhp, &stat); - } - } - *p++ = xdr_zero; - return p; -} - -/* Helper for NFSv3 ACLs */ -__be32 * -nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - return encode_post_op_attr(rqstp, p, fhp); -} - /* * Encode weak cache consistency data */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 746c5f79964f1..933008382bbeb 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -305,8 +305,6 @@ int nfs3svc_encode_entry3(void *data, const char *name, int namlen, int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type); /* Helper functions for NFSv3 ACL code */ -__be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, - struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status); bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6019ce0742ca55d3e45279a19b07d1542747a098 ]
Enable watching the progress of directory encoding to capture the timing of any issues with reading or encoding a directory. The new tracepoint captures dirent encoding for all NFS versions.
For example, here's what a few NFSv4 directory entries might look like:
nfsd-989 [002] 468.596265: nfsd_dirent: fh_hash=0x5d162594 ino=2 name=. nfsd-989 [002] 468.596267: nfsd_dirent: fh_hash=0x5d162594 ino=1 name=.. nfsd-989 [002] 468.596299: nfsd_dirent: fh_hash=0x5d162594 ino=3827 name=zlib.c nfsd-989 [002] 468.596325: nfsd_dirent: fh_hash=0x5d162594 ino=3811 name=xdiff nfsd-989 [002] 468.596351: nfsd_dirent: fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h nfsd-989 [002] 468.596377: nfsd_dirent: fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 24 ++++++++++++++++++++++++ fs/nfsd/vfs.c | 9 ++++++--- 2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2bc2a888f7fa8..d20ab767ba26a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -391,6 +391,30 @@ DEFINE_EVENT(nfsd_err_class, nfsd_##name, \ DEFINE_NFSD_ERR_EVENT(read_err); DEFINE_NFSD_ERR_EVENT(write_err);
+TRACE_EVENT(nfsd_dirent, + TP_PROTO(struct svc_fh *fhp, + u64 ino, + const char *name, + int namlen), + TP_ARGS(fhp, ino, name, namlen), + TP_STRUCT__entry( + __field(u32, fh_hash) + __field(u64, ino) + __field(int, len) + __dynamic_array(unsigned char, name, namlen) + ), + TP_fast_assign( + __entry->fh_hash = fhp ? knfsd_fh_hash(&fhp->fh_handle) : 0; + __entry->ino = ino; + __entry->len = namlen; + memcpy(__get_str(name), name, namlen); + __assign_str(name, name); + ), + TP_printk("fh_hash=0x%08x ino=%llu name=%.*s", + __entry->fh_hash, __entry->ino, + __entry->len, __get_str(name)) +) + #include "state.h" #include "filecache.h" #include "vfs.h" diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index d12c3e71ca10e..520e55c35e742 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1979,8 +1979,9 @@ static int nfsd_buffered_filldir(struct dir_context *ctx, const char *name, return 0; }
-static __be32 nfsd_buffered_readdir(struct file *file, nfsd_filldir_t func, - struct readdir_cd *cdp, loff_t *offsetp) +static __be32 nfsd_buffered_readdir(struct file *file, struct svc_fh *fhp, + nfsd_filldir_t func, struct readdir_cd *cdp, + loff_t *offsetp) { struct buffered_dirent *de; int host_err; @@ -2026,6 +2027,8 @@ static __be32 nfsd_buffered_readdir(struct file *file, nfsd_filldir_t func, if (cdp->err != nfs_ok) break;
+ trace_nfsd_dirent(fhp, de->ino, de->name, de->namlen); + reclen = ALIGN(sizeof(*de) + de->namlen, sizeof(u64)); size -= reclen; @@ -2073,7 +2076,7 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp, goto out_close; }
- err = nfsd_buffered_readdir(file, func, cdp, offsetp); + err = nfsd_buffered_readdir(file, fhp, func, cdp, offsetp);
if (err == nfserr_eof || err == nfserr_toosmall) err = nfs_ok; /* can still be found in ->err */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 219a170502b3d597c52eeec088aee8fbf7b90da5 ]
These are no longer needed because there are no dprintk() call sites in these files.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 3 --- fs/nfsd/nfsxdr.c | 2 -- 2 files changed, 5 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index fcfa0d611b931..0a5ebc52e6a9c 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -14,9 +14,6 @@ #include "netns.h" #include "vfs.h"
-#define NFSDDBG_FACILITY NFSDDBG_XDR - - /* * Force construction of an empty post-op attr */ diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index b800cfefcab7a..a06c05fe3b421 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -9,8 +9,6 @@ #include "xdr.h" #include "auth.h"
-#define NFSDDBG_FACILITY NFSDDBG_XDR - /* * Mapping of S_IF* types to NFS file types */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 7f7e7a4006f74b031718055a0751c70c2e3d5e7e ]
We do this same logic repeatedly, and it's easy to get the sense of the comparison wrong.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 49 +++++++++++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 22 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 914f60cee3226..0afc14b3e4593 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5385,6 +5385,22 @@ static bool clients_still_reclaiming(struct nfsd_net *nn) return true; }
+struct laundry_time { + time64_t cutoff; + time64_t new_timeo; +}; + +static bool state_expired(struct laundry_time *lt, time64_t last_refresh) +{ + time64_t time_remaining; + + if (last_refresh < lt->cutoff) + return true; + time_remaining = last_refresh - lt->cutoff; + lt->new_timeo = min(lt->new_timeo, time_remaining); + return false; +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5394,14 +5410,16 @@ nfs4_laundromat(struct nfsd_net *nn) struct nfs4_ol_stateid *stp; struct nfsd4_blocked_lock *nbl; struct list_head *pos, *next, reaplist; - time64_t cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease; - time64_t t, new_timeo = nn->nfsd4_lease; + struct laundry_time lt = { + .cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease, + .new_timeo = nn->nfsd4_lease + }; struct nfs4_cpntf_state *cps; copy_stateid_t *cps_t; int i;
if (clients_still_reclaiming(nn)) { - new_timeo = 0; + lt.new_timeo = 0; goto out; } nfsd4_end_grace(nn); @@ -5411,7 +5429,7 @@ nfs4_laundromat(struct nfsd_net *nn) idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { cps = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); if (cps->cp_stateid.sc_type == NFS4_COPYNOTIFY_STID && - cps->cpntf_time < cutoff) + state_expired(<, cps->cpntf_time)) _free_cpntf_state_locked(nn, cps); } spin_unlock(&nn->s2s_cp_lock); @@ -5419,11 +5437,8 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { clp = list_entry(pos, struct nfs4_client, cl_lru); - if (clp->cl_time > cutoff) { - t = clp->cl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, clp->cl_time)) break; - } if (mark_client_expired_locked(clp)) { trace_nfsd_clid_expired(&clp->cl_clientid); continue; @@ -5440,11 +5455,8 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); - if (dp->dl_time > cutoff) { - t = dp->dl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, dp->dl_time)) break; - } WARN_ON(!unhash_delegation_locked(dp)); list_add(&dp->dl_recall_lru, &reaplist); } @@ -5460,11 +5472,8 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->close_lru)) { oo = list_first_entry(&nn->close_lru, struct nfs4_openowner, oo_close_lru); - if (oo->oo_time > cutoff) { - t = oo->oo_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, oo->oo_time)) break; - } list_del_init(&oo->oo_close_lru); stp = oo->oo_last_closed_stid; oo->oo_last_closed_stid = NULL; @@ -5490,11 +5499,8 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->blocked_locks_lru)) { nbl = list_first_entry(&nn->blocked_locks_lru, struct nfsd4_blocked_lock, nbl_lru); - if (nbl->nbl_time > cutoff) { - t = nbl->nbl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, nbl->nbl_time)) break; - } list_move(&nbl->nbl_lru, &reaplist); list_del_init(&nbl->nbl_list); } @@ -5507,8 +5513,7 @@ nfs4_laundromat(struct nfsd_net *nn) free_blocked_lock(nbl); } out: - new_timeo = max_t(time64_t, new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); - return new_timeo; + return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); }
static struct workqueue_struct *laundry_wq;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Menzel pmenzel@molgen.mpg.de
[ Upstream commit f988a7b71d1e66e63f79cd59c763875347943a7a ]
`printk()`, by default, uses the log level warning, which leaves the user reading
NFSD: Using UMH upcall client tracking operations.
wondering what to do about it (`dmesg --level=warn`).
Several client tracking methods are tried, and expected to fail. That’s why a message is printed only on success. It might be interesting for users to know the chosen method, so use info-level instead of debug-level.
Cc: linux-nfs@vger.kernel.org Signed-off-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4recover.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 83c4e68839537..d08c1a8c9254b 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -626,7 +626,7 @@ nfsd4_legacy_tracking_init(struct net *net) status = nfsd4_load_reboot_recovery_data(net); if (status) goto err; - printk("NFSD: Using legacy client tracking operations.\n"); + pr_info("NFSD: Using legacy client tracking operations.\n"); return 0;
err: @@ -1030,7 +1030,7 @@ nfsd4_init_cld_pipe(struct net *net)
status = __nfsd4_init_cld_pipe(net); if (!status) - printk("NFSD: Using old nfsdcld client tracking operations.\n"); + pr_info("NFSD: Using old nfsdcld client tracking operations.\n"); return status; }
@@ -1607,7 +1607,7 @@ nfsd4_cld_tracking_init(struct net *net) nfs4_release_reclaim(nn); goto err_remove; } else - printk("NFSD: Using nfsdcld client tracking operations.\n"); + pr_info("NFSD: Using nfsdcld client tracking operations.\n"); return 0;
err_remove: @@ -1866,7 +1866,7 @@ nfsd4_umh_cltrack_init(struct net *net) ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL); kfree(grace_start); if (!ret) - printk("NFSD: Using UMH upcall client tracking operations.\n"); + pr_info("NFSD: Using UMH upcall client tracking operations.\n"); return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Ribalda ribalda@chromium.org
[ Upstream commit 34a624931b8c12b435b5009edc5897e4630107bc ]
Trivial fix.
Cc: linux-nfs@vger.kernel.org Signed-off-by: Ricardo Ribalda ribalda@chromium.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index d6cff5fbe705b..5fa38ad9e7e3f 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -99,7 +99,7 @@ config NFSD_BLOCKLAYOUT help This option enables support for the exporting pNFS block layouts in the kernel's NFS server. The pNFS block layout enables NFS - clients to directly perform I/O to block devices accesible to both + clients to directly perform I/O to block devices accessible to both the server and the clients. See RFC 5663 for more details.
If unsure, say N. @@ -113,7 +113,7 @@ config NFSD_SCSILAYOUT help This option enables support for the exporting pNFS SCSI layouts in the kernel's NFS server. The pNFS SCSI layout enables NFS - clients to directly perform I/O to SCSI devices accesible to both + clients to directly perform I/O to SCSI devices accessible to both the server and the clients. See draft-ietf-nfsv4-scsi-layout for more details.
@@ -127,7 +127,7 @@ config NFSD_FLEXFILELAYOUT This option enables support for the exporting pNFS Flex File layouts in the kernel's NFS server. The pNFS Flex File layout enables NFS clients to directly perform I/O to NFSv3 devices - accesible to both the server and the clients. See + accessible to both the server and the clients. See draft-ietf-nfsv4-flex-files for more details.
Warning, this server implements the bare minimum functionality
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 792a5112aa90e59c048b601c6382fe3498d75db7 ]
A count of 0 (zero) requests that all bytes from ca_src_offset through EOF be copied to the destination.
Reported-by: radchenkoy@gmail.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 2fba0808d975c..949d9cedef5d1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1380,6 +1380,9 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos;
+ /* See RFC 7862 p.67: */ + if (bytes_total == 0) + bytes_total = ULLONG_MAX; do { if (kthread_should_stop()) break;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit e7a833e9cc6c3b58fe94f049d2b40943cba07086 ]
Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined by the protocol to be 64 bit, so we could be turning a large copy into a 0-length copy here.
Reported-by: radchenkoy@gmail.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 949d9cedef5d1..f85958f81a266 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1376,7 +1376,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) struct file *dst = copy->nf_dst->nf_file; struct file *src = copy->nf_src->nf_file; ssize_t bytes_copied = 0; - size_t bytes_total = copy->cp_count; + u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 472d155a0631bd1a09b5c0c275a254e65605d683 ]
mountd can now monitor clients appearing and disappearing in /proc/fs/nfsd/clients, and will log these events, in liu of the logging of mount/unmount events for NFSv3.
Currently it cannot distinguish between unconfirmed clients (which might be transient and totally uninteresting) and confirmed clients.
So add a "status: " line which reports either "confirmed" or "unconfirmed", and use fsnotify to report that the info file has been modified.
This requires a bit of infrastructure to keep the dentry for the "info" file. There is no need to take a counted reference as the dentry must remain around until the client is removed.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 19 +++++++++++++++---- fs/nfsd/nfsctl.c | 14 ++++++++------ fs/nfsd/nfsd.h | 4 +++- fs/nfsd/state.h | 4 ++++ 4 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0afc14b3e4593..104d563636540 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -43,6 +43,7 @@ #include <linux/sunrpc/addr.h> #include <linux/jhash.h> #include <linux/string_helpers.h> +#include <linux/fsnotify.h> #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -2370,6 +2371,10 @@ static int client_info_show(struct seq_file *m, void *v) memcpy(&clid, &clp->cl_clientid, sizeof(clid)); seq_printf(m, "clientid: 0x%llx\n", clid); seq_printf(m, "address: "%pISpc"\n", (struct sockaddr *)&clp->cl_addr); + if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) + seq_puts(m, "status: confirmed\n"); + else + seq_puts(m, "status: unconfirmed\n"); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion); @@ -2724,6 +2729,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, int ret; struct net *net = SVC_NET(rqstp); struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct dentry *dentries[ARRAY_SIZE(client_files)];
clp = alloc_client(name); if (clp == NULL) @@ -2743,9 +2749,11 @@ static struct nfs4_client *create_client(struct xdr_netobj name, memcpy(&clp->cl_addr, sa, sizeof(struct sockaddr_storage)); clp->cl_cb_session = NULL; clp->net = net; - clp->cl_nfsd_dentry = nfsd_client_mkdir(nn, &clp->cl_nfsdfs, - clp->cl_clientid.cl_id - nn->clientid_base, - client_files); + clp->cl_nfsd_dentry = nfsd_client_mkdir( + nn, &clp->cl_nfsdfs, + clp->cl_clientid.cl_id - nn->clientid_base, + client_files, dentries); + clp->cl_nfsd_info_dentry = dentries[0]; if (!clp->cl_nfsd_dentry) { free_client(clp); return NULL; @@ -2820,7 +2828,10 @@ move_to_confirmed(struct nfs4_client *clp) list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags); + if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags) && + clp->cl_nfsd_dentry && + clp->cl_nfsd_info_dentry) + fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); renew_client_locked(clp); }
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index fa060f91df1b8..147ed8995a79f 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1270,7 +1270,8 @@ static void nfsdfs_remove_files(struct dentry *root) /* XXX: cut'n'paste from simple_fill_super; figure out if we could share * code instead. */ static int nfsdfs_create_files(struct dentry *root, - const struct tree_descr *files) + const struct tree_descr *files, + struct dentry **fdentries) { struct inode *dir = d_inode(root); struct inode *inode; @@ -1279,8 +1280,6 @@ static int nfsdfs_create_files(struct dentry *root,
inode_lock(dir); for (i = 0; files->name && files->name[0]; i++, files++) { - if (!files->name) - continue; dentry = d_alloc_name(root, files->name); if (!dentry) goto out; @@ -1294,6 +1293,8 @@ static int nfsdfs_create_files(struct dentry *root, inode->i_private = __get_nfsdfs_client(dir); d_add(dentry, inode); fsnotify_create(dir, dentry); + if (fdentries) + fdentries[i] = dentry; } inode_unlock(dir); return 0; @@ -1305,8 +1306,9 @@ static int nfsdfs_create_files(struct dentry *root,
/* on success, returns positive number unique to that client. */ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, - const struct tree_descr *files) + struct nfsdfs_client *ncl, u32 id, + const struct tree_descr *files, + struct dentry **fdentries) { struct dentry *dentry; char name[11]; @@ -1317,7 +1319,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, dentry = nfsd_mkdir(nn->nfsd_client_dir, ncl, name); if (IS_ERR(dentry)) /* XXX: tossing errors? */ return NULL; - ret = nfsdfs_create_files(dentry, files); + ret = nfsdfs_create_files(dentry, files, fdentries); if (ret) { nfsd_client_rmdir(dentry); return NULL; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 27c1308ffc2ba..14dbfa75059d5 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -106,7 +106,9 @@ struct nfsdfs_client {
struct nfsdfs_client *get_nfsdfs_client(struct inode *); struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, const struct tree_descr *); + struct nfsdfs_client *ncl, u32 id, + const struct tree_descr *, + struct dentry **fdentries); void nfsd_client_rmdir(struct dentry *dentry);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 73deea3531699..54cab651ac1d0 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -371,6 +371,10 @@ struct nfs4_client {
/* debugging info directory under nfsd/clients/ : */ struct dentry *cl_nfsd_dentry; + /* 'info' file within that directory. Ref is not counted, + * but will remain valid iff cl_nfsd_dentry != NULL + */ + struct dentry *cl_nfsd_info_dentry;
/* for nfs41 callbacks */ /* We currently support a single back channel with a single slot */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7dcfbd86adc45f6d6b37278efd22530cf80ab474 ]
Prepare svc_xprt_received() to be called from transport code instead of from generic RPC server code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/svc_xprt.h | 1 + include/trace/events/sunrpc.h | 1 + net/sunrpc/svc_xprt.c | 13 +++++++++---- 3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 92455e0d52445..c5278871f9e40 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -130,6 +130,7 @@ void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, int svc_create_xprt(struct svc_serv *, const char *, struct net *, const int, const unsigned short, int, const struct cred *); +void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 200978b94a0b9..657138ab1f91f 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1756,6 +1756,7 @@ DECLARE_EVENT_CLASS(svc_xprt_event, ), \ TP_ARGS(xprt))
+DEFINE_SVC_XPRT_EVENT(received); DEFINE_SVC_XPRT_EVENT(no_write_space); DEFINE_SVC_XPRT_EVENT(close); DEFINE_SVC_XPRT_EVENT(detach); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 06e503466c32c..d150e25b4d45e 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -233,21 +233,25 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl, return xprt; }
-/* - * svc_xprt_received conditionally queues the transport for processing - * by another thread. The caller must hold the XPT_BUSY bit and must +/** + * svc_xprt_received - start next receiver thread + * @xprt: controlling transport + * + * The caller must hold the XPT_BUSY bit and must * not thereafter touch transport data. * * Note: XPT_DATA only gets cleared when a read-attempt finds no (or * insufficient) data. */ -static void svc_xprt_received(struct svc_xprt *xprt) +void svc_xprt_received(struct svc_xprt *xprt) { if (!test_bit(XPT_BUSY, &xprt->xpt_flags)) { WARN_ONCE(1, "xprt=0x%p already busy!", xprt); return; }
+ trace_svc_xprt_received(xprt); + /* As soon as we clear busy, the xprt could be closed and * 'put', so we need a reference to call svc_enqueue_xprt with: */ @@ -257,6 +261,7 @@ static void svc_xprt_received(struct svc_xprt *xprt) xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); svc_xprt_put(xprt); } +EXPORT_SYMBOL_GPL(svc_xprt_received);
void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavoars@kernel.org
[ Upstream commit c0a744dcaa29e9537e8607ae9c965ad936124a4d ]
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2].
Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged:
$ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o struct nfs_fhbase_new { union { struct { __u8 fb_version_aux; /* 0 1 */ __u8 fb_auth_type_aux; /* 1 1 */ __u8 fb_fsid_type_aux; /* 2 1 */ __u8 fb_fileid_type_aux; /* 3 1 */ __u32 fb_auth[1]; /* 4 4 */ }; /* 0 8 */ struct { __u8 fb_version; /* 0 1 */ __u8 fb_auth_type; /* 1 1 */ __u8 fb_fsid_type; /* 2 1 */ __u8 fb_fileid_type; /* 3 1 */ __u32 fb_auth_flex[0]; /* 4 0 */ }; /* 0 4 */ }; /* 0 8 */
/* size: 8, cachelines: 1, members: 1 */ /* last cacheline: 8 bytes */ };
Also, this helps with the ongoing efforts to enable -Warray-bounds by fixing the following warnings:
fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’: fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 191 | ntohl((__force __be32)fh->fh_fsid[1]))); | ~~~~~~~~~~~^~~ ./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’ 12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi)) | ^~ ./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’ 40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x)) | ^~~~~~~~ ./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’ 136 | #define ___ntohl(x) __be32_to_cpu(x) | ^~~~~~~~~~~~~ ./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’ 140 | #define ntohl(x) ___ntohl(x) | ^~~~~~~~ fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’ 191 | ntohl((__force __be32)fh->fh_fsid[1]))); | ^~~~~ fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 192 | fh->fh_fsid[1] = fh->fh_fsid[2]; | ~~~~~~~~~~~^~~ fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 192 | fh->fh_fsid[1] = fh->fh_fsid[2]; | ~~~~~~~~~~~^~~
[1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-an...
Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/nfsd/nfsfh.h | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h index ff0ca88b1c8f6..427294dd56a1b 100644 --- a/include/uapi/linux/nfsd/nfsfh.h +++ b/include/uapi/linux/nfsd/nfsfh.h @@ -64,13 +64,24 @@ struct nfs_fhbase_old { * in include/linux/exportfs.h for currently registered values. */ struct nfs_fhbase_new { - __u8 fb_version; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type; - __u8 fb_fsid_type; - __u8 fb_fileid_type; - __u32 fb_auth[1]; -/* __u32 fb_fsid[0]; floating */ -/* __u32 fb_fileid[0]; floating */ + union { + struct { + __u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + __u8 fb_auth_type_aux; + __u8 fb_fsid_type_aux; + __u8 fb_fileid_type_aux; + __u32 fb_auth[1]; + /* __u32 fb_fsid[0]; floating */ + /* __u32 fb_fileid[0]; floating */ + }; + struct { + __u8 fb_version; /* == 1, even => nfs_fhbase_old */ + __u8 fb_auth_type; + __u8 fb_fsid_type; + __u8 fb_fileid_type; + __u32 fb_auth_flex[]; /* flexible-array member */ + }; + }; };
struct knfsd_fh { @@ -97,7 +108,7 @@ struct knfsd_fh { #define fh_fsid_type fh_base.fh_new.fb_fsid_type #define fh_auth_type fh_base.fh_new.fb_auth_type #define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth +#define fh_fsid fh_base.fh_new.fb_auth_flex
/* Do not use, provided for userspace compatiblity. */ #define fh_auth fh_base.fh_new.fb_auth
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guobin Huang huangguobin4@huawei.com
[ Upstream commit b73ac6808b0f7994a05ebc38571e2e9eaf98a0f4 ]
spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init().
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Guobin Huang huangguobin4@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 0666ef4b87b7a..79bc75d415226 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -84,7 +84,7 @@ DEFINE_MUTEX(nfsd_mutex); * version 4.1 DRC caches. * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage. */ -spinlock_t nfsd_drc_lock; +DEFINE_SPINLOCK(nfsd_drc_lock); unsigned long nfsd_drc_max_mem; unsigned long nfsd_drc_mem_used;
@@ -563,7 +563,6 @@ static void set_max_drc(void) nfsd_drc_max_mem = (nr_free_buffer_pages() >> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE; nfsd_drc_mem_used = 0; - spin_lock_init(&nfsd_drc_lock); dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 6f73171e192366ff7c98af9fb50615ef9615f8a7 ]
Current code has an assumtion that fsnotify_notify_queue_is_empty() is called to verify that queue is not empty before trying to peek or remove an event from queue.
Remove this assumption by moving the fsnotify_notify_queue_is_empty() into the functions, allow them to return NULL value and check return value by all callers.
This is a prep patch for multi event queues.
Link: https://lore.kernel.org/r/20210304104826.3993892-2-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 26 +++++++++++------- fs/notify/inotify/inotify_user.c | 5 ++-- fs/notify/notification.c | 42 ++++++++++++++---------------- include/linux/fsnotify_backend.h | 8 +++++- 4 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 829ead2792dfb..30651e8d1a6d5 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -100,24 +100,30 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, { size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; + struct fsnotify_event *fsn_event; unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
spin_lock(&group->notification_lock); - if (fsnotify_notify_queue_is_empty(group)) + fsn_event = fsnotify_peek_first_event(group); + if (!fsn_event) goto out;
- if (fid_mode) { - event_size += fanotify_event_info_len(fid_mode, - FANOTIFY_E(fsnotify_peek_first_event(group))); - } + event = FANOTIFY_E(fsn_event); + if (fid_mode) + event_size += fanotify_event_info_len(fid_mode, event);
if (event_size > count) { event = ERR_PTR(-EINVAL); goto out; } - event = FANOTIFY_E(fsnotify_remove_first_event(group)); + + /* + * Held the notification_lock the whole time, so this is the + * same event we peeked above. + */ + fsnotify_remove_first_event(group); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; out: @@ -573,6 +579,7 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t static int fanotify_release(struct inode *ignored, struct file *file) { struct fsnotify_group *group = file->private_data; + struct fsnotify_event *fsn_event;
/* * Stop new events from arriving in the notification queue. since @@ -601,13 +608,12 @@ static int fanotify_release(struct inode *ignored, struct file *file) * dequeue them and set the response. They will be freed once the * response is consumed and fanotify_get_response() returns. */ - while (!fsnotify_notify_queue_is_empty(group)) { - struct fanotify_event *event; + while ((fsn_event = fsnotify_remove_first_event(group))) { + struct fanotify_event *event = FANOTIFY_E(fsn_event);
- event = FANOTIFY_E(fsnotify_remove_first_event(group)); if (!(event->mask & FANOTIFY_PERM_EVENTS)) { spin_unlock(&group->notification_lock); - fsnotify_destroy_event(group, &event->fse); + fsnotify_destroy_event(group, fsn_event); } else { finish_permission_event(group, FANOTIFY_PERM(event), FAN_ALLOW); diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 82fc0cf86a7c3..c2018983832e5 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -146,10 +146,9 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, size_t event_size = sizeof(struct inotify_event); struct fsnotify_event *event;
- if (fsnotify_notify_queue_is_empty(group)) - return NULL; - event = fsnotify_peek_first_event(group); + if (!event) + return NULL;
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 75d79d6d3ef09..001cfe7d2e4e7 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -47,13 +47,6 @@ u32 fsnotify_get_cookie(void) } EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
-/* return true if the notify queue is empty, false otherwise */ -bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) -{ - assert_spin_locked(&group->notification_lock); - return list_empty(&group->notification_list) ? true : false; -} - void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event) { @@ -141,33 +134,36 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group, }
/* - * Remove and return the first event from the notification list. It is the - * responsibility of the caller to destroy the obtained event + * Return the first event on the notification list without removing it. + * Returns NULL if the list is empty. */ -struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group) +struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group) { - struct fsnotify_event *event; - assert_spin_locked(&group->notification_lock);
- pr_debug("%s: group=%p\n", __func__, group); + if (fsnotify_notify_queue_is_empty(group)) + return NULL;
- event = list_first_entry(&group->notification_list, - struct fsnotify_event, list); - fsnotify_remove_queued_event(group, event); - return event; + return list_first_entry(&group->notification_list, + struct fsnotify_event, list); }
/* - * This will not remove the event, that must be done with - * fsnotify_remove_first_event() + * Remove and return the first event from the notification list. It is the + * responsibility of the caller to destroy the obtained event */ -struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group) +struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group) { - assert_spin_locked(&group->notification_lock); + struct fsnotify_event *event = fsnotify_peek_first_event(group);
- return list_first_entry(&group->notification_list, - struct fsnotify_event, list); + if (!event) + return NULL; + + pr_debug("%s: group=%p event=%p\n", __func__, group, event); + + fsnotify_remove_queued_event(group, event); + + return event; }
/* diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index e5409b83e7313..7eb979bfc1413 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -495,7 +495,13 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) fsnotify_add_event(group, group->overflow_event, NULL); }
-/* true if the group notification queue is empty */ +static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) +{ + assert_spin_locked(&group->notification_lock); + + return list_empty(&group->notification_list); +} + extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group); /* return, but do not dequeue the first event on the notification queue */ extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
Temporarily revert commit ad3ea16746cc ("fanotify: limit number of event merge attempts") to enable subsequent upstream commits to apply and build cleanly.
Stable-dep-of: 8988f11abb82 ("fanotify: reduce event objectid to 29-bit hash") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c3af99e94f1d1..1192c99536200 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -129,15 +129,11 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, return false; }
-/* Limit event merges to limit CPU overhead per event */ -#define FANOTIFY_MAX_MERGE_EVENTS 128 - /* and the list better be locked by something too! */ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; struct fanotify_event *new; - int i = 0;
pr_debug("%s: list=%p event=%p\n", __func__, list, event); new = FANOTIFY_E(event); @@ -151,8 +147,6 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0;
list_for_each_entry_reverse(test_event, list, list) { - if (++i > FANOTIFY_MAX_MERGE_EVENTS) - break; if (fanotify_should_merge(test_event, event)) { FANOTIFY_E(test_event)->mask |= new->mask; return 1;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 8988f11abb820bacfcc53d498370bfb30f792ec4 ]
objectid is only used by fanotify backend and it is just an optimization for event merge before comparing all fields in event.
Move the objectid member from common struct fsnotify_event into struct fanotify_event and reduce it to 29-bit hash to cram it together with the 3-bit event type.
Events of different types are never merged, so the combination of event type and hash form a 32-bit key for fast compare of events.
This reduces the size of events by one pointer and paves the way for adding hashed queue support for fanotify.
Link: https://lore.kernel.org/r/20210304104826.3993892-3-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 25 ++++++++++++------------- fs/notify/fanotify/fanotify.h | 16 +++++++++++++--- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/inotify/inotify_user.c | 2 +- include/linux/fsnotify_backend.h | 5 +---- 5 files changed, 28 insertions(+), 22 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1192c99536200..8a2bb6954e02c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -88,16 +88,12 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, return fanotify_info_equal(info1, info2); }
-static bool fanotify_should_merge(struct fsnotify_event *old_fsn, - struct fsnotify_event *new_fsn) +static bool fanotify_should_merge(struct fanotify_event *old, + struct fanotify_event *new) { - struct fanotify_event *old, *new; + pr_debug("%s: old=%p new=%p\n", __func__, old, new);
- pr_debug("%s: old=%p new=%p\n", __func__, old_fsn, new_fsn); - old = FANOTIFY_E(old_fsn); - new = FANOTIFY_E(new_fsn); - - if (old_fsn->objectid != new_fsn->objectid || + if (old->hash != new->hash || old->type != new->type || old->pid != new->pid) return false;
@@ -133,10 +129,9 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; - struct fanotify_event *new; + struct fanotify_event *old, *new = FANOTIFY_E(event);
pr_debug("%s: list=%p event=%p\n", __func__, list, event); - new = FANOTIFY_E(event);
/* * Don't merge a permission event with any other event so that we know @@ -147,8 +142,9 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0;
list_for_each_entry_reverse(test_event, list, list) { - if (fanotify_should_merge(test_event, event)) { - FANOTIFY_E(test_event)->mask |= new->mask; + old = FANOTIFY_E(test_event); + if (fanotify_should_merge(old, new)) { + old->mask |= new->mask; return 1; } } @@ -533,6 +529,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct mem_cgroup *old_memcg; struct inode *child = NULL; bool name_event = false; + unsigned int hash = 0;
if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* @@ -600,8 +597,10 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, * Use the victim inode instead of the watching inode as the id for * event queue, so event reported on parent is merged with event * reported on child when both directory and child watches exist. + * Hash object id for queue merge. */ - fanotify_init_event(event, (unsigned long)id, mask); + hash = hash_ptr(id, FANOTIFY_EVENT_HASH_BITS); + fanotify_init_event(event, hash, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) event->pid = get_pid(task_pid(current)); else diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 896c819a17863..d531f0cfa46f2 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -135,19 +135,29 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ + __FANOTIFY_EVENT_TYPE_NUM };
+#define FANOTIFY_EVENT_TYPE_BITS \ + (ilog2(__FANOTIFY_EVENT_TYPE_NUM - 1) + 1) +#define FANOTIFY_EVENT_HASH_BITS \ + (32 - FANOTIFY_EVENT_TYPE_BITS) + struct fanotify_event { struct fsnotify_event fse; u32 mask; - enum fanotify_event_type type; + struct { + unsigned int type : FANOTIFY_EVENT_TYPE_BITS; + unsigned int hash : FANOTIFY_EVENT_HASH_BITS; + }; struct pid *pid; };
static inline void fanotify_init_event(struct fanotify_event *event, - unsigned long id, u32 mask) + unsigned int hash, u32 mask) { - fsnotify_init_event(&event->fse, id); + fsnotify_init_event(&event->fse); + event->hash = hash; event->mask = mask; event->pid = NULL; } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 66991c7fef9e2..e2b124c0081dc 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -114,7 +114,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, mask &= ~IN_ISDIR;
fsn_event = &event->fse; - fsnotify_init_event(fsn_event, 0); + fsnotify_init_event(fsn_event); event->mask = mask; event->wd = wd; event->sync_cookie = cookie; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index c2018983832e5..62cd91bc00b83 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -641,7 +641,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) return ERR_PTR(-ENOMEM); } group->overflow_event = &oevent->fse; - fsnotify_init_event(group->overflow_event, 0); + fsnotify_init_event(group->overflow_event); oevent->mask = FS_Q_OVERFLOW; oevent->wd = -1; oevent->sync_cookie = 0; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 7eb979bfc1413..fc98f9f88d126 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -167,7 +167,6 @@ struct fsnotify_ops { */ struct fsnotify_event { struct list_head list; - unsigned long objectid; /* identifier for queue merges */ };
/* @@ -582,11 +581,9 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info); extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
-static inline void fsnotify_init_event(struct fsnotify_event *event, - unsigned long objectid) +static inline void fsnotify_init_event(struct fsnotify_event *event) { INIT_LIST_HEAD(&event->list); - event->objectid = objectid; }
#else
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 7e3e5c6943994943eb76cab2d3a1806bc10b9045 ]
Improve the merge key hash by mixing more values relevant for merge.
For example, all FAN_CREATE name events in the same dir used to have the same merge key based on the dir inode. With this change the created file name is mixed into the merge key.
The object id that was used as merge key is redundant to the event info so it is no longer mixed into the hash.
Permission events are not hashed, so no need to hash their info.
Link: https://lore.kernel.org/r/20210304104826.3993892-4-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 87 ++++++++++++++++++++++++----------- fs/notify/fanotify/fanotify.h | 5 ++ 2 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 8a2bb6954e02c..43a606f153702 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -14,6 +14,7 @@ #include <linux/audit.h> #include <linux/sched/mm.h> #include <linux/statfs.h> +#include <linux/stringhash.h>
#include "fanotify.h"
@@ -22,12 +23,24 @@ static bool fanotify_path_equal(struct path *p1, struct path *p2) return p1->mnt == p2->mnt && p1->dentry == p2->dentry; }
+static unsigned int fanotify_hash_path(const struct path *path) +{ + return hash_ptr(path->dentry, FANOTIFY_EVENT_HASH_BITS) ^ + hash_ptr(path->mnt, FANOTIFY_EVENT_HASH_BITS); +} + static inline bool fanotify_fsid_equal(__kernel_fsid_t *fsid1, __kernel_fsid_t *fsid2) { return fsid1->val[0] == fsid2->val[0] && fsid1->val[1] == fsid2->val[1]; }
+static unsigned int fanotify_hash_fsid(__kernel_fsid_t *fsid) +{ + return hash_32(fsid->val[0], FANOTIFY_EVENT_HASH_BITS) ^ + hash_32(fsid->val[1], FANOTIFY_EVENT_HASH_BITS); +} + static bool fanotify_fh_equal(struct fanotify_fh *fh1, struct fanotify_fh *fh2) { @@ -38,6 +51,16 @@ static bool fanotify_fh_equal(struct fanotify_fh *fh1, !memcmp(fanotify_fh_buf(fh1), fanotify_fh_buf(fh2), fh1->len); }
+static unsigned int fanotify_hash_fh(struct fanotify_fh *fh) +{ + long salt = (long)fh->type | (long)fh->len << 8; + + /* + * full_name_hash() works long by long, so it handles fh buf optimally. + */ + return full_name_hash((void *)salt, fanotify_fh_buf(fh), fh->len); +} + static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1, struct fanotify_fid_event *ffe2) { @@ -325,7 +348,8 @@ static int fanotify_encode_fh_len(struct inode *inode) * Return 0 on failure to encode. */ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, - unsigned int fh_len, gfp_t gfp) + unsigned int fh_len, unsigned int *hash, + gfp_t gfp) { int dwords, type = 0; char *ext_buf = NULL; @@ -368,6 +392,9 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len;
+ /* Mix fh into event merge key */ + *hash ^= fanotify_hash_fh(fh); + return FANOTIFY_FH_HDR_LEN + fh_len;
out_err: @@ -421,6 +448,7 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, }
static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, + unsigned int *hash, gfp_t gfp) { struct fanotify_path_event *pevent; @@ -431,6 +459,7 @@ static struct fanotify_event *fanotify_alloc_path_event(const struct path *path,
pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH; pevent->path = *path; + *hash ^= fanotify_hash_path(path); path_get(path);
return &pevent->fae; @@ -456,6 +485,7 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, __kernel_fsid_t *fsid, + unsigned int *hash, gfp_t gfp) { struct fanotify_fid_event *ffe; @@ -466,16 +496,18 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id,
ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; ffe->fsid = *fsid; + *hash ^= fanotify_hash_fsid(fsid); fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), - gfp); + hash, gfp);
return &ffe->fae; }
static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, __kernel_fsid_t *fsid, - const struct qstr *file_name, + const struct qstr *name, struct inode *child, + unsigned int *hash, gfp_t gfp) { struct fanotify_name_event *fne; @@ -488,24 +520,30 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; - if (file_name) - size += file_name->len + 1; + if (name) + size += name->len + 1; fne = kmalloc(size, gfp); if (!fne) return NULL;
fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME; fne->fsid = *fsid; + *hash ^= fanotify_hash_fsid(fsid); info = &fne->info; fanotify_info_init(info); dfh = fanotify_info_dir_fh(info); - info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, 0); + info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, hash, 0); if (child_fh_len) { ffh = fanotify_info_file_fh(info); - info->file_fh_totlen = fanotify_encode_fh(ffh, child, child_fh_len, 0); + info->file_fh_totlen = fanotify_encode_fh(ffh, child, + child_fh_len, hash, 0); + } + if (name) { + long salt = name->len; + + fanotify_info_copy_name(info, name); + *hash ^= full_name_hash((void *)salt, name->name, name->len); } - if (file_name) - fanotify_info_copy_name(info, file_name);
pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", __func__, id->i_ino, size, dir_fh_len, child_fh_len, @@ -530,6 +568,8 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *child = NULL; bool name_event = false; unsigned int hash = 0; + bool ondir = mask & FAN_ONDIR; + struct pid *pid;
if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* @@ -537,8 +577,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, * report the child fid for events reported on a non-dir child * in addition to reporting the parent fid and maybe child name. */ - if ((fid_mode & FAN_REPORT_FID) && - id != dirid && !(mask & FAN_ONDIR)) + if ((fid_mode & FAN_REPORT_FID) && id != dirid && !ondir) child = id;
id = dirid; @@ -559,8 +598,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (!(fid_mode & FAN_REPORT_NAME)) { name_event = !!child; file_name = NULL; - } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || - !(mask & FAN_ONDIR)) { + } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || !ondir) { name_event = true; } } @@ -583,28 +621,25 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event = fanotify_alloc_perm_event(path, gfp); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, - gfp); + &hash, gfp); } else if (fid_mode) { - event = fanotify_alloc_fid_event(id, fsid, gfp); + event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); } else { - event = fanotify_alloc_path_event(path, gfp); + event = fanotify_alloc_path_event(path, &hash, gfp); }
if (!event) goto out;
- /* - * Use the victim inode instead of the watching inode as the id for - * event queue, so event reported on parent is merged with event - * reported on child when both directory and child watches exist. - * Hash object id for queue merge. - */ - hash = hash_ptr(id, FANOTIFY_EVENT_HASH_BITS); - fanotify_init_event(event, hash, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) - event->pid = get_pid(task_pid(current)); + pid = get_pid(task_pid(current)); else - event->pid = get_pid(task_tgid(current)); + pid = get_pid(task_tgid(current)); + + /* Mix event info, FAN_ONDIR flag and pid into event merge key */ + hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS); + fanotify_init_event(event, hash, mask); + event->pid = pid;
out: set_active_memcg(old_memcg); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index d531f0cfa46f2..9871f76cd9c2c 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -115,6 +115,11 @@ static inline void fanotify_info_init(struct fanotify_info *info) info->name_len = 0; }
+static inline unsigned int fanotify_info_len(struct fanotify_info *info) +{ + return info->dir_fh_totlen + info->file_fh_totlen + info->name_len; +} + static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 94e00d28a680dff18805ca472b191364347d2234 ]
In order to improve event merge performance, hash events in a 128 size hash table by the event merge key.
The fanotify_event size grows by two pointers, but we just reduced its size by removing the objectid member, so overall its size is increased by one pointer.
Permission events and overflow event are not merged so they are also not hashed.
Link: https://lore.kernel.org/r/20210304104826.3993892-5-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 40 +++++++++++++++++++++++----- fs/notify/fanotify/fanotify.h | 25 +++++++++++++++++ fs/notify/fanotify/fanotify_user.c | 39 +++++++++++++++++++++++++++ fs/notify/inotify/inotify_fsnotify.c | 7 ++--- fs/notify/notification.c | 22 ++++++++++----- include/linux/fsnotify_backend.h | 10 ++++--- 6 files changed, 123 insertions(+), 20 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 43a606f153702..50b3abc062156 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -149,12 +149,15 @@ static bool fanotify_should_merge(struct fanotify_event *old, }
/* and the list better be locked by something too! */ -static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) +static int fanotify_merge(struct fsnotify_group *group, + struct fsnotify_event *event) { - struct fsnotify_event *test_event; struct fanotify_event *old, *new = FANOTIFY_E(event); + unsigned int bucket = fanotify_event_hash_bucket(group, new); + struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket];
- pr_debug("%s: list=%p event=%p\n", __func__, list, event); + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, bucket);
/* * Don't merge a permission event with any other event so that we know @@ -164,8 +167,7 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) if (fanotify_is_perm_event(new->mask)) return 0;
- list_for_each_entry_reverse(test_event, list, list) { - old = FANOTIFY_E(test_event); + hlist_for_each_entry(old, hlist, merge_list) { if (fanotify_should_merge(old, new)) { old->mask |= new->mask; return 1; @@ -203,8 +205,11 @@ static int fanotify_get_response(struct fsnotify_group *group, return ret; } /* Event not yet reported? Just remove it. */ - if (event->state == FAN_EVENT_INIT) + if (event->state == FAN_EVENT_INIT) { fsnotify_remove_queued_event(group, &event->fae.fse); + /* Permission events are not supposed to be hashed */ + WARN_ON_ONCE(!hlist_unhashed(&event->fae.merge_list)); + } /* * Event may be also answered in case signal delivery raced * with wakeup. In that case we have nothing to do besides @@ -679,6 +684,24 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) return fsid; }
+/* + * Add an event to hash table for faster merge. + */ +static void fanotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) +{ + struct fanotify_event *event = FANOTIFY_E(fsn_event); + unsigned int bucket = fanotify_event_hash_bucket(group, event); + struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; + + assert_spin_locked(&group->notification_lock); + + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, bucket); + + hlist_add_head(&event->merge_list, hlist); +} + static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, @@ -749,7 +772,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, }
fsn_event = &event->fse; - ret = fsnotify_add_event(group, fsn_event, fanotify_merge); + ret = fsnotify_add_event(group, fsn_event, fanotify_merge, + fanotify_is_hashed_event(mask) ? + fanotify_insert_event : NULL); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); @@ -772,6 +797,7 @@ static void fanotify_free_group_priv(struct fsnotify_group *group) { struct user_struct *user;
+ kfree(group->fanotify_data.merge_hash); user = group->fanotify_data.user; atomic_dec(&user->fanotify_listeners); free_uid(user); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 9871f76cd9c2c..4a5e555dc3d25 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -3,6 +3,7 @@ #include <linux/path.h> #include <linux/slab.h> #include <linux/exportfs.h> +#include <linux/hashtable.h>
extern struct kmem_cache *fanotify_mark_cache; extern struct kmem_cache *fanotify_fid_event_cachep; @@ -150,6 +151,7 @@ enum fanotify_event_type {
struct fanotify_event { struct fsnotify_event fse; + struct hlist_node merge_list; /* List for hashed merge */ u32 mask; struct { unsigned int type : FANOTIFY_EVENT_TYPE_BITS; @@ -162,6 +164,7 @@ static inline void fanotify_init_event(struct fanotify_event *event, unsigned int hash, u32 mask) { fsnotify_init_event(&event->fse); + INIT_HLIST_NODE(&event->merge_list); event->hash = hash; event->mask = mask; event->pid = NULL; @@ -299,3 +302,25 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) else return NULL; } + +/* + * Use 128 size hash table to speed up events merge. + */ +#define FANOTIFY_HTABLE_BITS (7) +#define FANOTIFY_HTABLE_SIZE (1 << FANOTIFY_HTABLE_BITS) +#define FANOTIFY_HTABLE_MASK (FANOTIFY_HTABLE_SIZE - 1) + +/* + * Permission events and overflow event do not get merged - don't hash them. + */ +static inline bool fanotify_is_hashed_event(u32 mask) +{ + return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW); +} + +static inline unsigned int fanotify_event_hash_bucket( + struct fsnotify_group *group, + struct fanotify_event *event) +{ + return event->hash & FANOTIFY_HTABLE_MASK; +} diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 30651e8d1a6d5..1456470d0c4c6 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -89,6 +89,23 @@ static int fanotify_event_info_len(unsigned int fid_mode, return info_len; }
+/* + * Remove an hashed event from merge hash table. + */ +static void fanotify_unhash_event(struct fsnotify_group *group, + struct fanotify_event *event) +{ + assert_spin_locked(&group->notification_lock); + + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, fanotify_event_hash_bucket(group, event)); + + if (WARN_ON_ONCE(hlist_unhashed(&event->merge_list))) + return; + + hlist_del_init(&event->merge_list); +} + /* * Get an fanotify notification event if one exists and is small * enough to fit in "count". Return an error pointer if the count @@ -126,6 +143,8 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, fsnotify_remove_first_event(group); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; + if (fanotify_is_hashed_event(event->mask)) + fanotify_unhash_event(group, event); out: spin_unlock(&group->notification_lock); return event; @@ -925,6 +944,20 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void) return &oevent->fse; }
+static struct hlist_head *fanotify_alloc_merge_hash(void) +{ + struct hlist_head *hash; + + hash = kmalloc(sizeof(struct hlist_head) << FANOTIFY_HTABLE_BITS, + GFP_KERNEL_ACCOUNT); + if (!hash) + return NULL; + + __hash_init(hash, FANOTIFY_HTABLE_SIZE); + + return hash; +} + /* fanotify syscalls */ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { @@ -993,6 +1026,12 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm);
+ group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); + if (!group->fanotify_data.merge_hash) { + fd = -ENOMEM; + goto out_destroy_group; + } + group->overflow_event = fanotify_alloc_overflow_event(); if (unlikely(!group->overflow_event)) { fd = -ENOMEM; diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index e2b124c0081dc..b0530f75b274a 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -46,9 +46,10 @@ static bool event_compare(struct fsnotify_event *old_fsn, return false; }
-static int inotify_merge(struct list_head *list, - struct fsnotify_event *event) +static int inotify_merge(struct fsnotify_group *group, + struct fsnotify_event *event) { + struct list_head *list = &group->notification_list; struct fsnotify_event *last_event;
last_event = list_entry(list->prev, struct fsnotify_event, list); @@ -122,7 +123,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, if (len) strcpy(event->name, name->name);
- ret = fsnotify_add_event(group, fsn_event, inotify_merge); + ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL); if (ret) { /* Our event wasn't used in the end. Free it. */ fsnotify_destroy_event(group, fsn_event); diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 001cfe7d2e4e7..32f45543b9c64 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -68,16 +68,22 @@ void fsnotify_destroy_event(struct fsnotify_group *group, }
/* - * Add an event to the group notification queue. The group can later pull this - * event off the queue to deal with. The function returns 0 if the event was - * added to the queue, 1 if the event was merged with some other queued event, + * Try to add an event to the notification queue. + * The group can later pull this event off the queue to deal with. + * The group can use the @merge hook to merge the event with a queued event. + * The group can use the @insert hook to insert the event into hash table. + * The function returns: + * 0 if the event was added to a queue + * 1 if the event was merged with some other queued event * 2 if the event was not queued - either the queue of events has overflown - * or the group is shutting down. + * or the group is shutting down. */ int fsnotify_add_event(struct fsnotify_group *group, struct fsnotify_event *event, - int (*merge)(struct list_head *, - struct fsnotify_event *)) + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)) { int ret = 0; struct list_head *list = &group->notification_list; @@ -104,7 +110,7 @@ int fsnotify_add_event(struct fsnotify_group *group, }
if (!list_empty(list) && merge) { - ret = merge(list, event); + ret = merge(group, event); if (ret) { spin_unlock(&group->notification_lock); return ret; @@ -114,6 +120,8 @@ int fsnotify_add_event(struct fsnotify_group *group, queue: group->q_len++; list_add_tail(&event->list, list); + if (insert) + insert(group, event); spin_unlock(&group->notification_lock);
wake_up(&group->notification_waitq); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index fc98f9f88d126..63fb766f0f3ec 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -233,6 +233,8 @@ struct fsnotify_group { #endif #ifdef CONFIG_FANOTIFY struct fanotify_group_private_data { + /* Hash table of events for merge */ + struct hlist_head *merge_hash; /* allows a group to block waiting for a userspace response */ struct list_head access_list; wait_queue_head_t access_waitq; @@ -486,12 +488,14 @@ extern void fsnotify_destroy_event(struct fsnotify_group *group, /* attach the event to the group notification queue */ extern int fsnotify_add_event(struct fsnotify_group *group, struct fsnotify_event *event, - int (*merge)(struct list_head *, - struct fsnotify_event *)); + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)); /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { - fsnotify_add_event(group, group->overflow_event, NULL); + fsnotify_add_event(group, group->overflow_event, NULL, NULL); }
static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit b8cd0ee8cda68a888a317991c1e918a8cba1a568 ]
Event merges are expensive when event queue size is large, so limit the linear search to 128 merge tests.
In combination with 128 size hash table, there is a potential to merge with up to 16K events in the hashed queue.
Link: https://lore.kernel.org/r/20210304104826.3993892-6-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 50b3abc062156..754e27ead8742 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -148,6 +148,9 @@ static bool fanotify_should_merge(struct fanotify_event *old, return false; }
+/* Limit event merges to limit CPU overhead per event */ +#define FANOTIFY_MAX_MERGE_EVENTS 128 + /* and the list better be locked by something too! */ static int fanotify_merge(struct fsnotify_group *group, struct fsnotify_event *event) @@ -155,6 +158,7 @@ static int fanotify_merge(struct fsnotify_group *group, struct fanotify_event *old, *new = FANOTIFY_E(event); unsigned int bucket = fanotify_event_hash_bucket(group, new); struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; + int i = 0;
pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, group, event, bucket); @@ -168,6 +172,8 @@ static int fanotify_merge(struct fsnotify_group *group, return 0;
hlist_for_each_entry(old, hlist, merge_list) { + if (++i > FANOTIFY_MAX_MERGE_EVENTS) + break; if (fanotify_should_merge(old, new)) { old->mask |= new->mask; return 1;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 5b8fea65d197f408bb00b251c70d842826d6b70b ]
fanotify has some hardcoded limits. The only APIs to escape those limits are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.
Allow finer grained tuning of the system limits via sysfs tunables under /proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify, with some minor differences.
- max_queued_events - global system tunable for group queue size limit. Like the inotify tunable with the same name, it defaults to 16384 and applies on initialization of a new group.
- max_user_marks - user ns tunable for marks limit per user. Like the inotify tunable named max_user_watches, on a machine with sufficient RAM and it defaults to 1048576 in init userns and can be further limited per containing user ns.
- max_user_groups - user ns tunable for number of groups per user. Like the inotify tunable named max_user_instances, it defaults to 128 in init userns and can be further limited per containing user ns.
The slightly different tunable names used for fanotify are derived from the "group" and "mark" terminology used in the fanotify man pages and throughout the code.
Considering the fact that the default value for max_user_instances was increased in kernel v5.10 from 8192 to 1048576, leaving the legacy fanotify limit of 8192 marks per group in addition to the max_user_marks limit makes little sense, so the per group marks limit has been removed.
Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own marks are not accounted in the per user marks account, so in effect the limit of max_user_marks is only for the collection of groups that are not initialized with FAN_UNLIMITED_MARKS.
Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 16 ++-- fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++----- fs/notify/group.c | 1 - fs/notify/mark.c | 4 - include/linux/fanotify.h | 3 + include/linux/fsnotify_backend.h | 6 +- include/linux/sched/user.h | 3 - include/linux/user_namespace.h | 4 + kernel/sysctl.c | 12 ++- kernel/ucount.c | 4 + 10 files changed, 137 insertions(+), 39 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 754e27ead8742..057abd2cf8875 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -801,12 +801,10 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
static void fanotify_free_group_priv(struct fsnotify_group *group) { - struct user_struct *user; - kfree(group->fanotify_data.merge_hash); - user = group->fanotify_data.user; - atomic_dec(&user->fanotify_listeners); - free_uid(user); + if (group->fanotify_data.ucounts) + dec_ucount(group->fanotify_data.ucounts, + UCOUNT_FANOTIFY_GROUPS); }
static void fanotify_free_path_event(struct fanotify_event *event) @@ -862,6 +860,13 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event) } }
+static void fanotify_freeing_mark(struct fsnotify_mark *mark, + struct fsnotify_group *group) +{ + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) + dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS); +} + static void fanotify_free_mark(struct fsnotify_mark *fsn_mark) { kmem_cache_free(fanotify_mark_cache, fsn_mark); @@ -871,5 +876,6 @@ const struct fsnotify_ops fanotify_fsnotify_ops = { .handle_event = fanotify_handle_event, .free_group_priv = fanotify_free_group_priv, .free_event = fanotify_free_event, + .freeing_mark = fanotify_freeing_mark, .free_mark = fanotify_free_mark, }; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 1456470d0c4c6..74b4da6354e1c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -27,8 +27,61 @@ #include "fanotify.h"
#define FANOTIFY_DEFAULT_MAX_EVENTS 16384 -#define FANOTIFY_DEFAULT_MAX_MARKS 8192 -#define FANOTIFY_DEFAULT_MAX_LISTENERS 128 +#define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192 +#define FANOTIFY_DEFAULT_MAX_GROUPS 128 + +/* + * Legacy fanotify marks limits (8192) is per group and we introduced a tunable + * limit of marks per user, similar to inotify. Effectively, the legacy limit + * of fanotify marks per user is <max marks per group> * <max groups per user>. + * This default limit (1M) also happens to match the increased limit of inotify + * max_user_watches since v5.10. + */ +#define FANOTIFY_DEFAULT_MAX_USER_MARKS \ + (FANOTIFY_OLD_DEFAULT_MAX_MARKS * FANOTIFY_DEFAULT_MAX_GROUPS) + +/* + * Most of the memory cost of adding an inode mark is pinning the marked inode. + * The size of the filesystem inode struct is not uniform across filesystems, + * so double the size of a VFS inode is used as a conservative approximation. + */ +#define INODE_MARK_COST (2 * sizeof(struct inode)) + +/* configurable via /proc/sys/fs/fanotify/ */ +static int fanotify_max_queued_events __read_mostly; + +#ifdef CONFIG_SYSCTL + +#include <linux/sysctl.h> + +struct ctl_table fanotify_table[] = { + { + .procname = "max_user_groups", + .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS], + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + }, + { + .procname = "max_user_marks", + .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS], + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + }, + { + .procname = "max_queued_events", + .data = &fanotify_max_queued_events, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO + }, + { } +}; +#endif /* CONFIG_SYSCTL */
/* * All flags that may be specified in parameter event_f_flags of fanotify_init. @@ -847,24 +900,38 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, unsigned int type, __kernel_fsid_t *fsid) { + struct ucounts *ucounts = group->fanotify_data.ucounts; struct fsnotify_mark *mark; int ret;
- if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks) + /* + * Enforce per user marks limits per user in all containing user ns. + * A group with FAN_UNLIMITED_MARKS does not contribute to mark count + * in the limited groups account. + */ + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS) && + !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS)) return ERR_PTR(-ENOSPC);
mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL); - if (!mark) - return ERR_PTR(-ENOMEM); + if (!mark) { + ret = -ENOMEM; + goto out_dec_ucounts; + }
fsnotify_init_mark(mark, group); ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid); if (ret) { fsnotify_put_mark(mark); - return ERR_PTR(ret); + goto out_dec_ucounts; }
return mark; + +out_dec_ucounts: + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) + dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS); + return ERR_PTR(ret); }
@@ -963,7 +1030,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { struct fsnotify_group *group; int f_flags, fd; - struct user_struct *user; unsigned int fid_mode = flags & FANOTIFY_FID_BITS; unsigned int class = flags & FANOTIFY_CLASS_BITS;
@@ -1002,12 +1068,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL;
- user = get_current_user(); - if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { - free_uid(user); - return -EMFILE; - } - f_flags = O_RDWR | FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; @@ -1017,13 +1077,19 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); if (IS_ERR(group)) { - free_uid(user); return PTR_ERR(group); }
- group->fanotify_data.user = user; + /* Enforce groups limits per user in all containing user ns */ + group->fanotify_data.ucounts = inc_ucount(current_user_ns(), + current_euid(), + UCOUNT_FANOTIFY_GROUPS); + if (!group->fanotify_data.ucounts) { + fd = -EMFILE; + goto out_destroy_group; + } + group->fanotify_data.flags = flags; - atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm);
group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); @@ -1064,16 +1130,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) goto out_destroy_group; group->max_events = UINT_MAX; } else { - group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS; + group->max_events = fanotify_max_queued_events; }
if (flags & FAN_UNLIMITED_MARKS) { fd = -EPERM; if (!capable(CAP_SYS_ADMIN)) goto out_destroy_group; - group->fanotify_data.max_marks = UINT_MAX; - } else { - group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS; }
if (flags & FAN_ENABLE_AUDIT) { @@ -1375,6 +1438,21 @@ SYSCALL32_DEFINE6(fanotify_mark, */ static int __init fanotify_user_setup(void) { + struct sysinfo si; + int max_marks; + + si_meminfo(&si); + /* + * Allow up to 1% of addressable memory to be accounted for per user + * marks limited to the range [8192, 1048576]. mount and sb marks are + * a lot cheaper than inode marks, but there is no reason for a user + * to have many of those, so calculate by the cost of inode marks. + */ + max_marks = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / + INODE_MARK_COST; + max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS, + FANOTIFY_DEFAULT_MAX_USER_MARKS); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
@@ -1389,6 +1467,11 @@ static int __init fanotify_user_setup(void) KMEM_CACHE(fanotify_perm_event, SLAB_PANIC); }
+ fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS; + init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] = + FANOTIFY_DEFAULT_MAX_GROUPS; + init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks; + return 0; } device_initcall(fanotify_user_setup); diff --git a/fs/notify/group.c b/fs/notify/group.c index ffd723ffe46de..fb89c351295d6 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -122,7 +122,6 @@ static struct fsnotify_group *__fsnotify_alloc_group(
/* set to 0 when there a no external references to this group */ refcount_set(&group->refcnt, 1); - atomic_set(&group->num_marks, 0); atomic_set(&group->user_waits, 0);
spin_lock_init(&group->notification_lock); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 5b44be5f93dd8..7af98a7c33c27 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -391,8 +391,6 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark) list_del_init(&mark->g_list); spin_unlock(&mark->lock);
- atomic_dec(&group->num_marks); - /* Drop mark reference acquired in fsnotify_add_mark_locked() */ fsnotify_put_mark(mark); } @@ -656,7 +654,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED;
list_add(&mark->g_list, &group->marks_list); - atomic_inc(&group->num_marks); fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock);
@@ -674,7 +671,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, FSNOTIFY_MARK_FLAG_ATTACHED); list_del_init(&mark->g_list); spin_unlock(&mark->lock); - atomic_dec(&group->num_marks);
fsnotify_put_mark(mark); return ret; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 3e9c56ee651f7..031a97d8369ae 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -2,8 +2,11 @@ #ifndef _LINUX_FANOTIFY_H #define _LINUX_FANOTIFY_H
+#include <linux/sysctl.h> #include <uapi/linux/fanotify.h>
+extern struct ctl_table fanotify_table[]; /* for sysctl */ + #define FAN_GROUP_FLAG(group, flag) \ ((group)->fanotify_data.flags & (flag))
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 63fb766f0f3ec..1ce66748a2d29 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -206,9 +206,6 @@ struct fsnotify_group {
/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ - atomic_t num_marks; /* 1 for each mark and 1 for not being - * past the point of no return when freeing - * a group */ atomic_t user_waits; /* Number of tasks waiting for user * response */ struct list_head marks_list; /* all inode marks for this group */ @@ -240,8 +237,7 @@ struct fsnotify_group { wait_queue_head_t access_waitq; int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ - unsigned int max_marks; - struct user_struct *user; + struct ucounts *ucounts; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index a8ec3b6093fcb..3632c5d6ec559 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -14,9 +14,6 @@ struct user_struct { refcount_t __count; /* reference count */ atomic_t processes; /* How many processes does this user have? */ atomic_t sigpending; /* How many pending signals does this user have? */ -#ifdef CONFIG_FANOTIFY - atomic_t fanotify_listeners; -#endif #ifdef CONFIG_EPOLL atomic_long_t epoll_watches; /* The number of file descriptors currently watched */ #endif diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 7616c7bf4b241..0793951867aef 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -49,6 +49,10 @@ enum ucount_type { #ifdef CONFIG_INOTIFY_USER UCOUNT_INOTIFY_INSTANCES, UCOUNT_INOTIFY_WATCHES, +#endif +#ifdef CONFIG_FANOTIFY + UCOUNT_FANOTIFY_GROUPS, + UCOUNT_FANOTIFY_MARKS, #endif UCOUNT_COUNTS, }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 99a19190196e0..b29a9568ebbe4 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -142,6 +142,9 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ); #ifdef CONFIG_INOTIFY_USER #include <linux/inotify.h> #endif +#ifdef CONFIG_FANOTIFY +#include <linux/fanotify.h> +#endif
#ifdef CONFIG_PROC_SYSCTL
@@ -3330,7 +3333,14 @@ static struct ctl_table fs_table[] = { .mode = 0555, .child = inotify_table, }, -#endif +#endif +#ifdef CONFIG_FANOTIFY + { + .procname = "fanotify", + .mode = 0555, + .child = fanotify_table, + }, +#endif #ifdef CONFIG_EPOLL { .procname = "epoll", diff --git a/kernel/ucount.c b/kernel/ucount.c index 11b1596e2542a..8d8874f1c35e2 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -73,6 +73,10 @@ static struct ctl_table user_table[] = { #ifdef CONFIG_INOTIFY_USER UCOUNT_ENTRY("max_inotify_instances"), UCOUNT_ENTRY("max_inotify_watches"), +#endif +#ifdef CONFIG_FANOTIFY + UCOUNT_ENTRY("max_fanotify_groups"), + UCOUNT_ENTRY("max_fanotify_marks"), #endif { } };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 7cea2a3c505e87a9d6afc78be4a7f7be636a73a7 ]
Add limited support for unprivileged fanotify groups. An unprivileged users is not allowed to get an open file descriptor in the event nor the process pid of another process. An unprivileged user cannot request permission events, cannot set mount/filesystem marks and cannot request unlimited queue/marks.
This enables the limited functionality similar to inotify when watching a set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without requiring SYS_CAP_ADMIN privileges.
The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged listener watching a set of directories (with FAN_EVENT_ON_CHILD) to monitor all changes inside those directories.
This typically requires that the listener keeps a map of watched directory fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at() before starting to watch for changes.
When getting an event, the reported fid of the parent should be resolved to dirfd and fstatsat(2) with dirfd and name should be used to query the state of the filesystem entry.
Link: https://lore.kernel.org/r/20210304112921.3996419-3-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 29 ++++++++++++++++++++++++-- fs/notify/fdinfo.c | 3 ++- include/linux/fanotify.h | 33 +++++++++++++++++++++++++----- 3 files changed, 57 insertions(+), 8 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 74b4da6354e1c..842cccb4f7499 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, metadata.reserved = 0; metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; metadata.pid = pid_vnr(event->pid); + /* + * For an unprivileged listener, event->pid can be used to identify the + * events generated by the listener process itself, without disclosing + * the pids of other processes. + */ + if (!capable(CAP_SYS_ADMIN) && + task_tgid(current) != event->pid) + metadata.pid = 0;
if (path && path->mnt && path->dentry) { fd = create_fd(group, path, &f); @@ -1036,8 +1044,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags);
- if (!capable(CAP_SYS_ADMIN)) - return -EPERM; + if (!capable(CAP_SYS_ADMIN)) { + /* + * An unprivileged user can setup an fanotify group with + * limited functionality - an unprivileged group is limited to + * notification events with file handles and it cannot use + * unlimited queue/marks. + */ + if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode) + return -EPERM; + }
#ifdef CONFIG_AUDITSYSCALL if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT)) @@ -1306,6 +1322,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out; group = f.file->private_data;
+ /* + * An unprivileged user is not allowed to watch a mount point nor + * a filesystem. + */ + ret = -EPERM; + if (!capable(CAP_SYS_ADMIN) && + mark_type != FAN_MARK_INODE) + goto fput_and_out; + /* * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF. These are not * allowed to set permissions events. diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 765b50aeadd28..85b112bd88511 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -137,7 +137,8 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f) struct fsnotify_group *group = f->private_data;
seq_printf(m, "fanotify flags:%x event-flags:%x\n", - group->fanotify_data.flags, group->fanotify_data.f_flags); + group->fanotify_data.flags, + group->fanotify_data.f_flags);
show_fdinfo(m, f, fanotify_fdinfo); } diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 031a97d8369ae..bad41bcb25dfb 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -18,15 +18,38 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * these constant, the programs may break if re-compiled with new uapi headers * and then run on an old kernel. */ -#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ + +/* Group classes where permission events are allowed */ +#define FANOTIFY_PERM_CLASSES (FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT)
+#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES) + #define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
-#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ - FAN_REPORT_TID | \ - FAN_CLOEXEC | FAN_NONBLOCK | \ - FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS) +/* + * fanotify_init() flags that require CAP_SYS_ADMIN. + * We do not allow unprivileged groups to request permission events. + * We do not allow unprivileged groups to get other process pid in events. + * We do not allow unprivileged groups to use unlimited resources. + */ +#define FANOTIFY_ADMIN_INIT_FLAGS (FANOTIFY_PERM_CLASSES | \ + FAN_REPORT_TID | \ + FAN_UNLIMITED_QUEUE | \ + FAN_UNLIMITED_MARKS) + +/* + * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN. + * FAN_CLASS_NOTIF is the only class we allow for unprivileged group. + * We do not allow unprivileged groups to get file descriptors in events, + * so one of the flags for reporting file handles is required. + */ +#define FANOTIFY_USER_INIT_FLAGS (FAN_CLASS_NOTIF | \ + FANOTIFY_FID_BITS | \ + FAN_CLOEXEC | FAN_NONBLOCK) + +#define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \ + FANOTIFY_USER_INIT_FLAGS)
#define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner christian.brauner@ubuntu.com
[ Upstream commit 22d483b99863202e3631ff66fa0f3c2302c0f96f ]
I don't see an obvious reason why the upper 32 bit check needs to be open-coded this way. Switch to upper_32_bits() which is more idiomatic and should conceptually be the same check.
Cc: Amir Goldstein amir73il@gmail.com Cc: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20210325083742.2334933-1-brauner@kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 842cccb4f7499..98289ace66fac 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1268,7 +1268,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __func__, fanotify_fd, flags, dfd, pathname, mask);
/* we only use the lower 32 bits as of right now. */ - if (mask & ((__u64)0xffffffff << 32)) + if (upper_32_bits(mask)) return -EINVAL;
if (flags & ~FANOTIFY_MARK_FLAGS)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiapeng Chong jiapeng.chong@linux.alibaba.com
[ Upstream commit 363f8dd5eecd6c67fe9840ef6065440f0ee7df3a ]
Fix the following clang warning:
fs/nfsd/nfs4state.c:6276:1: warning: unused function 'end_offset' [-Wunused-function].
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Jiapeng Chong jiapeng.chong@linux.alibaba.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 104d563636540..a42a505b3e417 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6336,15 +6336,6 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; }
-static inline u64 -end_offset(u64 start, u64 len) -{ - u64 end; - - end = start + len; - return end >= start ? end: NFS4_MAX_UINT64; -} - /* last octet in a range */ static inline u64 last_byte_offset(u64 start, u64 len)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
[ Upstream commit 70c5307564035c160078401f541c397d77b95415 ]
Since commit 501cb1849f86 ("nfsd: rip out the raparms cache") nrservs is not used in nfsd_startup_generic()
Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 79bc75d415226..731d89898903a 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -308,7 +308,7 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
static int nfsd_users = 0;
-static int nfsd_startup_generic(int nrservs) +static int nfsd_startup_generic(void) { int ret;
@@ -374,7 +374,7 @@ void nfsd_reset_boot_verifier(struct nfsd_net *nn) write_sequnlock(&nn->boot_lock); }
-static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cred) +static int nfsd_startup_net(struct net *net, const struct cred *cred) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); int ret; @@ -382,7 +382,7 @@ static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cre if (nn->nfsd_net_up) return 0;
- ret = nfsd_startup_generic(nrservs); + ret = nfsd_startup_generic(); if (ret) return ret; ret = nfsd_init_socks(net, cred); @@ -790,7 +790,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
nfsd_up_before = nn->nfsd_net_up;
- error = nfsd_startup_net(nrservs, net, cred); + error = nfsd_startup_net(net, cred); if (error) goto out_destroy; error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit f9b60e2209213fdfcc504ba25a404977c5d08b77 ]
The nfs4_file structure is per-filehandle, not per-inode, because the spec requires open and other state to be per filehandle.
But it will turn out to be convenient for nfs4_files associated with the same inode to be hashed to the same bucket, so let's hash on the inode instead of the filehandle.
Filehandle aliasing is rare, so that shouldn't have much performance impact.
(If you have a ton of exported filesystems, though, and all of them have a root with inode number 2, could that get you an overlong hash chain? Perhaps this (and the v4 open file cache) should be hashed on the inode pointer instead.)
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 27 ++++++++++++--------------- fs/nfsd/state.h | 1 - 2 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a42a505b3e417..8d2d6e90bfc5e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -554,14 +554,12 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) #define FILE_HASH_BITS 8 #define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
-static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh) +static unsigned int file_hashval(struct svc_fh *fh) { - return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0); -} + struct inode *inode = d_inode(fh->fh_dentry);
-static unsigned int file_hashval(struct knfsd_fh *fh) -{ - return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1); + /* XXX: why not (here & in file cache) use inode? */ + return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS); }
static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; @@ -4106,7 +4104,7 @@ static struct nfs4_file *nfsd4_alloc_file(void) }
/* OPEN Share state helper functions */ -static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval, +static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, struct nfs4_file *fp) { lockdep_assert_held(&state_lock); @@ -4116,7 +4114,7 @@ static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval, INIT_LIST_HEAD(&fp->fi_stateids); INIT_LIST_HEAD(&fp->fi_delegations); INIT_LIST_HEAD(&fp->fi_clnt_odstate); - fh_copy_shallow(&fp->fi_fhandle, fh); + fh_copy_shallow(&fp->fi_fhandle, &fh->fh_handle); fp->fi_deleg_file = NULL; fp->fi_had_conflict = false; fp->fi_share_deny = 0; @@ -4460,13 +4458,13 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
/* search file_hashtbl[] for file */ static struct nfs4_file * -find_file_locked(struct knfsd_fh *fh, unsigned int hashval) +find_file_locked(struct svc_fh *fh, unsigned int hashval) { struct nfs4_file *fp;
hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, fh)) { + if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { if (refcount_inc_not_zero(&fp->fi_ref)) return fp; } @@ -4474,8 +4472,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval) return NULL; }
-struct nfs4_file * -find_file(struct knfsd_fh *fh) +static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; unsigned int hashval = file_hashval(fh); @@ -4487,7 +4484,7 @@ find_file(struct knfsd_fh *fh) }
static struct nfs4_file * -find_or_add_file(struct nfs4_file *new, struct knfsd_fh *fh) +find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) { struct nfs4_file *fp; unsigned int hashval = file_hashval(fh); @@ -4519,7 +4516,7 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) struct nfs4_file *fp; __be32 ret = nfs_ok;
- fp = find_file(¤t_fh->fh_handle); + fp = find_file(current_fh); if (!fp) return ret; /* Check for conflicting share reservations */ @@ -5208,7 +5205,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * and check for delegations in the process of being recalled. * If not found, create the nfs4_file struct */ - fp = find_or_add_file(open->op_file, ¤t_fh->fh_handle); + fp = find_or_add_file(open->op_file, current_fh); if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 54cab651ac1d0..61a2d95d79233 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -669,7 +669,6 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name struct xdr_netobj princhash, struct nfsd_net *nn); extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn);
-struct nfs4_file *find_file(struct knfsd_fh *fh); void put_nfs4_file(struct nfs4_file *fi); extern void nfs4_put_copy(struct nfsd4_copy *copy); extern struct nfsd4_copy *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit a0ce48375a367222989c2618fe68bf34db8c7bb7 ]
It's unusual but possible for multiple filehandles to point to the same file. In that case, we may end up with multiple nfs4_files referencing the same inode.
For delegation purposes it will turn out to be useful to flag those cases.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 37 ++++++++++++++++++++++++++++--------- fs/nfsd/state.h | 2 ++ 2 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8d2d6e90bfc5e..89d5669ce1463 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4120,6 +4120,8 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, fp->fi_share_deny = 0; memset(fp->fi_fds, 0, sizeof(fp->fi_fds)); memset(fp->fi_access, 0, sizeof(fp->fi_access)); + fp->fi_aliased = false; + fp->fi_inode = d_inode(fh->fh_dentry); #ifdef CONFIG_NFSD_PNFS INIT_LIST_HEAD(&fp->fi_lo_states); atomic_set(&fp->fi_lo_recalls, 0); @@ -4472,6 +4474,31 @@ find_file_locked(struct svc_fh *fh, unsigned int hashval) return NULL; }
+static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, + unsigned int hashval) +{ + struct nfs4_file *fp; + struct nfs4_file *ret = NULL; + bool alias_found = false; + + spin_lock(&state_lock); + hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { + if (refcount_inc_not_zero(&fp->fi_ref)) + ret = fp; + } else if (d_inode(fh->fh_dentry) == fp->fi_inode) + fp->fi_aliased = alias_found = true; + } + if (likely(ret == NULL)) { + nfsd4_init_file(fh, hashval, new); + new->fi_aliased = alias_found; + ret = new; + } + spin_unlock(&state_lock); + return ret; +} + static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; @@ -4495,15 +4522,7 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) if (fp) return fp;
- spin_lock(&state_lock); - fp = find_file_locked(fh, hashval); - if (likely(fp == NULL)) { - nfsd4_init_file(fh, hashval, new); - fp = new; - } - spin_unlock(&state_lock); - - return fp; + return insert_file(new, fh, hashval); }
/* diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 61a2d95d79233..e73bdbb1634ab 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -516,6 +516,8 @@ struct nfs4_clnt_odstate { */ struct nfs4_file { refcount_t fi_ref; + struct inode * fi_inode; + bool fi_aliased; spinlock_t fi_lock; struct hlist_node fi_hash; /* hash on fi_fhandle */ struct list_head fi_stateids;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit ebd9d2c2f5a7ebaaed2d7bb4dee148755f46033d ]
No change in behavior, I'm just moving some code around to avoid forward references in a following patch.
(To do someday: figure out how to split up nfs4state.c. It's big and disorganized.)
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 235 ++++++++++++++++++++++---------------------- 1 file changed, 118 insertions(+), 117 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 89d5669ce1463..9c8dacf0f2b86 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -354,6 +354,124 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { .release = nfsd4_cb_notify_lock_release, };
+/* + * We store the NONE, READ, WRITE, and BOTH bits separately in the + * st_{access,deny}_bmap field of the stateid, in order to track not + * only what share bits are currently in force, but also what + * combinations of share bits previous opens have used. This allows us + * to enforce the recommendation of rfc 3530 14.2.19 that the server + * return an error if the client attempt to downgrade to a combination + * of share bits not explicable by closing some of its previous opens. + * + * XXX: This enforcement is actually incomplete, since we don't keep + * track of access/deny bit combinations; so, e.g., we allow: + * + * OPEN allow read, deny write + * OPEN allow both, deny none + * DOWNGRADE allow read, deny none + * + * which we should reject. + */ +static unsigned int +bmap_to_share_mode(unsigned long bmap) +{ + int i; + unsigned int access = 0; + + for (i = 1; i < 4; i++) { + if (test_bit(i, &bmap)) + access |= i; + } + return access; +} + +/* set share access for a given stateid */ +static inline void +set_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap |= mask; +} + +/* clear share access for a given stateid */ +static inline void +clear_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap &= ~mask; +} + +/* test whether a given stateid has access */ +static inline bool +test_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + return (bool)(stp->st_access_bmap & mask); +} + +/* set share deny for a given stateid */ +static inline void +set_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap |= mask; +} + +/* clear share deny for a given stateid */ +static inline void +clear_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap &= ~mask; +} + +/* test whether a given stateid is denying specific access */ +static inline bool +test_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + return (bool)(stp->st_deny_bmap & mask); +} + +static int nfs4_access_to_omode(u32 access) +{ + switch (access & NFS4_SHARE_ACCESS_BOTH) { + case NFS4_SHARE_ACCESS_READ: + return O_RDONLY; + case NFS4_SHARE_ACCESS_WRITE: + return O_WRONLY; + case NFS4_SHARE_ACCESS_BOTH: + return O_RDWR; + } + WARN_ON_ONCE(1); + return O_RDONLY; +} + +static inline int +access_permit_read(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_READ, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp) || + test_access(NFS4_SHARE_ACCESS_WRITE, stp); +} + +static inline int +access_permit_write(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp); +} + static inline struct nfs4_stateowner * nfs4_get_stateowner(struct nfs4_stateowner *sop) { @@ -1167,108 +1285,6 @@ static unsigned int clientstr_hashval(struct xdr_netobj name) return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK; }
-/* - * We store the NONE, READ, WRITE, and BOTH bits separately in the - * st_{access,deny}_bmap field of the stateid, in order to track not - * only what share bits are currently in force, but also what - * combinations of share bits previous opens have used. This allows us - * to enforce the recommendation of rfc 3530 14.2.19 that the server - * return an error if the client attempt to downgrade to a combination - * of share bits not explicable by closing some of its previous opens. - * - * XXX: This enforcement is actually incomplete, since we don't keep - * track of access/deny bit combinations; so, e.g., we allow: - * - * OPEN allow read, deny write - * OPEN allow both, deny none - * DOWNGRADE allow read, deny none - * - * which we should reject. - */ -static unsigned int -bmap_to_share_mode(unsigned long bmap) { - int i; - unsigned int access = 0; - - for (i = 1; i < 4; i++) { - if (test_bit(i, &bmap)) - access |= i; - } - return access; -} - -/* set share access for a given stateid */ -static inline void -set_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap |= mask; -} - -/* clear share access for a given stateid */ -static inline void -clear_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap &= ~mask; -} - -/* test whether a given stateid has access */ -static inline bool -test_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - return (bool)(stp->st_access_bmap & mask); -} - -/* set share deny for a given stateid */ -static inline void -set_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap |= mask; -} - -/* clear share deny for a given stateid */ -static inline void -clear_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap &= ~mask; -} - -/* test whether a given stateid is denying specific access */ -static inline bool -test_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - return (bool)(stp->st_deny_bmap & mask); -} - -static int nfs4_access_to_omode(u32 access) -{ - switch (access & NFS4_SHARE_ACCESS_BOTH) { - case NFS4_SHARE_ACCESS_READ: - return O_RDONLY; - case NFS4_SHARE_ACCESS_WRITE: - return O_WRONLY; - case NFS4_SHARE_ACCESS_BOTH: - return O_RDWR; - } - WARN_ON_ONCE(1); - return O_RDONLY; -} - /* * A stateid that had a deny mode associated with it is being released * or downgraded. Recalculate the deny mode on the file. @@ -5565,21 +5581,6 @@ static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) return nfs_ok; }
-static inline int -access_permit_read(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_READ, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp) || - test_access(NFS4_SHARE_ACCESS_WRITE, stp); -} - -static inline int -access_permit_write(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp); -} - static __be32 nfs4_check_openmode(struct nfs4_ol_stateid *stp, int flags) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit aba2072f452346d56a462718bcde93d697383148 ]
It's OK to grant a read delegation to a client that holds a write, as long as it's the only client holding the write.
We originally tried to do this in commit 94415b06eb8a ("nfsd4: a client's own opens needn't prevent delegations"), which had to be reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own opens needn't prevent delegations"").
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/locks.c | 3 ++ fs/nfsd/nfs4state.c | 82 +++++++++++++++++++++++++++++++++++++-------- 2 files changed, 71 insertions(+), 14 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c index 873f97504bddf..101867933e4d3 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1808,6 +1808,9 @@ check_conflicting_open(struct file *filp, const long arg, int flags)
if (flags & FL_LAYOUT) return 0; + if (flags & FL_DELEG) + /* We leave these checks to the caller */ + return 0;
if (arg == F_RDLCK) return inode_is_open_for_write(inode) ? -EAGAIN : 0; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9c8dacf0f2b86..fd0d6e81a2272 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5030,6 +5030,65 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, return fl; }
+static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, + struct nfs4_file *fp) +{ + struct nfs4_ol_stateid *st; + struct file *f = fp->fi_deleg_file->nf_file; + struct inode *ino = locks_inode(f); + int writes; + + writes = atomic_read(&ino->i_writecount); + if (!writes) + return 0; + /* + * There could be multiple filehandles (hence multiple + * nfs4_files) referencing this file, but that's not too + * common; let's just give up in that case rather than + * trying to go look up all the clients using that other + * nfs4_file as well: + */ + if (fp->fi_aliased) + return -EAGAIN; + /* + * If there's a close in progress, make sure that we see it + * clear any fi_fds[] entries before we see it decrement + * i_writecount: + */ + smp_mb__after_atomic(); + + if (fp->fi_fds[O_WRONLY]) + writes--; + if (fp->fi_fds[O_RDWR]) + writes--; + if (writes > 0) + return -EAGAIN; /* There may be non-NFSv4 writers */ + /* + * It's possible there are non-NFSv4 write opens in progress, + * but if they haven't incremented i_writecount yet then they + * also haven't called break lease yet; so, they'll break this + * lease soon enough. So, all that's left to check for is NFSv4 + * opens: + */ + spin_lock(&fp->fi_lock); + list_for_each_entry(st, &fp->fi_stateids, st_perfile) { + if (st->st_openstp == NULL /* it's an open */ && + access_permit_write(st) && + st->st_stid.sc_client != clp) { + spin_unlock(&fp->fi_lock); + return -EAGAIN; + } + } + spin_unlock(&fp->fi_lock); + /* + * There's a small chance that we could be racing with another + * NFSv4 open. However, any open that hasn't added itself to + * the fi_stateids list also hasn't called break_lease yet; so, + * they'll break this lease soon enough. + */ + return 0; +} + static struct nfs4_delegation * nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) @@ -5049,9 +5108,12 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
nf = find_readable_file(fp); if (!nf) { - /* We should always have a readable file here */ - WARN_ON_ONCE(1); - return ERR_PTR(-EBADF); + /* + * We probably could attempt another open and get a read + * delegation, but for now, don't bother until the + * client actually sends us one. + */ + return ERR_PTR(-EAGAIN); } spin_lock(&state_lock); spin_lock(&fp->fi_lock); @@ -5086,6 +5148,9 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, locks_free_lock(fl); if (status) goto out_clnt_odstate; + status = nfsd4_check_conflicting_opens(clp, fp); + if (status) + goto out_unlock;
spin_lock(&state_lock); spin_lock(&fp->fi_lock); @@ -5167,17 +5232,6 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, goto out_no_deleg; if (!cb_up || !(oo->oo_flags & NFS4_OO_CONFIRMED)) goto out_no_deleg; - /* - * Also, if the file was opened for write or - * create, there's a good chance the client's - * about to write to it, resulting in an - * immediate recall (since we don't support - * write delegations): - */ - if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) - goto out_no_deleg; - if (open->op_create == NFS4_OPEN_CREATE) - goto out_no_deleg; break; default: goto out_no_deleg;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavoars@kernel.org
[ Upstream commit 76c50eb70d8e1133eaada0013845619c36345fbc ]
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding a couple of break statements instead of just letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/nfsctl.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fd0d6e81a2272..ea68dc157ada1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3161,6 +3161,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_nolock; } new->cl_mach_cred = true; + break; case SP4_NONE: break; default: /* checked by xdr code */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 147ed8995a79f..cb73c12925629 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1169,6 +1169,7 @@ static struct inode *nfsd_get_inode(struct super_block *sb, umode_t mode) inode->i_fop = &simple_dir_operations; inode->i_op = &simple_dir_inode_operations; inc_nlink(inode); + break; default: break; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit d9092b4bb2109502eb8972021a3f74febc931a63 ]
The client SSC code should not depend on any of the CONFIG_NFSD config. This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/Kconfig | 4 ++-- fs/nfs/nfs4file.c | 4 ---- fs/nfs/super.c | 4 ---- fs/nfsd/Kconfig | 2 +- 4 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig index 462253ae483a3..eaff422877c39 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -334,8 +334,8 @@ config NFS_COMMON default y
config NFS_V4_2_SSC_HELPER - tristate - default y if NFS_V4=y || NFS_FS=y + bool + default y if NFS_V4_2
source "net/sunrpc/Kconfig" source "fs/ceph/Kconfig" diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 5ad57ad89fb1e..70cd0d764c447 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -430,9 +430,7 @@ static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = { */ void nfs42_ssc_register_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs42_ssc_register(&nfs4_ssc_clnt_ops_tbl); -#endif }
/** @@ -443,9 +441,7 @@ void nfs42_ssc_register_ops(void) */ void nfs42_ssc_unregister_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs42_ssc_unregister(&nfs4_ssc_clnt_ops_tbl); -#endif } #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 7179d59d73ca4..1ffce90760606 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -116,16 +116,12 @@ static void unregister_nfs4_fs(void) #ifdef CONFIG_NFS_V4_2 static void nfs_ssc_register_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs_ssc_register(&nfs_ssc_clnt_ops_tbl); -#endif }
static void nfs_ssc_unregister_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs_ssc_unregister(&nfs_ssc_clnt_ops_tbl); -#endif } #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 5fa38ad9e7e3f..f229172652be0 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -138,7 +138,7 @@ config NFSD_FLEXFILELAYOUT
config NFSD_V4_2_INTER_SSC bool "NFSv4.2 inter server to server COPY" - depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2 + depends on NFSD_V4 && NFS_V4_2 help This option enables support for NFSv4.2 inter server to server copy where the destination server calls the NFSv4.2
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b876d708316bf9b6b9678eb2beb289b93cfe6369 ]
The change attribute is always set by all NFS client versions so get rid of the open-coded version.
Fixes: 3cc55f4434b4 ("nfs: use change attribute for NFS re-exports") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index f2b34cfe286c2..b347e3ce0cc8e 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,17 +171,10 @@ static u64 nfs_fetch_iversion(struct inode *inode) { struct nfs_server *server = NFS_SERVER(inode);
- /* Is this the right call?: */ - nfs_revalidate_inode(server, inode); - /* - * Also, note we're ignoring any returned error. That seems to be - * the practice for cache consistency information elsewhere in - * the server, but I'm not sure why. - */ - if (server->nfs_client->rpc_ops->version >= 4) - return inode_peek_iversion_raw(inode); - else - return time_to_chattr(&inode->i_ctime); + if (nfs_check_cache_invalid(inode, NFS_INO_INVALID_CHANGE | + NFS_INO_REVAL_PAGECACHE)) + __nfs_revalidate_inode(server, inode); + return inode_peek_iversion_raw(inode); }
const struct export_operations nfs_export_ops = {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit a8b98c808eab3ec8f1b5a64be967b0f4af4cae43 ]
Reporting event->pid should depend on the privileges of the user that initialized the group, not the privileges of the user reading the events.
Use an internal group flag FANOTIFY_UNPRIV to record the fact that the group was initialized by an unprivileged user.
To be on the safe side, the premissions to setup filesystem and mount marks now require that both the user that initialized the group and the user setting up the mark have CAP_SYS_ADMIN.
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiA77_P5vtv7e83g0+9d7B5W9ZTE4GfQ... Fixes: 7cea2a3c505e ("fanotify: support limited functionality for unprivileged users") Cc: Stable@vger.kernel.org # v5.12+ Link: https://lore.kernel.org/r/20210524135321.2190062-1-amir73il@gmail.com Reviewed-by: Matthew Bobrowski repnop@google.com Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 30 ++++++++++++++++++++++++------ fs/notify/fdinfo.c | 2 +- include/linux/fanotify.h | 4 ++++ 3 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 98289ace66fac..1c439e2fdcd80 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -424,11 +424,18 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, * events generated by the listener process itself, without disclosing * the pids of other processes. */ - if (!capable(CAP_SYS_ADMIN) && + if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && task_tgid(current) != event->pid) metadata.pid = 0;
- if (path && path->mnt && path->dentry) { + /* + * For now, fid mode is required for an unprivileged listener and + * fid mode does not report fd in events. Keep this check anyway + * for safety in case fid mode requirement is relaxed in the future + * to allow unprivileged listener to get events with no fd and no fid. + */ + if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && + path && path->mnt && path->dentry) { fd = create_fd(group, path, &f); if (fd < 0) return fd; @@ -1040,6 +1047,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) int f_flags, fd; unsigned int fid_mode = flags & FANOTIFY_FID_BITS; unsigned int class = flags & FANOTIFY_CLASS_BITS; + unsigned int internal_flags = 0;
pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); @@ -1053,6 +1061,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) */ if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode) return -EPERM; + + /* + * Setting the internal flag FANOTIFY_UNPRIV on the group + * prevents setting mount/filesystem marks on this group and + * prevents reporting pid and open fd in events. + */ + internal_flags |= FANOTIFY_UNPRIV; }
#ifdef CONFIG_AUDITSYSCALL @@ -1105,7 +1120,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) goto out_destroy_group; }
- group->fanotify_data.flags = flags; + group->fanotify_data.flags = flags | internal_flags; group->memcg = get_mem_cgroup_from_mm(current->mm);
group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); @@ -1323,11 +1338,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group = f.file->private_data;
/* - * An unprivileged user is not allowed to watch a mount point nor - * a filesystem. + * An unprivileged user is not allowed to setup mount nor filesystem + * marks. This also includes setting up such marks by a group that + * was initialized by an unprivileged user. */ ret = -EPERM; - if (!capable(CAP_SYS_ADMIN) && + if ((!capable(CAP_SYS_ADMIN) || + FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && mark_type != FAN_MARK_INODE) goto fput_and_out;
@@ -1478,6 +1495,7 @@ static int __init fanotify_user_setup(void) max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS, FANOTIFY_DEFAULT_MAX_USER_MARKS);
+ BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 85b112bd88511..3451708fd035c 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -137,7 +137,7 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f) struct fsnotify_group *group = f->private_data;
seq_printf(m, "fanotify flags:%x event-flags:%x\n", - group->fanotify_data.flags, + group->fanotify_data.flags & FANOTIFY_INIT_FLAGS, group->fanotify_data.f_flags);
show_fdinfo(m, f, fanotify_fdinfo); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index bad41bcb25dfb..a16dbeced1528 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -51,6 +51,10 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \ FANOTIFY_USER_INIT_FLAGS)
+/* Internal group flags */ +#define FANOTIFY_UNPRIV 0x80000000 +#define FANOTIFY_INTERNAL_GROUP_FLAGS (FANOTIFY_UNPRIV) + #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 87b2394d60c32c158ebb96ace4abee883baf1239 ]
To be used in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index d20ab767ba26a..3ec6d38fa5318 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -841,6 +841,22 @@ DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); DEFINE_NFSD_CB_EVENT(shutdown);
+TRACE_DEFINE_ENUM(RPC_AUTH_NULL); +TRACE_DEFINE_ENUM(RPC_AUTH_UNIX); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5I); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5P); + +#define show_nfsd_authflavor(val) \ + __print_symbolic(val, \ + { RPC_AUTH_NULL, "none" }, \ + { RPC_AUTH_UNIX, "sys" }, \ + { RPC_AUTH_GSS, "gss" }, \ + { RPC_AUTH_GSS_KRB5, "krb5" }, \ + { RPC_AUTH_GSS_KRB5I, "krb5i" }, \ + { RPC_AUTH_GSS_KRB5P, "krb5p" }) + TRACE_EVENT(nfsd_cb_setup_err, TP_PROTO( const struct nfs4_client *clp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 27787733ef44332fce749aa853f2749d141982b0 ]
Record when a client tries to establish a lease record but uses an unexpected credential. This is often a sign of a configuration problem.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 14 ++++++++++---- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ea68dc157ada1..2e18b1ad889d7 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3203,6 +3203,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!creds_match) { /* case 3 */ if (client_has_state(conf)) { status = nfserr_clid_inuse; + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } goto out_new; @@ -3447,9 +3448,10 @@ nfsd4_create_session(struct svc_rqst *rqstp, goto out_free_conn; } } else if (unconf) { + status = nfserr_clid_inuse; if (!same_creds(&unconf->cl_cred, &rqstp->rq_cred) || !rpc_cmp_addr(sa, (struct sockaddr *) &unconf->cl_addr)) { - status = nfserr_clid_inuse; + trace_nfsd_clid_cred_mismatch(unconf, rqstp); goto out_free_conn; } status = nfserr_wrong_cred; @@ -4008,7 +4010,7 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (clp_used_exchangeid(conf)) goto out; if (!same_creds(&conf->cl_cred, &rqstp->rq_cred)) { - trace_nfsd_clid_inuse_err(conf); + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } } @@ -4066,10 +4068,14 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, * Nevertheless, RFC 7530 recommends INUSE for this case: */ status = nfserr_clid_inuse; - if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) + if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) { + trace_nfsd_clid_cred_mismatch(unconf, rqstp); goto out; - if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) + } + if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) { + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; + } /* cases below refer to rfc 3530 section 14.2.34: */ if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) { if (conf && same_verf(&confirm, &conf->cl_confirm)) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3ec6d38fa5318..bec85fc8be01a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -536,6 +536,34 @@ DEFINE_EVENT(nfsd_net_class, nfsd_##name, \ DEFINE_NET_EVENT(grace_start); DEFINE_NET_EVENT(grace_complete);
+TRACE_EVENT(nfsd_clid_cred_mismatch, + TP_PROTO( + const struct nfs4_client *clp, + const struct svc_rqst *rqstp + ), + TP_ARGS(clp, rqstp), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(unsigned long, cl_flavor) + __field(unsigned long, new_flavor) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __entry->cl_flavor = clp->cl_cred.cr_flavor; + __entry->new_flavor = rqstp->rq_cred.cr_flavor; + memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, + sizeof(struct sockaddr_in6)); + ), + TP_printk("client %08x:%08x flavor=%s, conflict=%s from addr=%pISpc", + __entry->cl_boot, __entry->cl_id, + show_nfsd_authflavor(__entry->cl_flavor), + show_nfsd_authflavor(__entry->new_flavor), __entry->addr + ) +) + TRACE_EVENT(nfsd_clid_inuse_err, TP_PROTO(const struct nfs4_client *clp), TP_ARGS(clp),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 744ea54c869cebe41fbad5f53f8a8ca5d93a5c97 ]
Record when a client presents a different boot verifier than the one we know about. Typically this is a sign the client has rebooted, but sometimes it signals a conflicting client ID, which the client's administrator will need to address.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 11 ++++++++--- fs/nfsd/trace.h | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2e18b1ad889d7..8169625fdd233 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3213,6 +3213,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_copy; } /* case 5, client reboot */ + trace_nfsd_clid_verf_mismatch(conf, rqstp, &verf); conf = NULL; goto out_new; } @@ -4018,9 +4019,13 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (unconf) unhash_client_locked(unconf); /* We need to handle only case 1: probable callback update */ - if (conf && same_verf(&conf->cl_verifier, &clverifier)) { - copy_clid(new, conf); - gen_confirm(new, nn); + if (conf) { + if (same_verf(&conf->cl_verifier, &clverifier)) { + copy_clid(new, conf); + gen_confirm(new, nn); + } else + trace_nfsd_clid_verf_mismatch(conf, rqstp, + &clverifier); } new->cl_minorversion = 0; gen_callback(new, setclid, rqstp); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index bec85fc8be01a..0ab46a0c911d2 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -564,6 +564,38 @@ TRACE_EVENT(nfsd_clid_cred_mismatch, ) )
+TRACE_EVENT(nfsd_clid_verf_mismatch, + TP_PROTO( + const struct nfs4_client *clp, + const struct svc_rqst *rqstp, + const nfs4_verifier *verf + ), + TP_ARGS(clp, rqstp, verf), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __array(unsigned char, cl_verifier, NFS4_VERIFIER_SIZE) + __array(unsigned char, new_verifier, NFS4_VERIFIER_SIZE) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + memcpy(__entry->cl_verifier, (void *)&clp->cl_verifier, + NFS4_VERIFIER_SIZE); + memcpy(__entry->new_verifier, (void *)verf, + NFS4_VERIFIER_SIZE); + memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, + sizeof(struct sockaddr_in6)); + ), + TP_printk("client %08x:%08x verf=0x%s, updated=0x%s from addr=%pISpc", + __entry->cl_boot, __entry->cl_id, + __print_hex_str(__entry->cl_verifier, NFS4_VERIFIER_SIZE), + __print_hex_str(__entry->new_verifier, NFS4_VERIFIER_SIZE), + __entry->addr + ) +); + TRACE_EVENT(nfsd_clid_inuse_err, TP_PROTO(const struct nfs4_client *clp), TP_ARGS(clp),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0bfaacac57e64aa342f865b8ddcab06ca59a6f83 ]
This tracepoint has been replaced by nfsd_clid_cred_mismatch and nfsd_clid_verf_mismatch, and can simply be removed.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 24 ------------------------ 1 file changed, 24 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0ab46a0c911d2..2fac89e29f515 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -596,30 +596,6 @@ TRACE_EVENT(nfsd_clid_verf_mismatch, ) );
-TRACE_EVENT(nfsd_clid_inuse_err, - TP_PROTO(const struct nfs4_client *clp), - TP_ARGS(clp), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - __field(unsigned int, namelen) - __dynamic_array(unsigned char, name, clp->cl_name.len) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - memcpy(__entry->addr, &clp->cl_addr, - sizeof(struct sockaddr_in6)); - __entry->namelen = clp->cl_name.len; - memcpy(__get_dynamic_array(name), clp->cl_name.data, - clp->cl_name.len); - ), - TP_printk("nfs4_clientid %.*s already in use by %pISpc, client %08x:%08x", - __entry->namelen, __get_str(name), __entry->addr, - __entry->cl_boot, __entry->cl_id) -) - /* * from fs/nfsd/filecache.h */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7e3b32ace6094aadfa2e1e54ca4c6bbfd07646af ]
This replaces a dprintk call site in order to get greater visibility on when client IDs are confirmed or re-used. Simple example:
nfsd-995 [000] 126.622975: nfsd_compound: xid=0x3a34e2b1 opcnt=1 nfsd-995 [000] 126.623005: nfsd_cb_args: addr=192.168.2.51:45901 client 60958e3b:9213ef0e prog=1073741824 ident=1 nfsd-995 [000] 126.623007: nfsd_compound_status: op=1/1 OP_SETCLIENTID status=0 nfsd-996 [001] 126.623142: nfsd_compound: xid=0x3b34e2b1 opcnt=1
nfsd-996 [001] 126.623146: nfsd_clid_confirmed: client 60958e3b:9213ef0e
nfsd-996 [001] 126.623148: nfsd_cb_probe: addr=192.168.2.51:45901 client 60958e3b:9213ef0e state=UNKNOWN nfsd-996 [001] 126.623154: nfsd_compound_status: op=1/1 OP_SETCLIENTID_CONFIRM status=0
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 10 +++++----- fs/nfsd/trace.h | 1 + 2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8169625fdd233..b10593079e380 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2838,14 +2838,14 @@ move_to_confirmed(struct nfs4_client *clp)
lockdep_assert_held(&nn->client_lock);
- dprintk("NFSD: move_to_confirm nfs4_client %p\n", clp); list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags) && - clp->cl_nfsd_dentry && - clp->cl_nfsd_info_dentry) - fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); + if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) { + trace_nfsd_clid_confirmed(&clp->cl_clientid); + if (clp->cl_nfsd_dentry && clp->cl_nfsd_info_dentry) + fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); + } renew_client_locked(clp); }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2fac89e29f515..2c0f0057f60c9 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid))
+DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cee8aa074281e5269d8404be2b6388bb29ea8efc ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b10593079e380..da5b9b88b0cd4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3981,6 +3981,7 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, goto out;
status = nfs_ok; + trace_nfsd_clid_reclaim_complete(&clp->cl_clientid); nfsd4_client_record_create(clp); inc_reclaim_complete(clp); out: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2c0f0057f60c9..6c787f4ef5633 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid))
+DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c41a9b7a906fb872f8b2b1a34d2a1d5ef7f94adb ]
Record client-requested termination of client IDs.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index da5b9b88b0cd4..6f04a84f76c0e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3939,6 +3939,7 @@ nfsd4_destroy_clientid(struct svc_rqst *rqstp, status = nfserr_wrong_cred; goto out; } + trace_nfsd_clid_destroyed(&clp->cl_clientid); unhash_client_locked(clp); out: spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 6c787f4ef5633..3aca6dcba90a5 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -513,6 +513,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \
DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); +DEFINE_CLIENTID_EVENT(destroyed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2958d2ee71021b6c44212ec6c2a39cc71d9cd4a9 ]
Improve observation of NFSv4 lease expiry.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 9 ++++++--- fs/nfsd/trace.h | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6f04a84f76c0e..7e8752f4affda 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2687,6 +2687,8 @@ static void force_expire_client(struct nfs4_client *clp) struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); bool already_expired;
+ trace_nfsd_clid_admin_expired(&clp->cl_clientid); + spin_lock(&nn->client_lock); clp->cl_time = 0; spin_unlock(&nn->client_lock); @@ -3233,6 +3235,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = mark_client_expired_locked(conf); if (status) goto out; + trace_nfsd_clid_replaced(&conf->cl_clientid); } new->cl_minorversion = cstate->minorversion; new->cl_spo_must_allow.u.words[0] = exid->spo_must_allow[0]; @@ -3472,6 +3475,7 @@ nfsd4_create_session(struct svc_rqst *rqstp, old = NULL; goto out_free_conn; } + trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; @@ -4112,6 +4116,7 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, old = NULL; goto out; } + trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; @@ -5550,10 +5555,8 @@ nfs4_laundromat(struct nfsd_net *nn) clp = list_entry(pos, struct nfs4_client, cl_lru); if (!state_expired(<, clp->cl_time)) break; - if (mark_client_expired_locked(clp)) { - trace_nfsd_clid_expired(&clp->cl_clientid); + if (mark_client_expired_locked(clp)) continue; - } list_add(&clp->cl_lru, &reaplist); } spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3aca6dcba90a5..3271d925abf2e 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -514,7 +514,8 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(destroyed); -DEFINE_CLIENTID_EVENT(expired); +DEFINE_CLIENTID_EVENT(admin_expired); +DEFINE_CLIENTID_EVENT(replaced); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew); DEFINE_CLIENTID_EVENT(stale);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 237f91c85acef206a33bc02f3c4e856128fd7994 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 19 ++++++++----------- fs/nfsd/trace.h | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 7e8752f4affda..31310dbe9c1e2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4008,11 +4008,9 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, new = create_client(clname, rqstp, &clverifier); if (new == NULL) return nfserr_jukebox; - /* Cases below refer to rfc 3530 section 14.2.33: */ spin_lock(&nn->client_lock); conf = find_confirmed_client_by_name(&clname, nn); if (conf && client_has_state(conf)) { - /* case 0: */ status = nfserr_clid_inuse; if (clp_used_exchangeid(conf)) goto out; @@ -4024,7 +4022,6 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, unconf = find_unconfirmed_client_by_name(&clname, nn); if (unconf) unhash_client_locked(unconf); - /* We need to handle only case 1: probable callback update */ if (conf) { if (same_verf(&conf->cl_verifier, &clverifier)) { copy_clid(new, conf); @@ -4032,7 +4029,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } else trace_nfsd_clid_verf_mismatch(conf, rqstp, &clverifier); - } + } else + trace_nfsd_clid_fresh(new); new->cl_minorversion = 0; gen_callback(new, setclid, rqstp); add_to_unconfirmed(new); @@ -4045,12 +4043,13 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_unlock(&nn->client_lock); if (new) free_client(new); - if (unconf) + if (unconf) { + trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); expire_client(unconf); + } return status; }
- __be32 nfsd4_setclientid_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, @@ -4087,21 +4086,19 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } - /* cases below refer to rfc 3530 section 14.2.34: */ if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) { if (conf && same_verf(&confirm, &conf->cl_confirm)) { - /* case 2: probable retransmit */ status = nfs_ok; - } else /* case 4: client hasn't noticed we rebooted yet? */ + } else status = nfserr_stale_clientid; goto out; } status = nfs_ok; - if (conf) { /* case 1: callback update */ + if (conf) { old = unconf; unhash_client_locked(old); nfsd4_change_callback(conf, &unconf->cl_cb_conn); - } else { /* case 3: normal case; new or rebooted client */ + } else { old = find_confirmed_client_by_name(&unconf->cl_name, nn); if (old) { status = nfserr_clid_inuse; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3271d925abf2e..aa42d31cdfac1 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid))
+DEFINE_CLIENTID_EVENT(expire_unconf); DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(destroyed); @@ -600,6 +601,42 @@ TRACE_EVENT(nfsd_clid_verf_mismatch, ) );
+DECLARE_EVENT_CLASS(nfsd_clid_class, + TP_PROTO(const struct nfs4_client *clp), + TP_ARGS(clp), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + __field(unsigned long, flavor) + __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) + __dynamic_array(char, name, clp->cl_name.len + 1) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + memcpy(__entry->addr, &clp->cl_addr, + sizeof(struct sockaddr_in6)); + __entry->flavor = clp->cl_cred.cr_flavor; + memcpy(__entry->verifier, (void *)&clp->cl_verifier, + NFS4_VERIFIER_SIZE); + memcpy(__get_str(name), clp->cl_name.data, clp->cl_name.len); + __get_str(name)[clp->cl_name.len] = '\0'; + ), + TP_printk("addr=%pISpc name='%s' verifier=0x%s flavor=%s client=%08x:%08x", + __entry->addr, __get_str(name), + __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE), + show_nfsd_authflavor(__entry->flavor), + __entry->cl_boot, __entry->cl_id) +); + +#define DEFINE_CLID_EVENT(name) \ +DEFINE_EVENT(nfsd_clid_class, nfsd_clid_##name, \ + TP_PROTO(const struct nfs4_client *clp), \ + TP_ARGS(clp)) + +DEFINE_CLID_EVENT(fresh); + /* * from fs/nfsd/filecache.h */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e8f80c5545ec5794644b48537449e48b009d608d ]
Some of the most common cases are traced. Enough infrastructure is now in place that more can be added later, as needed.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 12 +++++++++--- fs/nfsd/trace.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 31310dbe9c1e2..adf476cbf36c3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3200,6 +3200,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } /* case 6 */ exid->flags |= EXCHGID4_FLAG_CONFIRMED_R; + trace_nfsd_clid_confirmed_r(conf); goto out_copy; } if (!creds_match) { /* case 3 */ @@ -3212,6 +3213,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } if (verfs_match) { /* case 2 */ conf->cl_exchange_flags |= EXCHGID4_FLAG_CONFIRMED_R; + trace_nfsd_clid_confirmed_r(conf); goto out_copy; } /* case 5, client reboot */ @@ -3225,11 +3227,13 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; }
- unconf = find_unconfirmed_client_by_name(&exid->clname, nn); + unconf = find_unconfirmed_client_by_name(&exid->clname, nn); if (unconf) /* case 4, possible retry or client restart */ unhash_client_locked(unconf);
- /* case 1 (normal case) */ + /* case 1, new owner ID */ + trace_nfsd_clid_fresh(new); + out_new: if (conf) { status = mark_client_expired_locked(conf); @@ -3259,8 +3263,10 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, out_nolock: if (new) expire_client(new); - if (unconf) + if (unconf) { + trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); expire_client(unconf); + } return status; }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index aa42d31cdfac1..de461c82dbf40 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -636,6 +636,7 @@ DEFINE_EVENT(nfsd_clid_class, nfsd_clid_##name, \ TP_ARGS(clp))
DEFINE_CLID_EVENT(fresh); +DEFINE_CLID_EVENT(confirmed_r);
/* * from fs/nfsd/filecache.h
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1736aec82a15cb5d4b3bbe0b2fbae0ede66b1a1a ]
Enable knfsd_fh_hash() to be invoked in functions where the filehandle pointer is a const.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index aff2cda5c6c33..6106697adc04b 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -225,15 +225,12 @@ static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) * returns a crc32 hash for the filehandle that is compatible with * the one displayed by "wireshark". */ - -static inline u32 -knfsd_fh_hash(struct knfsd_fh *fh) +static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size); } #else -static inline u32 -knfsd_fh_hash(struct knfsd_fh *fh) +static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8476c69a7fa0f1f9705ec0caa4e97c08b5045779 ]
We were missing one.
As a clean-up, add a helper that sets the new CB state and fires a tracepoint. The tracepoint fires only when the state changes, to help reduce trace log noise.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index f5b7ad0847f20..2e6a4a9e59ca1 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -945,20 +945,26 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c return 0; }
+static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate) +{ + if (clp->cl_cb_state != newstate) { + clp->cl_cb_state = newstate; + trace_nfsd_cb_state(clp); + } +} + static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - clp->cl_cb_state = NFSD4_CB_DOWN; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_DOWN); }
static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - clp->cl_cb_state = NFSD4_CB_FAULT; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_FAULT); }
static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) @@ -968,10 +974,8 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) trace_nfsd_cb_done(clp, task->tk_status); if (task->tk_status) nfsd4_mark_cb_down(clp, task->tk_status); - else { - clp->cl_cb_state = NFSD4_CB_UP; - trace_nfsd_cb_state(clp); - } + else + nfsd4_mark_cb_state(clp, NFSD4_CB_UP); }
static void nfsd4_cb_probe_release(void *calldata) @@ -995,8 +999,7 @@ static const struct rpc_call_ops nfsd4_cb_probe_ops = { */ void nfsd4_probe_callback(struct nfs4_client *clp) { - clp->cl_cb_state = NFSD4_CB_UNKNOWN; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); nfsd4_run_cb(&clp->cl_cb_null); } @@ -1009,11 +1012,10 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp)
void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *conn) { - clp->cl_cb_state = NFSD4_CB_UNKNOWN; + nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); spin_lock(&clp->cl_lock); memcpy(&clp->cl_cb_conn, conn, sizeof(struct nfs4_cb_conn)); spin_unlock(&clp->cl_lock); - trace_nfsd_cb_state(clp); }
/* @@ -1345,7 +1347,7 @@ nfsd4_run_cb_work(struct work_struct *work) * Don't send probe messages for 4.1 or later. */ if (!cb->cb_ops && clp->cl_minorversion) { - clp->cl_cb_state = NFSD4_CB_UP; + nfsd4_mark_cb_state(clp, NFSD4_CB_UP); nfsd41_destroy_cb(cb); return; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 167145cc64ce4b4b177e636829909a6b14004f9e ]
TRACE_DEFINE_ENUM() is necessary for enum {} but not for C macros.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index de461c82dbf40..3683076e0fcd3 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -877,11 +877,6 @@ TRACE_EVENT(nfsd_cb_nodelegs, TP_printk("client %08x:%08x", __entry->cl_boot, __entry->cl_id) )
-TRACE_DEFINE_ENUM(NFSD4_CB_UP); -TRACE_DEFINE_ENUM(NFSD4_CB_UNKNOWN); -TRACE_DEFINE_ENUM(NFSD4_CB_DOWN); -TRACE_DEFINE_ENUM(NFSD4_CB_FAULT); - #define show_cb_state(val) \ __print_symbolic(val, \ { NFSD4_CB_UP, "UP" }, \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 806d65b617d89be887fe68bfa051f78143669cd7 ]
Provide more clarity about when the callback channel is in trouble.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 ++ fs/nfsd/trace.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index adf476cbf36c3..a8aa3680605bb 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1763,6 +1763,8 @@ static void nfsd4_conn_lost(struct svc_xpt_user *u) struct nfsd4_conn *c = container_of(u, struct nfsd4_conn, cn_xpt_user); struct nfs4_client *clp = c->cn_session->se_client;
+ trace_nfsd_cb_lost(clp); + spin_lock(&clp->cl_lock); if (!list_empty(&c->cn_persession)) { list_del(&c->cn_persession); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3683076e0fcd3..afffb4912acbc 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -912,6 +912,7 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \
DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); +DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown);
TRACE_DEFINE_ENUM(RPC_AUTH_NULL);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b200f0e35338b052976b6c5759e4f77a3013e6f6 ]
Show when the upper layer requested a shutdown. RPC tracepoints can already show when rpc_shutdown_client() is called.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 2e6a4a9e59ca1..2a2eb6184bdae 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1233,6 +1233,9 @@ void nfsd4_destroy_callback_queue(void) /* must be called under the state lock */ void nfsd4_shutdown_callback(struct nfs4_client *clp) { + if (clp->cl_cb_state != NFSD4_CB_UNKNOWN) + trace_nfsd_cb_shutdown(clp); + set_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags); /* * Note this won't actually result in a null callback; @@ -1278,7 +1281,6 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb) * kill the old client: */ if (clp->cl_cb_client) { - trace_nfsd_cb_shutdown(clp); rpc_shutdown_client(clp->cl_cb_client); clp->cl_cb_client = NULL; put_cred(clp->cl_cb_cred);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9f57c6062bf3ce2c6ab9ba60040b34e8134ef259 ]
Display the transport protocol and authentication flavor so admins can see what they might be getting wrong.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 3 ++- fs/nfsd/trace.h | 27 ++++++++++++++++++++++++++- 2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 2a2eb6184bdae..fe1f36b70fa03 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -941,7 +941,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c clp->cl_cb_conn.cb_xprt = conn->cb_xprt; clp->cl_cb_client = client; clp->cl_cb_cred = cred; - trace_nfsd_cb_setup(clp); + trace_nfsd_cb_setup(clp, rpc_peeraddr2str(client, RPC_DISPLAY_NETID), + args.authflavor); return 0; }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index afffb4912acbc..86e0656bdb779 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -910,7 +910,6 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ TP_PROTO(const struct nfs4_client *clp), \ TP_ARGS(clp))
-DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown); @@ -931,6 +930,32 @@ TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5P); { RPC_AUTH_GSS_KRB5I, "krb5i" }, \ { RPC_AUTH_GSS_KRB5P, "krb5p" })
+TRACE_EVENT(nfsd_cb_setup, + TP_PROTO(const struct nfs4_client *clp, + const char *netid, + rpc_authflavor_t authflavor + ), + TP_ARGS(clp, netid, authflavor), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(unsigned long, authflavor) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + __array(unsigned char, netid, 8) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + strlcpy(__entry->netid, netid, sizeof(__entry->netid)); + __entry->authflavor = authflavor; + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x proto=%s flavor=%s", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->netid, show_nfsd_authflavor(__entry->authflavor)) +); + TRACE_EVENT(nfsd_cb_setup_err, TP_PROTO( const struct nfs4_client *clp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2cde7f8118f0fea29ad73ddcf28817f95adeffd5 ]
When the server kicks off a CB_LM_NOTIFY callback, record its arguments so we can better observe asynchronous locking behavior. For example:
nfsd-998 [002] 1471.705873: nfsd_cb_notify_lock: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8950b23a
Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: Jeff Layton jlayton@redhat.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 4 +++- fs/nfsd/trace.h | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a8aa3680605bb..89054fe68aca6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6494,8 +6494,10 @@ nfsd4_lm_notify(struct file_lock *fl) } spin_unlock(&nn->blocked_locks_lock);
- if (queue) + if (queue) { + trace_nfsd_cb_notify_lock(lo, nbl); nfsd4_run_cb(&nbl->nbl_cb); + } }
static const struct lock_manager_operations nfsd_posix_mng_ops = { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 86e0656bdb779..bed7d5d49fee4 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1027,6 +1027,32 @@ TRACE_EVENT(nfsd_cb_done, __entry->status) );
+TRACE_EVENT(nfsd_cb_notify_lock, + TP_PROTO( + const struct nfs4_lockowner *lo, + const struct nfsd4_blocked_lock *nbl + ), + TP_ARGS(lo, nbl), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, fh_hash) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + const struct nfs4_client *clp = lo->lo_owner.so_client; + + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __entry->fh_hash = knfsd_fh_hash(&nbl->nbl_fh); + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x fh_hash=0x%08x", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->fh_hash) +); + #endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 87512386e951ee28ba2e7ef32b843ac97621d371 ]
Record the arguments of CB_OFFLOAD callbacks so we can better observe asynchronous copy-offload behavior. For example:
nfsd-995 [008] 7721.934222: nfsd_cb_offload: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8739113a count=116528 status=0
Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: Olga Kornievskaia kolga@netapp.com Cc: Dai Ngo Dai.Ngo@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/trace.h | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f85958f81a266..dfce9c432a5ee 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1489,6 +1489,8 @@ static int nfsd4_do_async_copy(void *data) memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, + ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: if (!copy->cp_intra) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index bed7d5d49fee4..fe32dfe1e55af 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1053,6 +1053,42 @@ TRACE_EVENT(nfsd_cb_notify_lock, __entry->fh_hash) );
+TRACE_EVENT(nfsd_cb_offload, + TP_PROTO( + const struct nfs4_client *clp, + const stateid_t *stp, + const struct knfsd_fh *fh, + u64 count, + __be32 status + ), + TP_ARGS(clp, stp, fh, count, status), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __field(u32, fh_hash) + __field(int, status) + __field(u64, count) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + __entry->fh_hash = knfsd_fh_hash(fh); + __entry->status = be32_to_cpu(status); + __entry->count = count; + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x fh_hash=0x%08x count=%llu status=%d", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation, + __entry->fh_hash, __entry->count, __entry->status) +); + #endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 17d76ddf76e4972411402743eea7243d9a46f4f9 ]
Renamed so it can be enabled as a set with the other nfsd_cb_ tracepoints. And, consistent with those tracepoints, report the address of the client, the client ID the server has given it, and the state ID being recalled.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 +- fs/nfsd/trace.h | 32 +++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 89054fe68aca6..09967037eb1a3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,7 +4675,7 @@ nfsd_break_deleg_cb(struct file_lock *fl) struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file;
- trace_nfsd_deleg_break(&dp->dl_stid.sc_stateid); + trace_nfsd_cb_recall(&dp->dl_stid);
/* * We don't want the locks code to timeout the lease for us; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index fe32dfe1e55af..9061fd28c996d 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -459,7 +459,6 @@ DEFINE_STATEID_EVENT(layout_recall_release);
DEFINE_STATEID_EVENT(open); DEFINE_STATEID_EVENT(deleg_read); -DEFINE_STATEID_EVENT(deleg_break); DEFINE_STATEID_EVENT(deleg_recall);
DECLARE_EVENT_CLASS(nfsd_stateseqid_class, @@ -1027,6 +1026,37 @@ TRACE_EVENT(nfsd_cb_done, __entry->status) );
+TRACE_EVENT(nfsd_cb_recall, + TP_PROTO( + const struct nfs4_stid *stid + ), + TP_ARGS(stid), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + const stateid_t *stp = &stid->sc_stateid; + const struct nfs4_client *clp = stid->sc_client; + + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + if (clp) + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + else + memset(__entry->addr, 0, sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation) +); + TRACE_EVENT(nfsd_cb_notify_lock, TP_PROTO( const struct nfs4_lockowner *lo,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4ade892ae1c35527584decb7fa026553d53cd03f ]
Record a tracepoint event when the server performs a callback probe. This event can be enabled as a group with other nfsd_cb tracepoints.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index fe1f36b70fa03..453f60b127ebb 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1000,6 +1000,7 @@ static const struct rpc_call_ops nfsd4_cb_probe_ops = { */ void nfsd4_probe_callback(struct nfs4_client *clp) { + trace_nfsd_cb_probe(clp); nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); nfsd4_run_cb(&clp->cl_cb_null); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 9061fd28c996d..4361a0807f070 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -910,6 +910,7 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ TP_ARGS(clp))
DEFINE_NFSD_CB_EVENT(state); +DEFINE_NFSD_CB_EVENT(probe); DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1d2bf65983a137121c165a7e69b2885572954915 ]
Clean up: These are noise in properly working systems. If you really need to observe the operation of the callback mechanism, use the sunrpc:rpc* tracepoints along with the workqueue tracepoints.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 5 ----- fs/nfsd/trace.h | 48 ------------------------------------------ 2 files changed, 53 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 453f60b127ebb..59dc80ecd3764 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -972,7 +972,6 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) { struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null);
- trace_nfsd_cb_done(clp, task->tk_status); if (task->tk_status) nfsd4_mark_cb_down(clp, task->tk_status); else @@ -1174,8 +1173,6 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata) struct nfsd4_callback *cb = calldata; struct nfs4_client *clp = cb->cb_clp;
- trace_nfsd_cb_done(clp, task->tk_status); - if (!nfsd4_cb_sequence_done(task, cb)) return;
@@ -1328,8 +1325,6 @@ nfsd4_run_cb_work(struct work_struct *work) struct rpc_clnt *clnt; int flags;
- trace_nfsd_cb_work(clp, cb->cb_msg.rpc_proc->p_name); - if (cb->cb_need_restart) { cb->cb_need_restart = false; } else { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 4361a0807f070..87ac1f19bfd0b 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -979,54 +979,6 @@ TRACE_EVENT(nfsd_cb_setup_err, __entry->addr, __entry->cl_boot, __entry->cl_id, __entry->error) );
-TRACE_EVENT(nfsd_cb_work, - TP_PROTO( - const struct nfs4_client *clp, - const char *procedure - ), - TP_ARGS(clp, procedure), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __string(procedure, procedure) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __assign_str(procedure, procedure) - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x procedure=%s", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __get_str(procedure)) -); - -TRACE_EVENT(nfsd_cb_done, - TP_PROTO( - const struct nfs4_client *clp, - int status - ), - TP_ARGS(clp, status), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(int, status) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __entry->status = status; - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x status=%d", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->status) -); - TRACE_EVENT(nfsd_cb_recall, TP_PROTO( const struct nfs4_stid *stid
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d6cbe98ff32aef795462a309ef048cfb89d1a11d ]
Clean-up: Re-order the display of IP address and client ID to be consistent with other _cb_ tracepoints.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 87ac1f19bfd0b..68a0fecdd5f46 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -857,9 +857,9 @@ TRACE_EVENT(nfsd_cb_args, memcpy(__entry->addr, &conn->cb_addr, sizeof(struct sockaddr_in6)); ), - TP_printk("client %08x:%08x callback addr=%pISpc prog=%u ident=%u", - __entry->cl_boot, __entry->cl_id, - __entry->addr, __entry->prog, __entry->ident) + TP_printk("addr=%pISpc client %08x:%08x prog=%u ident=%u", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->prog, __entry->ident) );
TRACE_EVENT(nfsd_cb_nodelegs,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Hsiang Huang nickhuang@synology.com
[ Upstream commit e5d74a2d0ee67ae00edad43c3d7811016e4d2e21 ]
Truncation of an unlinked inode may take a long time for I/O waiting, and it doesn't have to prevent access to the directory. Thus, let truncation occur outside the directory's mutex, just like do_unlinkat() does.
Signed-off-by: Yu Hsiang Huang nickhuang@synology.com Signed-off-by: Bing Jing Chang bingjingc@synology.com Signed-off-by: Robbie Ko robbieko@synology.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 520e55c35e742..2eb3bfbc8a35f 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1870,6 +1870,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, { struct dentry *dentry, *rdentry; struct inode *dirp; + struct inode *rinode; __be32 err; int host_err;
@@ -1898,6 +1899,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, host_err = -ENOENT; goto out_drop_write; } + rinode = d_inode(rdentry); + ihold(rinode);
if (!type) type = d_inode(rdentry)->i_mode & S_IFMT; @@ -1913,6 +1916,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); + fh_unlock(fhp); + iput(rinode); /* truncate the inode here */
out_drop_write: fh_drop_write(fhp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit eeeadbb9bd5652c47bb9b31aa9ad8b4f1b4aa8b3 ]
The commit may be time-consuming and there's no need to hold the lock for it.
More of these are possible, these were just some easy ones.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2eb3bfbc8a35f..74b2c6c5ad0b9 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1626,9 +1626,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); + fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_unlock(fhp);
fh_drop_write(fhp);
@@ -1693,6 +1693,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, if (d_really_is_negative(dold)) goto out_dput; host_err = vfs_link(dold, dirp, dnew, NULL); + fh_unlock(ffhp); if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -1913,10 +1914,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, host_err = vfs_rmdir(dirp, rdentry); }
+ fh_unlock(fhp); if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); - fh_unlock(fhp); iput(rinode); /* truncate the inode here */
out_drop_write:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Kornievskaia kolga@netapp.com
[ Upstream commit eac0b17a77fbd763d305a5eaa4fd1119e5a0fe0d ]
Currently, the server does all copies as NFS_UNSTABLE. For synchronous copies linux client will append a COMMIT to the COPY compound but for async copies it does not (because COMMIT needs to be done after all bytes are copied and not as a reply to the COPY operation).
However, in order to save the client doing a COMMIT as a separate rpc, the server can reply back with NFS_FILE_SYNC copy. This patch proposed to add vfs_fsync() call at the end of the async copy.
Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 15 ++++++++++++++- fs/nfsd/xdr4.h | 1 + 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index dfce9c432a5ee..aa0da0737a3ff 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1366,7 +1366,8 @@ static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = {
static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) { - copy->cp_res.wr_stable_how = NFS_UNSTABLE; + copy->cp_res.wr_stable_how = + copy->committed ? NFS_FILE_SYNC : NFS_UNSTABLE; copy->cp_synchronous = sync; gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); } @@ -1375,10 +1376,12 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) { struct file *dst = copy->nf_dst->nf_file; struct file *src = copy->nf_src->nf_file; + errseq_t since; ssize_t bytes_copied = 0; u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; + __be32 status;
/* See RFC 7862 p.67: */ if (bytes_total == 0) @@ -1395,6 +1398,16 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) src_pos += bytes_copied; dst_pos += bytes_copied; } while (bytes_total > 0 && !copy->cp_synchronous); + /* for a non-zero asynchronous copy do a commit of data */ + if (!copy->cp_synchronous && copy->cp_res.wr_bytes_written > 0) { + since = READ_ONCE(dst->f_wb_err); + status = vfs_fsync_range(dst, copy->cp_dst_pos, + copy->cp_res.wr_bytes_written, 0); + if (!status) + status = filemap_check_wb_err(dst->f_mapping, since); + if (!status) + copy->committed = true; + } return bytes_copied; }
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index fe540a3415c6a..37af86e370cb3 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -567,6 +567,7 @@ struct nfsd4_copy { struct vfsmount *ss_mnt; struct nfs_fh c_fh; nfs4_stateid stateid; + bool committed; };
struct nfsd4_seek {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit f4e44b393389c77958f7c58bf4415032b4cda15b ]
Currently the source's export is mounted and unmounted on every inter-server copy operation. This patch is an enhancement to delay the unmount of the source export for a certain period of time to eliminate the mount and unmount overhead on subsequent copy operations.
After a copy operation completes, a work entry is added to the delayed unmount list with an expiration time. This list is serviced by the laundromat thread to unmount the export of the expired entries. Each time the export is being used again, its expiration time is extended and the entry is re-inserted to the tail of the list.
The unmount task and the mount operation of the copy request are synced to make sure the export is not unmounted while it's being used.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 6 ++ fs/nfsd/nfs4proc.c | 135 ++++++++++++++++++++++++++++++++++++++-- fs/nfsd/nfs4state.c | 71 +++++++++++++++++++++ fs/nfsd/nfsd.h | 4 ++ fs/nfsd/nfssvc.c | 3 + include/linux/nfs_ssc.h | 14 +++++ 6 files changed, 229 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index a75abeb1e6988..935c1028c2175 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -176,6 +176,12 @@ struct nfsd_net { unsigned int longest_chain_cachesize;
struct shrinker nfsd_reply_cache_shrinker; + + /* tracking server-to-server copy mounts */ + spinlock_t nfsd_ssc_lock; + struct list_head nfsd_ssc_mount_list; + wait_queue_head_t nfsd_ssc_waitq; + /* utsname taken from the process that starts the server */ char nfsd_name[UNX_MAXNODENAME+1]; }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index aa0da0737a3ff..573c550e7aceb 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -55,6 +55,13 @@ module_param(inter_copy_offload_enable, bool, 0644); MODULE_PARM_DESC(inter_copy_offload_enable, "Enable inter server to server copy offload. Default: false");
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC +static int nfsd4_ssc_umount_timeout = 900000; /* default to 15 mins */ +module_param(nfsd4_ssc_umount_timeout, int, 0644); +MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, + "idle msecs before unmount export from source server"); +#endif + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include <linux/security.h>
@@ -1168,6 +1175,81 @@ extern void nfs_sb_deactive(struct super_block *sb);
#define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys"
+/* + * setup a work entry in the ssc delayed unmount list. + */ +static int nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, + struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) +{ + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *work = NULL; + struct nfsd4_ssc_umount_item *tmp; + DEFINE_WAIT(wait); + + *ss_mnt = NULL; + *retwork = NULL; + work = kzalloc(sizeof(*work), GFP_KERNEL); +try_again: + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (strncmp(ni->nsui_ipaddr, ipaddr, sizeof(ni->nsui_ipaddr))) + continue; + /* found a match */ + if (ni->nsui_busy) { + /* wait - and try again */ + prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, + TASK_INTERRUPTIBLE); + spin_unlock(&nn->nfsd_ssc_lock); + + /* allow 20secs for mount/unmount for now - revisit */ + if (signal_pending(current) || + (schedule_timeout(20*HZ) == 0)) { + kfree(work); + return nfserr_eagain; + } + finish_wait(&nn->nfsd_ssc_waitq, &wait); + goto try_again; + } + *ss_mnt = ni->nsui_vfsmount; + refcount_inc(&ni->nsui_refcnt); + spin_unlock(&nn->nfsd_ssc_lock); + kfree(work); + + /* return vfsmount in ss_mnt */ + return 0; + } + if (work) { + strncpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr)); + refcount_set(&work->nsui_refcnt, 2); + work->nsui_busy = true; + list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); + *retwork = work; + } + spin_unlock(&nn->nfsd_ssc_lock); + return 0; +} + +static void nfsd4_ssc_update_dul_work(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *work, struct vfsmount *ss_mnt) +{ + /* set nsui_vfsmount, clear busy flag and wakeup waiters */ + spin_lock(&nn->nfsd_ssc_lock); + work->nsui_vfsmount = ss_mnt; + work->nsui_busy = false; + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); +} + +static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *work) +{ + spin_lock(&nn->nfsd_ssc_lock); + list_del(&work->nsui_list); + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); + kfree(work); +} + /* * Support one copy source server for now. */ @@ -1184,6 +1266,8 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, char *ipaddr, *dev_name, *raw_data; int len, raw_len; __be32 status = nfserr_inval; + struct nfsd4_ssc_umount_item *work = NULL; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
naddr = &nss->u.nl4_addr; tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr, @@ -1232,12 +1316,23 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, goto out_free_rawdata; snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep);
+ status = nfsd4_ssc_setup_dul(nn, ipaddr, &work, &ss_mnt); + if (status) + goto out_free_devname; + if (ss_mnt) + goto out_done; + /* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */ ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data); module_put(type->owner); - if (IS_ERR(ss_mnt)) + if (IS_ERR(ss_mnt)) { + if (work) + nfsd4_ssc_cancel_dul_work(nn, work); goto out_free_devname; - + } + if (work) + nfsd4_ssc_update_dul_work(nn, work, ss_mnt); +out_done: status = 0; *mount = ss_mnt;
@@ -1297,10 +1392,42 @@ static void nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd_file *dst) { + bool found = false; + long timeout; + struct nfsd4_ssc_umount_item *tmp; + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); + nfs42_ssc_close(src->nf_file); - fput(src->nf_file); nfsd_file_put(dst); - mntput(ss_mnt); + fput(src->nf_file); + + if (!nn) { + mntput(ss_mnt); + return; + } + spin_lock(&nn->nfsd_ssc_lock); + timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (ni->nsui_vfsmount->mnt_sb == ss_mnt->mnt_sb) { + list_del(&ni->nsui_list); + /* + * vfsmount can be shared by multiple exports, + * decrement refcnt. If the count drops to 1 it + * will be unmounted when nsui_expire expires. + */ + refcount_dec(&ni->nsui_refcnt); + ni->nsui_expire = jiffies + timeout; + list_add_tail(&ni->nsui_list, &nn->nfsd_ssc_mount_list); + found = true; + break; + } + } + spin_unlock(&nn->nfsd_ssc_lock); + if (!found) { + mntput(ss_mnt); + return; + } }
#else /* CONFIG_NFSD_V4_2_INTER_SSC */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 09967037eb1a3..3dd6e25d5d90f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -44,6 +44,7 @@ #include <linux/jhash.h> #include <linux/string_helpers.h> #include <linux/fsnotify.h> +#include <linux/nfs_ssc.h> #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -5522,6 +5523,69 @@ static bool state_expired(struct laundry_time *lt, time64_t last_refresh) return false; }
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC +void nfsd4_ssc_init_umount_work(struct nfsd_net *nn) +{ + spin_lock_init(&nn->nfsd_ssc_lock); + INIT_LIST_HEAD(&nn->nfsd_ssc_mount_list); + init_waitqueue_head(&nn->nfsd_ssc_waitq); +} +EXPORT_SYMBOL_GPL(nfsd4_ssc_init_umount_work); + +/* + * This is called when nfsd is being shutdown, after all inter_ssc + * cleanup were done, to destroy the ssc delayed unmount list. + */ +static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) +{ + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *tmp; + + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + list_del(&ni->nsui_list); + spin_unlock(&nn->nfsd_ssc_lock); + mntput(ni->nsui_vfsmount); + kfree(ni); + spin_lock(&nn->nfsd_ssc_lock); + } + spin_unlock(&nn->nfsd_ssc_lock); +} + +static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) +{ + bool do_wakeup = false; + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *tmp; + + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (time_after(jiffies, ni->nsui_expire)) { + if (refcount_read(&ni->nsui_refcnt) > 1) + continue; + + /* mark being unmount */ + ni->nsui_busy = true; + spin_unlock(&nn->nfsd_ssc_lock); + mntput(ni->nsui_vfsmount); + spin_lock(&nn->nfsd_ssc_lock); + + /* waiters need to start from begin of list */ + list_del(&ni->nsui_list); + kfree(ni); + + /* wakeup ssc_connect waiters */ + do_wakeup = true; + continue; + } + break; + } + if (do_wakeup) + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); +} +#endif + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5631,6 +5695,10 @@ nfs4_laundromat(struct nfsd_net *nn) list_del_init(&nbl->nbl_lru); free_blocked_lock(nbl); } +#ifdef CONFIG_NFSD_V4_2_INTER_SSC + /* service the server-to-server copy delayed unmount list */ + nfsd4_ssc_expire_umount(nn); +#endif out: return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); } @@ -7546,6 +7614,9 @@ nfs4_state_shutdown_net(struct net *net)
nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); +#ifdef CONFIG_NFSD_V4_2_INTER_SSC + nfsd4_ssc_shutdown_umount(nn); +#endif }
void diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 14dbfa75059d5..9664303afdaf3 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -484,6 +484,10 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) extern int nfsd4_is_junction(struct dentry *dentry); extern int register_cld_notifier(void); extern void unregister_cld_notifier(void); +#ifdef CONFIG_NFSD_V4_2_INTER_SSC +extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); +#endif + #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) { diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 731d89898903a..373695cc62a7a 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -403,6 +403,9 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred) if (ret) goto out_filecache;
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC + nfsd4_ssc_init_umount_work(nn); +#endif nn->nfsd_net_up = true; return 0;
diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index f5ba0fbff72fe..222ae8883e854 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -8,6 +8,7 @@ */
#include <linux/nfs_fs.h> +#include <linux/sunrpc/svc.h>
extern struct nfs_ssc_client_ops_tbl nfs_ssc_client_tbl;
@@ -52,6 +53,19 @@ static inline void nfs42_ssc_close(struct file *filep) if (nfs_ssc_client_tbl.ssc_nfs4_ops) (*nfs_ssc_client_tbl.ssc_nfs4_ops->sco_close)(filep); } + +struct nfsd4_ssc_umount_item { + struct list_head nsui_list; + bool nsui_busy; + /* + * nsui_refcnt inited to 2, 1 on list and 1 for consumer. Entry + * is removed when refcnt drops to 1 and nsui_expire expires. + */ + refcount_t nsui_refcnt; + unsigned long nsui_expire; + struct vfsmount *nsui_vfsmount; + char nsui_ipaddr[RPC_MAX_ADDRBUFLEN]; +}; #endif
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 934bd07fae7e55232845f909f78873ab8678ca74 ]
This was causing a "sleeping function called from invalid context" warning.
I don't think we need the set_and_test_bit() here; clients move from unconfirmed to confirmed only once, under the client_lock.
The (conf == unconf) is a way to check whether we're in that confirming case, hopefully that's not too obscure.
Fixes: 472d155a0631 "nfsd: report client confirmation status in "info" file" Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3dd6e25d5d90f..4e14a9f6dfd39 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2846,11 +2846,8 @@ move_to_confirmed(struct nfs4_client *clp) list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) { - trace_nfsd_clid_confirmed(&clp->cl_clientid); - if (clp->cl_nfsd_dentry && clp->cl_nfsd_info_dentry) - fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); - } + set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags); + trace_nfsd_clid_confirmed(&clp->cl_clientid); renew_client_locked(clp); }
@@ -3509,6 +3506,8 @@ nfsd4_create_session(struct svc_rqst *rqstp, /* cache solo and embedded create sessions under the client_lock */ nfsd4_cache_create_session(cr_ses, cs_slot, status); spin_unlock(&nn->client_lock); + if (conf == unconf) + fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); /* init connection and backchannel */ nfsd4_init_conn(rqstp, conn, new); nfsd4_put_session(new); @@ -4129,6 +4128,8 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, } get_client_locked(conf); spin_unlock(&nn->client_lock); + if (conf == unconf) + fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); nfsd4_probe_callback(conf); spin_lock(&nn->client_lock); put_client_renew_locked(conf);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Wysochanski dwysocha@redhat.com
[ Upstream commit 3518c8666f15cdd5d38878005dab1d589add1c19 ]
In addition to the client's address, display the callback channel state and address in the 'info' file.
Signed-off-by: Dave Wysochanski dwysocha@redhat.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4e14a9f6dfd39..a20cdb1910048 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2376,6 +2376,21 @@ static void seq_quote_mem(struct seq_file *m, char *data, int len) seq_printf(m, """); }
+static const char *cb_state2str(int state) +{ + switch (state) { + case NFSD4_CB_UP: + return "UP"; + case NFSD4_CB_UNKNOWN: + return "UNKNOWN"; + case NFSD4_CB_DOWN: + return "DOWN"; + case NFSD4_CB_FAULT: + return "FAULT"; + } + return "UNDEFINED"; +} + static int client_info_show(struct seq_file *m, void *v) { struct inode *inode = m->private; @@ -2404,6 +2419,8 @@ static int client_info_show(struct seq_file *m, void *v) seq_printf(m, "\nImplementation time: [%lld, %ld]\n", clp->cl_nii_time.tv_sec, clp->cl_nii_time.tv_nsec); } + seq_printf(m, "callback state: %s\n", cb_state2str(clp->cl_cb_state)); + seq_printf(m, "callback address: %pISpc\n", &clp->cl_cb_conn.cb_addr); drop_client(clp);
return 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit f47dc2d3013c65631bf8903becc7d88dc9d9966e ]
Fix by initializing pointer nfsd4_ssc_umount_item with NULL instead of 0. Replace return value of nfsd4_ssc_setup_dul with __be32 instead of int.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4state.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 573c550e7aceb..598b54893f837 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1178,7 +1178,7 @@ extern void nfs_sb_deactive(struct super_block *sb); /* * setup a work entry in the ssc delayed unmount list. */ -static int nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, +static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) { struct nfsd4_ssc_umount_item *ni = 0; @@ -1395,7 +1395,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, bool found = false; long timeout; struct nfsd4_ssc_umount_item *tmp; - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id);
nfs42_ssc_close(src->nf_file); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a20cdb1910048..401f0f2743717 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5556,7 +5556,7 @@ EXPORT_SYMBOL_GPL(nfsd4_ssc_init_umount_work); */ static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) { - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *tmp;
spin_lock(&nn->nfsd_ssc_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit 54185267e1fe476875e649bb18e1c4254c123305 ]
'status' has been overwritten to 0 after nfsd4_ssc_setup_dul(), this cause 0 will be return in vfs_kern_mount() error case. Fix to return nfserr_nodev in this error.
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 598b54893f837..f7ddfa204abc4 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1326,6 +1326,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data); module_put(type->owner); if (IS_ERR(ss_mnt)) { + status = nfserr_nodev; if (work) nfsd4_ssc_cancel_dul_work(nn, work); goto out_free_devname;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 05570a2b01117209b500e1989ce8f1b0524c489f ]
I'm not even sure cl_xprt can change here, but we're getting "suspicious RCU usage" warnings, and other rpc_peeraddr2str callers are taking the rcu lock.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 59dc80ecd3764..97f517e9b4189 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -941,8 +941,10 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c clp->cl_cb_conn.cb_xprt = conn->cb_xprt; clp->cl_cb_client = client; clp->cl_cb_cred = cred; + rcu_read_lock(); trace_nfsd_cb_setup(clp, rpc_peeraddr2str(client, RPC_DISPLAY_NETID), args.authflavor); + rcu_read_unlock(); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 99cdf57b33e68df7afc876739c93a11f0b1ba807 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/lockd/xdr.h | 6 ------ include/linux/lockd/xdr4.h | 7 +------ 2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 7ab9f264313f0..a98309c0121cb 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -109,11 +109,5 @@ int nlmsvc_decode_shareargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); int nlmsvc_decode_notify(struct svc_rqst *, __be32 *); int nlmsvc_decode_reboot(struct svc_rqst *, __be32 *); -/* -int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); - */
#endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index e709fe5924f2b..5ae766f26e04f 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -37,12 +37,7 @@ int nlm4svc_decode_shareargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); int nlm4svc_decode_notify(struct svc_rqst *, __be32 *); int nlm4svc_decode_reboot(struct svc_rqst *, __be32 *); -/* -int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); - */ + extern const struct rpc_version nlm_version4;
#endif /* LOCKD_XDR4_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a9ad1a8090f58b2ed1774dd0f4c7cdb8210a3793 ]
To enable xdr_stream-based encoding and decoding, create a bespoke RPC dispatch function for the lockd service.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1a639e34847dd..2de048f80eb8c 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -766,6 +766,46 @@ static void __exit exit_nlm(void) module_init(init_nlm); module_exit(exit_nlm);
+/** + * nlmsvc_dispatch - Process an NLM Request + * @rqstp: incoming request + * @statp: pointer to location of accept_stat field in RPC Reply buffer + * + * Return values: + * %0: Processing complete; do not send a Reply + * %1: Processing complete; send Reply in rqstp->rq_res + */ +static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) +{ + const struct svc_procedure *procp = rqstp->rq_procinfo; + struct kvec *argv = rqstp->rq_arg.head; + struct kvec *resv = rqstp->rq_res.head; + + svcxdr_init_decode(rqstp); + if (!procp->pc_decode(rqstp, argv->iov_base)) + goto out_decode_err; + + *statp = procp->pc_func(rqstp); + if (*statp == rpc_drop_reply) + return 0; + if (*statp != rpc_success) + return 1; + + svcxdr_init_encode(rqstp); + if (!procp->pc_encode(rqstp, resv->iov_base + resv->iov_len)) + goto out_encode_err; + + return 1; + +out_decode_err: + *statp = rpc_garbage_args; + return 1; + +out_encode_err: + *statp = rpc_system_err; + return 1; +} + /* * Define NLM program and procedures */ @@ -775,6 +815,7 @@ static const struct svc_version nlmsvc_version1 = { .vs_nproc = 17, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version1_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; static unsigned int nlmsvc_version3_count[24]; @@ -783,6 +824,7 @@ static const struct svc_version nlmsvc_version3 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version3_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #ifdef CONFIG_LOCKD_V4 @@ -792,6 +834,7 @@ static const struct svc_version nlmsvc_version4 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures4, .vs_count = nlmsvc_version4_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #endif
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a6a63ca5652ea05637ecfe349f9e895031529556 ]
Add a .h file containing xdr_stream-based XDR helpers common to both NLMv3 and NLMv4.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcxdr.h | 151 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 fs/lockd/svcxdr.h
diff --git a/fs/lockd/svcxdr.h b/fs/lockd/svcxdr.h new file mode 100644 index 0000000000000..c69a0bb76c940 --- /dev/null +++ b/fs/lockd/svcxdr.h @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Encode/decode NLM basic data types + * + * Basic NLMv3 XDR data types are not defined in an IETF standards + * document. X/Open has a description of these data types that + * is useful. See Chapter 10 of "Protocols for Interworking: + * XNFS, Version 3W". + * + * Basic NLMv4 XDR data types are defined in Appendix II.1.4 of + * RFC 1813: "NFS Version 3 Protocol Specification". + * + * Author: Chuck Lever chuck.lever@oracle.com + * + * Copyright (c) 2020, Oracle and/or its affiliates. + */ + +#ifndef _LOCKD_SVCXDR_H_ +#define _LOCKD_SVCXDR_H_ + +static inline bool +svcxdr_decode_stats(struct xdr_stream *xdr, __be32 *status) +{ + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT); + if (!p) + return false; + *status = *p; + + return true; +} + +static inline bool +svcxdr_encode_stats(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT); + if (!p) + return false; + *p = status; + + return true; +} + +static inline bool +svcxdr_decode_string(struct xdr_stream *xdr, char **data, unsigned int *data_len) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NLM_MAXSTRLEN) + return false; + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + *data_len = len; + *data = (char *)p; + + return true; +} + +/* + * NLM cookies are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * limits their length to 32 bytes, and treats zero-length cookies + * specially. + */ +static inline bool +svcxdr_decode_cookie(struct xdr_stream *xdr, struct nlm_cookie *cookie) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NLM_MAXCOOKIELEN) + return false; + if (!len) + goto out_hpux; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + cookie->len = len; + memcpy(cookie->data, p, len); + + return true; + + /* apparently HPUX can return empty cookies */ +out_hpux: + cookie->len = 4; + memset(cookie->data, 0, 4); + return true; +} + +static inline bool +svcxdr_encode_cookie(struct xdr_stream *xdr, const struct nlm_cookie *cookie) +{ + __be32 *p; + + if (xdr_stream_encode_u32(xdr, cookie->len) < 0) + return false; + p = xdr_reserve_space(xdr, cookie->len); + if (!p) + return false; + memcpy(p, cookie->data, cookie->len); + + return true; +} + +static inline bool +svcxdr_decode_owner(struct xdr_stream *xdr, struct xdr_netobj *obj) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > XDR_MAX_NETOBJ) + return false; + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + obj->len = len; + obj->data = (u8 *)p; + + return true; +} + +static inline bool +svcxdr_encode_owner(struct xdr_stream *xdr, const struct xdr_netobj *obj) +{ + unsigned int quadlen = XDR_QUADLEN(obj->len); + __be32 *p; + + if (xdr_stream_encode_u32(xdr, obj->len) < 0) + return false; + p = xdr_reserve_space(xdr, obj->len); + if (!p) + return false; + p[quadlen - 1] = 0; /* XDR pad */ + memcpy(p, obj->data, obj->len); + + return true; +} + +#endif /* _LOCKD_SVCXDR_H_ */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cc1029b51273da5b342683e9ae14ab4eeaa15997 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 982629f7b120a..8be42a23679e9 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -19,6 +19,8 @@
#include <uapi/linux/nfs2.h>
+#include "svcxdr.h" + #define NLMDBG_FACILITY NLMDBG_XDR
@@ -178,8 +180,15 @@ nlm_encode_testres(__be32 *p, struct nlm_res *resp)
/* - * First, the server side XDR functions + * Decode Call arguments */ + +int +nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { @@ -339,12 +348,6 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2fd0c67aabcf0f8821450b00ee511faa0b7761bf ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 72 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 8be42a23679e9..56982edd47667 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -98,6 +98,33 @@ nlm_decode_fh(__be32 *p, struct nfs_fh *f) return p + XDR_QUADLEN(NFS2_FHSIZE); }
+/* + * NLM file handles are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * constrains their length to exactly the length of an NFSv2 file + * handle. + */ +static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len != NFS2_FHSIZE) + return false; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + fh->size = NFS2_FHSIZE; + memcpy(fh->data, p, len); + memset(fh->data + NFS2_FHSIZE, 0, sizeof(fh->data) - NFS2_FHSIZE); + + return true; +} + /* * Encode and decode owner handle */ @@ -143,6 +170,38 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock) return p; }
+static bool +svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +{ + struct file_lock *fl = &lock->fl; + s32 start, len, end; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return false; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return false; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return false; + if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) + return false; + if (xdr_stream_decode_u32(xdr, &start) < 0) + return false; + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + + locks_init_lock(fl); + fl->fl_flags = FL_POSIX; + fl->fl_type = F_RDLCK; + end = start + len - 1; + fl->fl_start = s32_to_loff_t(start); + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = s32_to_loff_t(end); + + return true; +} + /* * Encode result of a TEST/TEST_MSG call */ @@ -192,19 +251,20 @@ nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive;
- if (!(p = nlm_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK;
- return xdr_argsize_check(rqstp, p); + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c1adb8c672ca2b085c400695ef064547d77eda29 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 56982edd47667..8a9f02e45df2d 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -267,35 +267,40 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); -} - int nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive;
- if (!(p = nlm_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - argp->reclaim = ntohl(*p++); - argp->state = ntohl(*p++); + if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; argp->monitor = 1; /* monitor client by default */
- return xdr_argsize_check(rqstp, p); + return 1; +} + +int +nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f4e08f3ac8c4945ea54a740e3afcf44b34e7cf44 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 8a9f02e45df2d..ef38f07d1224c 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -294,30 +294,34 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive;
- if (!(p = nlm_encode_testres(p, resp))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - return xdr_ressize_check(rqstp, p); + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + + return 1; }
int -nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) + if (!(p = nlm_encode_testres(p, resp))) return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c27045d302b022ed11d24a2653bceb6af56c6327 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 53 +++++++++++++------------------------------------- 1 file changed, 13 insertions(+), 40 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index ef38f07d1224c..b59e02b4417c8 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -140,36 +140,6 @@ nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) return xdr_encode_netobj(p, oh); }
-static __be32 * -nlm_decode_lock(__be32 *p, struct nlm_lock *lock) -{ - struct file_lock *fl = &lock->fl; - s32 start, len, end; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, - NLM_MAXSTRLEN)) - || !(p = nlm_decode_fh(p, &lock->fh)) - || !(p = nlm_decode_oh(p, &lock->oh))) - return NULL; - lock->svid = ntohl(*p++); - - locks_init_lock(fl); - fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; /* as good as anything else */ - start = ntohl(*p++); - len = ntohl(*p++); - end = start + len - 1; - - fl->fl_start = s32_to_loff_t(start); - - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s32_to_loff_t(end); - return p; -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -315,25 +285,28 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp;
- if (!(p = nlm_encode_testres(p, resp))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - return xdr_ressize_check(rqstp, p); + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + argp->lock.fl.fl_type = F_UNLCK; + + return 1; }
int -nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_args *argp = rqstp->rq_argp; + struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm_decode_cookie(p, &argp->cookie)) - || !(p = nlm_decode_lock(p, &argp->lock))) + if (!(p = nlm_encode_testres(p, resp))) return 0; - argp->lock.fl.fl_type = F_UNLCK; - return xdr_argsize_check(rqstp, p); + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 16ddcabe6240c4fb01c97f6fce6c35ddf8626ad5 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index b59e02b4417c8..911b6377a6da4 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -299,6 +299,20 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_res *resp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &resp->cookie)) + return 0; + if (!svcxdr_decode_stats(xdr, &resp->status)) + return 0; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -379,17 +393,6 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!(p = nlm_decode_cookie(p, &resp->cookie))) - return 0; - resp->status = *p++; - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 137e05e2f735f696e117553f7fa5ef8fb09953e1 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 911b6377a6da4..421613170e5f9 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -313,6 +313,32 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_reboot *argp = rqstp->rq_argp; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return 0; + if (len > SM_MAXSTRLEN) + return 0; + p = xdr_inline_decode(xdr, len); + if (!p) + return 0; + argp->len = len; + argp->mon = (char *)p; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + p = xdr_inline_decode(xdr, SM_PRIV_SIZE); + if (!p) + return 0; + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -380,19 +406,6 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - - if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - p += XDR_QUADLEN(SM_PRIV_SIZE); - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 890939e1266b9adf3b0acd5e0385b39813cb8f11 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 96 ++++++++++++++------------------------------------ 1 file changed, 26 insertions(+), 70 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 421613170e5f9..c496f18eff06e 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -21,8 +21,6 @@
#include "svcxdr.h"
-#define NLMDBG_FACILITY NLMDBG_XDR -
static inline loff_t s32_to_loff_t(__s32 offset) @@ -46,33 +44,6 @@ loff_t_to_s32(loff_t offset) /* * XDR functions for basic NLM types */ -static __be32 *nlm_decode_cookie(__be32 *p, struct nlm_cookie *c) -{ - unsigned int len; - - len = ntohl(*p++); - - if(len==0) - { - c->len=4; - memset(c->data, 0, 4); /* hockeypux brain damage */ - } - else if(len<=NLM_MAXCOOKIELEN) - { - c->len=len; - memcpy(c->data, p, len); - p+=XDR_QUADLEN(len); - } - else - { - dprintk("lockd: bad cookie size %d (only cookies under " - "%d bytes are supported.)\n", - len, NLM_MAXCOOKIELEN); - return NULL; - } - return p; -} - static inline __be32 * nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) { @@ -82,22 +53,6 @@ nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) return p; }
-static __be32 * -nlm_decode_fh(__be32 *p, struct nfs_fh *f) -{ - unsigned int len; - - if ((len = ntohl(*p++)) != NFS2_FHSIZE) { - dprintk("lockd: bad fhandle size %d (should be %d)\n", - len, NFS2_FHSIZE); - return NULL; - } - f->size = NFS2_FHSIZE; - memset(f->data, 0, sizeof(f->data)); - memcpy(f->data, p, NFS2_FHSIZE); - return p + XDR_QUADLEN(NFS2_FHSIZE); -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -128,12 +83,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) /* * Encode and decode owner handle */ -static inline __be32 * -nlm_decode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_decode_netobj(p, oh); -} - static inline __be32 * nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) { @@ -339,35 +288,42 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return 1; }
-int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); -} - int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); - lock->svid = ~(u32) 0; + lock->svid = ~(u32)0;
- if (!(p = nlm_decode_cookie(p, &argp->cookie)) - || !(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm_decode_fh(p, &lock->fh)) - || !(p = nlm_decode_oh(p, &lock->oh))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - argp->fsm_mode = ntohl(*p++); - argp->fsm_access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return 0; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return 0; + /* XXX: Range checks are missing in the original code */ + if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) + return 0; + + return 1; +} + +int +nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 14e105256b9dcdf50a003e2e9a0da77e06770a4b ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index c496f18eff06e..091c8c463ab40 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -316,6 +316,21 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -349,19 +364,6 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); }
-int -nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e26ec898b68b2ab64f379ba0fc0a615b2ad41f40 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 091c8c463ab40..840fa8ff84269 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -331,6 +331,17 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return 1; }
+ +/* + * Encode Reply results + */ + +int +nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -363,9 +374,3 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) *p++ = resp->status; return xdr_ressize_check(rqstp, p); } - -int -nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -}
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit adf98a4850b9ede9fc174c78a885845fb08499a5 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 74 ++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 39 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 840fa8ff84269..daf3524040d66 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -80,15 +80,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) return true; }
-/* - * Encode and decode owner handle - */ -static inline __be32 * -nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_encode_netobj(p, oh); -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -121,39 +112,44 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return true; }
-/* - * Encode result of a TEST/TEST_MSG call - */ -static __be32 * -nlm_encode_testres(__be32 *p, struct nlm_res *resp) +static bool +svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) { - s32 start, len; - - if (!(p = nlm_encode_cookie(p, &resp->cookie))) - return NULL; - *p++ = resp->status; + const struct file_lock *fl = &lock->fl; + s32 start, len;
- if (resp->status == nlm_lck_denied) { - struct file_lock *fl = &resp->lock.fl; - - *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; - *p++ = htonl(resp->lock.svid); - - /* Encode owner handle. */ - if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) - return NULL; + /* exclusive */ + if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) + return false; + if (xdr_stream_encode_u32(xdr, lock->svid) < 0) + return false; + if (!svcxdr_encode_owner(xdr, &lock->oh)) + return false; + start = loff_t_to_s32(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); + if (xdr_stream_encode_u32(xdr, start) < 0) + return false; + if (xdr_stream_encode_u32(xdr, len) < 0) + return false;
- start = loff_t_to_s32(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; - else - len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); + return true; +}
- *p++ = htonl(start); - *p++ = htonl(len); +static bool +svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) +{ + if (!svcxdr_encode_stats(xdr, resp->status)) + return false; + switch (resp->status) { + case nlm_lck_denied: + if (!svcxdr_encode_holder(xdr, &resp->lock)) + return false; }
- return p; + return true; }
@@ -345,11 +341,11 @@ nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_testrply(xdr, resp); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e96735a6980574ecbdb24c760b8d294095e47074 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index daf3524040d66..4fb6090bc9158 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -349,24 +349,23 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm_encode_cookie(p, &resp->cookie))) - return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_stats(xdr, resp->status); }
int -nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { struct nlm_res *resp = rqstp->rq_resp;
if (!(p = nlm_encode_cookie(p, &resp->cookie))) return 0; *p++ = resp->status; + *p++ = xdr_zero; /* sequence argument */ return xdr_ressize_check(rqstp, p); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 529ca3a116e8978575fec061a71fa6865a344891 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 4fb6090bc9158..9235e60b17694 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -41,18 +41,6 @@ loff_t_to_s32(loff_t offset) return res; }
-/* - * XDR functions for basic NLM types - */ -static inline __be32 * -nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) -{ - *p++ = htonl(c->len); - memcpy(p, c->data, c->len); - p+=XDR_QUADLEN(c->len); - return p; -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -361,11 +349,16 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm_encode_cookie(p, &resp->cookie))) + if (!svcxdr_encode_cookie(xdr, &resp->cookie)) return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stats(xdr, resp->status)) + return 0; + /* sequence */ + if (xdr_stream_encode_u32(xdr, 0) < 0) + return 0; + + return 1; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7956521aac58e434a05cf3c68c1b66c1312e5649 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 5fa9f48a9dba7..d0960a8551f8b 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -18,6 +18,8 @@ #include <linux/sunrpc/stats.h> #include <linux/lockd/lockd.h>
+#include "svcxdr.h" + #define NLMDBG_FACILITY NLMDBG_XDR
static inline loff_t @@ -175,8 +177,15 @@ nlm4_encode_testres(__be32 *p, struct nlm_res *resp)
/* - * First, the server side XDR functions + * Decode Call arguments */ + +int +nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { @@ -336,12 +345,6 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 345b4159a075b15dc4ae70f1db90fa8abf85d2e7 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 72 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index d0960a8551f8b..cf64794fdc1fa 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -96,6 +96,32 @@ nlm4_decode_fh(__be32 *p, struct nfs_fh *f) return p + XDR_QUADLEN(f->size); }
+/* + * NLM file handles are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * limits their length to the size of an NFSv3 file handle. + */ +static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NFS_MAXFHSIZE) + return false; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + fh->size = len; + memcpy(fh->data, p, len); + memset(fh->data + len, 0, sizeof(fh->data) - len); + + return true; +} + /* * Encode and decode owner handle */ @@ -135,6 +161,39 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock) return p; }
+static bool +svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +{ + struct file_lock *fl = &lock->fl; + u64 len, start; + s64 end; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return false; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return false; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return false; + if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) + return false; + if (xdr_stream_decode_u64(xdr, &start) < 0) + return false; + if (xdr_stream_decode_u64(xdr, &len) < 0) + return false; + + locks_init_lock(fl); + fl->fl_flags = FL_POSIX; + fl->fl_type = F_RDLCK; + end = start + len - 1; + fl->fl_start = s64_to_loff_t(start); + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = s64_to_loff_t(end); + + return true; +} + /* * Encode result of a TEST/TEST_MSG call */ @@ -189,19 +248,20 @@ nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive;
- if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK;
- return xdr_argsize_check(rqstp, p); + return 1; }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0e5977af4fdc277984fca7d8c2e0c880935775a0 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index cf64794fdc1fa..1d3e780c25fd5 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -264,35 +264,40 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm4_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); -} - int nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive;
- if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - argp->reclaim = ntohl(*p++); - argp->state = ntohl(*p++); + if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; argp->monitor = 1; /* monitor client by default */
- return xdr_argsize_check(rqstp, p); + return 1; +} + +int +nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm4_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1e1f38dcf3c031715191e1fd26f70a0affca4dbd ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 1d3e780c25fd5..37d45f1d71999 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -291,30 +291,33 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive;
- if (!(p = nlm4_encode_testres(p, resp))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - return xdr_ressize_check(rqstp, p); + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + return 1; }
int -nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm4_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) + if (!(p = nlm4_encode_testres(p, resp))) return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d76d8c25cea794f65615f3a2324052afa4b5f900 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 53 ++++++++++++------------------------------------- 1 file changed, 13 insertions(+), 40 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 37d45f1d71999..47a87ea4a99b1 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -131,36 +131,6 @@ nlm4_decode_oh(__be32 *p, struct xdr_netobj *oh) return xdr_decode_netobj(p, oh); }
-static __be32 * -nlm4_decode_lock(__be32 *p, struct nlm_lock *lock) -{ - struct file_lock *fl = &lock->fl; - __u64 len, start; - __s64 end; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm4_decode_fh(p, &lock->fh)) - || !(p = nlm4_decode_oh(p, &lock->oh))) - return NULL; - lock->svid = ntohl(*p++); - - locks_init_lock(fl); - fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; /* as good as anything else */ - p = xdr_decode_hyper(p, &start); - p = xdr_decode_hyper(p, &len); - end = start + len - 1; - - fl->fl_start = s64_to_loff_t(start); - - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s64_to_loff_t(end); - return p; -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -311,25 +281,28 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp;
- if (!(p = nlm4_encode_testres(p, resp))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - return xdr_ressize_check(rqstp, p); + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + argp->lock.fl.fl_type = F_UNLCK; + + return 1; }
int -nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_args *argp = rqstp->rq_argp; + struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm4_decode_cookie(p, &argp->cookie)) - || !(p = nlm4_decode_lock(p, &argp->lock))) + if (!(p = nlm4_encode_testres(p, resp))) return 0; - argp->lock.fl.fl_type = F_UNLCK; - return xdr_argsize_check(rqstp, p); + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b4c24b5a41da63e5f3a9b6ea56cbe2a1efe49579 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 47a87ea4a99b1..6bd3bfb69ed7f 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -295,6 +295,20 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_res *resp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &resp->cookie)) + return 0; + if (!svcxdr_decode_stats(xdr, &resp->status)) + return 0; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -375,17 +389,6 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!(p = nlm4_decode_cookie(p, &resp->cookie))) - return 0; - resp->status = *p++; - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bc3665fd718b325cfff3abd383b00d1a87e028dc ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 6bd3bfb69ed7f..2dbf82c2726be 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -309,6 +309,32 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_reboot *argp = rqstp->rq_argp; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return 0; + if (len > SM_MAXSTRLEN) + return 0; + p = xdr_inline_decode(xdr, len); + if (!p) + return 0; + argp->len = len; + argp->mon = (char *)p; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + p = xdr_inline_decode(xdr, SM_PRIV_SIZE); + if (!p) + return 0; + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -376,19 +402,6 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); }
-int -nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - - if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - p += XDR_QUADLEN(SM_PRIV_SIZE); - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7cf96b6d0104b12aa30961901879e428884b1695 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 99 +++++++++++++------------------------------------ 1 file changed, 26 insertions(+), 73 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 2dbf82c2726be..e6bab1d1e41fb 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -42,37 +42,6 @@ loff_t_to_s64(loff_t offset) return res; }
-/* - * XDR functions for basic NLM types - */ -static __be32 * -nlm4_decode_cookie(__be32 *p, struct nlm_cookie *c) -{ - unsigned int len; - - len = ntohl(*p++); - - if(len==0) - { - c->len=4; - memset(c->data, 0, 4); /* hockeypux brain damage */ - } - else if(len<=NLM_MAXCOOKIELEN) - { - c->len=len; - memcpy(c->data, p, len); - p+=XDR_QUADLEN(len); - } - else - { - dprintk("lockd: bad cookie size %d (only cookies under " - "%d bytes are supported.)\n", - len, NLM_MAXCOOKIELEN); - return NULL; - } - return p; -} - static __be32 * nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) { @@ -82,20 +51,6 @@ nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) return p; }
-static __be32 * -nlm4_decode_fh(__be32 *p, struct nfs_fh *f) -{ - memset(f->data, 0, sizeof(f->data)); - f->size = ntohl(*p++); - if (f->size > NFS_MAXFHSIZE) { - dprintk("lockd: bad fhandle size %d (should be <=%d)\n", - f->size, NFS_MAXFHSIZE); - return NULL; - } - memcpy(f->data, p, f->size); - return p + XDR_QUADLEN(f->size); -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -122,15 +77,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) return true; }
-/* - * Encode and decode owner handle - */ -static __be32 * -nlm4_decode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_decode_netobj(p, oh); -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -335,35 +281,42 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return 1; }
-int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm4_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); -} - int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); - lock->svid = ~(u32) 0; + lock->svid = ~(u32)0;
- if (!(p = nlm4_decode_cookie(p, &argp->cookie)) - || !(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm4_decode_fh(p, &lock->fh)) - || !(p = nlm4_decode_oh(p, &lock->oh))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - argp->fsm_mode = ntohl(*p++); - argp->fsm_access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return 0; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return 0; + /* XXX: Range checks are missing in the original code */ + if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) + return 0; + + return 1; +} + +int +nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm4_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3049e974a7c7cfa0c15fb807f4a3e75b2ab8517a ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index e6bab1d1e41fb..6c5383bef2bf7 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -309,6 +309,21 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
+int +nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -342,19 +357,6 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); }
-int -nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ec757e423b4fcd6e5ea4405d1e8243c040458d78 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 6c5383bef2bf7..0db142e203d2b 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -324,6 +324,17 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return 1; }
+ +/* + * Encode Reply results + */ + +int +nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -356,9 +367,3 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) *p++ = resp->status; return xdr_ressize_check(rqstp, p); } - -int -nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -}
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1beef1473ccaa70a2d54f9e76fba5f534931ea23 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 74 ++++++++++++++++++++++++------------------------- 1 file changed, 36 insertions(+), 38 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 0db142e203d2b..9b8a7afb935ca 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -20,8 +20,6 @@
#include "svcxdr.h"
-#define NLMDBG_FACILITY NLMDBG_XDR - static inline loff_t s64_to_loff_t(__s64 offset) { @@ -110,44 +108,44 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return true; }
-/* - * Encode result of a TEST/TEST_MSG call - */ -static __be32 * -nlm4_encode_testres(__be32 *p, struct nlm_res *resp) +static bool +svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) { - s64 start, len; + const struct file_lock *fl = &lock->fl; + s64 start, len;
- dprintk("xdr: before encode_testres (p %p resp %p)\n", p, resp); - if (!(p = nlm4_encode_cookie(p, &resp->cookie))) - return NULL; - *p++ = resp->status; + /* exclusive */ + if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) + return false; + if (xdr_stream_encode_u32(xdr, lock->svid) < 0) + return false; + if (!svcxdr_encode_owner(xdr, &lock->oh)) + return false; + start = loff_t_to_s64(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); + if (xdr_stream_encode_u64(xdr, start) < 0) + return false; + if (xdr_stream_encode_u64(xdr, len) < 0) + return false; + + return true; +}
- if (resp->status == nlm_lck_denied) { - struct file_lock *fl = &resp->lock.fl; - - *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; - *p++ = htonl(resp->lock.svid); - - /* Encode owner handle. */ - if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) - return NULL; - - start = loff_t_to_s64(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; - else - len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); - - p = xdr_encode_hyper(p, start); - p = xdr_encode_hyper(p, len); - dprintk("xdr: encode_testres (status %u pid %d type %d start %Ld end %Ld)\n", - resp->status, (int)resp->lock.svid, fl->fl_type, - (long long)fl->fl_start, (long long)fl->fl_end); +static bool +svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) +{ + if (!svcxdr_encode_stats(xdr, resp->status)) + return false; + switch (resp->status) { + case nlm_lck_denied: + if (!svcxdr_encode_holder(xdr, &resp->lock)) + return false; }
- dprintk("xdr: after encode_testres (p %p resp %p)\n", p, resp); - return p; + return true; }
@@ -338,11 +336,11 @@ nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm4_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_testrply(xdr, resp); }
int
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 447c14d48968d0d4c2733c3f8052cb63aa1deb38 ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 9b8a7afb935ca..efdede71b9511 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -344,24 +344,23 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm4_encode_cookie(p, &resp->cookie))) - return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_stats(xdr, resp->status); }
int -nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { struct nlm_res *resp = rqstp->rq_resp;
if (!(p = nlm4_encode_cookie(p, &resp->cookie))) return 0; *p++ = resp->status; + *p++ = xdr_zero; /* sequence argument */ return xdr_ressize_check(rqstp, p); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0ff5b50ab1f7f39862d0cdf6803978d31b27f25e ]
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr4.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index efdede71b9511..98e957e4566c2 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -40,15 +40,6 @@ loff_t_to_s64(loff_t offset) return res; }
-static __be32 * -nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) -{ - *p++ = htonl(c->len); - memcpy(p, c->data, c->len); - p+=XDR_QUADLEN(c->len); - return p; -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -356,11 +347,16 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
- if (!(p = nlm4_encode_cookie(p, &resp->cookie))) + if (!svcxdr_encode_cookie(xdr, &resp->cookie)) return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stats(xdr, resp->status)) + return 0; + /* sequence */ + if (xdr_stream_encode_u32(xdr, 0) < 0) + return 0; + + return 1; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.king@canonical.com
[ Upstream commit e34c0ce9136a0fe96f0f547898d14c44f3c9f147 ]
The pointer 'this' is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f7ddfa204abc4..1f840c72e9780 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3369,7 +3369,7 @@ bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { struct nfsd4_compoundres *resp = rqstp->rq_resp; struct nfsd4_compoundargs *argp = rqstp->rq_argp; - struct nfsd4_op *this = &argp->ops[resp->opcnt - 1]; + struct nfsd4_op *this; struct nfsd4_compound_state *cstate = &resp->cstate; struct nfs4_op_map *allow = &cstate->clp->cl_spo_must_allow; u32 opiter;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7b08cf62b1239a4322427d677ea9363f0ab677c6 ]
The double copy of the string is a mistake, plus __assign_str() uses strlen(), which is wrong to do on a string that isn't guaranteed to be NUL-terminated.
Fixes: 6019ce0742ca ("NFSD: Add a tracepoint to record directory entry encoding") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 68a0fecdd5f46..de245f433392d 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -408,7 +408,6 @@ TRACE_EVENT(nfsd_dirent, __entry->ino = ino; __entry->len = namlen; memcpy(__get_str(name), name, namlen); - __assign_str(name, name); ), TP_printk("fh_hash=0x%08x ino=%llu name=%.*s", __entry->fh_hash, __entry->ino,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit ab1016d39cc052064e32f25ad18ef8767a0ee3b8 ]
In error cases the dentry may be NULL.
Before 20798dfe249a, the encoder also checked dentry and d_really_is_positive(dentry), but that looks like overkill to me--zero status should be enough to guarantee a positive dentry.
This isn't the first time we've seen an error-case NULL dereference hidden in the initialization of a local variable in an xdr encoder. But I went back through the other recent rewrites and didn't spot any similar bugs.
Reported-by: JianHong Yin jiyin@redhat.com Reviewed-by: Chuck Lever III chuck.lever@oracle.com Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder...") Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index cfb686f23e571..5e13e5f7f92b8 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -170,7 +170,7 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct kvec *head = rqstp->rq_res.head; - struct inode *inode = d_inode(dentry); + struct inode *inode; unsigned int base; int n; int w; @@ -179,6 +179,7 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) return 0; switch (resp->status) { case nfs_ok: + inode = d_inode(dentry); if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; if (xdr_stream_encode_u32(xdr, resp->mask) < 0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Bobrowski repnop@google.com
[ Upstream commit c576e0fcd6188d0edb50b0fb83f853433ef4819b ]
With the idea of returning pidfds from the fanotify API, we need to expose a mechanism for creating pidfds. We drop the static qualifier from pidfd_create() and add its declaration to linux/pid.h so that the pidfd_create() helper can be called from other kernel subsystems i.e. fanotify.
Link: https://lore.kernel.org/r/0c68653ec32f1b7143301f0231f7ed14062fd82b.162839804... Signed-off-by: Matthew Bobrowski repnop@google.com Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pid.h | 1 + kernel/pid.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/pid.h b/include/linux/pid.h index fa10acb8d6a42..af308e15f174c 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -78,6 +78,7 @@ struct file;
extern struct pid *pidfd_pid(const struct file *file); struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); +int pidfd_create(struct pid *pid, unsigned int flags);
static inline struct pid *get_pid(struct pid *pid) { diff --git a/kernel/pid.c b/kernel/pid.c index 4856818c9de1a..74f0466757cbf 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -550,10 +550,12 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) * Note, that this function can only be called after the fd table has * been unshared to avoid leaking the pidfd to the new process. * + * This symbol should not be explicitly exported to loadable modules. + * * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid, unsigned int flags) +int pidfd_create(struct pid *pid, unsigned int flags) { int fd;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Bobrowski repnop@google.com
[ Upstream commit 490b9ba881e2c6337bb09b68010803ae98e59f4a ]
By adding the pidfd_create() declaration to linux/pid.h, we effectively expose this function to the rest of the kernel. In order to avoid any unintended behavior, or set false expectations upon this function, ensure that constraints are forced upon each of the passed parameters. This includes the checking of whether the passed struct pid is a thread-group leader as pidfd creation is currently limited to such pid types.
Link: https://lore.kernel.org/r/2e9b91c2d529d52a003b8b86c45f866153be9eb5.162839804... Signed-off-by: Matthew Bobrowski repnop@google.com Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/pid.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c index 74f0466757cbf..0820f2c50bb0c 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -559,6 +559,12 @@ int pidfd_create(struct pid *pid, unsigned int flags) { int fd;
+ if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) + return -EINVAL; + + if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC)) + return -EINVAL; + fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid), flags | O_RDWR | O_CLOEXEC); if (fd < 0) @@ -598,10 +604,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) if (!p) return -ESRCH;
- if (pid_has_task(p, PIDTYPE_TGID)) - fd = pidfd_create(p, flags); - else - fd = -EINVAL; + fd = pidfd_create(p, flags);
put_pid(p); return fd;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Bobrowski repnop@google.com
[ Upstream commit d3424c9bac893bd06f38a20474cd622881d384ca ]
With the idea to support additional info record types in the future i.e. fanotify_event_info_pidfd, it's a good idea to rename some of the labels assigned to some of the existing fid related functions, parameters, etc which more accurately represent the intent behind their usage.
For example, copy_info_to_user() was defined with a generic function label, which arguably reads as being supportive of different info record types, however the parameter list for this function is explicitly tailored towards the creation and copying of the fanotify_event_info_fid records. This same point applies to the macro defined as FANOTIFY_INFO_HDR_LEN.
With fanotify_event_info_len(), we change the parameter label so that the function implies that it can be extended to calculate the length for additional info record types.
Link: https://lore.kernel.org/r/7c3ec33f3c718dac40764305d4d494d858f59c51.162839804... Signed-off-by: Matthew Bobrowski repnop@google.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++------------- 1 file changed, 19 insertions(+), 14 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 1c439e2fdcd80..74b51dab2cdce 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -104,7 +104,7 @@ struct kmem_cache *fanotify_path_event_cachep __read_mostly; struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
#define FANOTIFY_EVENT_ALIGN 4 -#define FANOTIFY_INFO_HDR_LEN \ +#define FANOTIFY_FID_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
static int fanotify_fid_info_len(int fh_len, int name_len) @@ -114,10 +114,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len) if (name_len) info_len += name_len + 1;
- return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN); + return roundup(FANOTIFY_FID_INFO_HDR_LEN + info_len, + FANOTIFY_EVENT_ALIGN); }
-static int fanotify_event_info_len(unsigned int fid_mode, +static int fanotify_event_info_len(unsigned int info_mode, struct fanotify_event *event) { struct fanotify_info *info = fanotify_event_info(event); @@ -128,7 +129,8 @@ static int fanotify_event_info_len(unsigned int fid_mode,
if (dir_fh_len) { info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); - } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { + } else if ((info_mode & FAN_REPORT_NAME) && + (event->mask & FAN_ONDIR)) { /* * With group flag FAN_REPORT_NAME, if name was not recorded in * event on a directory, we will report the name ".". @@ -303,9 +305,10 @@ static int process_access_response(struct fsnotify_group *group, return -ENOENT; }
-static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, - int info_type, const char *name, size_t name_len, - char __user *buf, size_t count) +static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, + int info_type, const char *name, + size_t name_len, + char __user *buf, size_t count) { struct fanotify_event_info_fid info = { }; struct file_handle handle = { }; @@ -463,10 +466,11 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, if (fanotify_event_dir_fh_len(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; - ret = copy_info_to_user(fanotify_event_fsid(event), - fanotify_info_dir_fh(info), - info_type, fanotify_info_name(info), - info->name_len, buf, count); + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir_fh(info), + info_type, + fanotify_info_name(info), + info->name_len, buf, count); if (ret < 0) goto out_close_fd;
@@ -512,9 +516,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, info_type = FAN_EVENT_INFO_TYPE_FID; }
- ret = copy_info_to_user(fanotify_event_fsid(event), - fanotify_event_object_fh(event), - info_type, dot, dot_len, buf, count); + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_event_object_fh(event), + info_type, dot, dot_len, + buf, count); if (ret < 0) goto out_close_fd;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Bobrowski repnop@google.com
[ Upstream commit 0aca67bb7f0d8c997dfef8ff0bfeb0afb361f0e6 ]
The copy_info_records_to_user() helper allows for the separation of info record copying routines/conditionals from copy_event_to_user(), which reduces the overall clutter within this function. This becomes especially true as we start introducing additional info records in the future i.e. struct fanotify_event_info_pidfd. On success, this helper returns the total amount of bytes that have been copied into the user supplied buffer and on error, a negative value is returned to the caller.
The newly defined macro FANOTIFY_INFO_MODES can be used to obtain info record types that have been enabled for a specific notification group. This macro becomes useful in the subsequent patch when the FAN_REPORT_PIDFD initialization flag is introduced.
Link: https://lore.kernel.org/r/8872947dfe12ce8ae6e9a7f2d49ea29bc8006af0.162839804... Signed-off-by: Matthew Bobrowski repnop@google.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 151 ++++++++++++++++------------- include/linux/fanotify.h | 2 + 2 files changed, 88 insertions(+), 65 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 74b51dab2cdce..83405949a71b2 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -173,7 +173,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; struct fsnotify_event *fsn_event; - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES);
pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
@@ -183,8 +183,8 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, goto out;
event = FANOTIFY_E(fsn_event); - if (fid_mode) - event_size += fanotify_event_info_len(fid_mode, event); + if (info_mode) + event_size += fanotify_event_info_len(info_mode, event);
if (event_size > count) { event = ERR_PTR(-EINVAL); @@ -401,68 +401,17 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return info_len; }
-static ssize_t copy_event_to_user(struct fsnotify_group *group, - struct fanotify_event *event, - char __user *buf, size_t count) +static int copy_info_records_to_user(struct fanotify_event *event, + struct fanotify_info *info, + unsigned int info_mode, + char __user *buf, size_t count) { - struct fanotify_event_metadata metadata; - struct path *path = fanotify_event_path(event); - struct fanotify_info *info = fanotify_event_info(event); - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - struct file *f = NULL; - int ret, fd = FAN_NOFD; - int info_type = 0; - - pr_debug("%s: group=%p event=%p\n", __func__, group, event); - - metadata.event_len = FAN_EVENT_METADATA_LEN + - fanotify_event_info_len(fid_mode, event); - metadata.metadata_len = FAN_EVENT_METADATA_LEN; - metadata.vers = FANOTIFY_METADATA_VERSION; - metadata.reserved = 0; - metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; - metadata.pid = pid_vnr(event->pid); - /* - * For an unprivileged listener, event->pid can be used to identify the - * events generated by the listener process itself, without disclosing - * the pids of other processes. - */ - if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && - task_tgid(current) != event->pid) - metadata.pid = 0; - - /* - * For now, fid mode is required for an unprivileged listener and - * fid mode does not report fd in events. Keep this check anyway - * for safety in case fid mode requirement is relaxed in the future - * to allow unprivileged listener to get events with no fd and no fid. - */ - if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && - path && path->mnt && path->dentry) { - fd = create_fd(group, path, &f); - if (fd < 0) - return fd; - } - metadata.fd = fd; + int ret, total_bytes = 0, info_type = 0; + unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS;
- ret = -EFAULT; /* - * Sanity check copy size in case get_one_event() and - * event_len sizes ever get out of sync. + * Event info records order is as follows: dir fid + name, child fid. */ - if (WARN_ON_ONCE(metadata.event_len > count)) - goto out_close_fd; - - if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN)) - goto out_close_fd; - - buf += FAN_EVENT_METADATA_LEN; - count -= FAN_EVENT_METADATA_LEN; - - if (fanotify_is_perm_event(event->mask)) - FANOTIFY_PERM(event)->fd = fd; - - /* Event info records order is: dir fid + name, child fid */ if (fanotify_event_dir_fh_len(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; @@ -472,10 +421,11 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, fanotify_info_name(info), info->name_len, buf, count); if (ret < 0) - goto out_close_fd; + return ret;
buf += ret; count -= ret; + total_bytes += ret; }
if (fanotify_event_object_fh_len(event)) { @@ -492,8 +442,8 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, (event->mask & FAN_ONDIR)) { /* * With group flag FAN_REPORT_NAME, if name was not - * recorded in an event on a directory, report the - * name "." with info type DFID_NAME. + * recorded in an event on a directory, report the name + * "." with info type DFID_NAME. */ info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; dot = "."; @@ -521,10 +471,81 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, info_type, dot, dot_len, buf, count); if (ret < 0) - goto out_close_fd; + return ret;
buf += ret; count -= ret; + total_bytes += ret; + } + + return total_bytes; +} + +static ssize_t copy_event_to_user(struct fsnotify_group *group, + struct fanotify_event *event, + char __user *buf, size_t count) +{ + struct fanotify_event_metadata metadata; + struct path *path = fanotify_event_path(event); + struct fanotify_info *info = fanotify_event_info(event); + unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); + struct file *f = NULL; + int ret, fd = FAN_NOFD; + + pr_debug("%s: group=%p event=%p\n", __func__, group, event); + + metadata.event_len = FAN_EVENT_METADATA_LEN + + fanotify_event_info_len(info_mode, event); + metadata.metadata_len = FAN_EVENT_METADATA_LEN; + metadata.vers = FANOTIFY_METADATA_VERSION; + metadata.reserved = 0; + metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; + metadata.pid = pid_vnr(event->pid); + /* + * For an unprivileged listener, event->pid can be used to identify the + * events generated by the listener process itself, without disclosing + * the pids of other processes. + */ + if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && + task_tgid(current) != event->pid) + metadata.pid = 0; + + /* + * For now, fid mode is required for an unprivileged listener and + * fid mode does not report fd in events. Keep this check anyway + * for safety in case fid mode requirement is relaxed in the future + * to allow unprivileged listener to get events with no fd and no fid. + */ + if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && + path && path->mnt && path->dentry) { + fd = create_fd(group, path, &f); + if (fd < 0) + return fd; + } + metadata.fd = fd; + + ret = -EFAULT; + /* + * Sanity check copy size in case get_one_event() and + * event_len sizes ever get out of sync. + */ + if (WARN_ON_ONCE(metadata.event_len > count)) + goto out_close_fd; + + if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN)) + goto out_close_fd; + + buf += FAN_EVENT_METADATA_LEN; + count -= FAN_EVENT_METADATA_LEN; + + if (fanotify_is_perm_event(event->mask)) + FANOTIFY_PERM(event)->fd = fd; + + if (info_mode) { + ret = copy_info_records_to_user(event, info, info_mode, + buf, count); + if (ret < 0) + goto out_close_fd; }
if (f) diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index a16dbeced1528..10a7e26ddba6c 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -27,6 +27,8 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
+#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS) + /* * fanotify_init() flags that require CAP_SYS_ADMIN. * We do not allow unprivileged groups to request permission events.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Bobrowski repnop@google.com
[ Upstream commit af579beb666aefb17e9a335c12c788c92932baf1 ]
Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which allows userspace applications to control whether a pidfd information record containing a pidfd is to be returned alongside the generic event metadata for each event.
If FAN_REPORT_PIDFD is enabled for a notification group, an additional struct fanotify_event_info_pidfd object type will be supplied alongside the generic struct fanotify_event_metadata for a single event. This functionality is analogous to that of FAN_REPORT_FID in terms of how the event structure is supplied to a userspace application. Usage of FAN_REPORT_PIDFD with FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a struct fanotify_event_info_pidfd object will likely follow any struct fanotify_event_info_fid object.
Currently, the usage of the FAN_REPORT_TID flag is not permitted along with FAN_REPORT_PIDFD as the pidfd API currently only supports the creation of pidfds for thread-group leaders. Additionally, usage of the FAN_REPORT_PIDFD flag is limited to privileged processes only i.e. event listeners that are running with the CAP_SYS_ADMIN capability. Attempting to supply the FAN_REPORT_TID initialization flags with FAN_REPORT_PIDFD or creating a notification group without CAP_SYS_ADMIN will result with -EINVAL being returned to the caller.
In the event of a pidfd creation error, there are two types of error values that can be reported back to the listener. There is FAN_NOPIDFD, which will be reported in cases where the process responsible for generating the event has terminated prior to the event listener being able to read the event. Then there is FAN_EPIDFD, which will be reported when a more generic pidfd creation error has occurred when fanotify calls pidfd_create().
Link: https://lore.kernel.org/r/5f9e09cff7ed62bfaa51c1369e0f7ea5f16a91aa.162839804... Signed-off-by: Matthew Bobrowski repnop@google.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 85 ++++++++++++++++++++++++++++-- include/linux/fanotify.h | 3 +- include/uapi/linux/fanotify.h | 13 +++++ 3 files changed, 96 insertions(+), 5 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 83405949a71b2..10fb062065b69 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/fanotify.h> #include <linux/fcntl.h> +#include <linux/fdtable.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/anon_inodes.h> @@ -106,6 +107,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; #define FANOTIFY_EVENT_ALIGN 4 #define FANOTIFY_FID_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) +#define FANOTIFY_PIDFD_INFO_HDR_LEN \ + sizeof(struct fanotify_event_info_pidfd)
static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -138,6 +141,9 @@ static int fanotify_event_info_len(unsigned int info_mode, dot_len = 1; }
+ if (info_mode & FAN_REPORT_PIDFD) + info_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + if (fh_len) info_len += fanotify_fid_info_len(fh_len, dot_len);
@@ -401,13 +407,34 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return info_len; }
+static int copy_pidfd_info_to_user(int pidfd, + char __user *buf, + size_t count) +{ + struct fanotify_event_info_pidfd info = { }; + size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN; + + if (WARN_ON_ONCE(info_len > count)) + return -EFAULT; + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_PIDFD; + info.hdr.len = info_len; + info.pidfd = pidfd; + + if (copy_to_user(buf, &info, info_len)) + return -EFAULT; + + return info_len; +} + static int copy_info_records_to_user(struct fanotify_event *event, struct fanotify_info *info, - unsigned int info_mode, + unsigned int info_mode, int pidfd, char __user *buf, size_t count) { int ret, total_bytes = 0, info_type = 0; unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; + unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
/* * Event info records order is as follows: dir fid + name, child fid. @@ -478,6 +505,16 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; }
+ if (pidfd_mode) { + ret = copy_pidfd_info_to_user(pidfd, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + return total_bytes; }
@@ -489,8 +526,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); + unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; struct file *f = NULL; - int ret, fd = FAN_NOFD; + int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD;
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
@@ -524,6 +562,33 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, } metadata.fd = fd;
+ if (pidfd_mode) { + /* + * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual + * exclusion is ever lifted. At the time of incoporating pidfd + * support within fanotify, the pidfd API only supported the + * creation of pidfds for thread-group leaders. + */ + WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); + + /* + * The PIDTYPE_TGID check for an event->pid is performed + * preemptively in an attempt to catch out cases where the event + * listener reads events after the event generating process has + * already terminated. Report FAN_NOPIDFD to the event listener + * in those cases, with all other pidfd creation errors being + * reported as FAN_EPIDFD. + */ + if (metadata.pid == 0 || + !pid_has_task(event->pid, PIDTYPE_TGID)) { + pidfd = FAN_NOPIDFD; + } else { + pidfd = pidfd_create(event->pid, 0); + if (pidfd < 0) + pidfd = FAN_EPIDFD; + } + } + ret = -EFAULT; /* * Sanity check copy size in case get_one_event() and @@ -542,7 +607,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, FANOTIFY_PERM(event)->fd = fd;
if (info_mode) { - ret = copy_info_records_to_user(event, info, info_mode, + ret = copy_info_records_to_user(event, info, info_mode, pidfd, buf, count); if (ret < 0) goto out_close_fd; @@ -558,6 +623,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, put_unused_fd(fd); fput(f); } + + if (pidfd >= 0) + close_fd(pidfd); + return ret; }
@@ -1103,6 +1172,14 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) #endif return -EINVAL;
+ /* + * A pidfd can only be returned for a thread-group leader; thus + * FAN_REPORT_PIDFD and FAN_REPORT_TID need to remain mutually + * exclusive. + */ + if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID)) + return -EINVAL; + if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) return -EINVAL;
@@ -1522,7 +1599,7 @@ static int __init fanotify_user_setup(void) FANOTIFY_DEFAULT_MAX_USER_MARKS);
BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 10a7e26ddba6c..eec3b7c408115 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -27,7 +27,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
-#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS) +#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD)
/* * fanotify_init() flags that require CAP_SYS_ADMIN. @@ -37,6 +37,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ */ #define FANOTIFY_ADMIN_INIT_FLAGS (FANOTIFY_PERM_CLASSES | \ FAN_REPORT_TID | \ + FAN_REPORT_PIDFD | \ FAN_UNLIMITED_QUEUE | \ FAN_UNLIMITED_MARKS)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index fbf9c5c7dd59a..64553df9d7350 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -51,6 +51,7 @@ #define FAN_ENABLE_AUDIT 0x00000040
/* Flags to determine fanotify event format */ +#define FAN_REPORT_PIDFD 0x00000080 /* Report pidfd for event->pid */ #define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ @@ -123,6 +124,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_FID 1 #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 #define FAN_EVENT_INFO_TYPE_DFID 3 +#define FAN_EVENT_INFO_TYPE_PIDFD 4
/* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -148,6 +150,15 @@ struct fanotify_event_info_fid { unsigned char handle[0]; };
+/* + * This structure is used for info records of type FAN_EVENT_INFO_TYPE_PIDFD. + * It holds a pidfd for the pid that was responsible for generating an event. + */ +struct fanotify_event_info_pidfd { + struct fanotify_event_info_header hdr; + __s32 pidfd; +}; + struct fanotify_response { __s32 fd; __u32 response; @@ -160,6 +171,8 @@ struct fanotify_response {
/* No fd set in event */ #define FAN_NOFD -1 +#define FAN_NOPIDFD FAN_NOFD +#define FAN_EPIDFD -2
/* Helper functions to deal with fanotify_event_metadata buffers */ #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 09ddbe69c9925b42cb9529f60678c25b241d8b18 ]
We must have a reference on inode, so ihold is cheaper.
Link: https://lore.kernel.org/r/20210810151220.285179-2-amir73il@gmail.com Reviewed-by: Matthew Bobrowski repnop@google.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/mark.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 7af98a7c33c27..3c8fc77d3f072 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -493,8 +493,11 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, conn->fsid.val[0] = conn->fsid.val[1] = 0; conn->flags = 0; } - if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) - inode = igrab(fsnotify_conn_inode(conn)); + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { + inode = fsnotify_conn_inode(conn); + ihold(inode); + } + /* * cmpxchg() provides the barrier so that readers of *connp can see * only initialized structure
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 11fa333b58ba1518e7c69fafb6513a0117f8fe33 ]
Instead of incrementing s_fsnotify_inode_refs when detaching connector from inode, increment it earlier when attaching connector to inode. Next patch is going to use s_fsnotify_inode_refs to count all objects with attached connectors.
Link: https://lore.kernel.org/r/20210810151220.285179-3-amir73il@gmail.com Reviewed-by: Matthew Bobrowski repnop@google.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/mark.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 3c8fc77d3f072..f6d1ad3ecca2a 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -169,6 +169,21 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) } }
+static void fsnotify_get_inode_ref(struct inode *inode) +{ + ihold(inode); + atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); +} + +static void fsnotify_put_inode_ref(struct inode *inode) +{ + struct super_block *sb = inode->i_sb; + + iput(inode); + if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) + wake_up_var(&sb->s_fsnotify_inode_refs); +} + static void *fsnotify_detach_connector_from_object( struct fsnotify_mark_connector *conn, unsigned int *type) @@ -182,7 +197,6 @@ static void *fsnotify_detach_connector_from_object( if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); inode->i_fsnotify_mask = 0; - atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) { @@ -209,19 +223,12 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark) /* Drop object reference originally held by a connector */ static void fsnotify_drop_object(unsigned int type, void *objp) { - struct inode *inode; - struct super_block *sb; - if (!objp) return; /* Currently only inode references are passed to be dropped */ if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE)) return; - inode = objp; - sb = inode->i_sb; - iput(inode); - if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) - wake_up_var(&sb->s_fsnotify_inode_refs); + fsnotify_put_inode_ref(objp); }
void fsnotify_put_mark(struct fsnotify_mark *mark) @@ -495,7 +502,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, } if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); - ihold(inode); + fsnotify_get_inode_ref(inode); }
/* @@ -505,7 +512,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, if (cmpxchg(connp, NULL, conn)) { /* Someone else created list structure for us */ if (inode) - iput(inode); + fsnotify_put_inode_ref(inode); kmem_cache_free(fsnotify_mark_connector_cachep, conn); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit ec44610fe2b86daef70f3f53f47d2a2542d7094f ]
Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all objects with attached connectors, not only inodes with attached connectors.
This will be used to optimize fsnotify() calls on sb without any type of marks.
Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Matthew Bobrowski repnop@google.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.c | 6 +++--- fs/notify/fsnotify.h | 15 +++++++++++++++ fs/notify/mark.c | 24 +++++++++++++++++++++--- include/linux/fs.h | 7 +++++-- 4 files changed, 44 insertions(+), 8 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 30d422b8c0fc7..963e6ce75b961 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -87,15 +87,15 @@ static void fsnotify_unmount_inodes(struct super_block *sb)
if (iput_inode) iput(iput_inode); - /* Wait for outstanding inode references from connectors */ - wait_var_event(&sb->s_fsnotify_inode_refs, - !atomic_long_read(&sb->s_fsnotify_inode_refs)); }
void fsnotify_sb_delete(struct super_block *sb) { fsnotify_unmount_inodes(sb); fsnotify_clear_marks_by_sb(sb); + /* Wait for outstanding object references from connectors */ + wait_var_event(&sb->s_fsnotify_connectors, + !atomic_long_read(&sb->s_fsnotify_connectors)); }
/* diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index ff2063ec6b0f3..87d8a50ee8038 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -27,6 +27,21 @@ static inline struct super_block *fsnotify_conn_sb( return container_of(conn->obj, struct super_block, s_fsnotify_marks); }
+static inline struct super_block *fsnotify_connector_sb( + struct fsnotify_mark_connector *conn) +{ + switch (conn->type) { + case FSNOTIFY_OBJ_TYPE_INODE: + return fsnotify_conn_inode(conn)->i_sb; + case FSNOTIFY_OBJ_TYPE_VFSMOUNT: + return fsnotify_conn_mount(conn)->mnt.mnt_sb; + case FSNOTIFY_OBJ_TYPE_SB: + return fsnotify_conn_sb(conn); + default: + return NULL; + } +} + /* destroy all events sitting in this groups notification queue */ extern void fsnotify_flush_notify(struct fsnotify_group *group);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index f6d1ad3ecca2a..796946eb0c2e2 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -172,7 +172,7 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) static void fsnotify_get_inode_ref(struct inode *inode) { ihold(inode); - atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); + atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); }
static void fsnotify_put_inode_ref(struct inode *inode) @@ -180,8 +180,24 @@ static void fsnotify_put_inode_ref(struct inode *inode) struct super_block *sb = inode->i_sb;
iput(inode); - if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) - wake_up_var(&sb->s_fsnotify_inode_refs); + if (atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) + wake_up_var(&sb->s_fsnotify_connectors); +} + +static void fsnotify_get_sb_connectors(struct fsnotify_mark_connector *conn) +{ + struct super_block *sb = fsnotify_connector_sb(conn); + + if (sb) + atomic_long_inc(&sb->s_fsnotify_connectors); +} + +static void fsnotify_put_sb_connectors(struct fsnotify_mark_connector *conn) +{ + struct super_block *sb = fsnotify_connector_sb(conn); + + if (sb && atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) + wake_up_var(&sb->s_fsnotify_connectors); }
static void *fsnotify_detach_connector_from_object( @@ -203,6 +219,7 @@ static void *fsnotify_detach_connector_from_object( fsnotify_conn_sb(conn)->s_fsnotify_mask = 0; }
+ fsnotify_put_sb_connectors(conn); rcu_assign_pointer(*(conn->obj), NULL); conn->obj = NULL; conn->type = FSNOTIFY_OBJ_TYPE_DETACHED; @@ -504,6 +521,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, inode = fsnotify_conn_inode(conn); fsnotify_get_inode_ref(inode); } + fsnotify_get_sb_connectors(conn);
/* * cmpxchg() provides the barrier so that readers of *connp can see diff --git a/include/linux/fs.h b/include/linux/fs.h index cc3b6ddf58223..a9ac60d3be1d6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1512,8 +1512,11 @@ struct super_block { /* Number of inodes with nlink == 0 but still referenced */ atomic_long_t s_remove_count;
- /* Pending fsnotify inode refs */ - atomic_long_t s_fsnotify_inode_refs; + /* + * Number of inode/mount/sb objects that are being watched, note that + * inodes objects are currently double-accounted. + */ + atomic_long_t s_fsnotify_connectors;
/* Being remounted read-only */ int s_readonly_remount;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e43de7f0862b8598cd1ef440e3b4701cd107ea40 ]
Add a simple check in the inline helpers to avoid calling fsnotify() and __fsnotify_parent() in case there are no marks of any type (inode/sb/mount) for an inode's sb, so there can be no objects of any type interested in the event.
Link: https://lore.kernel.org/r/20210810151220.285179-5-amir73il@gmail.com Reviewed-by: Matthew Bobrowski repnop@google.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/fsnotify.h | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 79add91eaa04e..a9477c14fad5c 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -30,6 +30,9 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask, struct inode *child, const struct qstr *name, u32 cookie) { + if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) + return; + fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); }
@@ -41,6 +44,9 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
static inline void fsnotify_inode(struct inode *inode, __u32 mask) { + if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + return; + if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR;
@@ -53,6 +59,9 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, { struct inode *inode = d_inode(dentry);
+ if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + return 0; + if (S_ISDIR(inode->i_mode)) { mask |= FS_ISDIR;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c7e0b781b73c2e26e442ed71397cc2bc5945a732 ]
A few useful observations:
- The value in @size is never modified.
- splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary.
- The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant.
The resulting function is 18 bytes shorter on my system (-Os).
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 74b2c6c5ad0b9..8520a2fc92dee 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -854,26 +854,21 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct svc_rqst *rqstp = sd->u.data; struct page **pp = rqstp->rq_next_page; struct page *page = buf->page; - size_t size; - - size = sd->len;
if (rqstp->rq_res.page_len == 0) { get_page(page); put_page(*rqstp->rq_next_page); *(rqstp->rq_next_page++) = page; rqstp->rq_res.page_base = buf->offset; - rqstp->rq_res.page_len = size; } else if (page != pp[-1]) { get_page(page); if (*rqstp->rq_next_page) put_page(*rqstp->rq_next_page); *(rqstp->rq_next_page++) = page; - rqstp->rq_res.page_len += size; - } else - rqstp->rq_res.page_len += size; + } + rqstp->rq_res.page_len += sd->len;
- return size; + return sd->len; }
static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2f0f88f42f2eab0421ed37d7494de9124fdf0d34 ]
Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly.
To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes.
[ cel: adjusted to apply without f6e70aab9dfe ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/svc.h | 4 ++++ net/sunrpc/svc.c | 21 +++++++++++++++++++++ net/sunrpc/svc_xprt.c | 3 +++ 3 files changed, 28 insertions(+)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index e91d51ea028bb..ab9afbf0a0d8b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -19,6 +19,7 @@ #include <linux/sunrpc/svcauth.h> #include <linux/wait.h> #include <linux/mm.h> +#include <linux/pagevec.h>
/* statistics for svc_pool structures */ struct svc_pool_stats { @@ -256,6 +257,7 @@ struct svc_rqst { struct page * *rq_next_page; /* next reply page to use */ struct page * *rq_page_end; /* one past the last page */
+ struct pagevec rq_pvec; struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */ struct bio_vec rq_bvec[RPCSVC_MAXPAGES];
@@ -502,6 +504,8 @@ struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node); +void svc_rqst_replace_page(struct svc_rqst *rqstp, + struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); unsigned int svc_pool_map_get(void); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 73e02ba4ed53a..27f98756d1751 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -842,6 +842,27 @@ svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrser } EXPORT_SYMBOL_GPL(svc_set_num_threads_sync);
+/** + * svc_rqst_replace_page - Replace one page in rq_pages[] + * @rqstp: svc_rqst with pages to replace + * @page: replacement page + * + * When replacing a page in rq_pages, batch the release of the + * replaced pages to avoid hammering the page allocator. + */ +void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) +{ + if (*rqstp->rq_next_page) { + if (!pagevec_space(&rqstp->rq_pvec)) + __pagevec_release(&rqstp->rq_pvec); + pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page); + } + + get_page(page); + *(rqstp->rq_next_page++) = page; +} +EXPORT_SYMBOL_GPL(svc_rqst_replace_page); + /* * Called from a server thread as it's exiting. Caller must hold the "service * mutex" for the service. diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index d150e25b4d45e..570b092165d73 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -525,6 +525,7 @@ static void svc_xprt_release(struct svc_rqst *rqstp) kfree(rqstp->rq_deferred); rqstp->rq_deferred = NULL;
+ pagevec_release(&rqstp->rq_pvec); svc_free_res_pages(rqstp); rqstp->rq_res.page_len = 0; rqstp->rq_res.page_base = 0; @@ -651,6 +652,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) int pages; int i;
+ pagevec_init(&rqstp->rq_pvec); + /* now allocate needed pages. If we get a failure, sleep briefly */ pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT; if (pages > RPCSVC_MAXPAGES) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 496d83cf0f2fa70cfe256c2499e2d3523d3868f3 ]
Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 8520a2fc92dee..05b5f7e241e70 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -856,15 +856,10 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct page *page = buf->page;
if (rqstp->rq_res.page_len == 0) { - get_page(page); - put_page(*rqstp->rq_next_page); - *(rqstp->rq_next_page++) = page; + svc_rqst_replace_page(rqstp, page); rqstp->rq_res.page_base = buf->offset; } else if (page != pp[-1]) { - get_page(page); - if (*rqstp->rq_next_page) - put_page(*rqstp->rq_next_page); - *(rqstp->rq_next_page++) = page; + svc_rqst_replace_page(rqstp, page); } rqstp->rq_res.page_len += sd->len;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit ea49dc79002c416a9003f3204bc14f846a0dbcae ]
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 1 - include/uapi/linux/nfsd/nfsfh.h | 1 - 2 files changed, 2 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 05b5f7e241e70..2c493937dd5ec 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -244,7 +244,6 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 - * NeilBrown neilb@cse.unsw.edu.au */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h index 427294dd56a1b..e29e8accc4f4d 100644 --- a/include/uapi/linux/nfsd/nfsfh.h +++ b/include/uapi/linux/nfsd/nfsfh.h @@ -33,7 +33,6 @@ struct nfs_fhbase_old {
/* * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * by Neil Brown neilb@cse.unsw.edu.au - March 2000 * * The file handle starts with a sequence of four-byte words. * The first word contains a version number (1) and three descriptor bytes
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jia He hejianet@gmail.com
[ Upstream commit a2071573d6346819cc4e5787b4206f2184985160 ]
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches.
Suggested-by: Pan Xinhui xinhui@linux.vnet.ibm.com Signed-off-by: Jia He hejianet@gmail.com [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth thuth@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sysctl.h | 2 ++ kernel/sysctl.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index c202a72e16906..47cf70c8eb93c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -55,6 +55,8 @@ typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos);
int proc_dostring(struct ctl_table *, int, void *, size_t *, loff_t *); +int proc_dobool(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos); int proc_dointvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_douintvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_minmax(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index b29a9568ebbe4..abe0f16d53641 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -546,6 +546,21 @@ static void proc_put_char(void **buf, size_t *size, char c) } }
+static int do_proc_dobool_conv(bool *negp, unsigned long *lvalp, + int *valp, + int write, void *data) +{ + if (write) { + *(bool *)valp = *lvalp; + } else { + int val = *(bool *)valp; + + *lvalp = (unsigned long)val; + *negp = false; + } + return 0; +} + static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) @@ -808,6 +823,26 @@ static int do_proc_douintvec(struct ctl_table *table, int write, buffer, lenp, ppos, conv, data); }
+/** + * proc_dobool - read/write a bool + * @table: the sysctl table + * @write: %TRUE if this is a write to the sysctl file + * @buffer: the user buffer + * @lenp: the size of the user buffer + * @ppos: file position + * + * Reads/writes up to table->maxlen/sizeof(unsigned int) integer + * values from/to the user buffer, treated as an ASCII string. + * + * Returns 0 on success. + */ +int proc_dobool(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + return do_proc_dointvec(table, write, buffer, lenp, ppos, + do_proc_dobool_conv, NULL); +} + /** * proc_dointvec - read a vector of integers * @table: the sysctl table @@ -1644,6 +1679,12 @@ int proc_dostring(struct ctl_table *table, int write, return -ENOSYS; }
+int proc_dobool(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + return -ENOSYS; +} + int proc_dointvec(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -3503,6 +3544,7 @@ int __init sysctl_init(void) * No sense putting this after each symbol definition, twice, * exception granted :-) */ +EXPORT_SYMBOL(proc_dobool); EXPORT_SYMBOL(proc_dointvec); EXPORT_SYMBOL(proc_douintvec); EXPORT_SYMBOL(proc_dointvec_jiffies);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jia He hejianet@gmail.com
[ Upstream commit d02a3a2cb25d384005a6e3446a445013342024b7 ]
nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items.
This patch use a new proc_handler proc_dobool to fix it.
Signed-off-by: Jia He hejianet@gmail.com Reviewed-by: Pan Xinhui xinhui.pan@linux.vnet.ibm.com [thuth: Fix typo in commit message] Signed-off-by: Thomas Huth thuth@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 2de048f80eb8c..0ab9756ed2359 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -584,7 +584,7 @@ static struct ctl_table nlm_sysctls[] = { .data = &nsm_use_hostnames, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dobool, }, { .procname = "nsm_local_state",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 2dc6f19e4f438d4c14987cb17aee38aaf7304e7f ]
It'll come in handy to get the whole nlm_lock.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 3 ++- fs/lockd/svcproc.c | 2 +- fs/lockd/svcsubs.c | 15 ++++++++------- include/linux/lockd/lockd.h | 2 +- 4 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 4c10fb5138f10..bc496bbd696b8 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -40,7 +40,8 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0) + error = nlm_lookup_file(rqstp, &file, lock); + if (error) goto no_locks; *filp = file;
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 4ae4b63b53925..f4e5e0eb30fd1 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -69,7 +69,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh)); + error = cast_status(nlm_lookup_file(rqstp, &file, lock)); if (error != 0) goto no_locks; *filp = file; diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 028fc152da22f..2d62633b39e51 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -82,31 +82,31 @@ static inline unsigned int file_hash(struct nfs_fh *f) */ __be32 nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, - struct nfs_fh *f) + struct nlm_lock *lock) { struct nlm_file *file; unsigned int hash; __be32 nfserr;
- nlm_debug_print_fh("nlm_lookup_file", f); + nlm_debug_print_fh("nlm_lookup_file", &lock->fh);
- hash = file_hash(f); + hash = file_hash(&lock->fh);
/* Lock file table */ mutex_lock(&nlm_file_mutex);
hlist_for_each_entry(file, &nlm_files[hash], f_list) - if (!nfs_compare_fh(&file->f_handle, f)) + if (!nfs_compare_fh(&file->f_handle, &lock->fh)) goto found;
- nlm_debug_print_fh("creating file for", f); + nlm_debug_print_fh("creating file for", &lock->fh);
nfserr = nlm_lck_denied_nolocks; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) goto out_unlock;
- memcpy(&file->f_handle, f, sizeof(struct nfs_fh)); + memcpy(&file->f_handle, &lock->fh, sizeof(struct nfs_fh)); mutex_init(&file->f_mutex); INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks); @@ -117,7 +117,8 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, * We have to make sure we have the right credential to open * the file. */ - if ((nfserr = nlmsvc_ops->fopen(rqstp, f, &file->f_file)) != 0) { + nfserr = nlmsvc_ops->fopen(rqstp, &lock->fh, &file->f_file); + if (nfserr) { dprintk("lockd: open failed (error %d)\n", nfserr); goto out_free; } diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 666f5f310a041..81b71ad2040ac 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -286,7 +286,7 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); * File handling for the server personality */ __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, - struct nfs_fh *); + struct nlm_lock *); void nlm_release_file(struct nlm_file *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit a81041b7d8f08c4e1014173c5483a0f18724a576 ]
Make this lookup slightly more concise, and prepare for changing how we look this up in a following patch.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svclock.c | 16 ++++++++-------- fs/lockd/svcsubs.c | 4 ++-- 2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 273a81971ed57..bcd180ba99576 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -474,8 +474,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, __be32 ret;
dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_type, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end, @@ -581,8 +581,8 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_lockowner *test_owner;
dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_type, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -644,8 +644,8 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) int error;
dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -673,8 +673,8 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l int status = 0;
dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 2d62633b39e51..e02951f8a28f5 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -45,7 +45,7 @@ static inline void nlm_debug_print_fh(char *msg, struct nfs_fh *f)
static inline void nlm_debug_print_file(char *msg, struct nlm_file *file) { - struct inode *inode = locks_inode(file->f_file); + struct inode *inode = nlmsvc_file_inode(file);
dprintk("lockd: %s %s/%ld\n", msg, inode->i_sb->s_id, inode->i_ino); @@ -416,7 +416,7 @@ nlmsvc_match_sb(void *datap, struct nlm_file *file) { struct super_block *sb = datap;
- return sb == locks_inode(file->f_file)->i_sb; + return sb == nlmsvc_file_inode(file)->i_sb; }
/**
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit b661601a9fdf1af8516e1100de8bba84bd41cca4 ]
Update comment to reflect that we *do* allow reexport, whether it's a good idea or not....
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index e02951f8a28f5..13e6ffc219ec9 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -111,8 +111,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks);
- /* Open the file. Note that this must not sleep for too long, else - * we would lock up lockd:-) So no NFS re-exports, folks. + /* + * Open the file. Note that if we're reexporting, for example, + * this could block the lockd thread for a while. * * We have to make sure we have the right credential to open * the file.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 7f024fcd5c97dc70bb9121c80407cf3cf9be7159 ]
We shouldn't really be using a read-only file descriptor to take a write lock.
Most filesystems will put up with it. But NFS, for example, won't.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 4 +- fs/lockd/svclock.c | 25 ++++++--- fs/lockd/svcproc.c | 4 +- fs/lockd/svcsubs.c | 102 +++++++++++++++++++++++++----------- fs/nfsd/lockd.c | 8 ++- include/linux/lockd/bind.h | 3 +- include/linux/lockd/lockd.h | 9 +++- 7 files changed, 111 insertions(+), 44 deletions(-)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index bc496bbd696b8..e10ae2c41279e 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -40,13 +40,15 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { + int mode = lock_to_openmode(&lock->fl); + error = nlm_lookup_file(rqstp, &file, lock); if (error) goto no_locks; *filp = file;
/* Set up the missing parts of the file_lock structure */ - lock->fl.fl_file = file->f_file; + lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index bcd180ba99576..a7b4c51667ada 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -471,6 +471,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, { struct nlm_block *block = NULL; int error; + int mode; __be32 ret;
dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", @@ -524,7 +525,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
if (!wait) lock->fl.fl_flags &= ~FL_SLEEP; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + mode = lock_to_openmode(&lock->fl); + error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP;
dprintk("lockd: vfs_lock_file returned %d\n", error); @@ -577,6 +579,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_lock *conflock, struct nlm_cookie *cookie) { int error; + int mode; __be32 ret; struct nlm_lockowner *test_owner;
@@ -595,7 +598,8 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, /* If there's a conflicting lock, remember to clean up the test lock */ test_owner = (struct nlm_lockowner *)lock->fl.fl_owner;
- error = vfs_test_lock(file->f_file, &lock->fl); + mode = lock_to_openmode(&lock->fl); + error = vfs_test_lock(file->f_file[mode], &lock->fl); if (error) { /* We can't currently deal with deferred test requests */ if (error == FILE_LOCK_DEFERRED) @@ -641,7 +645,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, __be32 nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) { - int error; + int error = 0;
dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -654,7 +658,12 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) nlmsvc_cancel_blocked(net, file, lock);
lock->fl.fl_type = F_UNLCK; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + if (file->f_file[O_RDONLY]) + error = vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, + &lock->fl, NULL); + if (file->f_file[O_WRONLY]) + error = vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, + &lock->fl, NULL);
return (error < 0)? nlm_lck_denied_nolocks : nlm_granted; } @@ -671,6 +680,7 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l { struct nlm_block *block; int status = 0; + int mode;
dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -686,7 +696,8 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l block = nlmsvc_lookup_block(file, lock); mutex_unlock(&file->f_mutex); if (block != NULL) { - vfs_cancel_lock(block->b_file->f_file, + mode = lock_to_openmode(&lock->fl); + vfs_cancel_lock(block->b_file->f_file[mode], &block->b_call->a_args.lock.fl); status = nlmsvc_unlink_block(block); nlmsvc_release_block(block); @@ -803,6 +814,7 @@ nlmsvc_grant_blocked(struct nlm_block *block) { struct nlm_file *file = block->b_file; struct nlm_lock *lock = &block->b_call->a_args.lock; + int mode; int error; loff_t fl_start, fl_end;
@@ -828,7 +840,8 @@ nlmsvc_grant_blocked(struct nlm_block *block) lock->fl.fl_flags |= FL_SLEEP; fl_start = lock->fl.fl_start; fl_end = lock->fl.fl_end; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + mode = lock_to_openmode(&lock->fl); + error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP; lock->fl.fl_start = fl_start; lock->fl.fl_end = fl_end; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index f4e5e0eb30fd1..99696d3f6dd66 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -55,6 +55,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, struct nlm_host *host = NULL; struct nlm_file *file = NULL; struct nlm_lock *lock = &argp->lock; + int mode; __be32 error = 0;
/* nfsd callbacks must have been installed for this procedure */ @@ -75,7 +76,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, *filp = file;
/* Set up the missing parts of the file_lock structure */ - lock->fl.fl_file = file->f_file; + mode = lock_to_openmode(&lock->fl); + lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 13e6ffc219ec9..cb3a7512c33ec 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -71,14 +71,35 @@ static inline unsigned int file_hash(struct nfs_fh *f) return tmp & (FILE_NRHASH - 1); }
+int lock_to_openmode(struct file_lock *lock) +{ + return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY; +} + +/* + * Open the file. Note that if we're reexporting, for example, + * this could block the lockd thread for a while. + * + * We have to make sure we have the right credential to open + * the file. + */ +static __be32 nlm_do_fopen(struct svc_rqst *rqstp, + struct nlm_file *file, int mode) +{ + struct file **fp = &file->f_file[mode]; + __be32 nfserr; + + if (*fp) + return 0; + nfserr = nlmsvc_ops->fopen(rqstp, &file->f_handle, fp, mode); + if (nfserr) + dprintk("lockd: open failed (error %d)\n", nfserr); + return nfserr; +} + /* * Lookup file info. If it doesn't exist, create a file info struct * and open a (VFS) file for the given inode. - * - * FIXME: - * Note that we open the file O_RDONLY even when creating write locks. - * This is not quite right, but for now, we assume the client performs - * the proper R/W checking. */ __be32 nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, @@ -87,42 +108,38 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, struct nlm_file *file; unsigned int hash; __be32 nfserr; + int mode;
nlm_debug_print_fh("nlm_lookup_file", &lock->fh);
hash = file_hash(&lock->fh); + mode = lock_to_openmode(&lock->fl);
/* Lock file table */ mutex_lock(&nlm_file_mutex);
hlist_for_each_entry(file, &nlm_files[hash], f_list) - if (!nfs_compare_fh(&file->f_handle, &lock->fh)) + if (!nfs_compare_fh(&file->f_handle, &lock->fh)) { + mutex_lock(&file->f_mutex); + nfserr = nlm_do_fopen(rqstp, file, mode); + mutex_unlock(&file->f_mutex); goto found; - + } nlm_debug_print_fh("creating file for", &lock->fh);
nfserr = nlm_lck_denied_nolocks; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) - goto out_unlock; + goto out_free;
memcpy(&file->f_handle, &lock->fh, sizeof(struct nfs_fh)); mutex_init(&file->f_mutex); INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks);
- /* - * Open the file. Note that if we're reexporting, for example, - * this could block the lockd thread for a while. - * - * We have to make sure we have the right credential to open - * the file. - */ - nfserr = nlmsvc_ops->fopen(rqstp, &lock->fh, &file->f_file); - if (nfserr) { - dprintk("lockd: open failed (error %d)\n", nfserr); - goto out_free; - } + nfserr = nlm_do_fopen(rqstp, file, mode); + if (nfserr) + goto out_unlock;
hlist_add_head(&file->f_list, &nlm_files[hash]);
@@ -130,7 +147,6 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, dprintk("lockd: found file %p (count %d)\n", file, file->f_count); *result = file; file->f_count++; - nfserr = 0;
out_unlock: mutex_unlock(&nlm_file_mutex); @@ -150,13 +166,34 @@ nlm_delete_file(struct nlm_file *file) nlm_debug_print_file("closing file", file); if (!hlist_unhashed(&file->f_list)) { hlist_del(&file->f_list); - nlmsvc_ops->fclose(file->f_file); + if (file->f_file[O_RDONLY]) + nlmsvc_ops->fclose(file->f_file[O_RDONLY]); + if (file->f_file[O_WRONLY]) + nlmsvc_ops->fclose(file->f_file[O_WRONLY]); kfree(file); } else { printk(KERN_WARNING "lockd: attempt to release unknown file!\n"); } }
+static int nlm_unlock_files(struct nlm_file *file) +{ + struct file_lock lock; + struct file *f; + + lock.fl_type = F_UNLCK; + lock.fl_start = 0; + lock.fl_end = OFFSET_MAX; + for (f = file->f_file[0]; f <= file->f_file[1]; f++) { + if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { + pr_warn("lockd: unlock failure in %s:%d\n", + __FILE__, __LINE__); + return 1; + } + } + return 0; +} + /* * Loop over all locks on the given file and perform the specified * action. @@ -184,17 +221,10 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
lockhost = ((struct nlm_lockowner *)fl->fl_owner)->host; if (match(lockhost, host)) { - struct file_lock lock = *fl;
spin_unlock(&flctx->flc_lock); - lock.fl_type = F_UNLCK; - lock.fl_start = 0; - lock.fl_end = OFFSET_MAX; - if (vfs_lock_file(file->f_file, F_SETLK, &lock, NULL) < 0) { - printk("lockd: unlock failure in %s:%d\n", - __FILE__, __LINE__); + if (nlm_unlock_files(file)) return 1; - } goto again; } } @@ -248,6 +278,15 @@ nlm_file_inuse(struct nlm_file *file) return 0; }
+static void nlm_close_files(struct nlm_file *file) +{ + struct file *f; + + for (f = file->f_file[0]; f <= file->f_file[1]; f++) + if (f) + nlmsvc_ops->fclose(f); +} + /* * Loop over all files in the file table. */ @@ -278,7 +317,7 @@ nlm_traverse_files(void *data, nlm_host_match_fn_t match, if (list_empty(&file->f_blocks) && !file->f_locks && !file->f_shares && !file->f_count) { hlist_del(&file->f_list); - nlmsvc_ops->fclose(file->f_file); + nlm_close_files(file); kfree(file); } } @@ -412,6 +451,7 @@ nlmsvc_invalidate_all(void) nlm_traverse_files(NULL, nlmsvc_is_client, NULL); }
+ static int nlmsvc_match_sb(void *datap, struct nlm_file *file) { diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c index 3f5b3d7b62b71..606fa155c28ad 100644 --- a/fs/nfsd/lockd.c +++ b/fs/nfsd/lockd.c @@ -25,9 +25,11 @@ * Note: we hold the dentry use count while the file is open. */ static __be32 -nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp) +nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp, + int mode) { __be32 nfserr; + int access; struct svc_fh fh;
/* must initialize before using! but maxsize doesn't matter */ @@ -36,7 +38,9 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp) memcpy((char*)&fh.fh_handle.fh_base, f->data, f->size); fh.fh_export = NULL;
- nfserr = nfsd_open(rqstp, &fh, S_IFREG, NFSD_MAY_LOCK, filp); + access = (mode == O_WRONLY) ? NFSD_MAY_WRITE : NFSD_MAY_READ; + access |= NFSD_MAY_LOCK; + nfserr = nfsd_open(rqstp, &fh, S_IFREG, access, filp); fh_put(&fh); /* We return nlm error codes as nlm doesn't know * about nfsd, but nfsd does know about nlm.. diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h index 0520c0cd73f42..3bc9f7410e213 100644 --- a/include/linux/lockd/bind.h +++ b/include/linux/lockd/bind.h @@ -27,7 +27,8 @@ struct rpc_task; struct nlmsvc_binding { __be32 (*fopen)(struct svc_rqst *, struct nfs_fh *, - struct file **); + struct file **, + int mode); void (*fclose)(struct file *); };
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 81b71ad2040ac..c4ae6506b8b36 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -10,6 +10,8 @@ #ifndef LINUX_LOCKD_LOCKD_H #define LINUX_LOCKD_LOCKD_H
+/* XXX: a lot of this should really be under fs/lockd. */ + #include <linux/in.h> #include <linux/in6.h> #include <net/ipv6.h> @@ -154,7 +156,8 @@ struct nlm_rqst { struct nlm_file { struct hlist_node f_list; /* linked list */ struct nfs_fh f_handle; /* NFS file handle */ - struct file * f_file; /* VFS file pointer */ + struct file * f_file[2]; /* VFS file pointers, + indexed by O_ flags */ struct nlm_share * f_shares; /* DOS shares */ struct list_head f_blocks; /* blocked locks */ unsigned int f_locks; /* guesstimate # of locks */ @@ -267,6 +270,7 @@ typedef int (*nlm_host_match_fn_t)(void *cur, struct nlm_host *ref); /* * Server-side lock handling */ +int lock_to_openmode(struct file_lock *); __be32 nlmsvc_lock(struct svc_rqst *, struct nlm_file *, struct nlm_host *, struct nlm_lock *, int, struct nlm_cookie *, int); @@ -301,7 +305,8 @@ int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr);
static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { - return locks_inode(file->f_file); + return locks_inode(file->f_file[O_RDONLY] ? + file->f_file[O_RDONLY] : file->f_file[O_WRONLY]); }
static inline int __nlm_privileged_request4(const struct sockaddr *sap)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit f657f8eef3ff870552c9fd2839e0061046f44618 ]
NFS implements blocking locks by blocking inside its lock method. In the reexport case, this blocks the nfs server thread, which could lead to deadlocks since an nfs server thread might be required to unlock the conflicting lock. It also causes a crash, since the nfs server thread assumes it can free the lock when its lm_notify lock callback is called.
Ideal would be to make the nfs lock method return without blocking in this case, but for now it works just not to attempt blocking locks. The difference is just that the original client will have to poll (as it does in the v4.0 case) instead of getting a callback when the lock's available.
Signed-off-by: J. Bruce Fields bfields@redhat.com Acked-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 2 +- fs/nfsd/nfs4state.c | 8 ++++++-- include/linux/exportfs.h | 2 ++ 3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b347e3ce0cc8e..40beac65d1355 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -184,5 +184,5 @@ const struct export_operations nfs_export_ops = { .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR, + EXPORT_OP_NOATOMIC_ATTR|EXPORT_OP_SYNC_LOCKS, }; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 401f0f2743717..fd3bdf0bf0052 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6878,6 +6878,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_blocked_lock *nbl = NULL; struct file_lock *file_lock = NULL; struct file_lock *conflock = NULL; + struct super_block *sb; __be32 status = 0; int lkflg; int err; @@ -6899,6 +6900,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, dprintk("NFSD: nfsd4_lock: permission denied!\n"); return status; } + sb = cstate->current_fh.fh_dentry->d_sb;
if (lock->lk_is_new) { if (nfsd4_has_session(cstate)) @@ -6947,7 +6949,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: - if (nfsd4_has_session(cstate)) + if (nfsd4_has_session(cstate) && + !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_READ_LT: @@ -6959,7 +6962,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fl_type = F_RDLCK; break; case NFS4_WRITEW_LT: - if (nfsd4_has_session(cstate)) + if (nfsd4_has_session(cstate) && + !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_WRITE_LT: diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index fe848901fcc3a..3260fe7148462 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,6 +221,8 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ +#define EXPORT_OP_SYNC_LOCKS (0x20) /* Filesystem can't do + asychronous blocking locks */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit b840be2f00c0bc00d993f8f76e251052b83e4382 ]
As in the v4 case, it doesn't work well to block waiting for a lock on an nfs filesystem.
As in the v4 case, that means we're depending on the client to poll. It's probably incorrect to depend on that, but I *think* clients do poll in practice. In any case, it's an improvement over hanging the lockd thread indefinitely as we currently are.
Signed-off-by: J. Bruce Fields bfields@redhat.com Acked-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svclock.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index a7b4c51667ada..e9b85d8fd5fe7 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -31,6 +31,7 @@ #include <linux/lockd/nlm.h> #include <linux/lockd/lockd.h> #include <linux/kthread.h> +#include <linux/exportfs.h>
#define NLMDBG_FACILITY NLMDBG_SVCLOCK
@@ -470,18 +471,24 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_cookie *cookie, int reclaim) { struct nlm_block *block = NULL; + struct inode *inode = nlmsvc_file_inode(file); int error; int mode; + int async_block = 0; __be32 ret;
dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", - nlmsvc_file_inode(file)->i_sb->s_id, - nlmsvc_file_inode(file)->i_ino, + inode->i_sb->s_id, inode->i_ino, lock->fl.fl_type, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end, wait);
+ if (inode->i_sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS) { + async_block = wait; + wait = 0; + } + /* Lock file against concurrent access */ mutex_lock(&file->f_mutex); /* Get existing block (in case client is busy-waiting) @@ -542,7 +549,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, */ if (wait) break; - ret = nlm_lck_denied; + ret = async_block ? nlm_lck_blocked : nlm_lck_denied; goto out; case FILE_LOCK_DEFERRED: if (wait)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit bb0a55bb7148a49e549ee992200860e7a040d3a5 ]
In the reexport case, nfsd is currently passing along locks with the reclaim bit set. The client sends a new lock request, which is granted if there's currently no conflict--even if it's possible a conflicting lock could have been briefly held in the interim.
We don't currently have any way to safely grant reclaim, so for now let's just deny them all.
I'm doing this by passing the reclaim bit to nfs and letting it fail the call, with the idea that eventually the client might be able to do something more forgiving here.
Signed-off-by: J. Bruce Fields bfields@redhat.com Acked-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/file.c | 3 +++ fs/nfsd/nfs4state.c | 3 +++ fs/nfsd/nfsproc.c | 1 + include/linux/errno.h | 1 + include/linux/fs.h | 1 + 5 files changed, 9 insertions(+)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 7be1a7f7fcb2a..d35aae47b062b 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -798,6 +798,9 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
nfs_inc_stats(inode, NFSIOS_VFSLOCK);
+ if (fl->fl_flags & FL_RECLAIM) + return -ENOGRACE; + /* No mandatory locks over NFS */ if (__mandatory_lock(inode) && fl->fl_type != F_UNLCK) goto out_err; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fd3bdf0bf0052..e9ac77c28741e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6946,6 +6946,9 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!locks_in_grace(net) && lock->lk_reclaim) goto out;
+ if (lock->lk_reclaim) + fl_flags |= FL_RECLAIM; + fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 72f8bc4a7ea48..78bdfdc253fd3 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -881,6 +881,7 @@ nfserrno (int errno) { nfserr_serverfault, -ENFILE }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, + { nfserr_no_grace, -ENOGRACE}, }; int i;
diff --git a/include/linux/errno.h b/include/linux/errno.h index d73f597a24849..8b0c754bab025 100644 --- a/include/linux/errno.h +++ b/include/linux/errno.h @@ -31,5 +31,6 @@ #define EJUKEBOX 528 /* Request initiated, but will not complete before timeout */ #define EIOCBQUEUED 529 /* iocb queued, will get completion event */ #define ERECALLCONFLICT 530 /* conflict with recalled state */ +#define ENOGRACE 531 /* NFS file lock reclaim refused */
#endif diff --git a/include/linux/fs.h b/include/linux/fs.h index a9ac60d3be1d6..c0459446e1440 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -996,6 +996,7 @@ static inline struct file *get_file(struct file *f) #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ #define FL_LAYOUT 2048 /* outstanding pNFS layout */ +#define FL_RECLAIM 4096 /* reclaiming from a reboot server */
#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 438623a06bacd69c40c4af633bb09a3bbb9dfc78 ]
I'd like to take commit 4532608d71c8 ("SUNRPC: Clean up generic dispatcher code") even further by using only private local SVC dispatchers for all kernel RPC services. This change would enable the removal of the logic that switches between svc_generic_dispatch() and a service's private dispatcher, and simplify the invocation of the service's pc_release method so that humans can visually verify that it is always invoked properly.
All that will come later.
First, let's provide a better way to return authentication errors from SVC dispatcher functions. Instead of overloading the dispatch method's *statp argument, add a field to struct svc_rqst that can hold an error value.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/svc.h | 1 + include/linux/sunrpc/svcauth.h | 4 +-- include/trace/events/sunrpc.h | 6 ++--- net/sunrpc/auth_gss/svcauth_gss.c | 43 +++++++++++++++---------------- net/sunrpc/svc.c | 17 ++++++------ net/sunrpc/svcauth.c | 8 +++--- net/sunrpc/svcauth_unix.c | 12 ++++----- 7 files changed, 46 insertions(+), 45 deletions(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index ab9afbf0a0d8b..7b7bd8a0a6fbd 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -284,6 +284,7 @@ struct svc_rqst { void * rq_argp; /* decoded arguments */ void * rq_resp; /* xdr'd results */ void * rq_auth_data; /* flavor-specific data */ + __be32 rq_auth_stat; /* authentication status */ int rq_auth_slack; /* extra space xdr code * should leave in head * for krb5i, krb5p. diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index b0003866a2497..6d9cc9080aca7 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -127,7 +127,7 @@ struct auth_ops { char * name; struct module *owner; int flavour; - int (*accept)(struct svc_rqst *rq, __be32 *authp); + int (*accept)(struct svc_rqst *rq); int (*release)(struct svc_rqst *rq); void (*domain_release)(struct auth_domain *); int (*set_client)(struct svc_rqst *rq); @@ -149,7 +149,7 @@ struct auth_ops {
struct svc_xprt;
-extern int svc_authenticate(struct svc_rqst *rqstp, __be32 *authp); +extern int svc_authenticate(struct svc_rqst *rqstp); extern int svc_authorise(struct svc_rqst *rqstp); extern int svc_set_client(struct svc_rqst *rqstp); extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 657138ab1f91f..0cb9e182a1b1e 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1547,9 +1547,9 @@ TRACE_DEFINE_ENUM(SVC_COMPLETE); { SVC_COMPLETE, "SVC_COMPLETE" })
TRACE_EVENT(svc_authenticate, - TP_PROTO(const struct svc_rqst *rqst, int auth_res, __be32 auth_stat), + TP_PROTO(const struct svc_rqst *rqst, int auth_res),
- TP_ARGS(rqst, auth_res, auth_stat), + TP_ARGS(rqst, auth_res),
TP_STRUCT__entry( __field(u32, xid) @@ -1560,7 +1560,7 @@ TRACE_EVENT(svc_authenticate, TP_fast_assign( __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->svc_status = auth_res; - __entry->auth_stat = be32_to_cpu(auth_stat); + __entry->auth_stat = be32_to_cpu(rqst->rq_auth_stat); ),
TP_printk("xid=0x%08x auth_res=%s auth_stat=%s", diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 784c8b24f1640..54303b7efde76 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -707,11 +707,11 @@ svc_safe_putnetobj(struct kvec *resv, struct xdr_netobj *o) /* * Verify the checksum on the header and return SVC_OK on success. * Otherwise, return SVC_DROP (in the case of a bad sequence number) - * or return SVC_DENIED and indicate error in authp. + * or return SVC_DENIED and indicate error in rqstp->rq_auth_stat. */ static int gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, - __be32 *rpcstart, struct rpc_gss_wire_cred *gc, __be32 *authp) + __be32 *rpcstart, struct rpc_gss_wire_cred *gc) { struct gss_ctx *ctx_id = rsci->mechctx; struct xdr_buf rpchdr; @@ -725,7 +725,7 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, iov.iov_len = (u8 *)argv->iov_base - (u8 *)rpcstart; xdr_buf_from_iov(&iov, &rpchdr);
- *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; if (argv->iov_len < 4) return SVC_DENIED; flavor = svc_getnl(argv); @@ -737,13 +737,13 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, if (rqstp->rq_deferred) /* skip verification of revisited request */ return SVC_OK; if (gss_verify_mic(ctx_id, &rpchdr, &checksum) != GSS_S_COMPLETE) { - *authp = rpcsec_gsserr_credproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; return SVC_DENIED; }
if (gc->gc_seq > MAXSEQ) { trace_rpcgss_svc_seqno_large(rqstp, gc->gc_seq); - *authp = rpcsec_gsserr_ctxproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; return SVC_DENIED; } if (!gss_check_seq_num(rqstp, rsci, gc->gc_seq)) @@ -1136,7 +1136,7 @@ static void gss_free_in_token_pages(struct gssp_in_token *in_token) }
static int gss_read_proxy_verf(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp, + struct rpc_gss_wire_cred *gc, struct xdr_netobj *in_handle, struct gssp_in_token *in_token) { @@ -1145,7 +1145,7 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp, int pages, i, res, pgto, pgfrom; size_t inlen, to_offs, from_offs;
- res = gss_read_common_verf(gc, argv, authp, in_handle); + res = gss_read_common_verf(gc, argv, &rqstp->rq_auth_stat, in_handle); if (res) return res;
@@ -1226,7 +1226,7 @@ gss_write_resv(struct kvec *resv, size_t size_limit, * Otherwise, drop the request pending an answer to the upcall. */ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp) + struct rpc_gss_wire_cred *gc) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1235,7 +1235,7 @@ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
memset(&rsikey, 0, sizeof(rsikey)); - ret = gss_read_verf(gc, argv, authp, + ret = gss_read_verf(gc, argv, &rqstp->rq_auth_stat, &rsikey.in_handle, &rsikey.in_token); if (ret) return ret; @@ -1338,7 +1338,7 @@ static int gss_proxy_save_rsc(struct cache_detail *cd, }
static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp) + struct rpc_gss_wire_cred *gc) { struct kvec *resv = &rqstp->rq_res.head[0]; struct xdr_netobj cli_handle; @@ -1350,8 +1350,7 @@ static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
memset(&ud, 0, sizeof(ud)); - ret = gss_read_proxy_verf(rqstp, gc, authp, - &ud.in_handle, &ud.in_token); + ret = gss_read_proxy_verf(rqstp, gc, &ud.in_handle, &ud.in_token); if (ret) return ret;
@@ -1524,7 +1523,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {} * response here and return SVC_COMPLETE. */ static int -svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_gss_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1537,7 +1536,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) int ret; struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
- *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; if (!svcdata) svcdata = kmalloc(sizeof(*svcdata), GFP_KERNEL); if (!svcdata) @@ -1574,22 +1573,22 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) if ((gc->gc_proc != RPC_GSS_PROC_DATA) && (rqstp->rq_proc != 0)) goto auth_err;
- *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; switch (gc->gc_proc) { case RPC_GSS_PROC_INIT: case RPC_GSS_PROC_CONTINUE_INIT: if (use_gss_proxy(SVC_NET(rqstp))) - return svcauth_gss_proxy_init(rqstp, gc, authp); + return svcauth_gss_proxy_init(rqstp, gc); else - return svcauth_gss_legacy_init(rqstp, gc, authp); + return svcauth_gss_legacy_init(rqstp, gc); case RPC_GSS_PROC_DATA: case RPC_GSS_PROC_DESTROY: /* Look up the context, and check the verifier: */ - *authp = rpcsec_gsserr_credproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; rsci = gss_svc_searchbyctx(sn->rsc_cache, &gc->gc_ctx); if (!rsci) goto auth_err; - switch (gss_verify_header(rqstp, rsci, rpcstart, gc, authp)) { + switch (gss_verify_header(rqstp, rsci, rpcstart, gc)) { case SVC_OK: break; case SVC_DENIED: @@ -1599,7 +1598,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) } break; default: - *authp = rpc_autherr_rejectedcred; + rqstp->rq_auth_stat = rpc_autherr_rejectedcred; goto auth_err; }
@@ -1615,13 +1614,13 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) svc_putnl(resv, RPC_SUCCESS); goto complete; case RPC_GSS_PROC_DATA: - *authp = rpcsec_gsserr_ctxproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; svcdata->verf_start = resv->iov_base + resv->iov_len; if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq)) goto auth_err; rqstp->rq_cred = rsci->cred; get_group_info(rsci->cred.cr_group_info); - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; switch (gc->gc_svc) { case RPC_GSS_SVC_NONE: break; diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 27f98756d1751..cbcc951639ad5 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1305,7 +1305,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) struct svc_process_info process; __be32 *statp; u32 prog, vers; - __be32 auth_stat, rpc_stat; + __be32 rpc_stat; int auth_res; __be32 *reply_statp;
@@ -1348,14 +1348,14 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) * We do this before anything else in order to get a decent * auth verifier. */ - auth_res = svc_authenticate(rqstp, &auth_stat); + auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ if (auth_res == SVC_OK && progp) { - auth_stat = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; auth_res = progp->pg_authenticate(rqstp); } if (auth_res != SVC_OK) - trace_svc_authenticate(rqstp, auth_res, auth_stat); + trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { case SVC_OK: break; @@ -1414,8 +1414,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto release_dropit; if (*statp == rpc_garbage_args) goto err_garbage; - auth_stat = svc_get_autherr(rqstp, statp); - if (auth_stat != rpc_auth_ok) + rqstp->rq_auth_stat = svc_get_autherr(rqstp, statp); + if (rqstp->rq_auth_stat != rpc_auth_ok) goto err_release_bad_auth; } else { dprintk("svc: calling dispatcher\n"); @@ -1472,13 +1472,14 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) if (procp->pc_release) procp->pc_release(rqstp); err_bad_auth: - dprintk("svc: authentication failed (%d)\n", ntohl(auth_stat)); + dprintk("svc: authentication failed (%d)\n", + be32_to_cpu(rqstp->rq_auth_stat)); serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of accept status: */ xdr_ressize_check(rqstp, reply_statp); svc_putnl(resv, 1); /* REJECT */ svc_putnl(resv, 1); /* AUTH_ERROR */ - svc_putnl(resv, ntohl(auth_stat)); /* status */ + svc_putu32(resv, rqstp->rq_auth_stat); /* status */ goto sendit;
err_bad_prog: diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c index 998b196b61767..5a8b8e03fdd42 100644 --- a/net/sunrpc/svcauth.c +++ b/net/sunrpc/svcauth.c @@ -59,12 +59,12 @@ svc_put_auth_ops(struct auth_ops *aops) }
int -svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) +svc_authenticate(struct svc_rqst *rqstp) { rpc_authflavor_t flavor; struct auth_ops *aops;
- *authp = rpc_auth_ok; + rqstp->rq_auth_stat = rpc_auth_ok;
flavor = svc_getnl(&rqstp->rq_arg.head[0]);
@@ -72,7 +72,7 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp)
aops = svc_get_auth_ops(flavor); if (aops == NULL) { - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; }
@@ -80,7 +80,7 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) init_svc_cred(&rqstp->rq_cred);
rqstp->rq_authop = aops; - return aops->accept(rqstp, authp); + return aops->accept(rqstp); } EXPORT_SYMBOL_GPL(svc_authenticate);
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 60754a292589b..c20c63d651a9c 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -743,7 +743,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svcauth_unix_set_client);
static int -svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_null_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -754,12 +754,12 @@ svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp)
if (svc_getu32(argv) != 0) { dprintk("svc: bad null cred\n"); - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { dprintk("svc: bad null verf\n"); - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; return SVC_DENIED; }
@@ -803,7 +803,7 @@ struct auth_ops svcauth_null = {
static int -svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_unix_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -845,7 +845,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) } groups_sort(cred->cr_group_info); if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; return SVC_DENIED; }
@@ -857,7 +857,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) return SVC_OK;
badcred: - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5c2465dfd457f3015eebcc3ace50570e1d896aeb ]
In a few moments, rq_auth_stat will need to be explicitly set to rpc_auth_ok before execution gets to the dispatcher.
svc_authenticate() already sets it, but it often gets reset to rpc_autherr_badcred right after that call, even when authentication is successful. Let's ensure that the pg_authenticate callout and svc_set_client() set it properly in every case.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 2 ++ fs/nfs/callback.c | 4 ++++ net/sunrpc/auth_gss/svcauth_gss.c | 4 ++++ net/sunrpc/svc.c | 4 +--- net/sunrpc/svcauth_unix.c | 6 +++++- 5 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0ab9756ed2359..b632be3ad57b2 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -649,6 +649,7 @@ static int lockd_authenticate(struct svc_rqst *rqstp) switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: case RPC_AUTH_UNIX: + rqstp->rq_auth_stat = rpc_auth_ok; if (rqstp->rq_proc == 0) return SVC_OK; if (is_callback(rqstp->rq_proc)) { @@ -659,6 +660,7 @@ static int lockd_authenticate(struct svc_rqst *rqstp) } return svc_set_client(rqstp); } + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; }
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 7817ad94a6bae..86d856de1389b 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -429,6 +429,8 @@ check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp) */ static int nfs_callback_authenticate(struct svc_rqst *rqstp) { + rqstp->rq_auth_stat = rpc_autherr_badcred; + switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: if (rqstp->rq_proc != CB_NULL) @@ -439,6 +441,8 @@ static int nfs_callback_authenticate(struct svc_rqst *rqstp) if (svc_is_backchannel(rqstp)) return SVC_DENIED; } + + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; }
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 54303b7efde76..329eac782cc5e 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1038,6 +1038,8 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) struct rpc_gss_wire_cred *gc = &svcdata->clcred; int stat;
+ rqstp->rq_auth_stat = rpc_autherr_badcred; + /* * A gss export can be specified either by: * export *(sec=krb5,rw) @@ -1053,6 +1055,8 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) stat = svcauth_unix_set_client(rqstp); if (stat == SVC_DROP || stat == SVC_CLOSE) return stat; + + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; }
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index cbcc951639ad5..f036507275338 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1350,10 +1350,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) */ auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ - if (auth_res == SVC_OK && progp) { - rqstp->rq_auth_stat = rpc_autherr_badcred; + if (auth_res == SVC_OK && progp) auth_res = progp->pg_authenticate(rqstp); - } if (auth_res != SVC_OK) trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index c20c63d651a9c..1868596259af5 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -699,8 +699,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
rqstp->rq_client = NULL; if (rqstp->rq_proc == 0) - return SVC_OK; + goto out;
+ rqstp->rq_auth_stat = rpc_autherr_badcred; ipm = ip_map_cached_get(xprt); if (ipm == NULL) ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, @@ -737,6 +738,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) put_group_info(cred->cr_group_info); cred->cr_group_info = gi; } + +out: + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9082e1d914f8b27114352b1940bbcc7522f682e7 ]
Now that there is an alternate method for returning an auth_stat value, replace the RQ_AUTHERR flag with use of that new method.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback_xdr.c | 3 ++- include/linux/sunrpc/svc.h | 2 -- include/trace/events/sunrpc.h | 3 +-- net/sunrpc/svc.c | 24 ++++-------------------- 4 files changed, 7 insertions(+), 25 deletions(-)
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index f9dfd4e712a30..0559e8b6a8ec4 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -984,7 +984,8 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp)
out_invalidcred: pr_warn_ratelimited("NFS: NFSv4 callback contains invalid cred\n"); - return svc_return_autherr(rqstp, rpc_autherr_badcred); + rqstp->rq_auth_stat = rpc_autherr_badcred; + return rpc_success; }
/* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 7b7bd8a0a6fbd..dd3daadbc0e5c 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -277,7 +277,6 @@ struct svc_rqst { #define RQ_VICTIM (5) /* about to be shut down */ #define RQ_BUSY (6) /* request is busy */ #define RQ_DATA (7) /* request has data */ -#define RQ_AUTHERR (8) /* Request status is auth error */ unsigned long rq_flags; /* flags field */ ktime_t rq_qtime; /* enqueue time */
@@ -537,7 +536,6 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); -__be32 svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err); __be32 svc_generic_init_request(struct svc_rqst *rqstp, const struct svc_program *progp, struct svc_process_info *procinfo); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 0cb9e182a1b1e..fce071f39f51f 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1480,8 +1480,7 @@ DEFINE_SVCXDRBUF_EVENT(sendto); svc_rqst_flag(SPLICE_OK) \ svc_rqst_flag(VICTIM) \ svc_rqst_flag(BUSY) \ - svc_rqst_flag(DATA) \ - svc_rqst_flag_end(AUTHERR) + svc_rqst_flag_end(DATA)
#undef svc_rqst_flag #undef svc_rqst_flag_end diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index f036507275338..0d3c3ca2830a8 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1187,22 +1187,6 @@ void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif
-__be32 -svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err) -{ - set_bit(RQ_AUTHERR, &rqstp->rq_flags); - return auth_err; -} -EXPORT_SYMBOL_GPL(svc_return_autherr); - -static __be32 -svc_get_autherr(struct svc_rqst *rqstp, __be32 *statp) -{ - if (test_and_clear_bit(RQ_AUTHERR, &rqstp->rq_flags)) - return *statp; - return rpc_auth_ok; -} - static int svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) { @@ -1226,7 +1210,7 @@ svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) test_bit(RQ_DROPME, &rqstp->rq_flags)) return 0;
- if (test_bit(RQ_AUTHERR, &rqstp->rq_flags)) + if (rqstp->rq_auth_stat != rpc_auth_ok) return 1;
if (*statp != rpc_success) @@ -1412,15 +1396,15 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto release_dropit; if (*statp == rpc_garbage_args) goto err_garbage; - rqstp->rq_auth_stat = svc_get_autherr(rqstp, statp); - if (rqstp->rq_auth_stat != rpc_auth_ok) - goto err_release_bad_auth; } else { dprintk("svc: calling dispatcher\n"); if (!process.dispatch(rqstp, statp)) goto release_dropit; /* Release reply info */ }
+ if (rqstp->rq_auth_stat != rpc_auth_ok) + goto err_release_bad_auth; + /* Check RPC status result */ if (*statp != rpc_success) resv->iov_len = ((void*)statp) - resv->iov_base + 4;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7d34c96217cf3c2d37ca0a56ca0bc3c3bef1e189 ]
The client's NFSv4 callback service is the only remaining user of svc_generic_dispatch().
Note that the NFSv4 callback service doesn't use the .pc_encode and .pc_decode callouts in any substantial way, so they are removed.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback_xdr.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 0559e8b6a8ec4..e7d1efd45fa46 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -988,6 +988,15 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp) return rpc_success; }
+static int +nfs_callback_dispatch(struct svc_rqst *rqstp, __be32 *statp) +{ + const struct svc_procedure *procp = rqstp->rq_procinfo; + + *statp = procp->pc_func(rqstp); + return 1; +} + /* * Define NFS4 callback COMPOUND ops. */ @@ -1076,7 +1085,7 @@ const struct svc_version nfs4_callback_version1 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = NULL, + .vs_dispatch = nfs_callback_dispatch, .vs_hidden = true, .vs_need_cong_ctrl = true, }; @@ -1088,7 +1097,7 @@ const struct svc_version nfs4_callback_version4 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count4, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = NULL, + .vs_dispatch = nfs_callback_dispatch, .vs_hidden = true, .vs_need_cong_ctrl = true, };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c35a810ce59524971c4a3b45faed4d0121e5a305 ]
Clean up: The callback RPC dispatcher no longer invokes these call outs, although svc_process_common() relies on seeing a .pc_encode function.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback_xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index e7d1efd45fa46..600e640682401 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -63,11 +63,10 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) return htonl(NFS4_OK); }
-static int nfs4_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - +/* + * svc_process_common() looks for an XDR encoder to know when + * not to drop a Reply. + */ static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p) { return xdr_ressize_check(rqstp, p); @@ -1063,7 +1062,6 @@ static struct callback_op callback_ops[] = { static const struct svc_procedure nfs4_callback_procedures1[] = { [CB_NULL] = { .pc_func = nfs4_callback_null, - .pc_decode = nfs4_decode_void, .pc_encode = nfs4_encode_void, .pc_xdrressize = 1, .pc_name = "NULL",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 4396a73115fc8739083536162e2228c0c0c3ed1a ]
Fix a leak in s_fsnotify_connectors counter in case of a race between concurrent add of new fsnotify mark to an object.
The task that lost the race fails to drop the counter before freeing the unused connector.
Following umount() hangs in fsnotify_sb_delete()/wait_var_event(), because s_fsnotify_connectors never drops to zero.
Fixes: ec44610fe2b8 ("fsnotify: count all objects with attached connectors") Reported-by: Murphy Zhou jencce.kernel@gmail.com Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux... Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/mark.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 796946eb0c2e2..bea106fac0901 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -531,6 +531,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, /* Someone else created list structure for us */ if (inode) fsnotify_put_inode_ref(inode); + fsnotify_put_sb_connectors(conn); kmem_cache_free(fsnotify_mark_connector_cachep, conn); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 89c485c7a3ecbc2ebd568f9c9c2edf3a8cf7485b ]
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found.
Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper.
Reported-by: Dai Ngo Dai.Ngo@oracle.com Fixes: a6a63ca5652e ("lockd: Common NLM XDR helpers") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcxdr.h | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/fs/lockd/svcxdr.h b/fs/lockd/svcxdr.h index c69a0bb76c940..4f1a451da5ba2 100644 --- a/fs/lockd/svcxdr.h +++ b/fs/lockd/svcxdr.h @@ -134,18 +134,9 @@ svcxdr_decode_owner(struct xdr_stream *xdr, struct xdr_netobj *obj) static inline bool svcxdr_encode_owner(struct xdr_stream *xdr, const struct xdr_netobj *obj) { - unsigned int quadlen = XDR_QUADLEN(obj->len); - __be32 *p; - - if (xdr_stream_encode_u32(xdr, obj->len) < 0) - return false; - p = xdr_reserve_space(xdr, obj->len); - if (!p) + if (obj->len > XDR_MAX_NETOBJ) return false; - p[quadlen - 1] = 0; /* XDR pad */ - memcpy(p, obj->data, obj->len); - - return true; + return xdr_stream_encode_opaque(xdr, obj->data, obj->len) > 0; }
#endif /* _LOCKD_SVCXDR_H_ */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 19598141f40dff728dd50799e510805261f48850 ]
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d0748ff04b92f..87d984e0cdc0c 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -541,7 +541,7 @@ nfsd_file_close_inode_sync(struct inode *inode) }
/** - * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file + * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * * Walk the whole hash bucket, looking for any files that correspond to "inode".
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 9baf93d68bcc3d0a6042283b82603c076e25e4f5 ]
Align the arguments of fsnotify_name() to those of fsnotify().
Link: https://lore.kernel.org/r/20211025192746.66445-2-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz [ cel: adjust fsnotify_delete as well, a37d9a17f099 is already applied ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/fsnotify.h | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index a9477c14fad5c..211463715e29d 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -26,20 +26,21 @@ * FS_EVENT_ON_CHILD mask on the parent inode and will not be reported if only * the child is interested and not the parent. */ -static inline void fsnotify_name(struct inode *dir, __u32 mask, - struct inode *child, - const struct qstr *name, u32 cookie) +static inline int fsnotify_name(__u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *name, + u32 cookie) { if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) - return; + return 0;
- fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); + return fsnotify(mask, data, data_type, dir, name, NULL, cookie); }
static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, __u32 mask) { - fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); + fsnotify_name(mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, + dir, &dentry->d_name, 0); }
static inline void fsnotify_inode(struct inode *inode, __u32 mask) @@ -154,8 +155,10 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, new_dir_mask |= FS_ISDIR; }
- fsnotify_name(old_dir, old_dir_mask, source, old_name, fs_cookie); - fsnotify_name(new_dir, new_dir_mask, source, new_name, fs_cookie); + fsnotify_name(old_dir_mask, source, FSNOTIFY_EVENT_INODE, + old_dir, old_name, fs_cookie); + fsnotify_name(new_dir_mask, source, FSNOTIFY_EVENT_INODE, + new_dir, new_name, fs_cookie);
if (target) fsnotify_link_count(target); @@ -209,7 +212,8 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, fsnotify_link_count(inode); audit_inode_child(dir, new_dentry, AUDIT_TYPE_CHILD_CREATE);
- fsnotify_name(dir, FS_CREATE, inode, &new_dentry->d_name, 0); + fsnotify_name(FS_CREATE, inode, FSNOTIFY_EVENT_INODE, + dir, &new_dentry->d_name, 0); }
/* @@ -228,7 +232,8 @@ static inline void fsnotify_delete(struct inode *dir, struct inode *inode, if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR;
- fsnotify_name(dir, mask, inode, &dentry->d_name, 0); + fsnotify_name(mask, inode, FSNOTIFY_EVENT_INODE, dir, &dentry->d_name, + 0); }
/**
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit fd5a3ff49a19aa69e2bc1e26e98037c2d778e61a ]
Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY. Use it to pass the dentry instead of it's ->d_inode where available.
This is needed in preparation to the refactor to retrieve the super block from the data field. In some cases (i.e. mkdir in kernfs), the data inode comes from a negative dentry, such that no super block information would be available. By receiving the dentry itself, instead of the inode, fsnotify can derive the super block even on these cases.
Link: https://lore.kernel.org/r/20211025192746.66445-3-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Amir Goldstein amir73il@gmail.com [Expand explanation in commit message] Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/fsnotify.h | 5 ++--- include/linux/fsnotify_backend.h | 16 ++++++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 211463715e29d..e969a23f70631 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -39,8 +39,7 @@ static inline int fsnotify_name(__u32 mask, const void *data, int data_type, static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, __u32 mask) { - fsnotify_name(mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, - dir, &dentry->d_name, 0); + fsnotify_name(mask, dentry, FSNOTIFY_EVENT_DENTRY, dir, &dentry->d_name, 0); }
static inline void fsnotify_inode(struct inode *inode, __u32 mask) @@ -87,7 +86,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, */ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) { - fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE); + fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY); }
static inline int fsnotify_file(struct file *file, __u32 mask) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 1ce66748a2d29..a2db821e8a8f2 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -248,6 +248,7 @@ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, + FSNOTIFY_EVENT_DENTRY, };
static inline struct inode *fsnotify_data_inode(const void *data, int data_type) @@ -255,6 +256,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) switch (data_type) { case FSNOTIFY_EVENT_INODE: return (struct inode *)data; + case FSNOTIFY_EVENT_DENTRY: + return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); default: @@ -262,6 +265,19 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) } }
+static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_DENTRY: + /* Non const is needed for dget() */ + return (struct dentry *)data; + case FSNOTIFY_EVENT_PATH: + return ((const struct path *)data)->dentry; + default: + return NULL; + } +} + static inline const struct path *fsnotify_data_path(const void *data, int data_type) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit dabe729dddca550446e9cc118c96d1f91703345b ]
Clarify argument names and contract for fsnotify_create() and fsnotify_mkdir() to reflect the anomaly of kernfs, which leaves dentries negavite after mkdir/create.
Remove the WARN_ON(!inode) in audit code that were added by the Fixes commit under the wrong assumption that dentries cannot be negative after mkdir/create.
Fixes: aa93bdc5500c ("fsnotify: use helpers to access data by data_type") Link: https://lore.kernel.org/linux-fsdevel/87mtp5yz0q.fsf@collabora.com/ Link: https://lore.kernel.org/r/20211025192746.66445-4-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Reported-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/fsnotify.h | 22 ++++++++++++++++------ kernel/audit_fsnotify.c | 3 +-- kernel/audit_watch.c | 3 +-- 3 files changed, 18 insertions(+), 10 deletions(-)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index e969a23f70631..addca4ea56ad9 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -192,16 +192,22 @@ static inline void fsnotify_inoderemove(struct inode *inode)
/* * fsnotify_create - 'name' was linked in + * + * Caller must make sure that dentry->d_name is stable. + * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate + * ->d_inode later */ -static inline void fsnotify_create(struct inode *inode, struct dentry *dentry) +static inline void fsnotify_create(struct inode *dir, struct dentry *dentry) { - audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE);
- fsnotify_dirent(inode, dentry, FS_CREATE); + fsnotify_dirent(dir, dentry, FS_CREATE); }
/* * fsnotify_link - new hardlink in 'inode' directory + * + * Caller must make sure that new_dentry->d_name is stable. * Note: We have to pass also the linked inode ptr as some filesystems leave * new_dentry->d_inode NULL and instantiate inode pointer later */ @@ -267,12 +273,16 @@ static inline void fsnotify_unlink(struct inode *dir, struct dentry *dentry)
/* * fsnotify_mkdir - directory 'name' was created + * + * Caller must make sure that dentry->d_name is stable. + * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate + * ->d_inode later */ -static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry) +static inline void fsnotify_mkdir(struct inode *dir, struct dentry *dentry) { - audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE);
- fsnotify_dirent(inode, dentry, FS_CREATE | FS_ISDIR); + fsnotify_dirent(dir, dentry, FS_CREATE | FS_ISDIR); }
/* diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index b2ebacd2f3097..76a5925b4e18d 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -161,8 +161,7 @@ static int audit_mark_handle_event(struct fsnotify_mark *inode_mark, u32 mask,
audit_mark = container_of(inode_mark, struct audit_fsnotify_mark, mark);
- if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group) || - WARN_ON_ONCE(!inode)) + if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group)) return 0;
if (mask & (FS_CREATE|FS_MOVED_TO|FS_DELETE|FS_MOVED_FROM)) { diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index edbeffee64b8e..fd7b30a2d9a4b 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -472,8 +472,7 @@ static int audit_watch_handle_event(struct fsnotify_mark *inode_mark, u32 mask,
parent = container_of(inode_mark, struct audit_parent, mark);
- if (WARN_ON_ONCE(inode_mark->group != audit_watch_group) || - WARN_ON_ONCE(!inode)) + if (WARN_ON_ONCE(inode_mark->group != audit_watch_group)) return 0;
if (mask & (FS_CREATE|FS_MOVED_TO) && inode)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit cc53b55f697fe5aa98bdbfdfe67c6401da242155 ]
Some events, like the overflow event, are not mergeable, so they are not hashed. But, when failing inside fsnotify_add_event for lack of space, fsnotify_add_event() still calls the insert hook, which adds the overflow event to the merge list. Add a check to prevent any kind of unmergeable event to be inserted in the hashtable.
Fixes: 94e00d28a680 ("fsnotify: use hash table for faster events merge") Link: https://lore.kernel.org/r/20211025192746.66445-5-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 057abd2cf8875..310246f8d3f19 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -702,6 +702,9 @@ static void fanotify_insert_event(struct fsnotify_group *group,
assert_spin_locked(&group->notification_lock);
+ if (!fanotify_is_hashed_event(event->mask)) + return; + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, group, event, bucket);
@@ -779,8 +782,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
fsn_event = &event->fse; ret = fsnotify_add_event(group, fsn_event, fanotify_merge, - fanotify_is_hashed_event(mask) ? - fanotify_insert_event : NULL); + fanotify_insert_event); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit b9928e80dda84b349ba8de01780b9bef2fc36ffa ]
Every time this function is invoked, it is immediately added to FAN_EVENT_METADATA_LEN, since there is no need to just calculate the length of info records. This minor clean up folds the rest of the calculation into the function, which now operates in terms of events, returning the size of the entire event, including metadata.
Link: https://lore.kernel.org/r/20211025192746.66445-6-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++------------- 1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 10fb062065b69..846e2a661526c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -121,17 +121,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len) FANOTIFY_EVENT_ALIGN); }
-static int fanotify_event_info_len(unsigned int info_mode, - struct fanotify_event *event) +static size_t fanotify_event_len(unsigned int info_mode, + struct fanotify_event *event) { - struct fanotify_info *info = fanotify_event_info(event); - int dir_fh_len = fanotify_event_dir_fh_len(event); - int fh_len = fanotify_event_object_fh_len(event); - int info_len = 0; + size_t event_len = FAN_EVENT_METADATA_LEN; + struct fanotify_info *info; + int dir_fh_len; + int fh_len; int dot_len = 0;
+ if (!info_mode) + return event_len; + + info = fanotify_event_info(event); + dir_fh_len = fanotify_event_dir_fh_len(event); + fh_len = fanotify_event_object_fh_len(event); + if (dir_fh_len) { - info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { /* @@ -142,12 +149,12 @@ static int fanotify_event_info_len(unsigned int info_mode, }
if (info_mode & FAN_REPORT_PIDFD) - info_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + event_len += FANOTIFY_PIDFD_INFO_HDR_LEN;
if (fh_len) - info_len += fanotify_fid_info_len(fh_len, dot_len); + event_len += fanotify_fid_info_len(fh_len, dot_len);
- return info_len; + return event_len; }
/* @@ -176,7 +183,7 @@ static void fanotify_unhash_event(struct fsnotify_group *group, static struct fanotify_event *get_one_event(struct fsnotify_group *group, size_t count) { - size_t event_size = FAN_EVENT_METADATA_LEN; + size_t event_size; struct fanotify_event *event = NULL; struct fsnotify_event *fsn_event; unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); @@ -189,8 +196,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, goto out;
event = FANOTIFY_E(fsn_event); - if (info_mode) - event_size += fanotify_event_info_len(info_mode, event); + event_size = fanotify_event_len(info_mode, event);
if (event_size > count) { event = ERR_PTR(-EINVAL); @@ -532,8 +538,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
- metadata.event_len = FAN_EVENT_METADATA_LEN + - fanotify_event_info_len(info_mode, event); + metadata.event_len = fanotify_event_len(info_mode, event); metadata.metadata_len = FAN_EVENT_METADATA_LEN; metadata.vers = FANOTIFY_METADATA_VERSION; metadata.reserved = 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 8299212cbdb01a5867e230e961f82e5c02a6de34 ]
FAN_FS_ERROR will require fsid, but not necessarily require the filesystem to expose a file handle. Split those checks into different functions, so they can be used separately when setting up an event.
While there, update a comment about tmpfs having 0 fsid, which is no longer true.
Link: https://lore.kernel.org/r/20211025192746.66445-7-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 846e2a661526c..6dd6a2e05f55d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1295,16 +1295,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return fd; }
-/* Check if filesystem can encode a unique fid */ -static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) +static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid) { __kernel_fsid_t root_fsid; int err;
/* - * Make sure path is not in filesystem with zero fsid (e.g. tmpfs). + * Make sure dentry is not of a filesystem with zero fsid (e.g. fuse). */ - err = vfs_get_fsid(path->dentry, fsid); + err = vfs_get_fsid(dentry, fsid); if (err) return err;
@@ -1312,10 +1311,10 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) return -ENODEV;
/* - * Make sure path is not inside a filesystem subvolume (e.g. btrfs) + * Make sure dentry is not of a filesystem subvolume (e.g. btrfs) * which uses a different fsid than sb root. */ - err = vfs_get_fsid(path->dentry->d_sb->s_root, &root_fsid); + err = vfs_get_fsid(dentry->d_sb->s_root, &root_fsid); if (err) return err;
@@ -1323,6 +1322,12 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) root_fsid.val[1] != fsid->val[1]) return -EXDEV;
+ return 0; +} + +/* Check if filesystem can encode a unique fid */ +static int fanotify_test_fid(struct dentry *dentry) +{ /* * We need to make sure that the file system supports at least * encoding a file handle so user can use name_to_handle_at() to @@ -1330,8 +1335,8 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) * objects. However, name_to_handle_at() requires that the * filesystem also supports decoding file handles. */ - if (!path->dentry->d_sb->s_export_op || - !path->dentry->d_sb->s_export_op->fh_to_dentry) + if (!dentry->d_sb->s_export_op || + !dentry->d_sb->s_export_op->fh_to_dentry) return -EOPNOTSUPP;
return 0; @@ -1500,7 +1505,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, }
if (fid_mode) { - ret = fanotify_test_fid(&path, &__fsid); + ret = fanotify_test_fsid(path.dentry, &__fsid); + if (ret) + goto path_put_and_out; + + ret = fanotify_test_fid(path.dentry); if (ret) goto path_put_and_out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit e0462f91d24756916fded4313d508e0fc52f39c9 ]
According to Amir:
"FS_IN_IGNORED is completely internal to inotify and there is no need to set it in i_fsnotify_mask at all, so if we remove the bit from the output of inotify_arg_to_mask() no functionality will change and we will be able to overload the event bit for FS_ERROR."
This is done in preparation to overload FS_ERROR with the notification mechanism in fanotify.
Link: https://lore.kernel.org/r/20211025192746.66445-8-krisman@collabora.com Suggested-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/inotify/inotify_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 62cd91bc00b83..9676c3f2df3d7 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) __u32 mask;
/* - * Everything should accept their own ignored and should receive events - * when the inode is unmounted. All directories care about children. + * Everything should receive events when the inode is unmounted. + * All directories care about children. */ - mask = (FS_IN_IGNORED | FS_UNMOUNT); + mask = (FS_UNMOUNT); if (S_ISDIR(inode->i_mode)) mask |= FS_EVENT_ON_CHILD;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 808967a0a4d2f4ce6a2005c5692fffbecaf018c1 ]
Similarly to fanotify_is_perm_event and friends, provide a helper predicate to say whether a mask is of an overflow event.
Link: https://lore.kernel.org/r/20211025192746.66445-9-krisman@collabora.com Suggested-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 3 ++- include/linux/fsnotify_backend.h | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 4a5e555dc3d25..c42cf8fd7d798 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -315,7 +315,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) */ static inline bool fanotify_is_hashed_event(u32 mask) { - return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW); + return !(fanotify_is_perm_event(mask) || + fsnotify_is_overflow_event(mask)); }
static inline unsigned int fanotify_event_hash_bucket( diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index a2db821e8a8f2..749bc85e1d1c4 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -510,6 +510,11 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) fsnotify_add_event(group, group->overflow_event, NULL, NULL); }
+static inline bool fsnotify_is_overflow_event(u32 mask) +{ + return mask & FS_Q_OVERFLOW; +} + static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) { assert_spin_locked(&group->notification_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 1ad03c3a326a86e259389592117252c851873395 ]
fsnotify_add_event is growing in number of parameters, which in most case are just passed a NULL pointer. So, split out a new fsnotify_insert_event function to clean things up for users who don't need an insert hook.
Link: https://lore.kernel.org/r/20211025192746.66445-10-krisman@collabora.com Suggested-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 4 ++-- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/notification.c | 12 ++++++------ include/linux/fsnotify_backend.h | 23 ++++++++++++++++------- 4 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 310246f8d3f19..f82e20228999c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -781,8 +781,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, }
fsn_event = &event->fse; - ret = fsnotify_add_event(group, fsn_event, fanotify_merge, - fanotify_insert_event); + ret = fsnotify_insert_event(group, fsn_event, fanotify_merge, + fanotify_insert_event); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index b0530f75b274a..be3eb1cebdcce 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -123,7 +123,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, if (len) strcpy(event->name, name->name);
- ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL); + ret = fsnotify_add_event(group, fsn_event, inotify_merge); if (ret) { /* Our event wasn't used in the end. Free it. */ fsnotify_destroy_event(group, fsn_event); diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 32f45543b9c64..44bb10f507153 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -78,12 +78,12 @@ void fsnotify_destroy_event(struct fsnotify_group *group, * 2 if the event was not queued - either the queue of events has overflown * or the group is shutting down. */ -int fsnotify_add_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)) +int fsnotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)) { int ret = 0; struct list_head *list = &group->notification_list; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 749bc85e1d1c4..b323d0c4b9671 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -498,16 +498,25 @@ extern int fsnotify_fasync(int fd, struct file *file, int on); extern void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event); /* attach the event to the group notification queue */ -extern int fsnotify_add_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)); +extern int fsnotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)); + +static inline int fsnotify_add_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *)) +{ + return fsnotify_insert_event(group, event, merge, NULL); +} + /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { - fsnotify_add_event(group, group->overflow_event, NULL, NULL); + fsnotify_add_event(group, group->overflow_event, NULL); }
static inline bool fsnotify_is_overflow_event(u32 mask)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 29335033c574a15334015d8c4e36862cff3d3384 ]
Some file system events (i.e. FS_ERROR) might not be associated with an inode or directory. For these, we can retrieve the super block from the data field. But, since the super_block is available in the data field on every event type, simplify the code to always retrieve it from there, through a new helper.
Link: https://lore.kernel.org/r/20211025192746.66445-11-krisman@collabora.com Suggested-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.c | 7 +++---- include/linux/fsnotify_backend.h | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 963e6ce75b961..fde3a1115a170 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -455,16 +455,16 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) * @file_name is relative to * @file_name: optional file name associated with event * @inode: optional inode associated with event - - * either @dir or @inode must be non-NULL. - * if both are non-NULL event may be reported to both. + * If @dir and @inode are both non-NULL, event may be + * reported to both. * @cookie: inotify rename cookie */ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, struct inode *inode, u32 cookie) { const struct path *path = fsnotify_data_path(data, data_type); + struct super_block *sb = fsnotify_data_sb(data, data_type); struct fsnotify_iter_info iter_info = {}; - struct super_block *sb; struct mount *mnt = NULL; struct inode *parent = NULL; int ret = 0; @@ -483,7 +483,6 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, */ parent = dir; } - sb = inode->i_sb;
/* * Optimization: srcu_read_lock() has a memory barrier which can diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b323d0c4b9671..035438fe4a435 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -289,6 +289,21 @@ static inline const struct path *fsnotify_data_path(const void *data, } }
+static inline struct super_block *fsnotify_data_sb(const void *data, + int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_INODE: + return ((struct inode *)data)->i_sb; + case FSNOTIFY_EVENT_DENTRY: + return ((struct dentry *)data)->d_sb; + case FSNOTIFY_EVENT_PATH: + return ((const struct path *)data)->dentry->d_sb; + default: + return NULL; + } +} + enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_INODE, FSNOTIFY_OBJ_TYPE_PARENT,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 24dca90590509a7a6cbe0650100c90c5b8a3468a ]
FAN_FS_ERROR allows events without inodes - i.e. for file system-wide errors. Even though fsnotify_handle_inode_event is not currently used by fanotify, this patch protects other backends from cases where neither inode or dir are provided. Also document the constraints of the interface (inode and dir cannot be both NULL).
Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com Suggested-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 3 +++ fs/notify/fsnotify.c | 3 +++ include/linux/fsnotify_backend.h | 1 + 3 files changed, 7 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 87d984e0cdc0c..8cd7d5d6955a0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -601,6 +601,9 @@ nfsd_file_fsnotify_handle_event(struct fsnotify_mark *mark, u32 mask, struct inode *inode, struct inode *dir, const struct qstr *name, u32 cookie) { + if (WARN_ON_ONCE(!inode)) + return 0; + trace_nfsd_file_fsnotify_handle_event(inode, mask);
/* Should be no marks on non-regular files */ diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index fde3a1115a170..4034ca566f95c 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -252,6 +252,9 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group, if (WARN_ON_ONCE(!ops->handle_inode_event)) return 0;
+ if (WARN_ON_ONCE(!inode && !dir)) + return 0; + if ((inode_mark->mask & FS_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 035438fe4a435..b71dc788018e4 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -136,6 +136,7 @@ struct mem_cgroup; * @dir: optional directory associated with event - * if @file_name is not NULL, this is the directory that * @file_name is relative to. + * Either @inode or @dir must be non-NULL. * @file_name: optional file name associated with event * @cookie: inotify rename cookie *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 330ae77d2a5b0af32c0f29e139bf28ec8591de59 ]
For group-wide mempool backed events, like FS_ERROR, the free_event callback will need to reference the group's mempool to free the memory. Wire that argument into the current callers.
Link: https://lore.kernel.org/r/20211025192746.66445-13-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 3 ++- fs/notify/group.c | 2 +- fs/notify/inotify/inotify_fsnotify.c | 3 ++- fs/notify/notification.c | 2 +- include/linux/fsnotify_backend.h | 2 +- 5 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f82e20228999c..c620b4f6fe123 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -835,7 +835,8 @@ static void fanotify_free_name_event(struct fanotify_event *event) kfree(FANOTIFY_NE(event)); }
-static void fanotify_free_event(struct fsnotify_event *fsn_event) +static void fanotify_free_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) { struct fanotify_event *event;
diff --git a/fs/notify/group.c b/fs/notify/group.c index fb89c351295d6..6a297efc47887 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -88,7 +88,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) * that deliberately ignores overflow events. */ if (group->overflow_event) - group->ops->free_event(group->overflow_event); + group->ops->free_event(group, group->overflow_event);
fsnotify_put_group(group); } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index be3eb1cebdcce..8279827836399 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -184,7 +184,8 @@ static void inotify_free_group_priv(struct fsnotify_group *group) dec_inotify_instances(group->inotify_data.ucounts); }
-static void inotify_free_event(struct fsnotify_event *fsn_event) +static void inotify_free_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) { kfree(INOTIFY_E(fsn_event)); } diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 44bb10f507153..9022ae650cf86 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -64,7 +64,7 @@ void fsnotify_destroy_event(struct fsnotify_group *group, WARN_ON(!list_empty(&event->list)); spin_unlock(&group->notification_lock); } - group->ops->free_event(event); + group->ops->free_event(group, event); }
/* diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b71dc788018e4..3a7c314361824 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -156,7 +156,7 @@ struct fsnotify_ops { const struct qstr *file_name, u32 cookie); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); - void (*free_event)(struct fsnotify_event *event); + void (*free_event)(struct fsnotify_group *group, struct fsnotify_event *event); /* called on final put+free to free memory */ void (*free_mark)(struct fsnotify_mark *mark); };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 12f47bf0f0990933d95d021d13d31bda010648fd ]
FAN_FS_ERROR doesn't support DFID, but this function is still called for every event. The problem is that it is not capable of handling null inodes, which now can happen in case of superblock error events. For this case, just returning dir will be enough.
Link: https://lore.kernel.org/r/20211025192746.66445-14-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c620b4f6fe123..397ee623ff1e8 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -452,7 +452,7 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) return dir;
- if (S_ISDIR(inode->i_mode)) + if (inode && S_ISDIR(inode->i_mode)) return inode;
return dir;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 74fe4734897a2da2ae2a665a5e622cd490d36eaf ]
Allow passing a NULL hash to fanotify_encode_fh and avoid calculating the hash if not needed.
Link: https://lore.kernel.org/r/20211025192746.66445-15-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 397ee623ff1e8..ec84fee7ad01c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -403,8 +403,12 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len;
- /* Mix fh into event merge key */ - *hash ^= fanotify_hash_fh(fh); + /* + * Mix fh into event merge key. Hash might be NULL in case of + * unhashed FID events (i.e. FAN_FS_ERROR). + */ + if (hash) + *hash ^= fanotify_hash_fh(fh);
return FANOTIFY_FH_HDR_LEN + fh_len;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 272531ac619b374ab474e989eb387162fded553f ]
Instead of failing, encode an invalid file handle in fanotify_encode_fh if no inode is provided. This bogus file handle will be reported by FAN_FS_ERROR for non-inode errors.
Link: https://lore.kernel.org/r/20211025192746.66445-16-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index ec84fee7ad01c..c64d61b673caf 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -370,8 +370,14 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = FILEID_ROOT; fh->len = 0; fh->flags = 0; + + /* + * Invalid FHs are used by FAN_FS_ERROR for errors not + * linked to any inode. The f_handle won't be reported + * back to userspace. + */ if (!inode) - return 0; + goto out;
/* * !gpf means preallocated variable size fh, but fh_len could @@ -403,6 +409,7 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len;
+out: /* * Mix fh into event merge key. Hash might be NULL in case of * unhashed FID events (i.e. FAN_FS_ERROR).
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 4fe595cf1c80e7a5af4d00c4da29def64aff57a2 ]
Like inode events, FAN_FS_ERROR will require fid mode. Therefore, convert the verification during fanotify_mark(2) to require fid for any non-fd event. This means fid_mode will not only be required for inode events, but for any event that doesn't provide a descriptor.
Link: https://lore.kernel.org/r/20211025192746.66445-17-krisman@collabora.com Suggested-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 12 ++++++------ include/linux/fanotify.h | 3 +++ 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6dd6a2e05f55d..34bf71108f7a3 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1471,14 +1471,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out;
/* - * Events with data type inode do not carry enough information to report - * event->fd, so we do not allow setting a mask for inode events unless - * group supports reporting fid. - * inode events are not supported on a mount mark, because they do not - * carry enough information (i.e. path) to be filtered by mount point. + * Events that do not carry enough information to report + * event->fd require a group that supports reporting fid. Those + * events are not supported on a mount mark, because they do not + * carry enough information (i.e. path) to be filtered by mount + * point. */ fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - if (mask & FANOTIFY_INODE_EVENTS && + if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) && (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index eec3b7c408115..52d464802d99f 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -84,6 +84,9 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ */ #define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE)
+/* Events that can be reported with event->fd */ +#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS) + /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 9daa811073fa19c08e8aad3b90f9235fed161acf ]
Expose a new type of fsnotify event for filesystems to report errors for userspace monitoring tools. fanotify will send this type of notification for FAN_FS_ERROR events. This also introduce a helper for generating the new event.
Link: https://lore.kernel.org/r/20211025192746.66445-18-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/fsnotify.h | 13 +++++++++++++ include/linux/fsnotify_backend.h | 32 +++++++++++++++++++++++++++++++- 2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index addca4ea56ad9..bec1e23ecf787 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -376,4 +376,17 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid) fsnotify_dentry(dentry, mask); }
+static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode, + int error) +{ + struct fs_error_report report = { + .error = error, + .inode = inode, + .sb = sb, + }; + + return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, + NULL, NULL, NULL, 0); +} + #endif /* _LINUX_FS_NOTIFY_H */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 3a7c314361824..00dbaafbcf953 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -42,6 +42,12 @@
#define FS_UNMOUNT 0x00002000 /* inode on umount fs */ #define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ +#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */ + +/* + * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify + * which does not support FS_ERROR. + */ #define FS_IN_IGNORED 0x00008000 /* last inotify event here */
#define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */ @@ -95,7 +101,8 @@ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \ - FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED) + FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ + FS_ERROR)
/* Extra flags that may be reported with event or control handling of events */ #define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ @@ -250,6 +257,13 @@ enum fsnotify_data_type { FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, FSNOTIFY_EVENT_DENTRY, + FSNOTIFY_EVENT_ERROR, +}; + +struct fs_error_report { + int error; + struct inode *inode; + struct super_block *sb; };
static inline struct inode *fsnotify_data_inode(const void *data, int data_type) @@ -261,6 +275,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); + case FSNOTIFY_EVENT_ERROR: + return ((struct fs_error_report *)data)->inode; default: return NULL; } @@ -300,6 +316,20 @@ static inline struct super_block *fsnotify_data_sb(const void *data, return ((struct dentry *)data)->d_sb; case FSNOTIFY_EVENT_PATH: return ((const struct path *)data)->dentry->d_sb; + case FSNOTIFY_EVENT_ERROR: + return ((struct fs_error_report *) data)->sb; + default: + return NULL; + } +} + +static inline struct fs_error_report *fsnotify_data_error_report( + const void *data, + int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_ERROR: + return (struct fs_error_report *) data; default: return NULL; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 8d11a4f43ef4679be0908026907a7613b33d7127 ]
FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which is a mechanism to report file system wide problems via fanotify. This commit preallocate userspace visible bits to match the FS_ERROR event.
Link: https://lore.kernel.org/r/20211025192746.66445-19-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 1 + include/uapi/linux/fanotify.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c64d61b673caf..8f152445d75c4 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -752,6 +752,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); + BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 64553df9d7350..2990731ddc8bc 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -20,6 +20,7 @@ #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */
#define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ +#define FAN_FS_ERROR 0x00008000 /* Filesystem error */
#define FAN_OPEN_PERM 0x00010000 /* File open in perm check */ #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 734a1a5eccc5f7473002b0669f788e135f1f64aa ]
Pre-allocate slots for file system errors to have greater chances of succeeding, since error events can happen in GFP_NOFS context. This patch introduces a group-wide mempool of error events, shared by all FAN_FS_ERROR marks in this group.
Link: https://lore.kernel.org/r/20211025192746.66445-20-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 3 +++ fs/notify/fanotify/fanotify.h | 11 +++++++++++ fs/notify/fanotify/fanotify_user.c | 26 +++++++++++++++++++++++++- include/linux/fsnotify_backend.h | 2 ++ 4 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 8f152445d75c4..01d68dfc74aa2 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -819,6 +819,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group) if (group->fanotify_data.ucounts) dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_GROUPS); + + if (mempool_initialized(&group->fanotify_data.error_events_pool)) + mempool_exit(&group->fanotify_data.error_events_pool); }
static void fanotify_free_path_event(struct fanotify_event *event) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index c42cf8fd7d798..a577e87fac2b4 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -141,6 +141,7 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ + FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */ __FANOTIFY_EVENT_TYPE_NUM };
@@ -196,6 +197,16 @@ FANOTIFY_NE(struct fanotify_event *event) return container_of(event, struct fanotify_name_event, fae); }
+struct fanotify_error_event { + struct fanotify_event fae; +}; + +static inline struct fanotify_error_event * +FANOTIFY_EE(struct fanotify_event *event) +{ + return container_of(event, struct fanotify_error_event, fae); +} + static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_FID) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 34bf71108f7a3..8a2b7941fc986 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -30,6 +30,7 @@ #define FANOTIFY_DEFAULT_MAX_EVENTS 16384 #define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192 #define FANOTIFY_DEFAULT_MAX_GROUPS 128 +#define FANOTIFY_DEFAULT_FEE_POOL_SIZE 32
/* * Legacy fanotify marks limits (8192) is per group and we introduced a tunable @@ -1049,6 +1050,15 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, return ERR_PTR(ret); }
+static int fanotify_group_init_error_pool(struct fsnotify_group *group) +{ + if (mempool_initialized(&group->fanotify_data.error_events_pool)) + return 0; + + return mempool_init_kmalloc_pool(&group->fanotify_data.error_events_pool, + FANOTIFY_DEFAULT_FEE_POOL_SIZE, + sizeof(struct fanotify_error_event)); +}
static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int type, @@ -1057,6 +1067,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, { struct fsnotify_mark *fsn_mark; __u32 added; + int ret = 0;
mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); @@ -1067,13 +1078,26 @@ static int fanotify_add_mark(struct fsnotify_group *group, return PTR_ERR(fsn_mark); } } + + /* + * Error events are pre-allocated per group, only if strictly + * needed (i.e. FAN_FS_ERROR was requested). + */ + if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + ret = fanotify_group_init_error_pool(group); + if (ret) + goto out; + } + added = fanotify_mark_add_to_mask(fsn_mark, mask, flags); if (added & ~fsnotify_conn_mask(fsn_mark->connector)) fsnotify_recalc_mask(fsn_mark->connector); + +out: mutex_unlock(&group->mark_mutex);
fsnotify_put_mark(fsn_mark); - return 0; + return ret; }
static int fanotify_add_vfsmount_mark(struct fsnotify_group *group, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 00dbaafbcf953..51ef2b079bfa0 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -19,6 +19,7 @@ #include <linux/atomic.h> #include <linux/user_namespace.h> #include <linux/refcount.h> +#include <linux/mempool.h>
/* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -246,6 +247,7 @@ struct fsnotify_group { int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ struct ucounts *ucounts; + mempool_t error_events_pool; } fanotify_data; #endif /* CONFIG_FANOTIFY */ };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 83e9acbe13dc1b767f91b5c1350f7a65689b26f6 ]
Once an error event is triggered, enqueue it in the notification group, similarly to what is done for other events. FAN_FS_ERROR is not handled specially, since the memory is now handled by a preallocated mempool.
For now, make the event unhashed. A future patch implements merging of this kind of event.
Link: https://lore.kernel.org/r/20211025192746.66445-21-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 35 +++++++++++++++++++++++++++++++++++ fs/notify/fanotify/fanotify.h | 6 ++++++ 2 files changed, 41 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 01d68dfc74aa2..1f195c95dfcd0 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -574,6 +574,27 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, return &fne->fae; }
+static struct fanotify_event *fanotify_alloc_error_event( + struct fsnotify_group *group, + __kernel_fsid_t *fsid, + const void *data, int data_type) +{ + struct fs_error_report *report = + fsnotify_data_error_report(data, data_type); + struct fanotify_error_event *fee; + + if (WARN_ON_ONCE(!report)) + return NULL; + + fee = mempool_alloc(&group->fanotify_data.error_events_pool, GFP_NOFS); + if (!fee) + return NULL; + + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + + return &fee->fae; +} + static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, @@ -641,6 +662,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); + } else if (fanotify_is_error_event(mask)) { + event = fanotify_alloc_error_event(group, fsid, data, + data_type); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, &hash, gfp); @@ -850,6 +874,14 @@ static void fanotify_free_name_event(struct fanotify_event *event) kfree(FANOTIFY_NE(event)); }
+static void fanotify_free_error_event(struct fsnotify_group *group, + struct fanotify_event *event) +{ + struct fanotify_error_event *fee = FANOTIFY_EE(event); + + mempool_free(fee, &group->fanotify_data.error_events_pool); +} + static void fanotify_free_event(struct fsnotify_group *group, struct fsnotify_event *fsn_event) { @@ -873,6 +905,9 @@ static void fanotify_free_event(struct fsnotify_group *group, case FANOTIFY_EVENT_TYPE_OVERFLOW: kfree(event); break; + case FANOTIFY_EVENT_TYPE_FS_ERROR: + fanotify_free_error_event(group, event); + break; default: WARN_ON_ONCE(1); } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index a577e87fac2b4..ebef952481fa0 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -298,6 +298,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse) return container_of(fse, struct fanotify_event, fse); }
+static inline bool fanotify_is_error_event(u32 mask) +{ + return mask & FAN_FS_ERROR; +} + static inline bool fanotify_event_has_path(struct fanotify_event *event) { return event->type == FANOTIFY_EVENT_TYPE_PATH || @@ -327,6 +332,7 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) static inline bool fanotify_is_hashed_event(u32 mask) { return !(fanotify_is_perm_event(mask) || + fanotify_is_error_event(mask) || fsnotify_is_overflow_event(mask)); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 8a6ae64132fd27a944faed7bc38484827609eb76 ]
Error events (FAN_FS_ERROR) against the same file system can be merged by simply iterating the error count. The hash is taken from the fsid, without considering the FH. This means that only the first error object is reported.
Link: https://lore.kernel.org/r/20211025192746.66445-22-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 26 ++++++++++++++++++++++++-- fs/notify/fanotify/fanotify.h | 4 +++- 2 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1f195c95dfcd0..cedcb15468043 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -111,6 +111,16 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, return fanotify_info_equal(info1, info2); }
+static bool fanotify_error_event_equal(struct fanotify_error_event *fee1, + struct fanotify_error_event *fee2) +{ + /* Error events against the same file system are always merged. */ + if (!fanotify_fsid_equal(&fee1->fsid, &fee2->fsid)) + return false; + + return true; +} + static bool fanotify_should_merge(struct fanotify_event *old, struct fanotify_event *new) { @@ -141,6 +151,9 @@ static bool fanotify_should_merge(struct fanotify_event *old, case FANOTIFY_EVENT_TYPE_FID_NAME: return fanotify_name_event_equal(FANOTIFY_NE(old), FANOTIFY_NE(new)); + case FANOTIFY_EVENT_TYPE_FS_ERROR: + return fanotify_error_event_equal(FANOTIFY_EE(old), + FANOTIFY_EE(new)); default: WARN_ON_ONCE(1); } @@ -176,6 +189,10 @@ static int fanotify_merge(struct fsnotify_group *group, break; if (fanotify_should_merge(old, new)) { old->mask |= new->mask; + + if (fanotify_is_error_event(old->mask)) + FANOTIFY_EE(old)->err_count++; + return 1; } } @@ -577,7 +594,8 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, static struct fanotify_event *fanotify_alloc_error_event( struct fsnotify_group *group, __kernel_fsid_t *fsid, - const void *data, int data_type) + const void *data, int data_type, + unsigned int *hash) { struct fs_error_report *report = fsnotify_data_error_report(data, data_type); @@ -591,6 +609,10 @@ static struct fanotify_event *fanotify_alloc_error_event( return NULL;
fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + fee->err_count = 1; + fee->fsid = *fsid; + + *hash ^= fanotify_hash_fsid(fsid);
return &fee->fae; } @@ -664,7 +686,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event = fanotify_alloc_perm_event(path, gfp); } else if (fanotify_is_error_event(mask)) { event = fanotify_alloc_error_event(group, fsid, data, - data_type); + data_type, &hash); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, &hash, gfp); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index ebef952481fa0..2b032b79d5b06 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -199,6 +199,9 @@ FANOTIFY_NE(struct fanotify_event *event)
struct fanotify_error_event { struct fanotify_event fae; + u32 err_count; /* Suppressed errors count */ + + __kernel_fsid_t fsid; /* FSID this error refers to. */ };
static inline struct fanotify_error_event * @@ -332,7 +335,6 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) static inline bool fanotify_is_hashed_event(u32 mask) { return !(fanotify_is_perm_event(mask) || - fanotify_is_error_event(mask) || fsnotify_is_overflow_event(mask)); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 2c5069433a3adc01ff9c5673567961bb7f138074 ]
fanotify_error_event would duplicate this sequence of declarations that already exist elsewhere with a slight different size. Create a helper macro to avoid code duplication.
Link: https://lore.kernel.org/r/20211025192746.66445-23-krisman@collabora.com Suggested-by: Jan Kara jack@suse.cz Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 2b032b79d5b06..3510d06654ed0 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -171,12 +171,18 @@ static inline void fanotify_init_event(struct fanotify_event *event, event->pid = NULL; }
+#define FANOTIFY_INLINE_FH(name, size) \ +struct { \ + struct fanotify_fh (name); \ + /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \ + unsigned char _inline_fh_buf[(size)]; \ +} + struct fanotify_fid_event { struct fanotify_event fae; __kernel_fsid_t fsid; - struct fanotify_fh object_fh; - /* Reserve space in object_fh.buf[] - access with fanotify_fh_buf() */ - unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN]; + + FANOTIFY_INLINE_FH(object_fh, FANOTIFY_INLINE_FH_LEN); };
static inline struct fanotify_fid_event *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 4bd5a5c8e6e5cd964e9738e6ef87f6c2cb453edf ]
Now that there is an event that reports FID records even for a zeroed file handle, wrap the logic that deides whether to issue the records into helper functions. This shouldn't have any impact on the code, but simplifies further patches.
Link: https://lore.kernel.org/r/20211025192746.66445-24-krisman@collabora.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 10 ++++++++++ fs/notify/fanotify/fanotify_user.c | 13 +++++++------ 2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 3510d06654ed0..80af269eebb89 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -264,6 +264,16 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir_fh_len(info) : 0; }
+static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) +{ + return fanotify_event_object_fh_len(event) > 0; +} + +static inline bool fanotify_event_has_dir_fh(struct fanotify_event *event) +{ + return fanotify_event_dir_fh_len(event) > 0; +} + struct fanotify_path_event { struct fanotify_event fae; struct path path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 8a2b7941fc986..34ed30be0e4d4 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -135,10 +135,9 @@ static size_t fanotify_event_len(unsigned int info_mode, return event_len;
info = fanotify_event_info(event); - dir_fh_len = fanotify_event_dir_fh_len(event); - fh_len = fanotify_event_object_fh_len(event);
- if (dir_fh_len) { + if (fanotify_event_has_dir_fh(event)) { + dir_fh_len = fanotify_event_dir_fh_len(event); event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { @@ -152,8 +151,10 @@ static size_t fanotify_event_len(unsigned int info_mode, if (info_mode & FAN_REPORT_PIDFD) event_len += FANOTIFY_PIDFD_INFO_HDR_LEN;
- if (fh_len) + if (fanotify_event_has_object_fh(event)) { + fh_len = fanotify_event_object_fh_len(event); event_len += fanotify_fid_info_len(fh_len, dot_len); + }
return event_len; } @@ -446,7 +447,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, /* * Event info records order is as follows: dir fid + name, child fid. */ - if (fanotify_event_dir_fh_len(event)) { + if (fanotify_event_has_dir_fh(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; ret = copy_fid_info_to_user(fanotify_event_fsid(event), @@ -462,7 +463,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; }
- if (fanotify_event_object_fh_len(event)) { + if (fanotify_event_has_object_fh(event)) { const char *dot = NULL; int dot_len = 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 572c28f27a269f88e2d8d7b6b1507f114d637337 ]
struct fanotify_error_event, at least, is preallocated and isn't able to to handle arbitrarily large file handles. Future-proof the code by complaining loudly if a handle larger than MAX_HANDLE_SZ is ever found.
Link: https://lore.kernel.org/r/20211025192746.66445-26-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index cedcb15468043..45df610debbe4 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -360,13 +360,23 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, static int fanotify_encode_fh_len(struct inode *inode) { int dwords = 0; + int fh_len;
if (!inode) return 0;
exportfs_encode_inode_fh(inode, NULL, &dwords, NULL); + fh_len = dwords << 2;
- return dwords << 2; + /* + * struct fanotify_error_event might be preallocated and is + * limited to MAX_HANDLE_SZ. This should never happen, but + * safeguard by forcing an invalid file handle. + */ + if (WARN_ON_ONCE(fh_len > MAX_HANDLE_SZ)) + return 0; + + return fh_len; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 936d6a38be39177495af38497bf8da1c6128fa1b ]
Plumb the pieces to add a FID report to error records. Since all error event memory must be pre-allocated, we pre-allocate the maximum file handle size possible, such that it should always fit.
For errors that don't expose a file handle, report it with an invalid FID. Internally we use zero-length FILEID_ROOT file handle for passing the information (which we report as zero-length FILEID_INVALID file handle to userspace) so we update the handle reporting code to deal with this case correctly.
Link: https://lore.kernel.org/r/20211025192746.66445-27-krisman@collabora.com Link: https://lore.kernel.org/r/20211025192746.66445-25-krisman@collabora.com Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz [Folded two patches into 2 to make series bisectable] Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 11 +++++++++++ fs/notify/fanotify/fanotify.h | 9 +++++++++ fs/notify/fanotify/fanotify_user.c | 8 +++++--- 3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 45df610debbe4..465f07e70e6dc 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -609,7 +609,9 @@ static struct fanotify_event *fanotify_alloc_error_event( { struct fs_error_report *report = fsnotify_data_error_report(data, data_type); + struct inode *inode; struct fanotify_error_event *fee; + int fh_len;
if (WARN_ON_ONCE(!report)) return NULL; @@ -622,6 +624,15 @@ static struct fanotify_event *fanotify_alloc_error_event( fee->err_count = 1; fee->fsid = *fsid;
+ inode = report->inode; + fh_len = fanotify_encode_fh_len(inode); + + /* Bad fh_len. Fallback to using an invalid fh. Should never happen. */ + if (!fh_len && inode) + inode = NULL; + + fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0); + *hash ^= fanotify_hash_fsid(fsid);
return &fee->fae; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 80af269eebb89..edd7587adcc59 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -208,6 +208,8 @@ struct fanotify_error_event { u32 err_count; /* Suppressed errors count */
__kernel_fsid_t fsid; /* FSID this error refers to. */ + + FANOTIFY_INLINE_FH(object_fh, MAX_HANDLE_SZ); };
static inline struct fanotify_error_event * @@ -222,6 +224,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) return &FANOTIFY_FE(event)->fsid; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return &FANOTIFY_NE(event)->fsid; + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return &FANOTIFY_EE(event)->fsid; else return NULL; } @@ -233,6 +237,8 @@ static inline struct fanotify_fh *fanotify_event_object_fh( return &FANOTIFY_FE(event)->object_fh; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return fanotify_info_file_fh(&FANOTIFY_NE(event)->info); + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return &FANOTIFY_EE(event)->object_fh; else return NULL; } @@ -266,6 +272,9 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event)
static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) { + /* For error events, even zeroed fh are reported. */ + if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return true; return fanotify_event_object_fh_len(event) > 0; }
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 34ed30be0e4d4..0d36ac3ed7e99 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -334,9 +334,6 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, pr_debug("%s: fh_len=%zu name_len=%zu, info_len=%zu, count=%zu\n", __func__, fh_len, name_len, info_len, count);
- if (!fh_len) - return 0; - if (WARN_ON_ONCE(len < sizeof(info) || len > count)) return -EFAULT;
@@ -371,6 +368,11 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh,
handle.handle_type = fh->type; handle.handle_bytes = fh_len; + + /* Mangle handle_type for bad file_handle */ + if (!fh_len) + handle.handle_type = FILEID_INVALID; + if (copy_to_user(buf, &handle, sizeof(handle))) return -EFAULT;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 130a3c742107acff985541c28360c8b40203559c ]
The error info is a record sent to users on FAN_FS_ERROR events documenting the type of error. It also carries an error count, documenting how many errors were observed since the last reporting.
Link: https://lore.kernel.org/r/20211025192746.66445-28-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 1 + fs/notify/fanotify/fanotify.h | 1 + fs/notify/fanotify/fanotify_user.c | 36 ++++++++++++++++++++++++++++++ include/uapi/linux/fanotify.h | 7 ++++++ 4 files changed, 45 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 465f07e70e6dc..af61425e6e3bf 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -621,6 +621,7 @@ static struct fanotify_event *fanotify_alloc_error_event( return NULL;
fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + fee->error = report->error; fee->err_count = 1; fee->fsid = *fsid;
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index edd7587adcc59..d25f500bf7e79 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -205,6 +205,7 @@ FANOTIFY_NE(struct fanotify_event *event)
struct fanotify_error_event { struct fanotify_event fae; + s32 error; /* Error reported by the Filesystem. */ u32 err_count; /* Suppressed errors count */
__kernel_fsid_t fsid; /* FSID this error refers to. */ diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 0d36ac3ed7e99..3040c8669dfd1 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -110,6 +110,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) #define FANOTIFY_PIDFD_INFO_HDR_LEN \ sizeof(struct fanotify_event_info_pidfd) +#define FANOTIFY_ERROR_INFO_LEN \ + (sizeof(struct fanotify_event_info_error))
static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -134,6 +136,9 @@ static size_t fanotify_event_len(unsigned int info_mode, if (!info_mode) return event_len;
+ if (fanotify_is_error_event(event->mask)) + event_len += FANOTIFY_ERROR_INFO_LEN; + info = fanotify_event_info(event);
if (fanotify_event_has_dir_fh(event)) { @@ -319,6 +324,28 @@ static int process_access_response(struct fsnotify_group *group, return -ENOENT; }
+static size_t copy_error_info_to_user(struct fanotify_event *event, + char __user *buf, int count) +{ + struct fanotify_event_info_error info; + struct fanotify_error_event *fee = FANOTIFY_EE(event); + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; + info.hdr.pad = 0; + info.hdr.len = FANOTIFY_ERROR_INFO_LEN; + + if (WARN_ON(count < info.hdr.len)) + return -EFAULT; + + info.error = fee->error; + info.error_count = fee->err_count; + + if (copy_to_user(buf, &info, sizeof(info))) + return -EFAULT; + + return info.hdr.len; +} + static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, int info_type, const char *name, size_t name_len, @@ -525,6 +552,15 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; }
+ if (fanotify_is_error_event(event->mask)) { + ret = copy_error_info_to_user(event, buf, count); + if (ret < 0) + return ret; + buf += ret; + count -= ret; + total_bytes += ret; + } + return total_bytes; }
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 2990731ddc8bc..bd1932c2074d5 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -126,6 +126,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 #define FAN_EVENT_INFO_TYPE_DFID 3 #define FAN_EVENT_INFO_TYPE_PIDFD 4 +#define FAN_EVENT_INFO_TYPE_ERROR 5
/* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -160,6 +161,12 @@ struct fanotify_event_info_pidfd { __s32 pidfd; };
+struct fanotify_event_info_error { + struct fanotify_event_info_header hdr; + __s32 error; + __u32 error_count; +}; + struct fanotify_response { __s32 fd; __u32 response;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gabriel Krisman Bertazi krisman@collabora.com
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Link: https://lore.kernel.org/r/20211025192746.66445-29-krisman@collabora.com Reviewed-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 4 ++++ include/linux/fanotify.h | 6 +++++- 3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index af61425e6e3bf..b6091775aa6ef 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -822,7 +822,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
mask = fanotify_group_event_mask(group, iter_info, mask, data, data_type, dir); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3040c8669dfd1..e6860c55b3dcb 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1533,6 +1533,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group->priority == FS_PRIO_0) goto fput_and_out;
+ if (mask & FAN_FS_ERROR && + mark_type != FAN_MARK_FILESYSTEM) + goto fput_and_out; + /* * Events that do not carry enough information to report * event->fd require a group that supports reporting fid. Those diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 52d464802d99f..616af2ea20f30 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -91,9 +91,13 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
+/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */ +#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR) + /* Events that user can request to be notified on */ #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \ - FANOTIFY_INODE_EVENTS) + FANOTIFY_INODE_EVENTS | \ + FANOTIFY_ERROR_EVENTS)
/* Events that require a permission response from user */ #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
On Tue, Jul 23, 2024 at 10:06 AM Ajay Kaher ajay.kaher@broadcom.com wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
I agree that this is the best approach, because the test has no other way to test if ext4 specifically supports FAN_FS_ERROR.
Chuck,
I wonder how those patches end up in 5.15.y but not in 5.10.y?
Also, since you backported *most* of this series: https://lore.kernel.org/all/20211025192746.66445-1-krisman@collabora.com/
I think it would be wise to also backport the sample code and documentation patches to 5.15.y and 5.10.y.
9abeae5d4458 docs: Fix formatting of literal sections in fanotify docs 8fc70b3a142f samples: Make fs-monitor depend on libc and headers c0baf9ac0b05 docs: Document the FAN_FS_ERROR event 5451093081db samples: Add fs error monitoring example.
Gabriel, if 9abeae5d4458 has a Fixes: tag it may have been auto seleced for 5.15.y after c0baf9ac0b05 was picked up...
Thanks, Amir.
On Jul 23, 2024, at 5:20 AM, Amir Goldstein amir73il@gmail.com wrote:
On Tue, Jul 23, 2024 at 10:06 AM Ajay Kaher ajay.kaher@broadcom.com wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
I agree that this is the best approach, because the test has no other way to test if ext4 specifically supports FAN_FS_ERROR.
Chuck,
I wonder how those patches end up in 5.15.y but not in 5.10.y?
The process was by hand. I tried to copy the same commit ID set from 5.15 to 5.10, but some patches did not apply to 5.10, or may have been omitted unintentionally.
Also, since you backported *most* of this series: https://lore.kernel.org/all/20211025192746.66445-1-krisman@collabora.com/
I think it would be wise to also backport the sample code and documentation patches to 5.15.y and 5.10.y.
9abeae5d4458 docs: Fix formatting of literal sections in fanotify docs 8fc70b3a142f samples: Make fs-monitor depend on libc and headers c0baf9ac0b05 docs: Document the FAN_FS_ERROR event 5451093081db samples: Add fs error monitoring example.
Gabriel, if 9abeae5d4458 has a Fixes: tag it may have been auto seleced for 5.15.y after c0baf9ac0b05 was picked up...
Or this... I might not have backported those at all to 5.15, and they got pulled in by another process.
In any event, do you want me to backport these to 5.10, or would you like to do it yourself?
-- Chuck Lever
On Tue, Jul 23, 2024 at 4:48 PM Chuck Lever III chuck.lever@oracle.com wrote:
On Jul 23, 2024, at 5:20 AM, Amir Goldstein amir73il@gmail.com wrote:
On Tue, Jul 23, 2024 at 10:06 AM Ajay Kaher ajay.kaher@broadcom.com wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
I agree that this is the best approach, because the test has no other way to test if ext4 specifically supports FAN_FS_ERROR.
Chuck,
I wonder how those patches end up in 5.15.y but not in 5.10.y?
The process was by hand. I tried to copy the same commit ID set from 5.15 to 5.10, but some patches did not apply to 5.10, or may have been omitted unintentionally.
Also, since you backported *most* of this series: https://lore.kernel.org/all/20211025192746.66445-1-krisman@collabora.com/
I think it would be wise to also backport the sample code and documentation patches to 5.15.y and 5.10.y.
9abeae5d4458 docs: Fix formatting of literal sections in fanotify docs 8fc70b3a142f samples: Make fs-monitor depend on libc and headers c0baf9ac0b05 docs: Document the FAN_FS_ERROR event 5451093081db samples: Add fs error monitoring example.
Gabriel, if 9abeae5d4458 has a Fixes: tag it may have been auto seleced for 5.15.y after c0baf9ac0b05 was picked up...
Or this... I might not have backported those at all to 5.15, and they got pulled in by another process.
In any event, do you want me to backport these to 5.10, or would you like to do it yourself?
Please backport the docs and sample patches above to 5.15 and 5.10 for completion of the backport.
I do not expect you should have any major conflicts with these standalone document and sample code.
Thanks, Amir.
Amir Goldstein amir73il@gmail.com writes:
On Tue, Jul 23, 2024 at 10:06 AM Ajay Kaher ajay.kaher@broadcom.com wrote:
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
I agree that this is the best approach, because the test has no other way to test if ext4 specifically supports FAN_FS_ERROR.
Chuck,
I wonder how those patches end up in 5.15.y but not in 5.10.y?
I wonder why this was backported to stable in the first place. I get there is a lot of refactoring in this series, which might be useful when backporting further fixes. but 9709bd548f11 just enabled a new feature - which seems against stable rules. Considering that "anything is a CVE", we really need to be cautious about this kind of stuff in stable kernels.
Is it possible to drop 9709bd548f11 from stable instead?
Gabriel, if 9abeae5d4458 has a Fixes: tag it may have been auto seleced for 5.15.y after c0baf9ac0b05 was picked up...
right. It would be really cool if we had a way to append this information after the fact. How would people feel about using git-notes in the kernel tree to support that?
On Jul 23, 2024, at 10:34 AM, Gabriel Krisman Bertazi gabriel@krisman.be wrote:
Amir Goldstein amir73il@gmail.com writes:
On Tue, Jul 23, 2024 at 10:06 AM Ajay Kaher ajay.kaher@broadcom.com wrote:
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
-Ajay
I agree that this is the best approach, because the test has no other way to test if ext4 specifically supports FAN_FS_ERROR.
Chuck,
I wonder how those patches end up in 5.15.y but not in 5.10.y?
I wonder why this was backported to stable in the first place.
I wanted to fix the myriad problems with the NFSD file cache in v5.10.y and v5.15.y. These problems were addressed by a long list of changes from v5.19 to v6.3. I was told that it was preferred that I do a full backport of all NFSD commits from v6.3 to the older kernels.
The fanotify patches were dependencies.
I get there is a lot of refactoring in this series, which might be useful when backporting further fixes. but 9709bd548f11 just enabled a new feature - which seems against stable rules. Considering that "anything is a CVE", we really need to be cautious about this kind of stuff in stable kernels.
These patches were posted for review before they were included. Amir noted there were some changes to fanotify that would require man page updates and some modifications to ltp, but there were no other comments.
Is it possible to drop 9709bd548f11 from stable instead?
I've attempted a revert. It's not clean, but seems to be fixable by hand.
I can run some tests to see if that breaks NFSD.
Gabriel, if 9abeae5d4458 has a Fixes: tag it may have been auto seleced for 5.15.y after c0baf9ac0b05 was picked up...
right. It would be really cool if we had a way to append this information after the fact. How would people feel about using git-notes in the kernel tree to support that?
-- Gabriel Krisman Bertazi
-- Chuck Lever
From: Chuck Lever chuck.lever@oracle.com
Gabriel says:
9709bd548f11 just enabled a new feature - which seems against stable rules. Considering that "anything is a CVE", we really need to be cautious about this kind of stuff in stable kernels.
Is it possible to drop 9709bd548f11 from stable instead?
The revert wasn't clean, but adjusting it to fit was straightforward. This passes NFSD CI, and adds no new failures to the fanotify ltp tests.
Reported-by: Gabriel Krisman Bertazi gabriel@krisman.be Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/notify/fanotify/fanotify_user.c | 4 ---- include/linux/fanotify.h | 6 +----- 2 files changed, 1 insertion(+), 9 deletions(-)
Gabriel, is this what you were thinking?
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index d93418f21386..0d91db1c7249 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1701,10 +1701,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group->priority == FS_PRIO_0) goto fput_and_out;
- if (mask & FAN_FS_ERROR && - mark_type != FAN_MARK_FILESYSTEM) - goto fput_and_out; - /* * Evictable is only relevant for inode marks, because only inode object * can be evicted on memory pressure. diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 558844c8d259..df60b46971c9 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -97,13 +97,9 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
-/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */ -#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR) - /* Events that user can request to be notified on */ #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \ - FANOTIFY_INODE_EVENTS | \ - FANOTIFY_ERROR_EVENTS) + FANOTIFY_INODE_EVENTS)
/* Events that require a permission response from user */ #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
cel@kernel.org writes:
From: Chuck Lever chuck.lever@oracle.com
Gabriel says:
9709bd548f11 just enabled a new feature - which seems against stable rules. Considering that "anything is a CVE", we really need to be cautious about this kind of stuff in stable kernels.
Is it possible to drop 9709bd548f11 from stable instead?
The revert wasn't clean, but adjusting it to fit was straightforward. This passes NFSD CI, and adds no new failures to the fanotify ltp tests.
Reported-by: Gabriel Krisman Bertazi gabriel@krisman.be Signed-off-by: Chuck Lever chuck.lever@oracle.com
fs/notify/fanotify/fanotify_user.c | 4 ---- include/linux/fanotify.h | 6 +----- 2 files changed, 1 insertion(+), 9 deletions(-)
Gabriel, is this what you were thinking?
Thanks Chuck.
This looks good to me as a way to disable it in stable and prevent userspace from trying to use it. Up to fanotify maintainers to decide whether to brign the rest of the series or merge this, but either way:
Reviewed-by: Gabriel Krisman Bertazi gabriel@krisman.be
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index d93418f21386..0d91db1c7249 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1701,10 +1701,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group->priority == FS_PRIO_0) goto fput_and_out;
- if (mask & FAN_FS_ERROR &&
mark_type != FAN_MARK_FILESYSTEM)
goto fput_and_out;
- /*
- Evictable is only relevant for inode marks, because only inode object
- can be evicted on memory pressure.
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 558844c8d259..df60b46971c9 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -97,13 +97,9 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF) -/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */ -#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
/* Events that user can request to be notified on */ #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
FANOTIFY_INODE_EVENTS | \
FANOTIFY_ERROR_EVENTS)
FANOTIFY_INODE_EVENTS)
/* Events that require a permission response from user */ #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
On Wed, Jul 24, 2024 at 2:24 AM Gabriel Krisman Bertazi gabriel@krisman.be wrote:
cel@kernel.org writes:
From: Chuck Lever chuck.lever@oracle.com
Gabriel says:
9709bd548f11 just enabled a new feature - which seems against stable rules. Considering that "anything is a CVE", we really need to be cautious about this kind of stuff in stable kernels.
Is it possible to drop 9709bd548f11 from stable instead?
The revert wasn't clean, but adjusting it to fit was straightforward. This passes NFSD CI, and adds no new failures to the fanotify ltp tests.
Reported-by: Gabriel Krisman Bertazi gabriel@krisman.be Signed-off-by: Chuck Lever chuck.lever@oracle.com
fs/notify/fanotify/fanotify_user.c | 4 ---- include/linux/fanotify.h | 6 +----- 2 files changed, 1 insertion(+), 9 deletions(-)
Gabriel, is this what you were thinking?
Thanks Chuck.
This looks good to me as a way to disable it in stable and prevent userspace from trying to use it. Up to fanotify maintainers to decide whether to bring the rest of the series or merge this,
First of all, the "rest of the series" is already queued for 5.10.y.
I too was a bit surprised from willingness of stable tree maintainers to allow backports of entire features along with Chuck's nfsd backports. I understand the logic, I was just not aware when this shift in stable ideology had happened.
But having backported the entire fanotify 6.1 code as is to 5.15 and 5.10 I see no reason to make an exception for FAN_FS_ERROR. Certainly not because two patches were left out of 5.10.y (and are now queued).
I think that the benefits of fanotify code parity across 6.1 5.15 5.10 outweigh the risk of regressions introduced by this specific feature.
So please do not revert.
Thanks, Amir.
On Tue 23-07-24 12:36:27, Ajay Kaher wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
I know Chuck has been backporting the huge pile of fsnotify changes for stable and he was running LTP so I'm a bit curious if he saw the fanotify22 failure as well. The reason for the test failure seems to be that the combination of features now present in stable has never been upstream which confuses the test. As such I'm not sure if backporting more features to stable is warranted just to fix a broken LTP test... But given the huge pile Chuck has backported already I'm not strongly opposed to backporting a few more, there's just a question where does this stop :)
Honza
On Tue, Jul 23, 2024 at 12:29 PM Jan Kara jack@suse.cz wrote:
On Tue 23-07-24 12:36:27, Ajay Kaher wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
I know Chuck has been backporting the huge pile of fsnotify changes for stable and he was running LTP so I'm a bit curious if he saw the fanotify22 failure as well. The reason for the test failure seems to be that the combination of features now present in stable has never been upstream which confuses the test. As such I'm not sure if backporting more features to stable is warranted just to fix a broken LTP test... But given the huge pile Chuck has backported already I'm not strongly opposed to backporting a few more, there's just a question where does this stop :)
I'm not sure is it exactly "more features". The fanotify patches and ext4 patches that use them where merges as a feature together.
IOW, FAN_FS_ERROR was merged with support for a single fs (ext4) it would be weird to backport the feature with support for zero fs. Also, 5.15.y already has the ext4 patches - not sure why 5.10.y didn't get them.
Thanks, Amir.
On Tue 23-07-24 13:13:41, Amir Goldstein wrote:
On Tue, Jul 23, 2024 at 12:29 PM Jan Kara jack@suse.cz wrote:
On Tue 23-07-24 12:36:27, Ajay Kaher wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
I know Chuck has been backporting the huge pile of fsnotify changes for stable and he was running LTP so I'm a bit curious if he saw the fanotify22 failure as well. The reason for the test failure seems to be that the combination of features now present in stable has never been upstream which confuses the test. As such I'm not sure if backporting more features to stable is warranted just to fix a broken LTP test... But given the huge pile Chuck has backported already I'm not strongly opposed to backporting a few more, there's just a question where does this stop :)
I'm not sure is it exactly "more features". The fanotify patches and ext4 patches that use them where merges as a feature together.
IOW, FAN_FS_ERROR was merged with support for a single fs (ext4) it would be weird to backport the feature with support for zero fs. Also, 5.15.y already has the ext4 patches - not sure why 5.10.y didn't get them.
I forgot 5.15.y got all the patches. Ok, then pushing them to 5.10.y makes sense as well.
Honza
On Jul 23, 2024, at 5:29 AM, Jan Kara jack@suse.cz wrote:
On Tue 23-07-24 12:36:27, Ajay Kaher wrote:
[ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ]
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events.
These events are limited to filesystem marks, so check it is the case in the syscall handler.
Greg,
Without 9709bd548f11 in v5.10.y skips LTP fanotify22 test case, as: fanotify22.c:312: TCONF: FAN_FS_ERROR not supported in kernel
With 9709bd548f11 in v5.10.220, LTP fanotify22 is failing because of timeout as no notification. To fix need to merge following two upstream commit to v5.10:
124e7c61deb27d758df5ec0521c36cf08d417f7a: 0001-ext4_fix_error_code_saved_on_super_block_during_file_system.patch https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
9a089b21f79b47eed240d4da7ea0d049de7c9b4d: 0001-ext4_Send_notifications_on_error.patch Link: https://lore.kernel.org/stable/1721717240-8786-1-git-send-email-ajay.kaher@b...
I know Chuck has been backporting the huge pile of fsnotify changes for stable and he was running LTP so I'm a bit curious if he saw the fanotify22 failure as well.
Fwiw, I didn't see any new failures for either 5.10.y or 5.15.y.
The reason for the test failure seems to be that the combination of features now present in stable has never been upstream which confuses the test. As such I'm not sure if backporting more features to stable is warranted just to fix a broken LTP test... But given the huge pile Chuck has backported already I'm not strongly opposed to backporting a few more, there's just a question where does this stop :)
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR
-- Chuck Lever
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b40887e10dcacc5e8ae3c1a99dcba20877c4831b ]
Introduce a single tracepoint that can replace simple dprintk call sites in upper layer "rpc_call_done" callbacks. Example:
kworker/u24:2-1254 [001] 771.026677: rpc_stats_latency: task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555 kworker/u24:2-1254 [001] 771.026677: rpc_task_call_done: task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done kworker/u24:2-1254 [001] 771.026678: rpcb_setport: task:00000001@00000002 status=0 port=20048
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/clntproc.c | 3 --- fs/lockd/svc4proc.c | 2 -- fs/lockd/svcproc.c | 2 -- fs/nfs/filelayout/filelayout.c | 2 -- fs/nfs/flexfilelayout/flexfilelayout.c | 2 -- fs/nfs/pagelist.c | 3 --- fs/nfs/write.c | 3 --- include/trace/events/sunrpc.h | 1 + net/sunrpc/sched.c | 1 + 9 files changed, 2 insertions(+), 17 deletions(-)
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index b11f2afa84f1f..99fffc9cb9585 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -794,9 +794,6 @@ static void nlmclnt_cancel_callback(struct rpc_task *task, void *data) goto retry_cancel; }
- dprintk("lockd: cancel status %u (task %u)\n", - status, task->tk_pid); - switch (status) { case NLM_LCK_GRANTED: case NLM_LCK_DENIED_GRACE_PERIOD: diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index e10ae2c41279e..176b468a61c75 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -269,8 +269,6 @@ nlm4svc_proc_granted(struct svc_rqst *rqstp) */ static void nlm4svc_callback_exit(struct rpc_task *task, void *data) { - dprintk("lockd: %5u callback returned %d\n", task->tk_pid, - -task->tk_status); }
static void nlm4svc_callback_release(void *data) diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 99696d3f6dd66..4dc1b40a489a2 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -301,8 +301,6 @@ nlmsvc_proc_granted(struct svc_rqst *rqstp) */ static void nlmsvc_callback_exit(struct rpc_task *task, void *data) { - dprintk("lockd: %5u callback returned %d\n", task->tk_pid, - -task->tk_status); }
void nlmsvc_release_call(struct nlm_rqst *call) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 45eec08ec904f..2ed8b6885b091 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -293,8 +293,6 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data;
- dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); - if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs41_sequence_done(task, &hdr->res.seq_res); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index f2ae271fe7ec7..a263bfec4244d 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1419,8 +1419,6 @@ static void ff_layout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data;
- dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); - if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs4_sequence_done(task, &hdr->res.seq_res); diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 17fef6eb490c5..d79a3b6cb0701 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -870,9 +870,6 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata) struct nfs_pgio_header *hdr = calldata; struct inode *inode = hdr->inode;
- dprintk("NFS: %s: %5u, (status %d)\n", __func__, - task->tk_pid, task->tk_status); - if (hdr->rw_ops->rw_done(task, hdr, inode) != 0) return; if (task->tk_status < 0) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 4cf0606919794..2bde35921f2b2 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1809,9 +1809,6 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata) { struct nfs_commit_data *data = calldata;
- dprintk("NFS: %5u nfs_commit_done (status %d)\n", - task->tk_pid, task->tk_status); - /* Call the NFS version-specific code */ NFS_PROTO(data->inode)->commit_done(task, data); trace_nfs_commit_done(task, data); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index fce071f39f51f..56e4a57d25382 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -394,6 +394,7 @@ DEFINE_RPC_RUNNING_EVENT(complete); DEFINE_RPC_RUNNING_EVENT(timeout); DEFINE_RPC_RUNNING_EVENT(signalled); DEFINE_RPC_RUNNING_EVENT(end); +DEFINE_RPC_RUNNING_EVENT(call_done);
DECLARE_EVENT_CLASS(rpc_task_queued,
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index a00890962e115..a4c9d410eb8d5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -821,6 +821,7 @@ void rpc_exit_task(struct rpc_task *task) else if (task->tk_client) rpc_count_iostats(task, task->tk_client->cl_metrics); if (task->tk_ops->rpc_call_done != NULL) { + trace_rpc_task_call_done(task, task->tk_ops->rpc_call_done); task->tk_ops->rpc_call_done(task, task->tk_calldata); if (task->tk_action != NULL) { /* Always release the RPC slot and buffer memory */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8847ecc9274a14114385d1cb4030326baa0766eb ]
DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting).
I added a trace_printk() in prune_bucket() to see just how long it takes to prune. Here are two ends of the spectrum:
prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining ... prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
Pruning latency is noticeable on fast transports with fast storage. By noticeable, I mean that the latency measured here in the worst case is the same order of magnitude as the round trip time for cached server operations.
We could do something like moving expired entries to an expired list and then free them later instead of freeing them right in prune_bucket(). But simply limiting the number of entries that can be pruned by a lookup is simple and retains more entries in the cache, making the DRC somewhat more effective.
Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
Before:
write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
WRITE: 1848726 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds)
After:
write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
WRITE: 1891144 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds)
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 96cdf77925f33..6e0b6f3148dca 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -241,8 +241,8 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); }
-static long -prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, + unsigned int max) { struct svc_cacherep *rp, *tmp; long freed = 0; @@ -258,11 +258,17 @@ prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) time_before(jiffies, rp->c_timestamp + RC_EXPIRE)) break; nfsd_reply_cache_free_locked(b, rp, nn); - freed++; + if (max && freed++ > max) + break; } return freed; }
+static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +{ + return prune_bucket(b, nn, 3); +} + /* * Walk the LRU list and prune off entries that are older than RC_EXPIRE. * Also prune the oldest ones when the total exceeds the max number of entries. @@ -279,7 +285,7 @@ prune_cache_entries(struct nfsd_net *nn) if (list_empty(&b->lru_head)) continue; spin_lock(&b->cache_lock); - freed += prune_bucket(b, nn); + freed += prune_bucket(b, nn, 0); spin_unlock(&b->cache_lock); } return freed; @@ -453,8 +459,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) atomic_inc(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
- /* go ahead and prune the cache */ - prune_bucket(b, nn); + nfsd_prune_bucket(b, nn);
out_unlock: spin_unlock(&b->cache_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit ef5825e3cf0d0af657f5fb4dd86d750ed42fee0a ]
A small part of the declaration concerning filehandle format are currently in the "uapi" include directory: include/uapi/linux/nfsd/nfsfh.h
There is a lot more to the filehandle format, including "enum fid_type" and "enum nfsd_fsid" which are not exported via "uapi".
This small part of the filehandle definition is of minimal use outside of the kernel, and I can find no evidence that an other code is using it. Certainly nfs-utils and wireshark (The most likely candidates) do not use these declarations.
So move it out of "uapi" by copying the content from include/uapi/linux/nfsd/nfsfh.h into fs/nfsd/nfsfh.h
A few unnecessary "#include" directives are not copied, and neither is the #define of fh_auth, which is annotated as being for userspace only.
The copyright claims in the uapi file are identical to those in the nfsd file, so there is no need to copy those.
The "__u32" style integer types are only needed in "uapi". In kernel-only code we can use the more familiar "u32" style.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.h | 97 ++++++++++++++++++++++++++- fs/nfsd/vfs.c | 1 + include/uapi/linux/nfsd/nfsfh.h | 115 -------------------------------- 3 files changed, 97 insertions(+), 116 deletions(-) delete mode 100644 include/uapi/linux/nfsd/nfsfh.h
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 6106697adc04b..ad47f16676a8c 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -10,9 +10,104 @@
#include <linux/crc32.h> #include <linux/sunrpc/svc.h> -#include <uapi/linux/nfsd/nfsfh.h> #include <linux/iversion.h> #include <linux/exportfs.h> +#include <linux/nfs4.h> + + +/* + * This is the old "dentry style" Linux NFSv2 file handle. + * + * The xino and xdev fields are currently used to transport the + * ino/dev of the exported inode. + */ +struct nfs_fhbase_old { + u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ + u32 fb_ino; /* our inode number */ + u32 fb_dirino; /* dir inode number, 0 for directories */ + u32 fb_dev; /* our device */ + u32 fb_xdev; + u32 fb_xino; + u32 fb_generation; +}; + +/* + * This is the new flexible, extensible style NFSv2/v3/v4 file handle. + * + * The file handle starts with a sequence of four-byte words. + * The first word contains a version number (1) and three descriptor bytes + * that tell how the remaining 3 variable length fields should be handled. + * These three bytes are auth_type, fsid_type and fileid_type. + * + * All four-byte values are in host-byte-order. + * + * The auth_type field is deprecated and must be set to 0. + * + * The fsid_type identifies how the filesystem (or export point) is + * encoded. + * Current values: + * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number + * NOTE: we cannot use the kdev_t device id value, because kdev_t.h + * says we mustn't. We must break it up and reassemble. + * 1 - 4 byte user specified identifier + * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED + * 3 - 4 byte device id, encoded for user-space, 4 byte inode number + * 4 - 4 byte inode number and 4 byte uuid + * 5 - 8 byte uuid + * 6 - 16 byte uuid + * 7 - 8 byte inode number and 16 byte uuid + * + * The fileid_type identified how the file within the filesystem is encoded. + * The values for this field are filesystem specific, exccept that + * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' + * in include/linux/exportfs.h for currently registered values. + */ +struct nfs_fhbase_new { + union { + struct { + u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + u8 fb_auth_type_aux; + u8 fb_fsid_type_aux; + u8 fb_fileid_type_aux; + u32 fb_auth[1]; + /* u32 fb_fsid[0]; floating */ + /* u32 fb_fileid[0]; floating */ + }; + struct { + u8 fb_version; /* == 1, even => nfs_fhbase_old */ + u8 fb_auth_type; + u8 fb_fsid_type; + u8 fb_fileid_type; + u32 fb_auth_flex[]; /* flexible-array member */ + }; + }; +}; + +struct knfsd_fh { + unsigned int fh_size; /* significant for NFSv3. + * Points to the current size while building + * a new file handle + */ + union { + struct nfs_fhbase_old fh_old; + u32 fh_pad[NFS4_FHSIZE/4]; + struct nfs_fhbase_new fh_new; + } fh_base; +}; + +#define ofh_dcookie fh_base.fh_old.fb_dcookie +#define ofh_ino fh_base.fh_old.fb_ino +#define ofh_dirino fh_base.fh_old.fb_dirino +#define ofh_dev fh_base.fh_old.fb_dev +#define ofh_xdev fh_base.fh_old.fb_xdev +#define ofh_xino fh_base.fh_old.fb_xino +#define ofh_generation fh_base.fh_old.fb_generation + +#define fh_version fh_base.fh_new.fb_version +#define fh_fsid_type fh_base.fh_new.fb_fsid_type +#define fh_auth_type fh_base.fh_new.fb_auth_type +#define fh_fileid_type fh_base.fh_new.fb_fileid_type +#define fh_fsid fh_base.fh_new.fb_auth_flex
static inline __u32 ino_t_to_u32(ino_t ino) { diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2c493937dd5ec..05b5f7e241e70 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -244,6 +244,7 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 + * NeilBrown neilb@cse.unsw.edu.au */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h deleted file mode 100644 index e29e8accc4f4d..0000000000000 --- a/include/uapi/linux/nfsd/nfsfh.h +++ /dev/null @@ -1,115 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * This file describes the layout of the file handles as passed - * over the wire. - * - * Copyright (C) 1995, 1996, 1997 Olaf Kirch okir@monad.swb.de - */ - -#ifndef _UAPI_LINUX_NFSD_FH_H -#define _UAPI_LINUX_NFSD_FH_H - -#include <linux/types.h> -#include <linux/nfs.h> -#include <linux/nfs2.h> -#include <linux/nfs3.h> -#include <linux/nfs4.h> - -/* - * This is the old "dentry style" Linux NFSv2 file handle. - * - * The xino and xdev fields are currently used to transport the - * ino/dev of the exported inode. - */ -struct nfs_fhbase_old { - __u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ - __u32 fb_ino; /* our inode number */ - __u32 fb_dirino; /* dir inode number, 0 for directories */ - __u32 fb_dev; /* our device */ - __u32 fb_xdev; - __u32 fb_xino; - __u32 fb_generation; -}; - -/* - * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * - * The file handle starts with a sequence of four-byte words. - * The first word contains a version number (1) and three descriptor bytes - * that tell how the remaining 3 variable length fields should be handled. - * These three bytes are auth_type, fsid_type and fileid_type. - * - * All four-byte values are in host-byte-order. - * - * The auth_type field is deprecated and must be set to 0. - * - * The fsid_type identifies how the filesystem (or export point) is - * encoded. - * Current values: - * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number - * NOTE: we cannot use the kdev_t device id value, because kdev_t.h - * says we mustn't. We must break it up and reassemble. - * 1 - 4 byte user specified identifier - * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED - * 3 - 4 byte device id, encoded for user-space, 4 byte inode number - * 4 - 4 byte inode number and 4 byte uuid - * 5 - 8 byte uuid - * 6 - 16 byte uuid - * 7 - 8 byte inode number and 16 byte uuid - * - * The fileid_type identified how the file within the filesystem is encoded. - * The values for this field are filesystem specific, exccept that - * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' - * in include/linux/exportfs.h for currently registered values. - */ -struct nfs_fhbase_new { - union { - struct { - __u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type_aux; - __u8 fb_fsid_type_aux; - __u8 fb_fileid_type_aux; - __u32 fb_auth[1]; - /* __u32 fb_fsid[0]; floating */ - /* __u32 fb_fileid[0]; floating */ - }; - struct { - __u8 fb_version; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type; - __u8 fb_fsid_type; - __u8 fb_fileid_type; - __u32 fb_auth_flex[]; /* flexible-array member */ - }; - }; -}; - -struct knfsd_fh { - unsigned int fh_size; /* significant for NFSv3. - * Points to the current size while building - * a new file handle - */ - union { - struct nfs_fhbase_old fh_old; - __u32 fh_pad[NFS4_FHSIZE/4]; - struct nfs_fhbase_new fh_new; - } fh_base; -}; - -#define ofh_dcookie fh_base.fh_old.fb_dcookie -#define ofh_ino fh_base.fh_old.fb_ino -#define ofh_dirino fh_base.fh_old.fb_dirino -#define ofh_dev fh_base.fh_old.fb_dev -#define ofh_xdev fh_base.fh_old.fb_xdev -#define ofh_xino fh_base.fh_old.fb_xino -#define ofh_generation fh_base.fh_old.fb_generation - -#define fh_version fh_base.fh_new.fb_version -#define fh_fsid_type fh_base.fh_new.fb_fsid_type -#define fh_auth_type fh_base.fh_new.fb_auth_type -#define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth_flex - -/* Do not use, provided for userspace compatiblity. */ -#define fh_auth fh_base.fh_new.fb_auth - -#endif /* _UAPI_LINUX_NFSD_FH_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit c645a883df34ee10b884ec921e850def54b7f461 ]
Filehandles not in the "new" or "version 1" format have not been handed out for new mounts since Linux 2.4 which was released 20 years ago. I think it is safe to say that no such file handles are still in use, and that we can drop support for them.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.c | 160 +++++++++++++++--------------------------------- fs/nfsd/nfsfh.h | 34 +--------- 2 files changed, 54 insertions(+), 140 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 04930056222b7..7e5a508173a04 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -153,11 +153,12 @@ static inline __be32 check_pseudo_root(struct svc_rqst *rqstp, static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; - struct fid *fid = NULL, sfid; + struct fid *fid = NULL; struct svc_export *exp; struct dentry *dentry; int fileid_type; int data_left = fh->fh_size/4; + int len; __be32 error;
error = nfserr_stale; @@ -166,48 +167,35 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers == 4 && fh->fh_size == 0) return nfserr_nofilehandle;
- if (fh->fh_version == 1) { - int len; - - if (--data_left < 0) - return error; - if (fh->fh_auth_type != 0) - return error; - len = key_len(fh->fh_fsid_type) / 4; - if (len == 0) - return error; - if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { - /* deprecated, convert to type 3 */ - len = key_len(FSID_ENCODE_DEV)/4; - fh->fh_fsid_type = FSID_ENCODE_DEV; - /* - * struct knfsd_fh uses host-endian fields, which are - * sometimes used to hold net-endian values. This - * confuses sparse, so we must use __force here to - * keep it from complaining. - */ - fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), - ntohl((__force __be32)fh->fh_fsid[1]))); - fh->fh_fsid[1] = fh->fh_fsid[2]; - } - data_left -= len; - if (data_left < 0) - return error; - exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); - fid = (struct fid *)(fh->fh_fsid + len); - } else { - __u32 tfh[2]; - dev_t xdev; - ino_t xino; - - if (fh->fh_size != NFS_FHSIZE) - return error; - /* assume old filehandle format */ - xdev = old_decode_dev(fh->ofh_xdev); - xino = u32_to_ino_t(fh->ofh_xino); - mk_fsid(FSID_DEV, tfh, xdev, xino, 0, NULL); - exp = rqst_exp_find(rqstp, FSID_DEV, tfh); + if (fh->fh_version != 1) + return error; + + if (--data_left < 0) + return error; + if (fh->fh_auth_type != 0) + return error; + len = key_len(fh->fh_fsid_type) / 4; + if (len == 0) + return error; + if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { + /* deprecated, convert to type 3 */ + len = key_len(FSID_ENCODE_DEV)/4; + fh->fh_fsid_type = FSID_ENCODE_DEV; + /* + * struct knfsd_fh uses host-endian fields, which are + * sometimes used to hold net-endian values. This + * confuses sparse, so we must use __force here to + * keep it from complaining. + */ + fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), + ntohl((__force __be32)fh->fh_fsid[1]))); + fh->fh_fsid[1] = fh->fh_fsid[2]; } + data_left -= len; + if (data_left < 0) + return error; + exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); + fid = (struct fid *)(fh->fh_fsid + len);
error = nfserr_stale; if (IS_ERR(exp)) { @@ -252,18 +240,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers > 2) error = nfserr_badhandle;
- if (fh->fh_version != 1) { - sfid.i32.ino = fh->ofh_ino; - sfid.i32.gen = fh->ofh_generation; - sfid.i32.parent_ino = fh->ofh_dirino; - fid = &sfid; - data_left = 3; - if (fh->ofh_dirino == 0) - fileid_type = FILEID_INO32_GEN; - else - fileid_type = FILEID_INO32_GEN_PARENT; - } else - fileid_type = fh->fh_fileid_type; + fileid_type = fh->fh_fileid_type;
if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); @@ -451,20 +428,6 @@ static void _fh_update(struct svc_fh *fhp, struct svc_export *exp, } }
-/* - * for composing old style file handles - */ -static inline void _fh_update_old(struct dentry *dentry, - struct svc_export *exp, - struct knfsd_fh *fh) -{ - fh->ofh_ino = ino_t_to_u32(d_inode(dentry)->i_ino); - fh->ofh_generation = d_inode(dentry)->i_generation; - if (d_is_dir(dentry) || - (exp->ex_flags & NFSEXP_NOSUBTREECHECK)) - fh->ofh_dirino = 0; -} - static bool is_root_export(struct svc_export *exp) { return exp->ex_path.dentry == exp->ex_path.dentry->d_sb->s_root; @@ -561,9 +524,6 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, /* ref_fh is a reference file handle. * if it is non-null and for the same filesystem, then we should compose * a filehandle which is of the same version, where possible. - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca - * Then create a 32byte filehandle using nfs_fhbase_old - * */
struct inode * inode = d_inode(dentry); @@ -599,35 +559,21 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, fhp->fh_dentry = dget(dentry); /* our internal copy */ fhp->fh_export = exp_get(exp);
- if (fhp->fh_handle.fh_version == 0xca) { - /* old style filehandle please */ - memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE); - fhp->fh_handle.fh_size = NFS_FHSIZE; - fhp->fh_handle.ofh_dcookie = 0xfeebbaca; - fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev); - fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev; - fhp->fh_handle.ofh_xino = - ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino); - fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry)); - if (inode) - _fh_update_old(dentry, exp, &fhp->fh_handle); - } else { - fhp->fh_handle.fh_size = - key_len(fhp->fh_handle.fh_fsid_type) + 4; - fhp->fh_handle.fh_auth_type = 0; - - mk_fsid(fhp->fh_handle.fh_fsid_type, - fhp->fh_handle.fh_fsid, - ex_dev, - d_inode(exp->ex_path.dentry)->i_ino, - exp->ex_fsid, exp->ex_uuid); - - if (inode) - _fh_update(fhp, exp, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { - fh_put(fhp); - return nfserr_opnotsupp; - } + fhp->fh_handle.fh_size = + key_len(fhp->fh_handle.fh_fsid_type) + 4; + fhp->fh_handle.fh_auth_type = 0; + + mk_fsid(fhp->fh_handle.fh_fsid_type, + fhp->fh_handle.fh_fsid, + ex_dev, + d_inode(exp->ex_path.dentry)->i_ino, + exp->ex_fsid, exp->ex_uuid); + + if (inode) + _fh_update(fhp, exp, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { + fh_put(fhp); + return nfserr_opnotsupp; }
return 0; @@ -648,16 +594,12 @@ fh_update(struct svc_fh *fhp) dentry = fhp->fh_dentry; if (d_really_is_negative(dentry)) goto out_negative; - if (fhp->fh_handle.fh_version != 1) { - _fh_update_old(dentry, fhp->fh_export, &fhp->fh_handle); - } else { - if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) - return 0; + if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) + return 0;
- _fh_update(fhp, fhp->fh_export, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) - return nfserr_opnotsupp; - } + _fh_update(fhp, fhp->fh_export, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) + return nfserr_opnotsupp; return 0; out_bad: printk(KERN_ERR "fh_update: fh not verified!\n"); diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index ad47f16676a8c..8b5587f274a7d 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -14,26 +14,7 @@ #include <linux/exportfs.h> #include <linux/nfs4.h>
- -/* - * This is the old "dentry style" Linux NFSv2 file handle. - * - * The xino and xdev fields are currently used to transport the - * ino/dev of the exported inode. - */ -struct nfs_fhbase_old { - u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ - u32 fb_ino; /* our inode number */ - u32 fb_dirino; /* dir inode number, 0 for directories */ - u32 fb_dev; /* our device */ - u32 fb_xdev; - u32 fb_xino; - u32 fb_generation; -}; - /* - * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * * The file handle starts with a sequence of four-byte words. * The first word contains a version number (1) and three descriptor bytes * that tell how the remaining 3 variable length fields should be handled. @@ -57,7 +38,7 @@ struct nfs_fhbase_old { * 6 - 16 byte uuid * 7 - 8 byte inode number and 16 byte uuid * - * The fileid_type identified how the file within the filesystem is encoded. + * The fileid_type identifies how the file within the filesystem is encoded. * The values for this field are filesystem specific, exccept that * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' * in include/linux/exportfs.h for currently registered values. @@ -65,7 +46,7 @@ struct nfs_fhbase_old { struct nfs_fhbase_new { union { struct { - u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + u8 fb_version_aux; /* == 1 */ u8 fb_auth_type_aux; u8 fb_fsid_type_aux; u8 fb_fileid_type_aux; @@ -74,7 +55,7 @@ struct nfs_fhbase_new { /* u32 fb_fileid[0]; floating */ }; struct { - u8 fb_version; /* == 1, even => nfs_fhbase_old */ + u8 fb_version; /* == 1 */ u8 fb_auth_type; u8 fb_fsid_type; u8 fb_fileid_type; @@ -89,20 +70,11 @@ struct knfsd_fh { * a new file handle */ union { - struct nfs_fhbase_old fh_old; u32 fh_pad[NFS4_FHSIZE/4]; struct nfs_fhbase_new fh_new; } fh_base; };
-#define ofh_dcookie fh_base.fh_old.fb_dcookie -#define ofh_ino fh_base.fh_old.fb_ino -#define ofh_dirino fh_base.fh_old.fb_dirino -#define ofh_dev fh_base.fh_old.fb_dev -#define ofh_xdev fh_base.fh_old.fb_xdev -#define ofh_xino fh_base.fh_old.fb_xino -#define ofh_generation fh_base.fh_old.fb_generation - #define fh_version fh_base.fh_new.fb_version #define fh_fsid_type fh_base.fh_new.fb_fsid_type #define fh_auth_type fh_base.fh_new.fb_auth_type
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit d8b26071e65e80a348602b939e333242f989221b ]
Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a struct) and are accessed using macros like:
#define fh_FOO fh_base.fh_new.fb_FOO
This patch makes the union and struct anonymous, so that "fh_FOO" can be a name directly within 'struct knfsd_fh' and the #defines aren't needed.
The file handle as a whole is sometimes accessed as "fh_base" or "fh_base.fh_pad", neither of which are particularly helpful names. As the struct holding the filehandle is now anonymous, we cannot use the name of that, so we union it with 'fh_raw' and use that where the raw filehandle is needed. fh_raw also ensure the structure is large enough for the largest possible filehandle.
fh_raw is a 'char' array, removing any need to cast it for memcpy etc.
SVCFH_fmt() is simplified using the "%ph" printk format. This changes the appearance of filehandles in dprintk() debugging, making them a little more precise.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/flexfilelayout.c | 2 +- fs/nfsd/lockd.c | 2 +- fs/nfsd/nfs3xdr.c | 4 ++-- fs/nfsd/nfs4callback.c | 2 +- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4state.c | 4 ++-- fs/nfsd/nfs4xdr.c | 4 ++-- fs/nfsd/nfsctl.c | 6 ++--- fs/nfsd/nfsfh.c | 13 ++++------- fs/nfsd/nfsfh.h | 50 ++++++++++++---------------------------- fs/nfsd/nfsxdr.c | 4 ++-- 11 files changed, 35 insertions(+), 60 deletions(-)
diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c index db7ef07ae50c9..2e2f1d5e9f623 100644 --- a/fs/nfsd/flexfilelayout.c +++ b/fs/nfsd/flexfilelayout.c @@ -61,7 +61,7 @@ nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp, goto out_error;
fl->fh.size = fhp->fh_handle.fh_size; - memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size); + memcpy(fl->fh.data, &fhp->fh_handle.fh_raw, fl->fh.size);
/* Give whole file layout segments */ seg->offset = 0; diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c index 606fa155c28ad..46a7f9b813e52 100644 --- a/fs/nfsd/lockd.c +++ b/fs/nfsd/lockd.c @@ -35,7 +35,7 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp, /* must initialize before using! but maxsize doesn't matter */ fh_init(&fh,0); fh.fh_handle.fh_size = f->size; - memcpy((char*)&fh.fh_handle.fh_base, f->data, f->size); + memcpy(&fh.fh_handle.fh_raw, f->data, f->size); fh.fh_export = NULL;
access = (mode == O_WRONLY) ? NFSD_MAY_WRITE : NFSD_MAY_READ; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0a5ebc52e6a9c..3d37923afb06c 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -92,7 +92,7 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return false; fh_init(fhp, NFS3_FHSIZE); fhp->fh_handle.fh_size = size; - memcpy(&fhp->fh_handle.fh_base, p, size); + memcpy(&fhp->fh_handle.fh_raw, p, size);
return true; } @@ -131,7 +131,7 @@ svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) *p++ = cpu_to_be32(size); if (size) p[XDR_QUADLEN(size) - 1] = 0; - memcpy(p, &fhp->fh_handle.fh_base, size); + memcpy(p, &fhp->fh_handle.fh_raw, size);
return true; } diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 97f517e9b4189..e1272a7f45220 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -121,7 +121,7 @@ static void encode_nfs_fh4(struct xdr_stream *xdr, const struct knfsd_fh *fh)
BUG_ON(length > NFS4_FHSIZE); p = xdr_reserve_space(xdr, 4 + length); - xdr_encode_opaque(p, &fh->fh_base, length); + xdr_encode_opaque(p, &fh->fh_raw, length); }
/* diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1f840c72e9780..d55d9b9dbafb1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -519,7 +519,7 @@ nfsd4_putfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
fh_put(&cstate->current_fh); cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen; - memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval, + memcpy(&cstate->current_fh.fh_handle.fh_raw, putfh->pf_fhval, putfh->pf_fhlen); ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS); #ifdef CONFIG_NFSD_V4_2_INTER_SSC @@ -1379,7 +1379,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, s_fh = &cstate->save_fh;
copy->c_fh.size = s_fh->fh_handle.fh_size; - memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size); + memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_raw, copy->c_fh.size); copy->stateid.seqid = cpu_to_be32(s_stid->si_generation); memcpy(copy->stateid.other, (void *)&s_stid->si_opaque, sizeof(stateid_opaque_t)); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e9ac77c28741e..68562564be6b2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1022,7 +1022,7 @@ static int delegation_blocked(struct knfsd_fh *fh) } spin_unlock(&blocked_delegations_lock); } - hash = jhash(&fh->fh_base, fh->fh_size, 0); + hash = jhash(&fh->fh_raw, fh->fh_size, 0); if (test_bit(hash&255, bd->set[0]) && test_bit((hash>>8)&255, bd->set[0]) && test_bit((hash>>16)&255, bd->set[0])) @@ -1041,7 +1041,7 @@ static void block_delegations(struct knfsd_fh *fh) u32 hash; struct bloom_pair *bd = &blocked_delegations;
- hash = jhash(&fh->fh_base, fh->fh_size, 0); + hash = jhash(&fh->fh_raw, fh->fh_size, 0);
spin_lock(&blocked_delegations_lock); __set_bit(hash&255, bd->set[bd->new]); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 262c6fec56aa6..899d961372c06 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3110,7 +3110,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4); if (!p) goto out_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, fhp->fh_handle.fh_size); } if (bmval0 & FATTR4_WORD0_FILEID) { @@ -3681,7 +3681,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh p = xdr_reserve_space(xdr, len + 4); if (!p) return nfserr_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, len); + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, len); return 0; }
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index cb73c12925629..d0761ca8cb542 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -395,12 +395,12 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size) auth_domain_put(dom); if (len) return len; - + mesg = buf; len = SIMPLE_TRANSACTION_LIMIT; - qword_addhex(&mesg, &len, (char*)&fh.fh_base, fh.fh_size); + qword_addhex(&mesg, &len, fh.fh_raw, fh.fh_size); mesg[-1] = '\n'; - return mesg - buf; + return mesg - buf; }
/* diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 7e5a508173a04..34e201b6eb623 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -639,16 +639,11 @@ fh_put(struct svc_fh *fhp) char * SVCFH_fmt(struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; + static char buf[2+1+1+64*3+1];
- static char buf[80]; - sprintf(buf, "%d: %08x %08x %08x %08x %08x %08x", - fh->fh_size, - fh->fh_base.fh_pad[0], - fh->fh_base.fh_pad[1], - fh->fh_base.fh_pad[2], - fh->fh_base.fh_pad[3], - fh->fh_base.fh_pad[4], - fh->fh_base.fh_pad[5]); + if (fh->fh_size < 0 || fh->fh_size> 64) + return "bad-fh"; + sprintf(buf, "%d: %*ph", fh->fh_size, fh->fh_size, fh->fh_raw); return buf; }
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 8b5587f274a7d..d11e4b6870d68 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -43,44 +43,24 @@ * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' * in include/linux/exportfs.h for currently registered values. */ -struct nfs_fhbase_new { - union { - struct { - u8 fb_version_aux; /* == 1 */ - u8 fb_auth_type_aux; - u8 fb_fsid_type_aux; - u8 fb_fileid_type_aux; - u32 fb_auth[1]; - /* u32 fb_fsid[0]; floating */ - /* u32 fb_fileid[0]; floating */ - }; - struct { - u8 fb_version; /* == 1 */ - u8 fb_auth_type; - u8 fb_fsid_type; - u8 fb_fileid_type; - u32 fb_auth_flex[]; /* flexible-array member */ - }; - }; -};
struct knfsd_fh { - unsigned int fh_size; /* significant for NFSv3. - * Points to the current size while building - * a new file handle + unsigned int fh_size; /* + * Points to the current size while + * building a new file handle. */ union { - u32 fh_pad[NFS4_FHSIZE/4]; - struct nfs_fhbase_new fh_new; - } fh_base; + char fh_raw[NFS4_FHSIZE]; + struct { + u8 fh_version; /* == 1 */ + u8 fh_auth_type; /* deprecated */ + u8 fh_fsid_type; + u8 fh_fileid_type; + u32 fh_fsid[]; /* flexible-array member */ + }; + }; };
-#define fh_version fh_base.fh_new.fb_version -#define fh_fsid_type fh_base.fh_new.fb_fsid_type -#define fh_auth_type fh_base.fh_new.fb_auth_type -#define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth_flex - static inline __u32 ino_t_to_u32(ino_t ino) { return (__u32) ino; @@ -255,7 +235,7 @@ static inline void fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src) { dst->fh_size = src->fh_size; - memcpy(&dst->fh_base, &src->fh_base, src->fh_size); + memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size); }
static __inline__ struct svc_fh * @@ -270,7 +250,7 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) { if (fh1->fh_size != fh2->fh_size) return false; - if (memcmp(fh1->fh_base.fh_pad, fh2->fh_base.fh_pad, fh1->fh_size) != 0) + if (memcmp(fh1->fh_raw, fh2->fh_raw, fh1->fh_size) != 0) return false; return true; } @@ -294,7 +274,7 @@ static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) */ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { - return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size); + return ~crc32_le(0xFFFFFFFF, fh->fh_raw, fh->fh_size); } #else static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index a06c05fe3b421..082449c7d0dbf 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -64,7 +64,7 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) if (!p) return false; fh_init(fhp, NFS_FHSIZE); - memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); + memcpy(&fhp->fh_handle.fh_raw, p, NFS_FHSIZE); fhp->fh_handle.fh_size = NFS_FHSIZE;
return true; @@ -78,7 +78,7 @@ svcxdr_encode_fhandle(struct xdr_stream *xdr, const struct svc_fh *fhp) p = xdr_reserve_space(xdr, NFS_FHSIZE); if (!p) return false; - memcpy(p, &fhp->fh_handle.fh_base, NFS_FHSIZE); + memcpy(p, &fhp->fh_handle.fh_raw, NFS_FHSIZE);
return true; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.king@canonical.com
[ Upstream commit 8e70bf27fd20cc17e87150327a640e546bfbee64 ]
Pointer ni is being initialized with plain integer zero. Fix this by initializing with NULL.
Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4state.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d55d9b9dbafb1..ebb6d8471e8d7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1181,7 +1181,7 @@ extern void nfs_sb_deactive(struct super_block *sb); static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) { - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *work = NULL; struct nfsd4_ssc_umount_item *tmp; DEFINE_WAIT(wait); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 68562564be6b2..0b39ed1568d40 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5573,7 +5573,7 @@ static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) { bool do_wakeup = false; - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *tmp;
spin_lock(&nn->nfsd_ssc_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit dae9a6cab8009e526570e7477ce858dcdfeb256e ]
Refactor.
Now that the NFSv2 and NFSv3 XDR decoders have been converted to use xdr_streams, the WRITE decoder functions can use xdr_stream_subsegment() to extract the WRITE payload into its own xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.
That makes it possible to pass the first kvec, pages array + length, page_base, and total payload length via a single function parameter.
The payload's page_base is not yet assigned or used, but will be in subsequent patches.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 3 +-- fs/nfsd/nfs3xdr.c | 12 ++---------- fs/nfsd/nfs4proc.c | 3 +-- fs/nfsd/nfsproc.c | 3 +-- fs/nfsd/nfsxdr.c | 9 +-------- fs/nfsd/xdr.h | 2 +- fs/nfsd/xdr3.h | 2 +- include/linux/sunrpc/svc.h | 3 +-- net/sunrpc/svc.c | 11 ++++++----- 9 files changed, 15 insertions(+), 33 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index be1ed33e424e0..5abb5c8e2cd21 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -206,8 +206,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; - nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3d37923afb06c..267e56f218af7 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -621,9 +621,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp); - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; - size_t remaining;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) return 0; @@ -641,17 +638,12 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) /* request sanity */ if (args->count != args->len) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) - return 0; if (args->count > max_blocksize) { args->count = max_blocksize; args->len = max_blocksize; } - - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) + return 0;
return 1; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index ebb6d8471e8d7..d2ee1ba7ddc65 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1034,8 +1034,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
write->wr_how_written = write->wr_stable_how;
- nvecs = svc_fill_write_vector(rqstp, write->wr_payload.pages, - write->wr_payload.head, write->wr_buflen); + nvecs = svc_fill_write_vector(rqstp, &write->wr_payload); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 78bdfdc253fd3..5a84aed17e705 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -234,8 +234,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->len, argp->offset);
- nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 082449c7d0dbf..ddcc18adfeb1a 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -325,10 +325,7 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; u32 beginoffset, totalcount; - size_t remaining;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) return 0; @@ -346,12 +343,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) return 0; if (args->len > NFSSVC_MAXBLKSIZE_V2) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) + if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) return 0; - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr);
return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index c67ad02b9a028..863a35f24910a 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -33,7 +33,7 @@ struct nfsd_writeargs { svc_fh fh; __u32 offset; __u32 len; - struct kvec first; + struct xdr_buf payload; };
struct nfsd_createargs { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 933008382bbeb..712c117300cb7 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -40,7 +40,7 @@ struct nfsd3_writeargs { __u32 count; int stable; __u32 len; - struct kvec first; + struct xdr_buf payload; };
struct nfsd3_createargs { diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index dd3daadbc0e5c..b0986e969c2f4 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -531,8 +531,7 @@ int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, - struct page **pages, - struct kvec *first, size_t total); + struct xdr_buf *payload); char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 0d3c3ca2830a8..54f66f66beb59 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1654,16 +1654,17 @@ EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on - * @pages: list of pages containing data payload - * @first: buffer containing first section of write payload - * @total: total number of bytes of write payload + * @payload: xdr_buf containing only the write data payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ -unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, - struct kvec *first, size_t total) +unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, + struct xdr_buf *payload) { + struct page **pages = payload->pages; + struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; + size_t total = payload->len; unsigned int i;
/* Some types of transport can present the write payload
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 16c663642c7ec03cd4cee5fec520bb69e97babe4 ]
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR decoder, and can be removed.
Note also that there is a line in each decoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per decoder function.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 3 +-- fs/lockd/xdr.c | 27 +++++++++-------------- fs/lockd/xdr4.c | 27 +++++++++-------------- fs/nfsd/nfs2acl.c | 12 +++++----- fs/nfsd/nfs3acl.c | 8 +++---- fs/nfsd/nfs3xdr.c | 45 +++++++++++++------------------------- fs/nfsd/nfs4xdr.c | 4 ++-- fs/nfsd/nfsd.h | 3 ++- fs/nfsd/nfssvc.c | 7 +++--- fs/nfsd/nfsxdr.c | 30 +++++++++---------------- fs/nfsd/xdr.h | 21 +++++++++--------- fs/nfsd/xdr3.h | 31 +++++++++++++------------- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 19 ++++++++-------- include/linux/lockd/xdr4.h | 19 ++++++++-------- include/linux/sunrpc/svc.h | 3 ++- 16 files changed, 112 insertions(+), 149 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index b632be3ad57b2..9a82471bda071 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -780,11 +780,10 @@ module_exit(exit_nlm); static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *procp = rqstp->rq_procinfo; - struct kvec *argv = rqstp->rq_arg.head; struct kvec *resv = rqstp->rq_res.head;
svcxdr_init_decode(rqstp); - if (!procp->pc_decode(rqstp, argv->iov_base)) + if (!procp->pc_decode(rqstp, &rqstp->rq_arg_stream)) goto out_decode_err;
*statp = procp->pc_func(rqstp); diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 9235e60b17694..895f152221048 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -146,15 +146,14 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) */
int -nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; }
int -nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -171,9 +170,8 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -197,9 +195,8 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -218,9 +215,8 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) @@ -233,9 +229,8 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_res *resp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &resp->cookie)) @@ -247,10 +242,10 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_reboot *argp = rqstp->rq_argp; + __be32 *p; u32 len;
if (xdr_stream_decode_u32(xdr, &len) < 0) @@ -273,9 +268,8 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
@@ -301,9 +295,8 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 98e957e4566c2..573c7d580a5e6 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -145,15 +145,14 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) */
int -nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; }
int -nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -170,9 +169,8 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -196,9 +194,8 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
@@ -216,9 +213,8 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) @@ -231,9 +227,8 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_res *resp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &resp->cookie)) @@ -245,10 +240,10 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_reboot *argp = rqstp->rq_argp; + __be32 *p; u32 len;
if (xdr_stream_decode_u32(xdr, &len) < 0) @@ -271,9 +266,8 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
@@ -299,9 +293,8 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 8703326fc1654..53f793c3606d6 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -186,9 +186,9 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) * XDR decode functions */
-static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &argp->fh)) @@ -199,9 +199,9 @@ static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &argp->fh)) @@ -220,9 +220,9 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 5e13e5f7f92b8..37c8fb184ca4d 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -125,9 +125,9 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) * XDR decode functions */
-static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -138,9 +138,9 @@ static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; }
-static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 267e56f218af7..5f744f03cda7c 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -557,18 +557,16 @@ void fill_post_wcc(struct svc_fh *fhp) */
int -nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp;
return svcxdr_decode_nfs_fh3(xdr, &args->fh); }
int -nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_sattrargs *args = rqstp->rq_argp;
return svcxdr_decode_nfs_fh3(xdr, &args->fh) && @@ -577,18 +575,16 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_diropargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); }
int -nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -600,9 +596,8 @@ nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -616,9 +611,8 @@ nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp);
@@ -649,9 +643,8 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp;
if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) @@ -674,9 +667,8 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs3(xdr, &args->fh, @@ -685,9 +677,8 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; struct kvec *tail = rqstp->rq_arg.tail; @@ -713,9 +704,8 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_mknodargs *args = rqstp->rq_argp;
if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) @@ -742,9 +732,8 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_renameargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs3(xdr, &args->ffh, @@ -754,9 +743,8 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_linkargs *args = rqstp->rq_argp;
return svcxdr_decode_nfs_fh3(xdr, &args->ffh) && @@ -765,9 +753,8 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -784,9 +771,8 @@ nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; u32 dircount;
@@ -807,9 +793,8 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_commitargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 899d961372c06..5ee5081c56637 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5423,14 +5423,14 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) }
int -nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) +nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundargs *args = rqstp->rq_argp;
/* svcxdr_tmp_alloc */ args->to_free = NULL;
- args->xdr = &rqstp->rq_arg_stream; + args->xdr = xdr; args->ops = args->iops; args->rqstp = rqstp;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 9664303afdaf3..6e8ad5f9757c8 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -78,7 +78,8 @@ extern const struct seq_operations nfs_exports_op; */ struct nfsd_voidargs { }; struct nfsd_voidres { }; -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, + struct xdr_stream *xdr); int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p);
/* diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 373695cc62a7a..be1d656548cfe 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1004,7 +1004,6 @@ nfsd(void *vrqstp) int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *proc = rqstp->rq_procinfo; - struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p;
@@ -1015,7 +1014,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) rqstp->rq_cachetype = proc->pc_cachetype;
svcxdr_init_decode(rqstp); - if (!proc->pc_decode(rqstp, argv->iov_base)) + if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream)) goto out_decode_err;
switch (nfsd_cache_lookup(rqstp)) { @@ -1065,13 +1064,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) /** * nfssvc_decode_voidarg - Decode void arguments * @rqstp: Server RPC transaction context - * @p: buffer containing arguments to decode + * @xdr: XDR stream positioned at arguments to decode * * Return values: * %0: Arguments were not valid * %1: Decoding was successful */ -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index ddcc18adfeb1a..08e899180ee43 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -273,18 +273,16 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, */
int -nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp;
return svcxdr_decode_fhandle(xdr, &args->fh); }
int -nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_sattrargs *args = rqstp->rq_argp;
return svcxdr_decode_fhandle(xdr, &args->fh) && @@ -292,18 +290,16 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_diropargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); }
int -nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readargs *args = rqstp->rq_argp; u32 totalcount;
@@ -321,9 +317,8 @@ nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; u32 beginoffset, totalcount;
@@ -350,9 +345,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_createargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs(xdr, &args->fh, @@ -361,9 +355,8 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_renameargs *args = rqstp->rq_argp;
return svcxdr_decode_diropargs(xdr, &args->ffh, @@ -373,9 +366,8 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_linkargs *args = rqstp->rq_argp;
return svcxdr_decode_fhandle(xdr, &args->ffh) && @@ -384,9 +376,8 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head;
@@ -405,9 +396,8 @@ nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readdirargs *args = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 863a35f24910a..19e281382bb98 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -141,16 +141,17 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore)
-int nfssvc_decode_fhandleargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_writeargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_createargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 712c117300cb7..60a8909205e5a 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -265,21 +265,22 @@ union nfsd3_xdrstore {
#define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore)
-int nfs3svc_decode_fhandleargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_accessargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_writeargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_createargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_mkdirargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_mknodargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_linkargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 37af86e370cb3..c3e18efcd23b6 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -756,7 +756,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_compoundargs(struct svc_rqst *, __be32 *); +int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index a98309c0121cb..170ad6f5596a0 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -96,18 +96,19 @@ struct nlm_reboot { */ #define NLMSVC_XDRSIZE sizeof(struct nlm_args)
-int nlmsvc_decode_testargs(struct svc_rqst *, __be32 *); +int nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); -int nlmsvc_decode_lockargs(struct svc_rqst *, __be32 *); -int nlmsvc_decode_cancargs(struct svc_rqst *, __be32 *); -int nlmsvc_decode_unlockargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_res(struct svc_rqst *, __be32 *); -int nlmsvc_decode_res(struct svc_rqst *, __be32 *); int nlmsvc_encode_void(struct svc_rqst *, __be32 *); -int nlmsvc_decode_void(struct svc_rqst *, __be32 *); -int nlmsvc_decode_shareargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); -int nlmsvc_decode_notify(struct svc_rqst *, __be32 *); -int nlmsvc_decode_reboot(struct svc_rqst *, __be32 *);
#endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 5ae766f26e04f..68e14e0f2b1fb 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,21 +22,20 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED)
+int nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
- -int nlm4svc_decode_testargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); -int nlm4svc_decode_lockargs(struct svc_rqst *, __be32 *); -int nlm4svc_decode_cancargs(struct svc_rqst *, __be32 *); -int nlm4svc_decode_unlockargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_res(struct svc_rqst *, __be32 *); -int nlm4svc_decode_res(struct svc_rqst *, __be32 *); int nlm4svc_encode_void(struct svc_rqst *, __be32 *); -int nlm4svc_decode_void(struct svc_rqst *, __be32 *); -int nlm4svc_decode_shareargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); -int nlm4svc_decode_notify(struct svc_rqst *, __be32 *); -int nlm4svc_decode_reboot(struct svc_rqst *, __be32 *);
extern const struct rpc_version nlm_version4;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index b0986e969c2f4..77e3a9b398275 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -457,7 +457,8 @@ struct svc_procedure { /* process the request: */ __be32 (*pc_func)(struct svc_rqst *); /* XDR decode args: */ - int (*pc_decode)(struct svc_rqst *, __be32 *data); + int (*pc_decode)(struct svc_rqst *rqstp, + struct xdr_stream *xdr); /* XDR encode result: */ int (*pc_encode)(struct svc_rqst *, __be32 *data); /* XDR free result: */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c44b31c263798ec34614dd394c31ef1a2e7e716e ]
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having .pc_decode return only true or false.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 96 +++++++++++++++--------------- fs/lockd/xdr4.c | 97 +++++++++++++++--------------- fs/nfsd/nfs2acl.c | 30 +++++----- fs/nfsd/nfs3acl.c | 22 +++---- fs/nfsd/nfs3xdr.c | 118 ++++++++++++++++++------------------- fs/nfsd/nfs4xdr.c | 24 ++++---- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 6 +- fs/nfsd/nfsxdr.c | 62 +++++++++---------- fs/nfsd/xdr.h | 20 +++---- fs/nfsd/xdr3.h | 30 +++++----- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 18 +++--- include/linux/lockd/xdr4.h | 18 +++--- include/linux/sunrpc/svc.h | 2 +- 15 files changed, 274 insertions(+), 273 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 895f152221048..622c2ca37dbfd 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -145,103 +145,103 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) * Decode Call arguments */
-int +bool nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
-int +bool nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK;
- return 1; + return true; }
-int +bool nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; argp->monitor = 1; /* monitor client by default */
- return 1; + return true; }
-int +bool nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK;
- return 1; + return true; }
-int +bool nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; argp->lock.fl.fl_type = F_UNLCK;
- return 1; + return true; }
-int +bool nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_decode_stats(xdr, &resp->status)) - return 0; + return false;
- return 1; + return true; }
-int +bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_reboot *argp = rqstp->rq_argp; @@ -249,25 +249,25 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) u32 len;
if (xdr_stream_decode_u32(xdr, &len) < 0) - return 0; + return false; if (len > SM_MAXSTRLEN) - return 0; + return false; p = xdr_inline_decode(xdr, len); if (!p) - return 0; + return false; argp->len = len; argp->mon = (char *)p; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; p = xdr_inline_decode(xdr, SM_PRIV_SIZE); if (!p) - return 0; + return false; memcpy(&argp->priv.data, p, sizeof(argp->priv.data));
- return 1; + return true; }
-int +bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; @@ -278,34 +278,34 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) lock->svid = ~(u32)0;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return 0; + return false; if (!svcxdr_decode_owner(xdr, &lock->oh)) - return 0; + return false; /* XXX: Range checks are missing in the original code */ if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false;
- return 1; + return true; }
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 573c7d580a5e6..45551dee26b41 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -144,102 +144,103 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) * Decode Call arguments */
-int +bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
-int +bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK;
- return 1; + return true; }
-int +bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; argp->monitor = 1; /* monitor client by default */
- return 1; + return true; }
-int +bool nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return 1; + + return true; }
-int +bool nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; argp->lock.fl.fl_type = F_UNLCK;
- return 1; + return true; }
-int +bool nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_argp;
if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_decode_stats(xdr, &resp->status)) - return 0; + return false;
- return 1; + return true; }
-int +bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_reboot *argp = rqstp->rq_argp; @@ -247,25 +248,25 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) u32 len;
if (xdr_stream_decode_u32(xdr, &len) < 0) - return 0; + return false; if (len > SM_MAXSTRLEN) - return 0; + return false; p = xdr_inline_decode(xdr, len); if (!p) - return 0; + return false; argp->len = len; argp->mon = (char *)p; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; p = xdr_inline_decode(xdr, SM_PRIV_SIZE); if (!p) - return 0; + return false; memcpy(&argp->priv.data, p, sizeof(argp->priv.data));
- return 1; + return true; }
-int +bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; @@ -276,34 +277,34 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) lock->svid = ~(u32)0;
if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return 0; + return false; if (!svcxdr_decode_owner(xdr, &lock->oh)) - return 0; + return false; /* XXX: Range checks are missing in the original code */ if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock;
if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false;
- return 1; + return true; }
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 53f793c3606d6..eb6a89baf675f 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -186,51 +186,51 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) * XDR decode functions */
-static int +static bool nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false;
- return 1; + return true; }
-static int +static bool nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_setaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false; if (argp->mask & ~NFS_ACL_MASK) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? &argp->acl_access : NULL)) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? &argp->acl_default : NULL)) - return 0; + return false;
- return 1; + return true; }
-static int +static bool nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessargs *args = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return 0; + return false;
- return 1; + return true; }
/* diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 37c8fb184ca4d..f60b072ce9ecc 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -125,38 +125,38 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) * XDR decode functions */
-static int +static bool nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->mask) < 0) - return 0; + return false;
- return 1; + return true; }
-static int +static bool nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_setaclargs *argp = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false; if (argp->mask & ~NFS_ACL_MASK) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? &argp->acl_access : NULL)) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? &argp->acl_default : NULL)) - return 0; + return false;
- return 1; + return true; }
/* diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 5f744f03cda7c..1f3de46d24d46 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -556,7 +556,7 @@ void fill_post_wcc(struct svc_fh *fhp) * XDR decode functions */
-int +bool nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_fhandle *args = rqstp->rq_argp; @@ -564,7 +564,7 @@ nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_nfs_fh3(xdr, &args->fh); }
-int +bool nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_sattrargs *args = rqstp->rq_argp; @@ -574,7 +574,7 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattrguard3(xdr, args); }
-int +bool nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropargs *args = rqstp->rq_argp; @@ -582,75 +582,75 @@ nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); }
-int +bool nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp);
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->stable) < 0) - return 0; + return false;
/* opaque data */ if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return 0; + return false;
/* request sanity */ if (args->count != args->len) - return 0; + return false; if (args->count > max_blocksize) { args->count = max_blocksize; args->len = max_blocksize; } if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) - return 0; + return false;
- return 1; + return true; }
-int +bool nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_createargs *args = rqstp->rq_argp;
if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->createmode) < 0) - return 0; + return false; switch (args->createmode) { case NFS3_CREATE_UNCHECKED: case NFS3_CREATE_GUARDED: @@ -658,15 +658,15 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) case NFS3_CREATE_EXCLUSIVE: args->verf = xdr_inline_decode(xdr, NFS3_CREATEVERFSIZE); if (!args->verf) - return 0; + return false; break; default: - return 0; + return false; } - return 1; + return true; }
-int +bool nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_createargs *args = rqstp->rq_argp; @@ -676,7 +676,7 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); }
-int +bool nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_symlinkargs *args = rqstp->rq_argp; @@ -685,33 +685,33 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) size_t remaining;
if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) - return 0; + return false; if (!svcxdr_decode_sattr3(rqstp, xdr, &args->attrs)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return 0; + return false;
/* request sanity */ remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; remaining -= xdr_stream_pos(xdr); if (remaining < xdr_align_size(args->tlen)) - return 0; + return false;
args->first.iov_base = xdr->p; args->first.iov_len = head->iov_len - xdr_stream_pos(xdr);
- return 1; + return true; }
-int +bool nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_mknodargs *args = rqstp->rq_argp;
if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->ftype) < 0) - return 0; + return false; switch (args->ftype) { case NF3CHR: case NF3BLK: @@ -725,13 +725,13 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) /* Valid XDR but illegal file types */ break; default: - return 0; + return false; }
- return 1; + return true; }
-int +bool nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_renameargs *args = rqstp->rq_argp; @@ -742,7 +742,7 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); }
-int +bool nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_linkargs *args = rqstp->rq_argp; @@ -752,59 +752,59 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); }
-int +bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return 0; + return false; args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); if (!args->verf) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirargs *args = rqstp->rq_argp; u32 dircount;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return 0; + return false; args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); if (!args->verf) - return 0; + return false; /* dircount is ignored */ if (xdr_stream_decode_u32(xdr, &dircount) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_commitargs *args = rqstp->rq_argp;
if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false;
- return 1; + return true; }
/* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5ee5081c56637..1b33c1c93e883 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2322,7 +2322,7 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op) return true; }
-static int +static bool nfsd4_decode_compound(struct nfsd4_compoundargs *argp) { struct nfsd4_op *op; @@ -2335,25 +2335,25 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) int i;
if (xdr_stream_decode_u32(argp->xdr, &argp->taglen) < 0) - return 0; + return false; max_reply += XDR_UNIT; argp->tag = NULL; if (unlikely(argp->taglen)) { if (argp->taglen > NFSD4_MAX_TAGLEN) - return 0; + return false; p = xdr_inline_decode(argp->xdr, argp->taglen); if (!p) - return 0; + return false; argp->tag = svcxdr_savemem(argp, p, argp->taglen); if (!argp->tag) - return 0; + return false; max_reply += xdr_align_size(argp->taglen); }
if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) - return 0; + return false; if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) - return 0; + return false;
/* * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS @@ -2361,14 +2361,14 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) * nfsd4_proc can handle this is an NFS-level error. */ if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return 1; + return true;
if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; dprintk("nfsd: couldn't allocate room for COMPOUND\n"); - return 0; + return false; } }
@@ -2380,7 +2380,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->replay = NULL;
if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) - return 0; + return false; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -2427,7 +2427,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags);
- return 1; + return true; }
static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, @@ -5422,7 +5422,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } }
-int +bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundargs *args = rqstp->rq_argp; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 6e8ad5f9757c8..bfcddd4c75345 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -78,7 +78,7 @@ extern const struct seq_operations nfs_exports_op; */ struct nfsd_voidargs { }; struct nfsd_voidres { }; -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, +bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index be1d656548cfe..00aadc2635032 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1067,10 +1067,10 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * @xdr: XDR stream positioned at arguments to decode * * Return values: - * %0: Arguments were not valid - * %1: Decoding was successful + * %false: Arguments were not valid + * %true: Decoding was successful */ -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) +bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 08e899180ee43..b5817a41b3de6 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -272,7 +272,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, * XDR decode functions */
-int +bool nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_fhandle *args = rqstp->rq_argp; @@ -280,7 +280,7 @@ nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_fhandle(xdr, &args->fh); }
-int +bool nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_sattrargs *args = rqstp->rq_argp; @@ -289,7 +289,7 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
-int +bool nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_diropargs *args = rqstp->rq_argp; @@ -297,54 +297,54 @@ nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); }
-int +bool nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readargs *args = rqstp->rq_argp; u32 totalcount;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; /* totalcount is ignored */ if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return 0; + return false;
- return 1; + return true; }
-int +bool nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_writeargs *args = rqstp->rq_argp; u32 beginoffset, totalcount;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; /* beginoffset is ignored */ if (xdr_stream_decode_u32(xdr, &beginoffset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return 0; + return false; /* totalcount is ignored */ if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return 0; + return false;
/* opaque data */ if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return 0; + return false; if (args->len > NFSSVC_MAXBLKSIZE_V2) - return 0; + return false; if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) - return 0; + return false;
- return 1; + return true; }
-int +bool nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_createargs *args = rqstp->rq_argp; @@ -354,7 +354,7 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
-int +bool nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_renameargs *args = rqstp->rq_argp; @@ -365,7 +365,7 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); }
-int +bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_linkargs *args = rqstp->rq_argp; @@ -375,39 +375,39 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); }
-int +bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head;
if (!svcxdr_decode_diropargs(xdr, &args->ffh, &args->fname, &args->flen)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return 0; + return false; if (args->tlen == 0) - return 0; + return false;
args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); args->first.iov_base = xdr_inline_decode(xdr, args->tlen); if (!args->first.iov_base) - return 0; + return false; return svcxdr_decode_sattr(rqstp, xdr, &args->attrs); }
-int +bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readdirargs *args = rqstp->rq_argp;
if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->cookie) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false;
- return 1; + return true; }
/* diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 19e281382bb98..d897c198c9126 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -141,16 +141,16 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore)
-int nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 60a8909205e5a..ef72bc4868da6 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -265,21 +265,21 @@ union nfsd3_xdrstore {
#define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore)
-int nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c3e18efcd23b6..8f349640d2e97 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -756,7 +756,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 170ad6f5596a0..e1362244f909b 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -96,15 +96,15 @@ struct nlm_reboot { */ #define NLMSVC_XDRSIZE sizeof(struct nlm_args)
-int nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); int nlmsvc_encode_res(struct svc_rqst *, __be32 *); diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 68e14e0f2b1fb..376b8f6a3763a 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,15 +22,15 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED)
-int nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); int nlm4svc_encode_res(struct svc_rqst *, __be32 *); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 77e3a9b398275..85a9884b10743 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -457,7 +457,7 @@ struct svc_procedure { /* process the request: */ __be32 (*pc_func)(struct svc_rqst *); /* XDR decode args: */ - int (*pc_decode)(struct svc_rqst *rqstp, + bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ int (*pc_encode)(struct svc_rqst *, __be32 *data);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3b0ebb255fdc49a3d340846deebf045ef58ec744 ]
Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS dispatcher to pass in the buffer location of the COMPOUND status. Instead, save that buffer location in struct nfsd4_compoundres.
The compound tag follows immediately after.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 9 +++++++-- fs/nfsd/xdr4.h | 3 ++- 3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d2ee1ba7ddc65..52f3f35533791 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2456,11 +2456,11 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) __be32 status;
resp->xdr = &rqstp->rq_res_stream; + resp->statusp = resp->xdr->p;
/* reserve space for: NFS status code */ xdr_reserve_space(resp->xdr, XDR_UNIT);
- resp->tagp = resp->xdr->p; /* reserve space for: taglen, tag, and opcnt */ xdr_reserve_space(resp->xdr, XDR_UNIT * 2 + args->taglen); resp->taglen = args->taglen; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1b33c1c93e883..3f8d23586ea79 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5446,11 +5446,16 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len);
- *p = resp->cstate.status; + /* + * Send buffer space for the following items is reserved + * at the top of nfsd4_proc_compound(). + */ + p = resp->statusp; + + *p++ = resp->cstate.status;
rqstp->rq_next_page = resp->xdr->page_ptr + 1;
- p = resp->tagp; *p++ = htonl(resp->taglen); memcpy(p, resp->tag, resp->taglen); p += XDR_QUADLEN(resp->taglen); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8f349640d2e97..8e11dfdc2563a 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -702,10 +702,11 @@ struct nfsd4_compoundres { struct xdr_stream *xdr; struct svc_rqst * rqstp;
+ __be32 *statusp; u32 taglen; char * tag; u32 opcnt; - __be32 * tagp; /* tag, opcount encode location */ + struct nfsd4_compound_state cstate; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fda494411485aff91768842c532f90fb8eb54943 ]
The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR encoder, and can be removed.
Note also that there is a line in each encoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per encoder function.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 3 +-- fs/lockd/xdr.c | 11 ++++----- fs/lockd/xdr4.c | 11 ++++----- fs/nfs/callback_xdr.c | 4 ++-- fs/nfsd/nfs2acl.c | 8 +++---- fs/nfsd/nfs3acl.c | 8 +++---- fs/nfsd/nfs3xdr.c | 46 +++++++++++++------------------------- fs/nfsd/nfs4xdr.c | 7 +++--- fs/nfsd/nfsd.h | 3 ++- fs/nfsd/nfssvc.c | 9 +++----- fs/nfsd/nfsxdr.c | 22 +++++++----------- fs/nfsd/xdr.h | 14 ++++++------ fs/nfsd/xdr3.h | 30 ++++++++++++------------- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 8 +++---- include/linux/lockd/xdr4.h | 8 +++---- include/linux/sunrpc/svc.h | 3 ++- 17 files changed, 85 insertions(+), 112 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 9a82471bda071..b220e1b917268 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -780,7 +780,6 @@ module_exit(exit_nlm); static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *procp = rqstp->rq_procinfo; - struct kvec *resv = rqstp->rq_res.head;
svcxdr_init_decode(rqstp); if (!procp->pc_decode(rqstp, &rqstp->rq_arg_stream)) @@ -793,7 +792,7 @@ static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) return 1;
svcxdr_init_encode(rqstp); - if (!procp->pc_encode(rqstp, resv->iov_base + resv->iov_len)) + if (!procp->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err;
return 1; diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 622c2ca37dbfd..2595b4d14cd44 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -314,15 +314,14 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
int -nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; }
int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -330,9 +329,8 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -340,9 +338,8 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) }
int -nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
if (!svcxdr_encode_cookie(xdr, &resp->cookie)) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 45551dee26b41..32231c21c22dd 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -313,15 +313,14 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
int -nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; }
int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -329,9 +328,8 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -339,9 +337,8 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) }
int -nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp;
if (!svcxdr_encode_cookie(xdr, &resp->cookie)) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 600e640682401..c58b3b60bc2c0 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -67,9 +67,9 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) * svc_process_common() looks for an XDR encoder to know when * not to drop a Reply. */ -static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p) +static int nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return xdr_ressize_check(rqstp, p); + return 1; }
static __be32 decode_string(struct xdr_stream *xdr, unsigned int *len, diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index eb6a89baf675f..23e4c3eb2c381 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -238,9 +238,9 @@ nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
/* GETACL */ -static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; @@ -278,9 +278,9 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) }
/* ACCESS */ -static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index f60b072ce9ecc..21aacba88ea3e 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -164,9 +164,9 @@ nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
/* GETACL */ -static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct kvec *head = rqstp->rq_res.head; @@ -216,9 +216,9 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) }
/* SETACL */ -static int nfs3svc_encode_setaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_encode_setaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
return svcxdr_encode_nfsstat3(xdr, resp->status) && diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1f3de46d24d46..63f0be4e44f70 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -813,9 +813,8 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
/* GETATTR */ int -nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -833,9 +832,8 @@ nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p)
/* SETATTR, REMOVE, RMDIR */ int -nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp;
return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -843,9 +841,9 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) }
/* LOOKUP */ -int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p) +int +nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -869,9 +867,8 @@ int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p)
/* ACCESS */ int -nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -893,9 +890,8 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p)
/* READLINK */ int -nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
@@ -921,9 +917,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p)
/* READ */ int -nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
@@ -954,9 +949,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p)
/* WRITE */ int -nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_writeres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -982,9 +976,8 @@ nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p)
/* CREATE, MKDIR, SYMLINK, MKNOD */ int -nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1008,9 +1001,8 @@ nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p)
/* RENAME */ int -nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_renameres *resp = rqstp->rq_resp;
return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -1020,9 +1012,8 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p)
/* LINK */ int -nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_linkres *resp = rqstp->rq_resp;
return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -1032,9 +1023,8 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p)
/* READDIR */ int -nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist;
@@ -1286,9 +1276,8 @@ svcxdr_encode_fsstat3resok(struct xdr_stream *xdr,
/* FSSTAT */ int -nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsstatres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1333,9 +1322,8 @@ svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr,
/* FSINFO */ int -nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsinfores *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1376,9 +1364,8 @@ svcxdr_encode_pathconf3resok(struct xdr_stream *xdr,
/* PATHCONF */ int -nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_pathconfres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1400,9 +1387,8 @@ nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p)
/* COMMIT */ int -nfs3svc_encode_commitres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_commitres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3f8d23586ea79..697d61819a59d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5438,10 +5438,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) }
int -nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) +nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = resp->xdr->buf; + struct xdr_buf *buf = xdr->buf; + __be32 *p;
WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len); @@ -5454,7 +5455,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p)
*p++ = resp->cstate.status;
- rqstp->rq_next_page = resp->xdr->page_ptr + 1; + rqstp->rq_next_page = xdr->page_ptr + 1;
*p++ = htonl(resp->taglen); memcpy(p, resp->tag, resp->taglen); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index bfcddd4c75345..345f8247d5da9 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -80,7 +80,8 @@ struct nfsd_voidargs { }; struct nfsd_voidres { }; bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_encode_voidres(struct svc_rqst *rqstp, + struct xdr_stream *xdr);
/* * Function prototypes. diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 00aadc2635032..195f2bcc65384 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1004,8 +1004,6 @@ nfsd(void *vrqstp) int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *proc = rqstp->rq_procinfo; - struct kvec *resv = &rqstp->rq_res.head[0]; - __be32 *p;
/* * Give the xdr decoder a chance to change this if it wants @@ -1030,14 +1028,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * Need to grab the location to store the status, as * NFSv4 does some encoding while processing */ - p = resv->iov_base + resv->iov_len; svcxdr_init_encode(rqstp);
*statp = proc->pc_func(rqstp); if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop;
- if (!proc->pc_encode(rqstp, p)) + if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err;
nfsd_cache_update(rqstp, rqstp->rq_cachetype, statp + 1); @@ -1078,13 +1075,13 @@ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) /** * nfssvc_encode_voidres - Encode void results * @rqstp: Server RPC transaction context - * @p: buffer in which to encode results + * @xdr: XDR stream into which to encode results * * Return values: * %0: Local error while encoding * %1: Encoding was successful */ -int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +int nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index b5817a41b3de6..6aa8138ae2f7d 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -415,18 +415,16 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
int -nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_stat *resp = rqstp->rq_resp;
return svcxdr_encode_stat(xdr, resp->status); }
int -nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_attrstat *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) @@ -442,9 +440,8 @@ nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) @@ -462,9 +459,8 @@ nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
@@ -484,9 +480,8 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
@@ -509,9 +504,8 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist;
@@ -532,11 +526,11 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) }
int -nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_statfsres *resp = rqstp->rq_resp; struct kstatfs *stat = &resp->stats; + __be32 *p;
if (!svcxdr_encode_stat(xdr, resp->status)) return 0; diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index d897c198c9126..bff7258041fc4 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -152,13 +152,13 @@ bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nfssvc_encode_statres(struct svc_rqst *, __be32 *); -int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); -int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readres(struct svc_rqst *, __be32 *); -int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); +int nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *data, const char *name, int namlen, diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index ef72bc4868da6..bb017fc7cba19 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -281,21 +281,21 @@ bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); -int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readlinkres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_writeres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_createres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_renameres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_linkres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readdirres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_fsstatres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_fsinfores(struct svc_rqst *, __be32 *); -int nfs3svc_encode_pathconfres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_commitres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8e11dfdc2563a..5b343d0c6963a 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -758,7 +758,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); +int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index e1362244f909b..d8bd26a5525ef 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -106,9 +106,9 @@ bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); -int nlmsvc_encode_res(struct svc_rqst *, __be32 *); -int nlmsvc_encode_void(struct svc_rqst *, __be32 *); -int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); +int nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
#endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 376b8f6a3763a..50677be3557dc 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -32,10 +32,10 @@ bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); -int nlm4svc_encode_res(struct svc_rqst *, __be32 *); -int nlm4svc_encode_void(struct svc_rqst *, __be32 *); -int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); +int nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
extern const struct rpc_version nlm_version4;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 85a9884b10743..c6a0b4364f4a2 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -460,7 +460,8 @@ struct svc_procedure { bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ - int (*pc_encode)(struct svc_rqst *, __be32 *data); + int (*pc_encode)(struct svc_rqst *rqstp, + struct xdr_stream *xdr); /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 130e2054d4a652a2bd79fb1557ddcd19c053cb37 ]
Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length.
Document there are only two valid return values by having .pc_encode return only true or false.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/xdr.c | 18 ++-- fs/lockd/xdr4.c | 18 ++-- fs/nfs/callback_xdr.c | 4 +- fs/nfsd/nfs2acl.c | 4 +- fs/nfsd/nfs3acl.c | 18 ++-- fs/nfsd/nfs3xdr.c | 166 ++++++++++++++++++------------------- fs/nfsd/nfs4xdr.c | 4 +- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 8 +- fs/nfsd/nfsxdr.c | 60 +++++++------- fs/nfsd/xdr.h | 14 ++-- fs/nfsd/xdr3.h | 30 +++---- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 8 +- include/linux/lockd/xdr4.h | 8 +- include/linux/sunrpc/svc.h | 2 +- 16 files changed, 183 insertions(+), 183 deletions(-)
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 2595b4d14cd44..2fb5748dae0c8 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -313,13 +313,13 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) * Encode Reply results */
-int +bool nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
-int +bool nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -328,7 +328,7 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_testrply(xdr, resp); }
-int +bool nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -337,18 +337,18 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_stats(xdr, resp->status); }
-int +bool nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp;
if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_encode_stats(xdr, resp->status)) - return 0; + return false; /* sequence */ if (xdr_stream_encode_u32(xdr, 0) < 0) - return 0; + return false;
- return 1; + return true; } diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 32231c21c22dd..856267c0864bd 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -312,13 +312,13 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) * Encode Reply results */
-int +bool nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
-int +bool nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -327,7 +327,7 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_testrply(xdr, resp); }
-int +bool nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -336,18 +336,18 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_stats(xdr, resp->status); }
-int +bool nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp;
if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_encode_stats(xdr, resp->status)) - return 0; + return false; /* sequence */ if (xdr_stream_encode_u32(xdr, 0) < 0) - return 0; + return false;
- return 1; + return true; } diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index c58b3b60bc2c0..1b21f88a9808c 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -67,9 +67,9 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) * svc_process_common() looks for an XDR encoder to know when * not to drop a Reply. */ -static int nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static bool nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
static __be32 decode_string(struct xdr_stream *xdr, unsigned int *len, diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 23e4c3eb2c381..96733dff354d3 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -238,7 +238,7 @@ nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
/* GETACL */ -static int +static bool nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; @@ -278,7 +278,7 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) }
/* ACCESS */ -static int +static bool nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessres *resp = rqstp->rq_resp; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 21aacba88ea3e..350fae92ae045 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -164,7 +164,7 @@ nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
/* GETACL */ -static int +static bool nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; @@ -176,14 +176,14 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) int w;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: inode = d_inode(dentry); if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return 0; + return false;
base = (char *)xdr->p - (char *)head->iov_base;
@@ -192,7 +192,7 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 0; + return false; w -= PAGE_SIZE; }
@@ -205,18 +205,18 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) resp->mask & NFS_DFACL, NFS_ACL_DEFAULT); if (n <= 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* SETACL */ -static int +static bool nfs3svc_encode_setaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 63f0be4e44f70..c3ac1b6aa3aaa 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -812,26 +812,26 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */
/* GETATTR */ -int +bool nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: lease_get_mtime(d_inode(resp->fh.fh_dentry), &resp->stat.mtime); if (!svcxdr_encode_fattr3(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; }
- return 1; + return true; }
/* SETATTR, REMOVE, RMDIR */ -int +bool nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp; @@ -841,166 +841,166 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) }
/* LOOKUP */ -int +bool nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return 0; + return false; }
- return 1; + return true; }
/* ACCESS */ -int +bool nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* READLINK */ -int +bool nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, 0, resp->len); if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* READ */ -int +bool nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->eof) < 0) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, resp->count); if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* WRITE */ -int +bool nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_writeres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->committed) < 0) - return 0; + return false; if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* CREATE, MKDIR, SYMLINK, MKNOD */ -int +bool nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_fh3(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return 0; + return false; }
- return 1; + return true; }
/* RENAME */ -int +bool nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_renameres *resp = rqstp->rq_resp; @@ -1011,7 +1011,7 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) }
/* LINK */ -int +bool nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_linkres *resp = rqstp->rq_resp; @@ -1022,33 +1022,33 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) }
/* READDIR */ -int +bool nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) - return 0; + return false; xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
static __be32 @@ -1275,26 +1275,26 @@ svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, }
/* FSSTAT */ -int +bool nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_fsstatres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_fsstat3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; }
- return 1; + return true; }
static bool @@ -1321,26 +1321,26 @@ svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, }
/* FSINFO */ -int +bool nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_fsinfores *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_fsinfo3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; }
- return 1; + return true; }
static bool @@ -1363,49 +1363,49 @@ svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, }
/* PATHCONF */ -int +bool nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_pathconfres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_pathconf3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; }
- return 1; + return true; }
/* COMMIT */ -int +bool nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_commitres *resp = rqstp->rq_resp;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; }
- return 1; + return true; }
/* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 697d61819a59d..ba2ed12df2060 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5437,7 +5437,7 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return nfsd4_decode_compound(args); }
-int +bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; @@ -5463,7 +5463,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) *p++ = htonl(resp->opcnt);
nfsd4_sequence_done(resp); - return 1; + return true; }
/* diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 345f8247d5da9..498e5a4898260 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -80,7 +80,7 @@ struct nfsd_voidargs { }; struct nfsd_voidres { }; bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_voidres(struct svc_rqst *rqstp, +bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
/* diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 195f2bcc65384..7df1505425edc 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1078,12 +1078,12 @@ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) * @xdr: XDR stream into which to encode results * * Return values: - * %0: Local error while encoding - * %1: Encoding was successful + * %false: Local error while encoding + * %true: Encoding was successful */ -int nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
int nfsd_pool_stats_open(struct inode *inode, struct file *file) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 6aa8138ae2f7d..aba8520b4b8b6 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -414,7 +414,7 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) * XDR encode functions */
-int +bool nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_stat *resp = rqstp->rq_resp; @@ -422,110 +422,110 @@ nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_encode_stat(xdr, resp->status); }
-int +bool nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_attrstat *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; }
- return 1; + return true; }
-int +bool nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_diropres *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fhandle(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; }
- return 1; + return true; }
-int +bool nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return 0; + return false; xdr_write_pages(xdr, &resp->page, 0, resp->len); if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return 0; + return false; break; }
- return 1; + return true; }
-int +bool nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, resp->count); if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return 0; + return false; break; }
- return 1; + return true; }
-int +bool nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return 0; + return false; break; }
- return 1; + return true; }
-int +bool nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_statfsres *resp = rqstp->rq_resp; @@ -533,12 +533,12 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) __be32 *p;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: p = xdr_reserve_space(xdr, XDR_UNIT * 5); if (!p) - return 0; + return false; *p++ = cpu_to_be32(NFSSVC_MAXBLKSIZE_V2); *p++ = cpu_to_be32(stat->f_bsize); *p++ = cpu_to_be32(stat->f_blocks); @@ -547,7 +547,7 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) break; }
- return 1; + return true; }
/** diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index bff7258041fc4..852f71580bd06 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -152,13 +152,13 @@ bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *data, const char *name, int namlen, diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index bb017fc7cba19..03fe4e21306cb 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -281,21 +281,21 @@ bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 5b343d0c6963a..4a298ac5515df 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -758,7 +758,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index d8bd26a5525ef..398f70093cd35 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -106,9 +106,9 @@ bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
#endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 50677be3557dc..9a6b55da8fd64 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -32,10 +32,10 @@ bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr);
-int nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr);
extern const struct rpc_version nlm_version4;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c6a0b4364f4a2..15ecd5b1abacf 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -460,7 +460,7 @@ struct svc_procedure { bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ - int (*pc_encode)(struct svc_rqst *rqstp, + bool (*pc_encode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR free result: */ void (*pc_release)(struct svc_rqst *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 2336d696862186fd4a6ddd1ea0cb243b3e32847c ]
I don't know if that Solaris behavior matters any more or if it's still possible to look up that bug ID any more. The XFS behavior's definitely still relevant, though; any but the most recent XFS filesystems will lose the top bits.
Reported-by: Frank S. Filz ffilzlnx@mindspring.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 05b5f7e241e70..5b0abdf8de27e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1431,7 +1431,8 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (nfsd_create_is_exclusive(createmode)) { /* solaris7 gets confused (bugid 4218508) if these have - * the high bit set, so just clear the high bits. If this is + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. If this is * ever changed to use different attrs for storing the * verifier, then do_open_lookup() will also need to be fixed * accordingly.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Changcheng Deng deng.changcheng@zte.com.cn
[ Upstream commit 291cd656da04163f4bba67953c1f2f823e0d1231 ]
./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function 'nfssvc_decode_voidarg' with return type bool
Return statements in functions returning bool should use true/false instead of 1/0.
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Changcheng Deng deng.changcheng@zte.com.cn Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 7df1505425edc..408cff8fe32d3 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1069,7 +1069,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) */ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; }
/**
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 80479eb862102f9513e93fcf726c78cc0be2e3b2 ]
Mandatory locking has been removed. And the rest of this comment is redundant with the code.
Reported-by: Jeff layton jlayton@kernel.org Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5b0abdf8de27e..deaf4a50550d5 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -751,9 +751,6 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, path.dentry = fhp->fh_dentry; inode = d_inode(path.dentry);
- /* Disallow write access to files with the append-only bit set - * or any access when mandatory locking enabled - */ err = nfserr_perm; if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE)) goto out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c0019b7db1d7ac62c711cda6b357a659d46428fe ]
rtm@csail.mit.edu reports:
nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC directs it to do so. This can cause nfsd4_decode_state_protect4_a() to write client-supplied data beyond the end of nfsd4_exchange_id.spo_must_allow[] when called by nfsd4_decode_exchange_id().
Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond @bmlen.
Reported by: rtm@csail.mit.edu Fixes: d1c263a031e8 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ba2ed12df2060..d9758232a398c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -288,11 +288,8 @@ nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) p = xdr_inline_decode(argp->xdr, count << 2); if (!p) return nfserr_bad_xdr; - i = 0; - while (i < count) - bmval[i++] = be32_to_cpup(p++); - while (i < bmlen) - bmval[i++] = 0; + for (i = 0; i < bmlen; i++) + bmval[i] = (i < count) ? be32_to_cpup(p++) : 0;
return nfs_ok; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 53b1119a6e5028b125f431a0116ba73510d82a72 ]
If a client sends a READDIR count argument that is too small (say, zero), then the buffer size calculation in the new init_dirlist helper functions results in an underflow, allowing the XDR stream functions to write beyond the actual buffer.
This calculation has always been suspect. NFSD has never sanity- checked the READDIR count argument, but the old entry encoders managed the problem correctly.
With the commits below, entry encoding changed, exposing the underflow to the pointer arithmetic in xdr_reserve_space().
Modern NFS clients attempt to retrieve as much data as possible for each READDIR request. Also, we have no unit tests that exercise the behavior of READDIR at the lower bound of @count values. Thus this case was missed during testing.
Reported-by: Anatoly Trosinenko anatoly.trosinenko@gmail.com Fixes: f5dcccd647da ("NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream") Fixes: 7f87fc2d34d4 ("NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 11 ++++------- fs/nfsd/nfsproc.c | 8 ++++---- 2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 5abb5c8e2cd21..d26271c60ab44 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -443,22 +443,19 @@ nfsd3_proc_link(struct svc_rqst *rqstp)
static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd3_readdirres *resp, - int count) + u32 count) { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr;
- count = min_t(u32, count, svc_max_payload(rqstp)); + count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp));
memset(buf, 0, sizeof(*buf));
/* Reserve room for the NULL ptr & eof flag (-2 words) */ buf->buflen = count - XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; - while (count > 0) { - rqstp->rq_next_page++; - count -= PAGE_SIZE; - } + rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
/* This is xdr_init_encode(), but it assumes that * the head kvec has already been consumed. */ @@ -467,7 +464,7 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, xdr->page_ptr = buf->pages; xdr->iov = NULL; xdr->p = page_address(*buf->pages); - xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); xdr->rqst = NULL; }
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 5a84aed17e705..8ee329bc3815d 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -556,17 +556,17 @@ nfsd_proc_rmdir(struct svc_rqst *rqstp)
static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd_readdirres *resp, - int count) + u32 count) { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr;
- count = min_t(u32, count, PAGE_SIZE); + count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp));
memset(buf, 0, sizeof(*buf));
/* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - sizeof(__be32) * 2; + buf->buflen = count - XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++;
@@ -577,7 +577,7 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, xdr->page_ptr = buf->pages; xdr->iov = NULL; xdr->p = page_address(*buf->pages); - xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); xdr->rqst = NULL; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit ad69cd9972e79aba103ba5365de0acd35770c265 ]
In preparation for separating object type from iterator type, rename some 'type' arguments in functions to 'obj_type' and remove the unused interface to clear marks by object type mask.
Link: https://lore.kernel.org/r/20211129201537.1932819-2-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 8 ++++---- fs/notify/group.c | 2 +- fs/notify/mark.c | 27 +++++++++++++++------------ include/linux/fsnotify_backend.h | 28 ++++++++++++---------------- 4 files changed, 32 insertions(+), 33 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index e6860c55b3dcb..2f78999a7aa3d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1052,7 +1052,7 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, - unsigned int type, + unsigned int obj_type, __kernel_fsid_t *fsid) { struct ucounts *ucounts = group->fanotify_data.ucounts; @@ -1075,7 +1075,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, }
fsnotify_init_mark(mark, group); - ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid); if (ret) { fsnotify_put_mark(mark); goto out_dec_ucounts; @@ -1100,7 +1100,7 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group) }
static int fanotify_add_mark(struct fsnotify_group *group, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, __u32 mask, unsigned int flags, __kernel_fsid_t *fsid) { @@ -1111,7 +1111,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsn_mark = fanotify_add_new_mark(group, connp, type, fsid); + fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid); if (IS_ERR(fsn_mark)) { mutex_unlock(&group->mark_mutex); return PTR_ERR(fsn_mark); diff --git a/fs/notify/group.c b/fs/notify/group.c index 6a297efc47887..b7d4d64f87c29 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -58,7 +58,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) fsnotify_group_stop_queueing(group);
/* Clear all marks for this group and queue them for destruction */ - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_ALL_TYPES_MASK); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_ANY);
/* * Some marks can still be pinned when waiting for response from diff --git a/fs/notify/mark.c b/fs/notify/mark.c index bea106fac0901..7c0946e16918a 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -496,7 +496,7 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b) }
static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, - unsigned int type, + unsigned int obj_type, __kernel_fsid_t *fsid) { struct inode *inode = NULL; @@ -507,7 +507,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, return -ENOMEM; spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); - conn->type = type; + conn->type = obj_type; conn->obj = connp; /* Cache fsid of filesystem containing the object */ if (fsid) { @@ -572,7 +572,8 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( * priority, highest number first, and then by the group's location in memory. */ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, + unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_mark *lmark, *last = NULL; @@ -580,7 +581,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, int cmp; int err = 0;
- if (WARN_ON(!fsnotify_valid_obj_type(type))) + if (WARN_ON(!fsnotify_valid_obj_type(obj_type))) return -EINVAL;
/* Backend is expected to check for zero fsid (e.g. tmpfs) */ @@ -592,7 +593,8 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, conn = fsnotify_grab_connector(connp); if (!conn) { spin_unlock(&mark->lock); - err = fsnotify_attach_connector_to_object(connp, type, fsid); + err = fsnotify_attach_connector_to_object(connp, obj_type, + fsid); if (err) return err; goto restart; @@ -665,7 +667,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, * event types should be delivered to which group. */ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_group *group = mark->group; @@ -686,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock);
- ret = fsnotify_add_mark_list(mark, connp, type, allow_dups, fsid); + ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid); if (ret) goto err;
@@ -706,13 +708,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, }
int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int type, int allow_dups, __kernel_fsid_t *fsid) + unsigned int obj_type, int allow_dups, + __kernel_fsid_t *fsid) { int ret; struct fsnotify_group *group = mark->group;
mutex_lock(&group->mark_mutex); - ret = fsnotify_add_mark_locked(mark, connp, type, allow_dups, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid); mutex_unlock(&group->mark_mutex); return ret; } @@ -747,14 +750,14 @@ EXPORT_SYMBOL_GPL(fsnotify_find_mark);
/* Clear any marks in a group with given type mask */ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, - unsigned int type_mask) + unsigned int obj_type) { struct fsnotify_mark *lmark, *mark; LIST_HEAD(to_free); struct list_head *head = &to_free;
/* Skip selection step if we want to clear all marks. */ - if (type_mask == FSNOTIFY_OBJ_ALL_TYPES_MASK) { + if (obj_type == FSNOTIFY_OBJ_TYPE_ANY) { head = &group->marks_list; goto clear; } @@ -769,7 +772,7 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, */ mutex_lock(&group->mark_mutex); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { - if ((1U << mark->connector->type) & type_mask) + if (mark->connector->type == obj_type) list_move(&mark->g_list, &to_free); } mutex_unlock(&group->mark_mutex); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 51ef2b079bfa0..b9c84b1dbcc8f 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -338,6 +338,7 @@ static inline struct fs_error_report *fsnotify_data_error_report( }
enum fsnotify_obj_type { + FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, FSNOTIFY_OBJ_TYPE_PARENT, FSNOTIFY_OBJ_TYPE_VFSMOUNT, @@ -346,15 +347,9 @@ enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT };
-#define FSNOTIFY_OBJ_TYPE_INODE_FL (1U << FSNOTIFY_OBJ_TYPE_INODE) -#define FSNOTIFY_OBJ_TYPE_PARENT_FL (1U << FSNOTIFY_OBJ_TYPE_PARENT) -#define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL (1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT) -#define FSNOTIFY_OBJ_TYPE_SB_FL (1U << FSNOTIFY_OBJ_TYPE_SB) -#define FSNOTIFY_OBJ_ALL_TYPES_MASK ((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1) - -static inline bool fsnotify_valid_obj_type(unsigned int type) +static inline bool fsnotify_valid_obj_type(unsigned int obj_type) { - return (type < FSNOTIFY_OBJ_TYPE_COUNT); + return (obj_type < FSNOTIFY_OBJ_TYPE_COUNT); }
struct fsnotify_iter_info { @@ -387,7 +382,7 @@ static inline void fsnotify_iter_set_report_type_mark( static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & FSNOTIFY_OBJ_TYPE_##NAME##_FL) ? \ + return (iter_info->report_mask & (1U << FSNOTIFY_OBJ_TYPE_##NAME)) ? \ iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ }
@@ -604,11 +599,11 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, __kernel_fsid_t *fsid); /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int type, int allow_dups, + unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid);
/* attach the mark to the inode */ @@ -637,22 +632,23 @@ extern void fsnotify_detach_mark(struct fsnotify_mark *mark); extern void fsnotify_free_mark(struct fsnotify_mark *mark); /* Wait until all marks queued for destruction are destroyed */ extern void fsnotify_wait_marks_destroyed(void); -/* run all the marks in a group, and clear all of the marks attached to given object type */ -extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned int type); +/* Clear all of the marks of a group attached to a given object type */ +extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, + unsigned int obj_type); /* run all the marks in a group, and clear all of the vfsmount marks */ static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT); } /* run all the marks in a group, and clear all of the inode marks */ static inline void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE); } /* run all the marks in a group, and clear all of the sn marks */ static inline void fsnotify_clear_sb_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB); } extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 1c9007d62bea6fd164285314f7553f73e5308863 ]
They are two different types that use the same enum, so this confusing.
Use the object type to indicate the type of object mark is attached to and the iter type to indicate the type of watch.
A group can have two different watches of the same object type (parent and child watches) that match the same event.
Link: https://lore.kernel.org/r/20211129201537.1932819-3-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 6 ++--- fs/notify/fsnotify.c | 18 +++++++------- fs/notify/mark.c | 4 ++-- include/linux/fsnotify_backend.h | 41 ++++++++++++++++++++++---------- 4 files changed, 42 insertions(+), 27 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index b6091775aa6ef..652fe84cb8acd 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -299,7 +299,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; }
- fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -318,7 +318,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, * If the event is on a child and this mark is on a parent not * watching children, don't send it! */ - if (type == FSNOTIFY_OBJ_TYPE_PARENT && + if (type == FSNOTIFY_ITER_TYPE_PARENT && !(mark->mask & FS_EVENT_ON_CHILD)) continue;
@@ -746,7 +746,7 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) int type; __kernel_fsid_t fsid = {};
- fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { struct fsnotify_mark_connector *conn;
if (!fsnotify_iter_should_report_type(iter_info, type)) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 4034ca566f95c..0c94457c625e2 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -330,7 +330,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type,
/* clear ignored on inode modification */ if (mask & FS_MODIFY) { - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -340,7 +340,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type, } }
- fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -405,7 +405,7 @@ static unsigned int fsnotify_iter_select_report_types( int type;
/* Choose max prio group among groups of all queue heads */ - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; if (mark && fsnotify_compare_groups(max_prio_group, mark->group) > 0) @@ -417,7 +417,7 @@ static unsigned int fsnotify_iter_select_report_types(
/* Set the report mask for marks from same group as max prio group */ iter_info->report_mask = 0; - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; if (mark && fsnotify_compare_groups(max_prio_group, mark->group) == 0) @@ -435,7 +435,7 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) { int type;
- fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (fsnotify_iter_should_report_type(iter_info, type)) iter_info->marks[type] = fsnotify_next_mark(iter_info->marks[type]); @@ -519,18 +519,18 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
- iter_info.marks[FSNOTIFY_OBJ_TYPE_SB] = + iter_info.marks[FSNOTIFY_ITER_TYPE_SB] = fsnotify_first_mark(&sb->s_fsnotify_marks); if (mnt) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = + iter_info.marks[FSNOTIFY_ITER_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } if (inode) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = + iter_info.marks[FSNOTIFY_ITER_TYPE_INODE] = fsnotify_first_mark(&inode->i_fsnotify_marks); } if (parent) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_PARENT] = + iter_info.marks[FSNOTIFY_ITER_TYPE_PARENT] = fsnotify_first_mark(&parent->i_fsnotify_marks); }
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 7c0946e16918a..b42629d2fc1c6 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -353,7 +353,7 @@ bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) { int type;
- fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { /* This can fail if mark is being removed */ if (!fsnotify_get_mark_safe(iter_info->marks[type])) { __release(&fsnotify_mark_srcu); @@ -382,7 +382,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) int type;
iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - fsnotify_foreach_obj_type(type) + fsnotify_foreach_iter_type(type) fsnotify_put_mark_wake(iter_info->marks[type]); }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b9c84b1dbcc8f..73739fee1710f 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -337,10 +337,25 @@ static inline struct fs_error_report *fsnotify_data_error_report( } }
+/* + * Index to merged marks iterator array that correlates to a type of watch. + * The type of watched object can be deduced from the iterator type, but not + * the other way around, because an event can match different watched objects + * of the same object type. + * For example, both parent and child are watching an object of type inode. + */ +enum fsnotify_iter_type { + FSNOTIFY_ITER_TYPE_INODE, + FSNOTIFY_ITER_TYPE_VFSMOUNT, + FSNOTIFY_ITER_TYPE_SB, + FSNOTIFY_ITER_TYPE_PARENT, + FSNOTIFY_ITER_TYPE_COUNT +}; + +/* The type of object that a mark is attached to */ enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, - FSNOTIFY_OBJ_TYPE_PARENT, FSNOTIFY_OBJ_TYPE_VFSMOUNT, FSNOTIFY_OBJ_TYPE_SB, FSNOTIFY_OBJ_TYPE_COUNT, @@ -353,37 +368,37 @@ static inline bool fsnotify_valid_obj_type(unsigned int obj_type) }
struct fsnotify_iter_info { - struct fsnotify_mark *marks[FSNOTIFY_OBJ_TYPE_COUNT]; + struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; unsigned int report_mask; int srcu_idx; };
static inline bool fsnotify_iter_should_report_type( - struct fsnotify_iter_info *iter_info, int type) + struct fsnotify_iter_info *iter_info, int iter_type) { - return (iter_info->report_mask & (1U << type)); + return (iter_info->report_mask & (1U << iter_type)); }
static inline void fsnotify_iter_set_report_type( - struct fsnotify_iter_info *iter_info, int type) + struct fsnotify_iter_info *iter_info, int iter_type) { - iter_info->report_mask |= (1U << type); + iter_info->report_mask |= (1U << iter_type); }
static inline void fsnotify_iter_set_report_type_mark( - struct fsnotify_iter_info *iter_info, int type, + struct fsnotify_iter_info *iter_info, int iter_type, struct fsnotify_mark *mark) { - iter_info->marks[type] = mark; - iter_info->report_mask |= (1U << type); + iter_info->marks[iter_type] = mark; + iter_info->report_mask |= (1U << iter_type); }
#define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & (1U << FSNOTIFY_OBJ_TYPE_##NAME)) ? \ - iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ + return (iter_info->report_mask & (1U << FSNOTIFY_ITER_TYPE_##NAME)) ? \ + iter_info->marks[FSNOTIFY_ITER_TYPE_##NAME] : NULL; \ }
FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -391,8 +406,8 @@ FSNOTIFY_ITER_FUNCS(parent, PARENT) FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) FSNOTIFY_ITER_FUNCS(sb, SB)
-#define fsnotify_foreach_obj_type(type) \ - for (type = 0; type < FSNOTIFY_OBJ_TYPE_COUNT; type++) +#define fsnotify_foreach_iter_type(type) \ + for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++)
/* * fsnotify_connp_t is what we embed in objects which connector can be attached
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit d61fd650e9d206a71fda789f02a1ced4b19944c4 ]
FAN_REPORT_FID is ambiguous in that it reports the fid of the child for some events and the fid of the parent for create/delete/move events.
The new FAN_REPORT_TARGET_FID flag is an implicit request to report the fid of the target object of the operation (a.k.a the child inode) also in create/delete/move events in addition to the fid of the parent and the name of the child.
To reduce the test matrix for uninteresting use cases, the new FAN_REPORT_TARGET_FID flag requires both FAN_REPORT_NAME and FAN_REPORT_FID. The convenience macro FAN_REPORT_DFID_NAME_TARGET combines FAN_REPORT_TARGET_FID with all the required flags.
Link: https://lore.kernel.org/r/20211129201537.1932819-4-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 48 ++++++++++++++++++++++-------- fs/notify/fanotify/fanotify_user.c | 11 ++++++- include/linux/fanotify.h | 2 +- include/uapi/linux/fanotify.h | 4 +++ 4 files changed, 51 insertions(+), 14 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 652fe84cb8acd..85e542b164c8c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -458,17 +458,41 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, }
/* - * The inode to use as identifier when reporting fid depends on the event. - * Report the modified directory inode on dirent modification events. - * Report the "victim" inode otherwise. + * FAN_REPORT_FID is ambiguous in that it reports the fid of the child for + * some events and the fid of the parent for create/delete/move events. + * + * With the FAN_REPORT_TARGET_FID flag, the fid of the child is reported + * also in create/delete/move events in addition to the fid of the parent + * and the name of the child. + */ +static inline bool fanotify_report_child_fid(unsigned int fid_mode, u32 mask) +{ + if (mask & ALL_FSNOTIFY_DIRENT_EVENTS) + return (fid_mode & FAN_REPORT_TARGET_FID); + + return (fid_mode & FAN_REPORT_FID) && !(mask & FAN_ONDIR); +} + +/* + * The inode to use as identifier when reporting fid depends on the event + * and the group flags. + * + * With the group flag FAN_REPORT_TARGET_FID, always report the child fid. + * + * Without the group flag FAN_REPORT_TARGET_FID, report the modified directory + * fid on dirent events and the child fid otherwise. + * * For example: - * FS_ATTRIB reports the child inode even if reported on a watched parent. - * FS_CREATE reports the modified dir inode and not the created inode. + * FS_ATTRIB reports the child fid even if reported on a watched parent. + * FS_CREATE reports the modified dir fid without FAN_REPORT_TARGET_FID. + * and reports the created child fid with FAN_REPORT_TARGET_FID. */ static struct inode *fanotify_fid_inode(u32 event_mask, const void *data, - int data_type, struct inode *dir) + int data_type, struct inode *dir, + unsigned int fid_mode) { - if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) + if ((event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) && + !(fid_mode & FAN_REPORT_TARGET_FID)) return dir;
return fsnotify_data_inode(data, data_type); @@ -647,10 +671,11 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; - struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + struct inode *id = fanotify_fid_inode(mask, data, data_type, dir, + fid_mode); struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct mem_cgroup *old_memcg; struct inode *child = NULL; bool name_event = false; @@ -660,11 +685,10 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group,
if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* - * With both flags FAN_REPORT_DIR_FID and FAN_REPORT_FID, we - * report the child fid for events reported on a non-dir child + * For certain events and group flags, report the child fid * in addition to reporting the parent fid and maybe child name. */ - if ((fid_mode & FAN_REPORT_FID) && id != dirid && !ondir) + if (fanotify_report_child_fid(fid_mode, mask) && id != dirid) child = id;
id = dirid; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 2f78999a7aa3d..6b058d652f47b 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1270,6 +1270,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL;
+ /* + * FAN_REPORT_TARGET_FID requires FAN_REPORT_NAME and FAN_REPORT_FID + * and is used as an indication to report both dir and child fid on all + * dirent events. + */ + if ((fid_mode & FAN_REPORT_TARGET_FID) && + (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID))) + return -EINVAL; + f_flags = O_RDWR | FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; @@ -1680,7 +1689,7 @@ static int __init fanotify_user_setup(void) FANOTIFY_DEFAULT_MAX_USER_MARKS);
BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 616af2ea20f30..376e050e6f384 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -25,7 +25,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES)
-#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) +#define FANOTIFY_FID_BITS (FAN_REPORT_DFID_NAME_TARGET)
#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index bd1932c2074d5..60f73639a896a 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -57,9 +57,13 @@ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ #define FAN_REPORT_NAME 0x00000800 /* Report events with name */ +#define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */
/* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */ #define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME) +/* Convenience macro - FAN_REPORT_TARGET_FID requires all other FID flags */ +#define FAN_REPORT_DFID_NAME_TARGET (FAN_REPORT_DFID_NAME | \ + FAN_REPORT_FID | FAN_REPORT_TARGET_FID)
/* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e54183fa7047c15819bc155f4c58501d9a9a3489 ]
The dnotify FS_DN_RENAME event is used to request notification about a move within the same parent directory and was always coupled with the FS_MOVED_FROM event.
Rename the FS_DN_RENAME event flag to FS_RENAME, decouple it from FS_MOVED_FROM and report it with the moved dentry instead of the moved inode, so it has the information about both old and new parent and name.
Generate the FS_RENAME event regardless of same parent dir and apply the "same parent" rule in the generic fsnotify_handle_event() helper that is used to call backends with ->handle_inode_event() method (i.e. dnotify). The ->handle_inode_event() method is not rich enough to report both old and new parent and name anyway.
The enriched event is reported to fanotify over the ->handle_event() method with the old and new dir inode marks in marks array slots for ITER_TYPE_INODE and a new iter type slot ITER_TYPE_INODE2.
The enriched event will be used for reporting old and new parent+name to fanotify groups with FAN_RENAME events.
Link: https://lore.kernel.org/r/20211129201537.1932819-5-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/dnotify/dnotify.c | 2 +- fs/notify/fsnotify.c | 37 +++++++++++++++++++++++++------- include/linux/dnotify.h | 2 +- include/linux/fsnotify.h | 9 +++++--- include/linux/fsnotify_backend.h | 7 +++--- 5 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e85e13c50d6d4..d5ebebb034ffe 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -196,7 +196,7 @@ static __u32 convert_arg(unsigned long arg) if (arg & DN_ATTRIB) new_mask |= FS_ATTRIB; if (arg & DN_RENAME) - new_mask |= FS_DN_RENAME; + new_mask |= FS_RENAME; if (arg & DN_CREATE) new_mask |= (FS_CREATE | FS_MOVED_TO);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 0c94457c625e2..ab81a0776ece5 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -279,6 +279,18 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, WARN_ON_ONCE(fsnotify_iter_vfsmount_mark(iter_info))) return 0;
+ /* + * For FS_RENAME, 'dir' is old dir and 'data' is new dentry. + * The only ->handle_inode_event() backend that supports FS_RENAME is + * dnotify, where it means file was renamed within same parent. + */ + if (mask & FS_RENAME) { + struct dentry *moved = fsnotify_data_dentry(data, data_type); + + if (dir != moved->d_parent->d_inode) + return 0; + } + if (parent_mark) { /* * parent_mark indicates that the parent inode is watching @@ -469,7 +481,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, struct super_block *sb = fsnotify_data_sb(data, data_type); struct fsnotify_iter_info iter_info = {}; struct mount *mnt = NULL; - struct inode *parent = NULL; + struct inode *inode2 = NULL; + struct dentry *moved; + int inode2_type; int ret = 0; __u32 test_mask, marks_mask;
@@ -479,12 +493,19 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!inode) { /* Dirent event - report on TYPE_INODE to dir */ inode = dir; + /* For FS_RENAME, inode is old_dir and inode2 is new_dir */ + if (mask & FS_RENAME) { + moved = fsnotify_data_dentry(data, data_type); + inode2 = moved->d_parent->d_inode; + inode2_type = FSNOTIFY_ITER_TYPE_INODE2; + } } else if (mask & FS_EVENT_ON_CHILD) { /* * Event on child - report on TYPE_PARENT to dir if it is * watching children and on TYPE_INODE to child. */ - parent = dir; + inode2 = dir; + inode2_type = FSNOTIFY_ITER_TYPE_PARENT; }
/* @@ -497,7 +518,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!inode || !inode->i_fsnotify_marks) && - (!parent || !parent->i_fsnotify_marks)) + (!inode2 || !inode2->i_fsnotify_marks)) return 0;
marks_mask = sb->s_fsnotify_mask; @@ -505,8 +526,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, marks_mask |= mnt->mnt_fsnotify_mask; if (inode) marks_mask |= inode->i_fsnotify_mask; - if (parent) - marks_mask |= parent->i_fsnotify_mask; + if (inode2) + marks_mask |= inode2->i_fsnotify_mask;
/* @@ -529,9 +550,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, iter_info.marks[FSNOTIFY_ITER_TYPE_INODE] = fsnotify_first_mark(&inode->i_fsnotify_marks); } - if (parent) { - iter_info.marks[FSNOTIFY_ITER_TYPE_PARENT] = - fsnotify_first_mark(&parent->i_fsnotify_marks); + if (inode2) { + iter_info.marks[inode2_type] = + fsnotify_first_mark(&inode2->i_fsnotify_marks); }
/* diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h index 0aad774beaec4..b87c3b85a166c 100644 --- a/include/linux/dnotify.h +++ b/include/linux/dnotify.h @@ -26,7 +26,7 @@ struct dnotify_struct { FS_MODIFY | FS_MODIFY_CHILD |\ FS_ACCESS | FS_ACCESS_CHILD |\ FS_ATTRIB | FS_ATTRIB_CHILD |\ - FS_CREATE | FS_DN_RENAME |\ + FS_CREATE | FS_RENAME |\ FS_MOVED_FROM | FS_MOVED_TO)
extern int dir_notify_enable; diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index bec1e23ecf787..bb8467cd11ae2 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -144,16 +144,19 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, u32 fs_cookie = fsnotify_get_cookie(); __u32 old_dir_mask = FS_MOVED_FROM; __u32 new_dir_mask = FS_MOVED_TO; + __u32 rename_mask = FS_RENAME; const struct qstr *new_name = &moved->d_name;
- if (old_dir == new_dir) - old_dir_mask |= FS_DN_RENAME; - if (isdir) { old_dir_mask |= FS_ISDIR; new_dir_mask |= FS_ISDIR; + rename_mask |= FS_ISDIR; }
+ /* Event with information about both old and new parent+name */ + fsnotify_name(rename_mask, moved, FSNOTIFY_EVENT_DENTRY, + old_dir, old_name, 0); + fsnotify_name(old_dir_mask, source, FSNOTIFY_EVENT_INODE, old_dir, old_name, fs_cookie); fsnotify_name(new_dir_mask, source, FSNOTIFY_EVENT_INODE, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 73739fee1710f..790c31844db5d 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -63,7 +63,7 @@ */ #define FS_EVENT_ON_CHILD 0x08000000
-#define FS_DN_RENAME 0x10000000 /* file renamed */ +#define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ #define FS_IN_ONESHOT 0x80000000 /* only send event once */ @@ -76,7 +76,7 @@ * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event * when a directory entry inside a child subdir changes. */ -#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE) +#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \ FS_OPEN_EXEC_PERM) @@ -101,7 +101,7 @@ /* Events that can be reported to backends */ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ - FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \ + FS_DELETE_SELF | FS_MOVE_SELF | \ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ FS_ERROR)
@@ -349,6 +349,7 @@ enum fsnotify_iter_type { FSNOTIFY_ITER_TYPE_VFSMOUNT, FSNOTIFY_ITER_TYPE_SB, FSNOTIFY_ITER_TYPE_PARENT, + FSNOTIFY_ITER_TYPE_INODE2, FSNOTIFY_ITER_TYPE_COUNT };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 2d9374f095136206a02eb0b6cd9ef94632c1e9f7 ]
The fanotify_info buffer contains up to two file handles and a name. Use macros to simplify the code that access the different items within the buffer.
Add assertions to verify that stored fh len and name len do not overflow the u8 stored value in fanotify_info header.
Remove the unused fanotify_info_len() helper.
Link: https://lore.kernel.org/r/20211129201537.1932819-6-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify.h | 41 +++++++++++++++++++++++++---------- 2 files changed, 31 insertions(+), 12 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 85e542b164c8c..ffad224be0149 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -411,7 +411,7 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, * be zero in that case if encoding fh len failed. */ err = -ENOENT; - if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4)) + if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4) || fh_len > MAX_HANDLE_SZ) goto out_err;
/* No external buffer in a variable size allocated fh */ diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index d25f500bf7e79..dd23ba659e76b 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -49,6 +49,22 @@ struct fanotify_info { * (optional) file_fh starts at buf[dir_fh_totlen] * name starts at buf[dir_fh_totlen + file_fh_totlen] */ +#define FANOTIFY_DIR_FH_SIZE(info) ((info)->dir_fh_totlen) +#define FANOTIFY_FILE_FH_SIZE(info) ((info)->file_fh_totlen) +#define FANOTIFY_NAME_SIZE(info) ((info)->name_len + 1) + +#define FANOTIFY_DIR_FH_OFFSET(info) 0 +#define FANOTIFY_FILE_FH_OFFSET(info) \ + (FANOTIFY_DIR_FH_OFFSET(info) + FANOTIFY_DIR_FH_SIZE(info)) +#define FANOTIFY_NAME_OFFSET(info) \ + (FANOTIFY_FILE_FH_OFFSET(info) + FANOTIFY_FILE_FH_SIZE(info)) + +#define FANOTIFY_DIR_FH_BUF(info) \ + ((info)->buf + FANOTIFY_DIR_FH_OFFSET(info)) +#define FANOTIFY_FILE_FH_BUF(info) \ + ((info)->buf + FANOTIFY_FILE_FH_OFFSET(info)) +#define FANOTIFY_NAME_BUF(info) \ + ((info)->buf + FANOTIFY_NAME_OFFSET(info)) } __aligned(4);
static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -87,7 +103,7 @@ static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *inf { BUILD_BUG_ON(offsetof(struct fanotify_info, buf) % 4);
- return (struct fanotify_fh *)info->buf; + return (struct fanotify_fh *)FANOTIFY_DIR_FH_BUF(info); }
static inline int fanotify_info_file_fh_len(struct fanotify_info *info) @@ -101,32 +117,35 @@ static inline int fanotify_info_file_fh_len(struct fanotify_info *info)
static inline struct fanotify_fh *fanotify_info_file_fh(struct fanotify_info *info) { - return (struct fanotify_fh *)(info->buf + info->dir_fh_totlen); + return (struct fanotify_fh *)FANOTIFY_FILE_FH_BUF(info); }
-static inline const char *fanotify_info_name(struct fanotify_info *info) +static inline char *fanotify_info_name(struct fanotify_info *info) { - return info->buf + info->dir_fh_totlen + info->file_fh_totlen; + if (!info->name_len) + return NULL; + + return FANOTIFY_NAME_BUF(info); }
static inline void fanotify_info_init(struct fanotify_info *info) { + BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN + MAX_HANDLE_SZ > U8_MAX); + BUILD_BUG_ON(NAME_MAX > U8_MAX); + info->dir_fh_totlen = 0; info->file_fh_totlen = 0; info->name_len = 0; }
-static inline unsigned int fanotify_info_len(struct fanotify_info *info) -{ - return info->dir_fh_totlen + info->file_fh_totlen + info->name_len; -} - static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { + if (WARN_ON_ONCE(name->len > NAME_MAX)) + return; + info->name_len = name->len; - strcpy(info->buf + info->dir_fh_totlen + info->file_fh_totlen, - name->name); + strcpy(fanotify_info_name(info), name->name); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 1a9515ac9e55e68d733bab81bd408463ab1e25b1 ]
fanotify_info buffer is parceled into variable sized records, so the records must be written in order: dir_fh, file_fh, name.
Use helpers to assert that order and make fanotify_alloc_name_event() a bit more generic to allow empty dir_fh record and to allow expanding to more records (i.e. name2) soon.
Link: https://lore.kernel.org/r/20211129201537.1932819-7-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 35 +++++++++++++++++++---------------- fs/notify/fanotify/fanotify.h | 20 ++++++++++++++++++++ 2 files changed, 39 insertions(+), 16 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index ffad224be0149..2b13c79cebc62 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -576,7 +576,7 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, return &ffe->fae; }
-static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, +static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, __kernel_fsid_t *fsid, const struct qstr *name, struct inode *child, @@ -586,15 +586,17 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh, *ffh; - unsigned int dir_fh_len = fanotify_encode_fh_len(id); + unsigned int dir_fh_len = fanotify_encode_fh_len(dir); unsigned int child_fh_len = fanotify_encode_fh_len(child); - unsigned int size; + unsigned long name_len = name ? name->len : 0; + unsigned int len, size;
- size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; + /* Reserve terminating null byte even for empty name */ + size = sizeof(*fne) + name_len + 1; + if (dir_fh_len) + size += FANOTIFY_FH_HDR_LEN + dir_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; - if (name) - size += name->len + 1; fne = kmalloc(size, gfp); if (!fne) return NULL; @@ -604,22 +606,23 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, *hash ^= fanotify_hash_fsid(fsid); info = &fne->info; fanotify_info_init(info); - dfh = fanotify_info_dir_fh(info); - info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, hash, 0); + if (dir_fh_len) { + dfh = fanotify_info_dir_fh(info); + len = fanotify_encode_fh(dfh, dir, dir_fh_len, hash, 0); + fanotify_info_set_dir_fh(info, len); + } if (child_fh_len) { ffh = fanotify_info_file_fh(info); - info->file_fh_totlen = fanotify_encode_fh(ffh, child, - child_fh_len, hash, 0); + len = fanotify_encode_fh(ffh, child, child_fh_len, hash, 0); + fanotify_info_set_file_fh(info, len); } - if (name) { - long salt = name->len; - + if (name_len) { fanotify_info_copy_name(info, name); - *hash ^= full_name_hash((void *)salt, name->name, name->len); + *hash ^= full_name_hash((void *)name_len, name->name, name_len); }
- pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", - __func__, id->i_ino, size, dir_fh_len, child_fh_len, + pr_debug("%s: size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", + __func__, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info));
return &fne->fae; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index dd23ba659e76b..7ac6f9f1e4148 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -138,6 +138,26 @@ static inline void fanotify_info_init(struct fanotify_info *info) info->name_len = 0; }
+/* These set/copy helpers MUST be called by order */ +static inline void fanotify_info_set_dir_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0)) + return; + + info->dir_fh_totlen = totlen; +} + +static inline void fanotify_info_set_file_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->name_len > 0)) + return; + + info->file_fh_totlen = totlen; +} + static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 3cf984e950c1c3f41d407ed31db33beb996be132 ]
Allow storing a secondary dir fh and name tupple in fanotify_info. This will be used to store the new parent and name information in FAN_RENAME event.
Link: https://lore.kernel.org/r/20211129201537.1932819-8-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 20 ++++++-- fs/notify/fanotify/fanotify.h | 79 +++++++++++++++++++++++++++--- fs/notify/fanotify/fanotify_user.c | 3 +- 3 files changed, 88 insertions(+), 14 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 2b13c79cebc62..5f184b2d6ea7c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -76,8 +76,10 @@ static bool fanotify_info_equal(struct fanotify_info *info1, struct fanotify_info *info2) { if (info1->dir_fh_totlen != info2->dir_fh_totlen || + info1->dir2_fh_totlen != info2->dir2_fh_totlen || info1->file_fh_totlen != info2->file_fh_totlen || - info1->name_len != info2->name_len) + info1->name_len != info2->name_len || + info1->name2_len != info2->name2_len) return false;
if (info1->dir_fh_totlen && @@ -85,14 +87,24 @@ static bool fanotify_info_equal(struct fanotify_info *info1, fanotify_info_dir_fh(info2))) return false;
+ if (info1->dir2_fh_totlen && + !fanotify_fh_equal(fanotify_info_dir2_fh(info1), + fanotify_info_dir2_fh(info2))) + return false; + if (info1->file_fh_totlen && !fanotify_fh_equal(fanotify_info_file_fh(info1), fanotify_info_file_fh(info2))) return false;
- return !info1->name_len || - !memcmp(fanotify_info_name(info1), fanotify_info_name(info2), - info1->name_len); + if (info1->name_len && + memcmp(fanotify_info_name(info1), fanotify_info_name(info2), + info1->name_len)) + return false; + + return !info1->name2_len || + !memcmp(fanotify_info_name2(info1), fanotify_info_name2(info2), + info1->name2_len); }
static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 7ac6f9f1e4148..8fa3bc0effd45 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -40,31 +40,45 @@ struct fanotify_fh { struct fanotify_info { /* size of dir_fh/file_fh including fanotify_fh hdr size */ u8 dir_fh_totlen; + u8 dir2_fh_totlen; u8 file_fh_totlen; u8 name_len; - u8 pad; + u8 name2_len; + u8 pad[3]; unsigned char buf[]; /* * (struct fanotify_fh) dir_fh starts at buf[0] - * (optional) file_fh starts at buf[dir_fh_totlen] - * name starts at buf[dir_fh_totlen + file_fh_totlen] + * (optional) dir2_fh starts at buf[dir_fh_totlen] + * (optional) file_fh starts at buf[dir_fh_totlen + dir2_fh_totlen] + * name starts at buf[dir_fh_totlen + dir2_fh_totlen + file_fh_totlen] + * ... */ #define FANOTIFY_DIR_FH_SIZE(info) ((info)->dir_fh_totlen) +#define FANOTIFY_DIR2_FH_SIZE(info) ((info)->dir2_fh_totlen) #define FANOTIFY_FILE_FH_SIZE(info) ((info)->file_fh_totlen) #define FANOTIFY_NAME_SIZE(info) ((info)->name_len + 1) +#define FANOTIFY_NAME2_SIZE(info) ((info)->name2_len + 1)
#define FANOTIFY_DIR_FH_OFFSET(info) 0 -#define FANOTIFY_FILE_FH_OFFSET(info) \ +#define FANOTIFY_DIR2_FH_OFFSET(info) \ (FANOTIFY_DIR_FH_OFFSET(info) + FANOTIFY_DIR_FH_SIZE(info)) +#define FANOTIFY_FILE_FH_OFFSET(info) \ + (FANOTIFY_DIR2_FH_OFFSET(info) + FANOTIFY_DIR2_FH_SIZE(info)) #define FANOTIFY_NAME_OFFSET(info) \ (FANOTIFY_FILE_FH_OFFSET(info) + FANOTIFY_FILE_FH_SIZE(info)) +#define FANOTIFY_NAME2_OFFSET(info) \ + (FANOTIFY_NAME_OFFSET(info) + FANOTIFY_NAME_SIZE(info))
#define FANOTIFY_DIR_FH_BUF(info) \ ((info)->buf + FANOTIFY_DIR_FH_OFFSET(info)) +#define FANOTIFY_DIR2_FH_BUF(info) \ + ((info)->buf + FANOTIFY_DIR2_FH_OFFSET(info)) #define FANOTIFY_FILE_FH_BUF(info) \ ((info)->buf + FANOTIFY_FILE_FH_OFFSET(info)) #define FANOTIFY_NAME_BUF(info) \ ((info)->buf + FANOTIFY_NAME_OFFSET(info)) +#define FANOTIFY_NAME2_BUF(info) \ + ((info)->buf + FANOTIFY_NAME2_OFFSET(info)) } __aligned(4);
static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -106,6 +120,20 @@ static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *inf return (struct fanotify_fh *)FANOTIFY_DIR_FH_BUF(info); }
+static inline int fanotify_info_dir2_fh_len(struct fanotify_info *info) +{ + if (!info->dir2_fh_totlen || + WARN_ON_ONCE(info->dir2_fh_totlen < FANOTIFY_FH_HDR_LEN)) + return 0; + + return info->dir2_fh_totlen - FANOTIFY_FH_HDR_LEN; +} + +static inline struct fanotify_fh *fanotify_info_dir2_fh(struct fanotify_info *info) +{ + return (struct fanotify_fh *)FANOTIFY_DIR2_FH_BUF(info); +} + static inline int fanotify_info_file_fh_len(struct fanotify_info *info) { if (!info->file_fh_totlen || @@ -128,31 +156,55 @@ static inline char *fanotify_info_name(struct fanotify_info *info) return FANOTIFY_NAME_BUF(info); }
+static inline char *fanotify_info_name2(struct fanotify_info *info) +{ + if (!info->name2_len) + return NULL; + + return FANOTIFY_NAME2_BUF(info); +} + static inline void fanotify_info_init(struct fanotify_info *info) { BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN + MAX_HANDLE_SZ > U8_MAX); BUILD_BUG_ON(NAME_MAX > U8_MAX);
info->dir_fh_totlen = 0; + info->dir2_fh_totlen = 0; info->file_fh_totlen = 0; info->name_len = 0; + info->name2_len = 0; }
/* These set/copy helpers MUST be called by order */ static inline void fanotify_info_set_dir_fh(struct fanotify_info *info, unsigned int totlen) { - if (WARN_ON_ONCE(info->file_fh_totlen > 0) || - WARN_ON_ONCE(info->name_len > 0)) + if (WARN_ON_ONCE(info->dir2_fh_totlen > 0) || + WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) return;
info->dir_fh_totlen = totlen; }
+static inline void fanotify_info_set_dir2_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) + return; + + info->dir2_fh_totlen = totlen; +} + static inline void fanotify_info_set_file_fh(struct fanotify_info *info, unsigned int totlen) { - if (WARN_ON_ONCE(info->name_len > 0)) + if (WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) return;
info->file_fh_totlen = totlen; @@ -161,13 +213,24 @@ static inline void fanotify_info_set_file_fh(struct fanotify_info *info, static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { - if (WARN_ON_ONCE(name->len > NAME_MAX)) + if (WARN_ON_ONCE(name->len > NAME_MAX) || + WARN_ON_ONCE(info->name2_len > 0)) return;
info->name_len = name->len; strcpy(fanotify_info_name(info), name->name); }
+static inline void fanotify_info_copy_name2(struct fanotify_info *info, + const struct qstr *name) +{ + if (WARN_ON_ONCE(name->len > NAME_MAX)) + return; + + info->name2_len = name->len; + strcpy(fanotify_info_name2(info), name->name); +} + /* * Common structure for fanotify events. Concrete structs are allocated in * fanotify_handle_event() and freed when the information is retrieved by diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6b058d652f47b..d69570db5efd2 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -327,11 +327,10 @@ static int process_access_response(struct fsnotify_group *group, static size_t copy_error_info_to_user(struct fanotify_event *event, char __user *buf, int count) { - struct fanotify_event_info_error info; + struct fanotify_event_info_error info = { }; struct fanotify_error_event *fee = FANOTIFY_EE(event);
info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; - info.hdr.pad = 0; info.hdr.len = FANOTIFY_ERROR_INFO_LEN;
if (WARN_ON(count < info.hdr.len))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 3982534ba5ce45e890b2f5ef5e7372c1accd14c7 ]
In the special case of FAN_RENAME event, we record both the old and new parent and name.
Link: https://lore.kernel.org/r/20211129201537.1932819-9-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 42 +++++++++++++++++++++++++++++++---- include/uapi/linux/fanotify.h | 2 ++ 2 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 5f184b2d6ea7c..db81eab905442 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -592,21 +592,28 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, __kernel_fsid_t *fsid, const struct qstr *name, struct inode *child, + struct dentry *moved, unsigned int *hash, gfp_t gfp) { struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh, *ffh; + struct inode *dir2 = moved ? d_inode(moved->d_parent) : NULL; + const struct qstr *name2 = moved ? &moved->d_name : NULL; unsigned int dir_fh_len = fanotify_encode_fh_len(dir); + unsigned int dir2_fh_len = fanotify_encode_fh_len(dir2); unsigned int child_fh_len = fanotify_encode_fh_len(child); unsigned long name_len = name ? name->len : 0; + unsigned long name2_len = name2 ? name2->len : 0; unsigned int len, size;
/* Reserve terminating null byte even for empty name */ - size = sizeof(*fne) + name_len + 1; + size = sizeof(*fne) + name_len + name2_len + 2; if (dir_fh_len) size += FANOTIFY_FH_HDR_LEN + dir_fh_len; + if (dir2_fh_len) + size += FANOTIFY_FH_HDR_LEN + dir2_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; fne = kmalloc(size, gfp); @@ -623,6 +630,11 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, len = fanotify_encode_fh(dfh, dir, dir_fh_len, hash, 0); fanotify_info_set_dir_fh(info, len); } + if (dir2_fh_len) { + dfh = fanotify_info_dir2_fh(info); + len = fanotify_encode_fh(dfh, dir2, dir2_fh_len, hash, 0); + fanotify_info_set_dir2_fh(info, len); + } if (child_fh_len) { ffh = fanotify_info_file_fh(info); len = fanotify_encode_fh(ffh, child, child_fh_len, hash, 0); @@ -632,11 +644,22 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, fanotify_info_copy_name(info, name); *hash ^= full_name_hash((void *)name_len, name->name, name_len); } + if (name2_len) { + fanotify_info_copy_name2(info, name2); + *hash ^= full_name_hash((void *)name2_len, name2->name, + name2_len); + }
pr_debug("%s: size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", __func__, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info));
+ if (dir2_fh_len) { + pr_debug("%s: dir2_fh_len=%u name2_len=%u name2='%.*s'\n", + __func__, dir2_fh_len, info->name2_len, + info->name2_len, fanotify_info_name2(info)); + } + return &fne->fae; }
@@ -692,6 +715,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); struct mem_cgroup *old_memcg; + struct dentry *moved = NULL; struct inode *child = NULL; bool name_event = false; unsigned int hash = 0; @@ -727,6 +751,15 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || !ondir) { name_event = true; } + + /* + * In the special case of FAN_RENAME event, we record both + * old and new parent+name. + * 'dirid' and 'file_name' are the old parent+name and + * 'moved' has the new parent+name. + */ + if (mask & FAN_RENAME) + moved = fsnotify_data_dentry(data, data_type); }
/* @@ -748,9 +781,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, } else if (fanotify_is_error_event(mask)) { event = fanotify_alloc_error_event(group, fsid, data, data_type, &hash); - } else if (name_event && (file_name || child)) { - event = fanotify_alloc_name_event(id, fsid, file_name, child, - &hash, gfp); + } else if (name_event && (file_name || moved || child)) { + event = fanotify_alloc_name_event(dirid, fsid, file_name, child, + moved, &hash, gfp); } else if (fid_mode) { event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); } else { @@ -860,6 +893,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); + BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 60f73639a896a..9d0e2dc5767b5 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -28,6 +28,8 @@
#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
+#define FAN_RENAME 0x10000000 /* File was renamed */ + #define FAN_ONDIR 0x40000000 /* Event occurred against dir */
/* helper events */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 2bfbcccde6e7a787feabad4645f628f963fe0663 ]
We do not want to report the dirfid+name of a directory whose inode/sb are not watched, because watcher may not have permissions to see the directory content.
Use an internal iter_info to indicate to fanotify_alloc_event() which marks of this group are watching FAN_RENAME, so it can decide if we need to record only the old parent+name, new parent+name or both.
Link: https://lore.kernel.org/r/20211129201537.1932819-10-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com [JK: Modified code to pass around only mask of mark types matching generated event] Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 59 ++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 15 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index db81eab905442..14bc0f12cc9f3 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -284,8 +284,9 @@ static int fanotify_get_response(struct fsnotify_group *group, */ static u32 fanotify_group_event_mask(struct fsnotify_group *group, struct fsnotify_iter_info *iter_info, - u32 event_mask, const void *data, - int data_type, struct inode *dir) + u32 *match_mask, u32 event_mask, + const void *data, int data_type, + struct inode *dir) { __u32 marks_mask = 0, marks_ignored_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | @@ -335,6 +336,9 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, continue;
marks_mask |= mark->mask; + + /* Record the mark types of this group that matched the event */ + *match_mask |= 1U << type; }
test_mask = event_mask & marks_mask & ~marks_ignored_mask; @@ -701,11 +705,11 @@ static struct fanotify_event *fanotify_alloc_error_event( return &fee->fae; }
-static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - u32 mask, const void *data, - int data_type, struct inode *dir, - const struct qstr *file_name, - __kernel_fsid_t *fsid) +static struct fanotify_event *fanotify_alloc_event( + struct fsnotify_group *group, + u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, + __kernel_fsid_t *fsid, u32 match_mask) { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; @@ -753,13 +757,36 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, }
/* - * In the special case of FAN_RENAME event, we record both - * old and new parent+name. + * In the special case of FAN_RENAME event, use the match_mask + * to determine if we need to report only the old parent+name, + * only the new parent+name or both. * 'dirid' and 'file_name' are the old parent+name and * 'moved' has the new parent+name. */ - if (mask & FAN_RENAME) - moved = fsnotify_data_dentry(data, data_type); + if (mask & FAN_RENAME) { + bool report_old, report_new; + + if (WARN_ON_ONCE(!match_mask)) + return NULL; + + /* Report both old and new parent+name if sb watching */ + report_old = report_new = + match_mask & (1U << FSNOTIFY_ITER_TYPE_SB); + report_old |= + match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE); + report_new |= + match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE2); + + if (!report_old) { + /* Do not report old parent+name */ + dirid = NULL; + file_name = NULL; + } + if (report_new) { + /* Report new parent+name */ + moved = fsnotify_data_dentry(data, data_type); + } + } }
/* @@ -872,6 +899,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, struct fanotify_event *event; struct fsnotify_event *fsn_event; __kernel_fsid_t fsid = {}; + u32 match_mask = 0;
BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS); BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY); @@ -897,12 +925,13 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
- mask = fanotify_group_event_mask(group, iter_info, mask, data, - data_type, dir); + mask = fanotify_group_event_mask(group, iter_info, &match_mask, + mask, data, data_type, dir); if (!mask) return 0;
- pr_debug("%s: group=%p mask=%x\n", __func__, group, mask); + pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__, + group, mask, match_mask);
if (fanotify_is_perm_event(mask)) { /* @@ -921,7 +950,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, }
event = fanotify_alloc_event(group, mask, data, data_type, dir, - file_name, &fsid); + file_name, &fsid, match_mask); ret = -ENOMEM; if (unlikely(!event)) { /*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 7326e382c21e9c23c89c88369afdc90b82a14da8 ]
In the special case of FAN_RENAME event, we report old or new or both old and new parent+name.
A single info record will be reported if either the old or new dir is watched and two records will be reported if both old and new dir (or their filesystem) are watched.
The old and new parent+name are reported using new info record types FAN_EVENT_INFO_TYPE_{OLD,NEW}_DFID_NAME, so if a single info record is reported, it is clear to the application, to which dir entry the fid+name info is referring to.
Link: https://lore.kernel.org/r/20211129201537.1932819-11-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 7 ++++ fs/notify/fanotify/fanotify.h | 18 +++++++++++ fs/notify/fanotify/fanotify_user.c | 52 +++++++++++++++++++++++++++--- include/uapi/linux/fanotify.h | 6 ++++ 4 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 14bc0f12cc9f3..0da305b6f3e2f 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -153,6 +153,13 @@ static bool fanotify_should_merge(struct fanotify_event *old, if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) return false;
+ /* + * FAN_RENAME event is reported with special info record types, + * so we cannot merge it with other events. + */ + if ((old->mask & FAN_RENAME) != (new->mask & FAN_RENAME)) + return false; + switch (old->type) { case FANOTIFY_EVENT_TYPE_PATH: return fanotify_path_equal(fanotify_event_path(old), diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 8fa3bc0effd45..a3d5b751cac5b 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -373,6 +373,13 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir_fh_len(info) : 0; }
+static inline int fanotify_event_dir2_fh_len(struct fanotify_event *event) +{ + struct fanotify_info *info = fanotify_event_info(event); + + return info ? fanotify_info_dir2_fh_len(info) : 0; +} + static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) { /* For error events, even zeroed fh are reported. */ @@ -386,6 +393,17 @@ static inline bool fanotify_event_has_dir_fh(struct fanotify_event *event) return fanotify_event_dir_fh_len(event) > 0; }
+static inline bool fanotify_event_has_dir2_fh(struct fanotify_event *event) +{ + return fanotify_event_dir2_fh_len(event) > 0; +} + +static inline bool fanotify_event_has_any_dir_fh(struct fanotify_event *event) +{ + return fanotify_event_has_dir_fh(event) || + fanotify_event_has_dir2_fh(event); +} + struct fanotify_path_event { struct fanotify_event fae; struct path path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index d69570db5efd2..e16b18fdf1a65 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -124,12 +124,29 @@ static int fanotify_fid_info_len(int fh_len, int name_len) FANOTIFY_EVENT_ALIGN); }
+/* FAN_RENAME may have one or two dir+name info records */ +static int fanotify_dir_name_info_len(struct fanotify_event *event) +{ + struct fanotify_info *info = fanotify_event_info(event); + int dir_fh_len = fanotify_event_dir_fh_len(event); + int dir2_fh_len = fanotify_event_dir2_fh_len(event); + int info_len = 0; + + if (dir_fh_len) + info_len += fanotify_fid_info_len(dir_fh_len, + info->name_len); + if (dir2_fh_len) + info_len += fanotify_fid_info_len(dir2_fh_len, + info->name2_len); + + return info_len; +} + static size_t fanotify_event_len(unsigned int info_mode, struct fanotify_event *event) { size_t event_len = FAN_EVENT_METADATA_LEN; struct fanotify_info *info; - int dir_fh_len; int fh_len; int dot_len = 0;
@@ -141,9 +158,8 @@ static size_t fanotify_event_len(unsigned int info_mode,
info = fanotify_event_info(event);
- if (fanotify_event_has_dir_fh(event)) { - dir_fh_len = fanotify_event_dir_fh_len(event); - event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + if (fanotify_event_has_any_dir_fh(event)) { + event_len += fanotify_dir_name_info_len(event); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { /* @@ -374,6 +390,8 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return -EFAULT; break; case FAN_EVENT_INFO_TYPE_DFID_NAME: + case FAN_EVENT_INFO_TYPE_OLD_DFID_NAME: + case FAN_EVENT_INFO_TYPE_NEW_DFID_NAME: if (WARN_ON_ONCE(!name || !name_len)) return -EFAULT; break; @@ -473,11 +491,19 @@ static int copy_info_records_to_user(struct fanotify_event *event, unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
/* - * Event info records order is as follows: dir fid + name, child fid. + * Event info records order is as follows: + * 1. dir fid + name + * 2. (optional) new dir fid + new name + * 3. (optional) child fid */ if (fanotify_event_has_dir_fh(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; + + /* FAN_RENAME uses special info types */ + if (event->mask & FAN_RENAME) + info_type = FAN_EVENT_INFO_TYPE_OLD_DFID_NAME; + ret = copy_fid_info_to_user(fanotify_event_fsid(event), fanotify_info_dir_fh(info), info_type, @@ -491,6 +517,22 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; }
+ /* New dir fid+name may be reported in addition to old dir fid+name */ + if (fanotify_event_has_dir2_fh(event)) { + info_type = FAN_EVENT_INFO_TYPE_NEW_DFID_NAME; + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir2_fh(info), + info_type, + fanotify_info_name2(info), + info->name2_len, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + if (fanotify_event_has_object_fh(event)) { const char *dot = NULL; int dot_len = 0; diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 9d0e2dc5767b5..e8ac38cc2fd6d 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -134,6 +134,12 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_PIDFD 4 #define FAN_EVENT_INFO_TYPE_ERROR 5
+/* Special info types for FAN_RENAME */ +#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 +/* Reserved for FAN_EVENT_INFO_TYPE_OLD_DFID 11 */ +#define FAN_EVENT_INFO_TYPE_NEW_DFID_NAME 12 +/* Reserved for FAN_EVENT_INFO_TYPE_NEW_DFID 13 */ + /* Variable length info record following event metadata */ struct fanotify_event_info_header { __u8 info_type;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 8cc3b1ccd930fe6971e1527f0c4f1bdc8cb56026 ]
FAN_RENAME is the successor of FAN_MOVED_FROM and FAN_MOVED_TO and can be used to get the old and new parent+name information in a single event.
FAN_MOVED_FROM and FAN_MOVED_TO are still supported for backward compatibility, but it makes little sense to use them together with FAN_RENAME in the same group.
FAN_RENAME uses special info type records to report the old and new parent+name, so reporting only old and new parent id is less useful and was not implemented. Therefore, FAN_REANAME requires a group with flag FAN_REPORT_NAME.
Link: https://lore.kernel.org/r/20211129201537.1932819-12-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 8 ++++++++ include/linux/fanotify.h | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 0da305b6f3e2f..985e995d2a398 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -930,7 +930,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
mask = fanotify_group_event_mask(group, iter_info, &match_mask, mask, data, data_type, dir); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index e16b18fdf1a65..3bac2329dc35f 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1599,6 +1599,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out;
+ /* + * FAN_RENAME uses special info type records to report the old and + * new parent+name. Reporting only old and new parent id is less + * useful and was not implemented. + */ + if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME)) + goto fput_and_out; + if (flags & FAN_MARK_FLUSH) { ret = 0; if (mark_type == FAN_MARK_MOUNT) diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 376e050e6f384..3afdf339d53c9 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -82,7 +82,8 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * Directory entry modification events - reported only to directory * where entry is modified and not to a watching parent. */ -#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE) +#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE | \ + FAN_RENAME)
/* Events that can be reported with event->fd */ #define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit bbda86e988d4c124e4cfa816291cbd583ae8bfb1 ]
The way the per task_struct exit_code is used by kernel threads is not quite compatible how it is used by userspace applications. The low byte of the userspace exit_code value encodes the exit signal. While kthreads just use the value as an int holding ordinary kernel function exit status like -EPERM.
Add kthread_exit to clearly separate the two kinds of uses.
Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Stable-dep-of: ca3574bd653a ("exit: Rename module_put_and_exit to module_put_and_kthread_exit") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/kthread.h | 1 + kernel/kthread.c | 23 +++++++++++++++++++---- tools/objtool/check.c | 1 + 3 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 2484ed97e72f5..9dae77a97a033 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -68,6 +68,7 @@ void *kthread_probe_data(struct task_struct *k); int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); +void kthread_exit(long result) __noreturn;
int kthreadd(void *unused); extern struct task_struct *kthreadd_task; diff --git a/kernel/kthread.c b/kernel/kthread.c index 508fe52782857..9d6cc9c15a55e 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -262,6 +262,21 @@ void kthread_parkme(void) } EXPORT_SYMBOL_GPL(kthread_parkme);
+/** + * kthread_exit - Cause the current kthread return @result to kthread_stop(). + * @result: The integer value to return to kthread_stop(). + * + * While kthread_exit can be called directly, it exists so that + * functions which do some additional work in non-modular code such as + * module_put_and_kthread_exit can be implemented. + * + * Does not return. + */ +void __noreturn kthread_exit(long result) +{ + do_exit(result); +} + static int kthread(void *_create) { /* Copy data: it's on kthread's stack */ @@ -279,13 +294,13 @@ static int kthread(void *_create) done = xchg(&create->done, NULL); if (!done) { kfree(create); - do_exit(-EINTR); + kthread_exit(-EINTR); }
if (!self) { create->result = ERR_PTR(-ENOMEM); complete(done); - do_exit(-ENOMEM); + kthread_exit(-ENOMEM); }
self->threadfn = threadfn; @@ -312,7 +327,7 @@ static int kthread(void *_create) __kthread_parkme(self); ret = threadfn(data); } - do_exit(ret); + kthread_exit(ret); }
/* called from do_fork() to get node information for about to be created task */ @@ -621,7 +636,7 @@ EXPORT_SYMBOL_GPL(kthread_park); * instead of calling wake_up_process(): the thread will exit without * calling threadfn(). * - * If threadfn() may call do_exit() itself, the caller must ensure + * If threadfn() may call kthread_exit() itself, the caller must ensure * task_struct can't go away. * * Returns the result of threadfn(), or %-EINTR if wake_up_process() diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 059b78d08f7af..6afa1f8ca1614 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -168,6 +168,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "panic", "do_exit", "do_task_dead", + "kthread_exit", "make_task_dead", "__module_put_and_exit", "complete_and_exit",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric W. Biederman ebiederm@xmission.com
[ Upstream commit ca3574bd653aba234a4b31955f2778947403be16 ]
Update module_put_and_exit to call kthread_exit instead of do_exit.
Change the name to reflect this change in functionality. All of the users of module_put_and_exit are causing the current kthread to exit so this change makes it clear what is happening. There is no functional change.
Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/algboss.c | 4 ++-- fs/cifs/connect.c | 2 +- fs/nfs/callback.c | 4 ++-- fs/nfs/nfs4state.c | 2 +- fs/nfsd/nfssvc.c | 2 +- include/linux/module.h | 6 +++--- kernel/module.c | 6 +++--- net/bluetooth/bnep/core.c | 2 +- net/bluetooth/cmtp/core.c | 2 +- net/bluetooth/hidp/core.c | 2 +- tools/objtool/check.c | 2 +- 11 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/crypto/algboss.c b/crypto/algboss.c index 5ebccbd6b74ed..b87f907bb1428 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -74,7 +74,7 @@ static int cryptomgr_probe(void *data) complete_all(¶m->larval->completion); crypto_alg_put(¶m->larval->alg); kfree(param); - module_put_and_exit(0); + module_put_and_kthread_exit(0); }
static int cryptomgr_schedule_probe(struct crypto_larval *larval) @@ -209,7 +209,7 @@ static int cryptomgr_test(void *data) crypto_alg_tested(param->driver, err);
kfree(param); - module_put_and_exit(0); + module_put_and_kthread_exit(0); }
static int cryptomgr_schedule_test(struct crypto_alg *alg) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 164b985407160..a3c0e6a4e4847 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1242,7 +1242,7 @@ cifs_demultiplex_thread(void *p) }
memalloc_noreclaim_restore(noreclaim_flag); - module_put_and_exit(0); + module_put_and_kthread_exit(0); }
/* extract the host portion of the UNC string */ diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 86d856de1389b..3c86a559a321a 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -93,7 +93,7 @@ nfs4_callback_svc(void *vrqstp) svc_process(rqstp); } svc_exit_thread(rqstp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
@@ -137,7 +137,7 @@ nfs41_callback_svc(void *vrqstp) } } svc_exit_thread(rqstp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index afb617a4a7e42..d8fc5d72a161c 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,7 +2757,7 @@ static int nfs4_run_state_manager(void *ptr) goto again;
nfs_put_client(clp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 408cff8fe32d3..0f84151011088 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -986,7 +986,7 @@ nfsd(void *vrqstp)
/* Release module */ mutex_unlock(&nfsd_mutex); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/include/linux/module.h b/include/linux/module.h index 59cbd8e1be2d6..a55a40c28568e 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -604,9 +604,9 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, /* Look for this name: can be of form module:name. */ unsigned long module_kallsyms_lookup_name(const char *name);
-extern void __noreturn __module_put_and_exit(struct module *mod, +extern void __noreturn __module_put_and_kthread_exit(struct module *mod, long code); -#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code) +#define module_put_and_kthread_exit(code) __module_put_and_kthread_exit(THIS_MODULE, code)
#ifdef CONFIG_MODULE_UNLOAD int module_refcount(struct module *mod); @@ -798,7 +798,7 @@ static inline int unregister_module_notifier(struct notifier_block *nb) return 0; }
-#define module_put_and_exit(code) do_exit(code) +#define module_put_and_kthread_exit(code) kthread_exit(code)
static inline void print_modules(void) { diff --git a/kernel/module.c b/kernel/module.c index 949d09d2d8297..9030ff8c08555 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -336,12 +336,12 @@ static inline void add_taint_module(struct module *mod, unsigned flag, * A thread that wants to hold a reference to a module only while it * is running can call this to safely exit. nfsd and lockd use this. */ -void __noreturn __module_put_and_exit(struct module *mod, long code) +void __noreturn __module_put_and_kthread_exit(struct module *mod, long code) { module_put(mod); - do_exit(code); + kthread_exit(code); } -EXPORT_SYMBOL(__module_put_and_exit); +EXPORT_SYMBOL(__module_put_and_kthread_exit);
/* Find a module section: 0 means not found. */ static unsigned int find_sec(const struct load_info *info, const char *name) diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c index 43c284158f63e..09b6d825124ee 100644 --- a/net/bluetooth/bnep/core.c +++ b/net/bluetooth/bnep/core.c @@ -535,7 +535,7 @@ static int bnep_session(void *arg)
up_write(&bnep_session_sem); free_netdev(dev); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c index 83eb84e8e688f..90d130588a3e5 100644 --- a/net/bluetooth/cmtp/core.c +++ b/net/bluetooth/cmtp/core.c @@ -323,7 +323,7 @@ static int cmtp_session(void *arg) up_write(&cmtp_session_sem);
kfree(session); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index b946a6379433a..3ff870599eb77 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -1305,7 +1305,7 @@ static int hidp_session_thread(void *arg) l2cap_unregister_user(session->conn, &session->user); hidp_session_put(session);
- module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; }
diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 6afa1f8ca1614..0506a48f124c2 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -170,7 +170,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "do_task_dead", "kthread_exit", "make_task_dead", - "__module_put_and_exit", + "__module_put_and_kthread_exit", "complete_and_exit", "__reiserfs_panic", "lbug_with_loc",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c2f1c4bd20621175c581f298b4943df0cffbd841 ]
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types) /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: expected restricted __be32 [usertype] status /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: got int
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 52f3f35533791..eaf2f95d059ca 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1508,7 +1508,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; - __be32 status; + int status;
/* See RFC 7862 p.67: */ if (bytes_total == 0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 89b24336f03a8ba560e96b0c47a8434a7fa48e3c ]
If write_ports_add() fails, we shouldn't destroy the serv, unless we had only just created it. So if there are any permanent sockets already attached, leave the serv in place.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index d0761ca8cb542..162866cfe83a2 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -742,7 +742,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred return err;
err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err < 0) { + if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { nfsd_destroy(net); return err; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit df5e49c880ea0776806b8a9f8ab95e035272cf6f ]
It is common for 'get' functions to return the object that was 'got', and there are a couple of places where users of svc_get() would be a little simpler if svc_get() did that.
Make it so.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 6 ++---- fs/nfs/callback.c | 6 ++---- include/linux/sunrpc/svc.h | 3 ++- 3 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index b220e1b917268..2f50d5b2a8a42 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -430,14 +430,12 @@ static struct svc_serv *lockd_create_svc(void) /* * Check whether we're already up and running. */ - if (nlmsvc_rqst) { + if (nlmsvc_rqst) /* * Note: increase service usage, because later in case of error * svc_destroy() will be called. */ - svc_get(nlmsvc_rqst->rq_server); - return nlmsvc_rqst->rq_server; - } + return svc_get(nlmsvc_rqst->rq_server);
/* * Sanity check: if there's no pid, diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 3c86a559a321a..674198e0eb5e1 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -266,14 +266,12 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) /* * Check whether we're already up and running. */ - if (cb_info->serv) { + if (cb_info->serv) /* * Note: increase service usage, because later in case of error * svc_destroy() will be called. */ - svc_get(cb_info->serv); - return cb_info->serv; - } + return svc_get(cb_info->serv);
switch (minorversion) { case 0: diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 15ecd5b1abacf..e2580f6a05c21 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -120,9 +120,10 @@ struct svc_serv { * change the number of threads. Horrible, but there it is. * Should be called with the "service mutex" held. */ -static inline void svc_get(struct svc_serv *serv) +static inline struct svc_serv *svc_get(struct svc_serv *serv) { serv->sv_nrthreads++; + return serv; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 8c62d12740a1450d2e8456d5747f440e10db281a ]
svc_destroy() is poorly named - it doesn't necessarily destroy the svc, it might just reduce the ref count. nfsd_destroy() is poorly named for the same reason.
This patch: - removes the refcount functionality from svc_destroy(), moving it to a new svc_put(). Almost all previous callers of svc_destroy() now call svc_put(). - renames nfsd_destroy() to nfsd_put() and improves the code, using the new svc_destroy() rather than svc_put() - removes a few comments that explain the important for balanced get/put calls. This should be obvious.
The only non-trivial part of this is that svc_destroy() would call svc_sock_update() on a non-final decrement. It can no longer do that, and svc_put() isn't really a good place of it. This call is now made from svc_exit_thread() which seems like a good place. This makes the call *before* sv_nrthreads is decremented rather than after. This is not particularly important as the call just sets a flag which causes sv_nrthreads set be checked later. A subsequent patch will improve the ordering.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 6 +----- fs/nfs/callback.c | 14 ++------------ fs/nfsd/nfsctl.c | 4 ++-- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 30 ++++++++++++++++-------------- include/linux/sunrpc/svc.h | 26 +++++++++++++++++++++++--- net/sunrpc/svc.c | 19 +++++-------------- 7 files changed, 50 insertions(+), 51 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 2f50d5b2a8a42..135bd86ed3adb 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -431,10 +431,6 @@ static struct svc_serv *lockd_create_svc(void) * Check whether we're already up and running. */ if (nlmsvc_rqst) - /* - * Note: increase service usage, because later in case of error - * svc_destroy() will be called. - */ return svc_get(nlmsvc_rqst->rq_server);
/* @@ -495,7 +491,7 @@ int lockd_up(struct net *net, const struct cred *cred) * so we exit through here on both success and failure. */ err_put: - svc_destroy(serv); + svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); return error; diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 674198e0eb5e1..dddd66749a881 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -267,10 +267,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) * Check whether we're already up and running. */ if (cb_info->serv) - /* - * Note: increase service usage, because later in case of error - * svc_destroy() will be called. - */ return svc_get(cb_info->serv);
switch (minorversion) { @@ -333,16 +329,10 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt) goto err_start;
cb_info->users++; - /* - * svc_create creates the svc_serv with sv_nrthreads == 1, and then - * svc_prepare_thread increments that. So we need to call svc_destroy - * on both success and failure so that the refcount is 1 when the - * thread exits. - */ err_net: if (!cb_info->users) cb_info->serv = NULL; - svc_destroy(serv); + svc_put(serv); err_create: mutex_unlock(&nfs_callback_mutex); return ret; @@ -368,7 +358,7 @@ void nfs_callback_down(int minorversion, struct net *net) if (cb_info->users == 0) { svc_get(serv); serv->sv_ops->svo_setup(serv, NULL, 0); - svc_destroy(serv); + svc_put(serv); dprintk("nfs_callback_down: service destroyed\n"); cb_info->serv = NULL; } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 162866cfe83a2..5c8d985acf5fb 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -743,7 +743,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { - nfsd_destroy(net); + nfsd_put(net); return err; }
@@ -796,7 +796,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (!list_empty(&nn->nfsd_serv->sv_permsocks)) nn->nfsd_serv->sv_nrthreads--; else - nfsd_destroy(net); + nfsd_put(net); return err; }
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 498e5a4898260..3e5008b475ff0 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -97,7 +97,7 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net);
-void nfsd_destroy(struct net *net); +void nfsd_put(struct net *net);
bool i_am_nfsd(void);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 0f84151011088..4aee1cfe0d1bb 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -623,7 +623,7 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ serv->sv_ops->svo_setup(serv, NULL, 0); - nfsd_destroy(net); + nfsd_put(net); mutex_unlock(&nfsd_mutex); /* Wait for shutdown of nfsd_serv to complete */ wait_for_completion(&nn->nfsd_shutdown_complete); @@ -656,7 +656,10 @@ int nfsd_create_serv(struct net *net) nn->nfsd_serv->sv_maxconn = nn->max_connections; error = svc_bind(nn->nfsd_serv, net); if (error < 0) { - svc_destroy(nn->nfsd_serv); + /* NOT nfsd_put() as notifiers (see below) haven't + * been set up yet. + */ + svc_put(nn->nfsd_serv); nfsd_complete_shutdown(net); return error; } @@ -697,16 +700,16 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; }
-void nfsd_destroy(struct net *net) +void nfsd_put(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - int destroy = (nn->nfsd_serv->sv_nrthreads == 1);
- if (destroy) + nn->nfsd_serv->sv_nrthreads -= 1; + if (nn->nfsd_serv->sv_nrthreads == 0) { svc_shutdown_net(nn->nfsd_serv, net); - svc_destroy(nn->nfsd_serv); - if (destroy) + svc_destroy(nn->nfsd_serv); nfsd_complete_shutdown(net); + } }
int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) @@ -758,7 +761,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (err) break; } - nfsd_destroy(net); + nfsd_put(net); return err; }
@@ -795,7 +798,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred)
error = nfsd_startup_net(net, cred); if (error) - goto out_destroy; + goto out_put; error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, NULL, nrservs); if (error) @@ -808,8 +811,8 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); -out_destroy: - nfsd_destroy(net); /* Release server */ +out_put: + nfsd_put(net); out: mutex_unlock(&nfsd_mutex); return error; @@ -982,7 +985,7 @@ nfsd(void *vrqstp) /* Release the thread */ svc_exit_thread(rqstp);
- nfsd_destroy(net); + nfsd_put(net);
/* Release module */ mutex_unlock(&nfsd_mutex); @@ -1109,8 +1112,7 @@ int nfsd_pool_stats_release(struct inode *inode, struct file *file) struct net *net = inode->i_sb->s_fs_info;
mutex_lock(&nfsd_mutex); - /* this function really, really should have been called svc_put() */ - nfsd_destroy(net); + nfsd_put(net); mutex_unlock(&nfsd_mutex); return ret; } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index e2580f6a05c21..f7b582d3a65ac 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -114,8 +114,13 @@ struct svc_serv { #endif /* CONFIG_SUNRPC_BACKCHANNEL */ };
-/* - * We use sv_nrthreads as a reference count. svc_destroy() drops +/** + * svc_get() - increment reference count on a SUNRPC serv + * @serv: the svc_serv to have count incremented + * + * Returns: the svc_serv that was passed in. + * + * We use sv_nrthreads as a reference count. svc_put() drops * this refcount, so we need to bump it up around operations that * change the number of threads. Horrible, but there it is. * Should be called with the "service mutex" held. @@ -126,6 +131,22 @@ static inline struct svc_serv *svc_get(struct svc_serv *serv) return serv; }
+void svc_destroy(struct svc_serv *serv); + +/** + * svc_put - decrement reference count on a SUNRPC serv + * @serv: the svc_serv to have count decremented + * + * When the reference count reaches zero, svc_destroy() + * is called to clean up and free the serv. + */ +static inline void svc_put(struct svc_serv *serv) +{ + serv->sv_nrthreads -= 1; + if (serv->sv_nrthreads == 0) + svc_destroy(serv); +} + /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is @@ -518,7 +539,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); -void svc_destroy(struct svc_serv *); void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 54f66f66beb59..00d464f6dbe60 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -526,17 +526,7 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net); void svc_destroy(struct svc_serv *serv) { - dprintk("svc: svc_destroy(%s, %d)\n", - serv->sv_program->pg_name, - serv->sv_nrthreads); - - if (serv->sv_nrthreads) { - if (--(serv->sv_nrthreads) != 0) { - svc_sock_update_bufs(serv); - return; - } - } else - printk("svc_destroy: no threads for serv=%p!\n", serv); + dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
del_timer_sync(&serv->sv_temptimer);
@@ -893,9 +883,10 @@ svc_exit_thread(struct svc_rqst *rqstp)
svc_rqst_free(rqstp);
- /* Release the server */ - if (serv) - svc_destroy(serv); + if (!serv) + return; + svc_sock_update_bufs(serv); + svc_destroy(serv); } EXPORT_SYMBOL_GPL(svc_exit_thread);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit ec52361df99b490f6af412b046df9799b92c1050 ]
The use of sv_nrthreads as a general refcount results in clumsy code, as is seen by various comments needed to explain the situation.
This patch introduces a 'struct kref' and uses that for reference counting, leaving sv_nrthreads to be a pure count of threads. The kref is managed particularly in svc_get() and svc_put(), and also nfsd_put();
svc_destroy() now takes a pointer to the embedded kref, rather than to the serv.
nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This happens when a transport is created before the first thread is started. To support this, a 'keep_active' flag is introduced which holds a ref on the svc_serv. This is set when any listening socket is successfully added (unless there are running threads), and cleared when the number of threads is set. So when the last thread exits, the nfs_serv will be destroyed. The use of 'keep_active' replaces previous code which checked if there were any permanent sockets.
We no longer clear ->rq_server when nfsd() exits. This was done to prevent svc_exit_thread() from calling svc_destroy(). Instead we take an extra reference to the svc_serv to prevent svc_destroy() from being called.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 4 ---- fs/nfs/callback.c | 2 +- fs/nfsd/netns.h | 7 +++++++ fs/nfsd/nfsctl.c | 22 +++++++++----------- fs/nfsd/nfssvc.c | 42 +++++++++++++++++++++++--------------- include/linux/sunrpc/svc.h | 14 ++++--------- net/sunrpc/svc.c | 22 ++++++++++---------- 7 files changed, 59 insertions(+), 54 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 135bd86ed3adb..a9669b106dbde 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -486,10 +486,6 @@ int lockd_up(struct net *net, const struct cred *cred) goto err_put; } nlmsvc_users++; - /* - * Note: svc_serv structures have an initial use count of 1, - * so we exit through here on both success and failure. - */ err_put: svc_put(serv); err_create: diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index dddd66749a881..09ec60b99f65e 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -169,7 +169,7 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS) nrservs = NFS4_MIN_NR_CALLBACK_THREADS;
- if (serv->sv_nrthreads-1 == nrservs) + if (serv->sv_nrthreads == nrservs) return 0;
ret = serv->sv_ops->svo_setup(serv, NULL, nrservs); diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 935c1028c2175..08bcd8f23b013 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -123,6 +123,13 @@ struct nfsd_net { u32 clverifier_counter;
struct svc_serv *nfsd_serv; + /* When a listening socket is added to nfsd, keep_active is set + * and this justifies a reference on nfsd_serv. This stops + * nfsd_serv from being freed. When the number of threads is + * set, keep_active is cleared and the reference is dropped. So + * when the last thread exits, the service will be destroyed. + */ + int keep_active;
wait_queue_head_t ntf_wq; atomic_t ntf_refcnt; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 5c8d985acf5fb..53076c5afe62c 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -742,13 +742,12 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred return err;
err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { - nfsd_put(net); - return err; - }
- /* Decrease the count, but don't shut down the service */ - nn->nfsd_serv->sv_nrthreads--; + if (err >= 0 && + !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(nn->nfsd_serv); + + nfsd_put(net); return err; }
@@ -783,8 +782,10 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err < 0 && err != -EAFNOSUPPORT) goto out_close;
- /* Decrease the count, but don't shut down the service */ - nn->nfsd_serv->sv_nrthreads--; + if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(nn->nfsd_serv); + + nfsd_put(net); return 0; out_close: xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); @@ -793,10 +794,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr svc_xprt_put(xprt); } out_err: - if (!list_empty(&nn->nfsd_serv->sv_permsocks)) - nn->nfsd_serv->sv_nrthreads--; - else - nfsd_put(net); + nfsd_put(net); return err; }
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 4aee1cfe0d1bb..141d884fee4f4 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -60,13 +60,13 @@ static __be32 nfsd_init_request(struct svc_rqst *, * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a - * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number - * of nfsd threads must exist and each must listed in ->sp_all_threads in each - * entry of ->sv_pools[]. + * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless + * nn->keep_active is set). That number of nfsd threads must + * exist and each must be listed in ->sp_all_threads in some entry of + * ->sv_pools[]. * - * Transitions of the thread count between zero and non-zero are of particular - * interest since the svc_serv needs to be created and initialized at that - * point, or freed. + * Each active thread holds a counted reference on nn->nfsd_serv, as does + * the nn->keep_active flag and various transient calls to svc_get(). * * Finally, the nfsd_mutex also protects some of the global variables that are * accessed when nfsd starts and that are settable via the write_* routines in @@ -700,14 +700,22 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; }
+/* This is the callback for kref_put() below. + * There is no code here as the first thing to be done is + * call svc_shutdown_net(), but we cannot get the 'net' from + * the kref. So do all the work when kref_put returns true. + */ +static void nfsd_noop(struct kref *ref) +{ +} + void nfsd_put(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- nn->nfsd_serv->sv_nrthreads -= 1; - if (nn->nfsd_serv->sv_nrthreads == 0) { + if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); - svc_destroy(nn->nfsd_serv); + svc_destroy(&nn->nfsd_serv->sv_refcnt); nfsd_complete_shutdown(net); } } @@ -803,15 +811,14 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) NULL, nrservs); if (error) goto out_shutdown; - /* We are holding a reference to nn->nfsd_serv which - * we don't want to count in the return value, - * so subtract 1 - */ - error = nn->nfsd_serv->sv_nrthreads - 1; + error = nn->nfsd_serv->sv_nrthreads; out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); out_put: + /* Threads now hold service active */ + if (xchg(&nn->keep_active, 0)) + nfsd_put(net); nfsd_put(net); out: mutex_unlock(&nfsd_mutex); @@ -980,11 +987,15 @@ nfsd(void *vrqstp) nfsdstats.th_cnt --;
out: - rqstp->rq_server = NULL; + /* Take an extra ref so that the svc_put in svc_exit_thread() + * doesn't call svc_destroy() + */ + svc_get(nn->nfsd_serv);
/* Release the thread */ svc_exit_thread(rqstp);
+ /* Now if needed we call svc_destroy in appropriate context */ nfsd_put(net);
/* Release module */ @@ -1099,7 +1110,6 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file) mutex_unlock(&nfsd_mutex); return -ENODEV; } - /* bump up the psudo refcount while traversing */ svc_get(nn->nfsd_serv); ret = svc_pool_stats_open(nn->nfsd_serv, file); mutex_unlock(&nfsd_mutex); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f7b582d3a65ac..6829c4ee36544 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -85,6 +85,7 @@ struct svc_serv { struct svc_program * sv_program; /* RPC program */ struct svc_stat * sv_stats; /* RPC statistics */ spinlock_t sv_lock; + struct kref sv_refcnt; unsigned int sv_nrthreads; /* # of server threads */ unsigned int sv_maxconn; /* max connections allowed or * '0' causing max to be based @@ -119,19 +120,14 @@ struct svc_serv { * @serv: the svc_serv to have count incremented * * Returns: the svc_serv that was passed in. - * - * We use sv_nrthreads as a reference count. svc_put() drops - * this refcount, so we need to bump it up around operations that - * change the number of threads. Horrible, but there it is. - * Should be called with the "service mutex" held. */ static inline struct svc_serv *svc_get(struct svc_serv *serv) { - serv->sv_nrthreads++; + kref_get(&serv->sv_refcnt); return serv; }
-void svc_destroy(struct svc_serv *serv); +void svc_destroy(struct kref *);
/** * svc_put - decrement reference count on a SUNRPC serv @@ -142,9 +138,7 @@ void svc_destroy(struct svc_serv *serv); */ static inline void svc_put(struct svc_serv *serv) { - serv->sv_nrthreads -= 1; - if (serv->sv_nrthreads == 0) - svc_destroy(serv); + kref_put(&serv->sv_refcnt, svc_destroy); }
/* diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 00d464f6dbe60..19c9d8bf77beb 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -433,7 +433,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return NULL; serv->sv_name = prog->pg_name; serv->sv_program = prog; - serv->sv_nrthreads = 1; + kref_init(&serv->sv_refcnt); serv->sv_stats = prog->pg_stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; @@ -524,10 +524,11 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net); * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. */ void -svc_destroy(struct svc_serv *serv) +svc_destroy(struct kref *ref) { - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); + struct svc_serv *serv = container_of(ref, struct svc_serv, sv_refcnt);
+ dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); del_timer_sync(&serv->sv_temptimer);
/* @@ -635,6 +636,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) if (!rqstp) return ERR_PTR(-ENOMEM);
+ svc_get(serv); serv->sv_nrthreads++; spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; @@ -774,8 +776,7 @@ int svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { - /* The -1 assumes caller has done a svc_get() */ - nrservs -= (serv->sv_nrthreads-1); + nrservs -= serv->sv_nrthreads; } else { spin_lock_bh(&pool->sp_lock); nrservs -= pool->sp_nrthreads; @@ -816,8 +817,7 @@ int svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { - /* The -1 assumes caller has done a svc_get() */ - nrservs -= (serv->sv_nrthreads-1); + nrservs -= serv->sv_nrthreads; } else { spin_lock_bh(&pool->sp_lock); nrservs -= pool->sp_nrthreads; @@ -881,12 +881,12 @@ svc_exit_thread(struct svc_rqst *rqstp) list_del_rcu(&rqstp->rq_all); spin_unlock_bh(&pool->sp_lock);
+ serv->sv_nrthreads -= 1; + svc_sock_update_bufs(serv); + svc_rqst_free(rqstp);
- if (!serv) - return; - svc_sock_update_bufs(serv); - svc_destroy(serv); + svc_put(serv); } EXPORT_SYMBOL_GPL(svc_exit_thread);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 9b6c8c9bebccd5fb785c306b948c08874a88874d ]
This allows us to move the updates for th_cnt out of the mutex. This is a step towards reducing mutex coverage in nfsd().
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 6 +++--- fs/nfsd/stats.c | 2 +- fs/nfsd/stats.h | 4 +--- 3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 141d884fee4f4..32f2c46a38323 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -57,7 +57,7 @@ static __be32 nfsd_init_request(struct svc_rqst *, /* * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members * of the svc_serv struct. In particular, ->sv_nrthreads but also to some - * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt + * extent ->sv_temp_socks and ->sv_permsocks. * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless @@ -955,8 +955,8 @@ nfsd(void *vrqstp) allow_signal(SIGINT); allow_signal(SIGQUIT);
- nfsdstats.th_cnt++; mutex_unlock(&nfsd_mutex); + atomic_inc(&nfsdstats.th_cnt);
set_freezable();
@@ -983,8 +983,8 @@ nfsd(void *vrqstp) /* Clear signals before calling svc_exit_thread() */ flush_signals(current);
+ atomic_dec(&nfsdstats.th_cnt); mutex_lock(&nfsd_mutex); - nfsdstats.th_cnt --;
out: /* Take an extra ref so that the svc_put in svc_exit_thread() diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index 1d3b881e73821..a8c5a02a84f04 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -45,7 +45,7 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE]));
/* thread usage: */ - seq_printf(seq, "th %u 0", nfsdstats.th_cnt); + seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt));
/* deprecated thread usage histogram stats */ for (i = 0; i < 10; i++) diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 51ecda852e23b..9b43dc3d99913 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -29,11 +29,9 @@ enum { struct nfsd_stats { struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM];
- /* Protected by nfsd_mutex */ - unsigned int th_cnt; /* number of available threads */ + atomic_t th_cnt; /* number of available threads */ };
- extern struct nfsd_stats nfsdstats;
extern struct svc_stat nfsd_svcstats;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 2a36395fac3b72771f87c3ee4387e3a96d85a7cc ]
Using sv_lock means we don't need to hold the service mutex over these updates.
In particular, svc_exit_thread() no longer requires synchronisation, so threads can exit asynchronously.
Note that we could use an atomic_t, but as there are many more read sites than writes, that would add unnecessary noise to the code. Some reads are already racy, and there is no need for them to not be.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 5 ++--- net/sunrpc/svc.c | 9 +++++++-- 2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 32f2c46a38323..16884a90e1ab0 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -55,9 +55,8 @@ static __be32 nfsd_init_request(struct svc_rqst *, struct svc_process_info *);
/* - * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members - * of the svc_serv struct. In particular, ->sv_nrthreads but also to some - * extent ->sv_temp_socks and ->sv_permsocks. + * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and some members + * of the svc_serv struct such as ->sv_temp_socks and ->sv_permsocks. * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 19c9d8bf77beb..283088f3215e6 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -521,7 +521,7 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net);
/* * Destroy an RPC service. Should be called with appropriate locking to - * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. + * protect sv_permsocks and sv_tempsocks. */ void svc_destroy(struct kref *ref) @@ -637,7 +637,10 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) return ERR_PTR(-ENOMEM);
svc_get(serv); - serv->sv_nrthreads++; + spin_lock_bh(&serv->sv_lock); + serv->sv_nrthreads += 1; + spin_unlock_bh(&serv->sv_lock); + spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads); @@ -881,7 +884,9 @@ svc_exit_thread(struct svc_rqst *rqstp) list_del_rcu(&rqstp->rq_all); spin_unlock_bh(&pool->sp_lock);
+ spin_lock_bh(&serv->sv_lock); serv->sv_nrthreads -= 1; + spin_unlock_bh(&serv->sv_lock); svc_sock_update_bufs(serv);
svc_rqst_free(rqstp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 9d3792aefdcda71d20c2b1ecc589c17ae71eb523 ]
There is nothing happening in the start of nfsd() that requires protection by the mutex, so don't take it until shutting down the thread - which does still require protection - but only for nfsd_put().
Signed-off-by: NeilBrown neilb@suse.de [ cel: address merge conflict with fd2468fa1301 ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 16884a90e1ab0..eb8cc4d914fee 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -932,9 +932,6 @@ nfsd(void *vrqstp) struct nfsd_net *nn = net_generic(net, nfsd_net_id); int err;
- /* Lock module and set up kernel thread */ - mutex_lock(&nfsd_mutex); - /* At this point, the thread shares current->fs * with the init process. We need to create files with the * umask as defined by the client instead of init's umask. */ @@ -954,7 +951,6 @@ nfsd(void *vrqstp) allow_signal(SIGINT); allow_signal(SIGQUIT);
- mutex_unlock(&nfsd_mutex); atomic_inc(&nfsdstats.th_cnt);
set_freezable(); @@ -983,7 +979,6 @@ nfsd(void *vrqstp) flush_signals(current);
atomic_dec(&nfsdstats.th_cnt); - mutex_lock(&nfsd_mutex);
out: /* Take an extra ref so that the svc_put in svc_exit_thread() @@ -995,10 +990,11 @@ nfsd(void *vrqstp) svc_exit_thread(rqstp);
/* Now if needed we call svc_destroy in appropriate context */ + mutex_lock(&nfsd_mutex); nfsd_put(net); + mutex_unlock(&nfsd_mutex);
/* Release module */ - mutex_unlock(&nfsd_mutex); module_put_and_kthread_exit(0); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 3409e4f1e8f239f0ed81be0b068ecf4e73e2e826 ]
nfsd cannot currently use svc_set_num_threads_sync. It instead uses svc_set_num_threads which does *not* wait for threads to all exit, and has a separate mechanism (nfsd_shutdown_complete) to wait for completion.
The reason that nfsd is unlike other services is that nfsd threads can exit separately from svc_set_num_threads being called - they die on receipt of SIGKILL. Also, when the last thread exits, the service must be shut down (sockets closed).
For this, the nfsd_mutex needs to be taken, and as that mutex needs to be held while svc_set_num_threads is called, the one cannot wait for the other.
This patch changes the nfsd thread so that it can drop the ref on the service without blocking on nfsd_mutex, so that svc_set_num_threads_sync can be used: - if it can drop a non-last reference, it does that. This does not trigger shutdown and does not require a mutex. This will likely happen for all but the last thread signalled, and for all threads being shut down by nfsd_shutdown_threads() - if it can get the mutex without blocking (trylock), it does that and then drops the reference. This will likely happen for the last thread killed by SIGKILL - Otherwise there might be an unrelated task holding the mutex, possibly in another network namespace, or nfsd_shutdown_threads() might be just about to get a reference on the service, after which we can drop ours safely. We cannot conveniently get wakeup notifications on these events, and we are unlikely to need to, so we sleep briefly and check again.
With this we can discard nfsd_shutdown_complete and nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 3 --- fs/nfsd/nfssvc.c | 41 +++++++++++++++++++------------------- include/linux/sunrpc/svc.h | 13 ++++++++++++ 3 files changed, 33 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 08bcd8f23b013..1fd59eb0730bb 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -134,9 +134,6 @@ struct nfsd_net { wait_queue_head_t ntf_wq; atomic_t ntf_refcnt;
- /* Allow umount to wait for nfsd state cleanup */ - struct completion nfsd_shutdown_complete; - /* * clientid and stateid data for construction of net unique COPY * stateids. diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index eb8cc4d914fee..6b10415e4006b 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -593,20 +593,10 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads, + .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, };
-static void nfsd_complete_shutdown(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - WARN_ON(!mutex_is_locked(&nfsd_mutex)); - - nn->nfsd_serv = NULL; - complete(&nn->nfsd_shutdown_complete); -} - void nfsd_shutdown_threads(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); @@ -624,8 +614,6 @@ void nfsd_shutdown_threads(struct net *net) serv->sv_ops->svo_setup(serv, NULL, 0); nfsd_put(net); mutex_unlock(&nfsd_mutex); - /* Wait for shutdown of nfsd_serv to complete */ - wait_for_completion(&nn->nfsd_shutdown_complete); }
bool i_am_nfsd(void) @@ -650,7 +638,6 @@ int nfsd_create_serv(struct net *net) &nfsd_thread_sv_ops); if (nn->nfsd_serv == NULL) return -ENOMEM; - init_completion(&nn->nfsd_shutdown_complete);
nn->nfsd_serv->sv_maxconn = nn->max_connections; error = svc_bind(nn->nfsd_serv, net); @@ -659,7 +646,7 @@ int nfsd_create_serv(struct net *net) * been set up yet. */ svc_put(nn->nfsd_serv); - nfsd_complete_shutdown(net); + nn->nfsd_serv = NULL; return error; }
@@ -715,7 +702,7 @@ void nfsd_put(struct net *net) if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); - nfsd_complete_shutdown(net); + nn->nfsd_serv = NULL; } }
@@ -743,7 +730,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (tot > NFSD_MAXSERVS) { /* total too large: scale down requested numbers */ for (i = 0; i < n && tot > 0; i++) { - int new = nthreads[i] * NFSD_MAXSERVS / tot; + int new = nthreads[i] * NFSD_MAXSERVS / tot; tot -= (nthreads[i] - new); nthreads[i] = new; } @@ -989,10 +976,22 @@ nfsd(void *vrqstp) /* Release the thread */ svc_exit_thread(rqstp);
- /* Now if needed we call svc_destroy in appropriate context */ - mutex_lock(&nfsd_mutex); - nfsd_put(net); - mutex_unlock(&nfsd_mutex); + /* We need to drop a ref, but may not drop the last reference + * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that + * could deadlock with nfsd_shutdown_threads() waiting for us. + * So three options are: + * - drop a non-final reference, + * - get the mutex without waiting + * - sleep briefly andd try the above again + */ + while (!svc_put_not_last(nn->nfsd_serv)) { + if (mutex_trylock(&nfsd_mutex)) { + nfsd_put(net); + mutex_unlock(&nfsd_mutex); + break; + } + msleep(20); + }
/* Release module */ module_put_and_kthread_exit(0); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 6829c4ee36544..2768d61b8aed5 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -141,6 +141,19 @@ static inline void svc_put(struct svc_serv *serv) kref_put(&serv->sv_refcnt, svc_destroy); }
+/** + * svc_put_not_last - decrement non-final reference count on SUNRPC serv + * @serv: the svc_serv to have count decremented + * + * Returns: %true is refcount was decremented. + * + * If the refcount is 1, it is not decremented and instead failure is reported. + */ +static inline bool svc_put_not_last(struct svc_serv *serv) +{ + return refcount_dec_not_one(&serv->sv_refcnt.refcount); +} + /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 3ebdbe5203a874614819700d3f470724cb803709 ]
The ->svo_setup callback serves no purpose. It is always called from within the same module that chooses which callback is needed. So discard it and call the relevant function directly.
Now that svc_set_num_threads() is no longer used remove it and rename svc_set_num_threads_sync() to remove the "_sync" suffix.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback.c | 8 +++---- fs/nfsd/nfssvc.c | 11 ++++----- include/linux/sunrpc/svc.h | 4 ---- net/sunrpc/svc.c | 49 ++------------------------------------ 4 files changed, 10 insertions(+), 62 deletions(-)
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 09ec60b99f65e..422055a1092f0 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -172,9 +172,9 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, if (serv->sv_nrthreads == nrservs) return 0;
- ret = serv->sv_ops->svo_setup(serv, NULL, nrservs); + ret = svc_set_num_threads(serv, NULL, nrservs); if (ret) { - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); return ret; } dprintk("nfs_callback_up: service started\n"); @@ -235,14 +235,12 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, };
@@ -357,7 +355,7 @@ void nfs_callback_down(int minorversion, struct net *net) cb_info->users--; if (cb_info->users == 0) { svc_get(serv); - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); svc_put(serv); dprintk("nfs_callback_down: service destroyed\n"); cb_info->serv = NULL; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 6b10415e4006b..8d49dfbe03f85 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -593,7 +593,6 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, };
@@ -611,7 +610,7 @@ void nfsd_shutdown_threads(struct net *net)
svc_get(serv); /* Kill outstanding nfsd threads */ - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); nfsd_put(net); mutex_unlock(&nfsd_mutex); } @@ -750,8 +749,9 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) /* apply the new numbers */ svc_get(nn->nfsd_serv); for (i = 0; i < n; i++) { - err = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, - &nn->nfsd_serv->sv_pools[i], nthreads[i]); + err = svc_set_num_threads(nn->nfsd_serv, + &nn->nfsd_serv->sv_pools[i], + nthreads[i]); if (err) break; } @@ -793,8 +793,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) error = nfsd_startup_net(net, cred); if (error) goto out_put; - error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, - NULL, nrservs); + error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs); if (error) goto out_shutdown; error = nn->nfsd_serv->sv_nrthreads; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 2768d61b8aed5..165719a6229ab 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -64,9 +64,6 @@ struct svc_serv_ops { /* queue up a transport for servicing */ void (*svo_enqueue_xprt)(struct svc_xprt *);
- /* set up thread (or whatever) execution context */ - int (*svo_setup)(struct svc_serv *, struct svc_pool *, int); - /* optional module to count when adding threads (pooled svcs only) */ struct module *svo_module; }; @@ -544,7 +541,6 @@ void svc_pool_map_put(void); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); -int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 283088f3215e6..da5f008b8d27c 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -741,58 +741,13 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) return 0; }
- -/* destroy old threads */ -static int -svc_signal_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) -{ - struct task_struct *task; - unsigned int state = serv->sv_nrthreads-1; - - /* destroy old threads */ - do { - task = choose_victim(serv, pool, &state); - if (task == NULL) - break; - send_sig(SIGINT, task, 1); - nrservs++; - } while (nrservs < 0); - - return 0; -} - /* * Create or destroy enough new threads to make the number * of threads the given number. If `pool' is non-NULL, applies * only to threads in that pool, otherwise round-robins between * all pools. Caller must ensure that mutual exclusion between this and * server startup or shutdown. - * - * Destroying threads relies on the service threads filling in - * rqstp->rq_task, which only the nfs ones do. Assumes the serv - * has been created using svc_create_pooled(). - * - * Based on code that used to be in nfsd_svc() but tweaked - * to be pool-aware. */ -int -svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) -{ - if (pool == NULL) { - nrservs -= serv->sv_nrthreads; - } else { - spin_lock_bh(&pool->sp_lock); - nrservs -= pool->sp_nrthreads; - spin_unlock_bh(&pool->sp_lock); - } - - if (nrservs > 0) - return svc_start_kthreads(serv, pool, nrservs); - if (nrservs < 0) - return svc_signal_kthreads(serv, pool, nrservs); - return 0; -} -EXPORT_SYMBOL_GPL(svc_set_num_threads);
/* destroy old threads */ static int @@ -817,7 +772,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) }
int -svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs) +svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { nrservs -= serv->sv_nrthreads; @@ -833,7 +788,7 @@ svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrser return svc_stop_kthreads(serv, pool, nrservs); return 0; } -EXPORT_SYMBOL_GPL(svc_set_num_threads_sync); +EXPORT_SYMBOL_GPL(svc_set_num_threads);
/** * svc_rqst_replace_page - Replace one page in rq_pages[]
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit d057cfec4940ce6eeffa22b4a71dec203b06cd55 ]
nfsd currently maintains an open-coded read/write semaphore (refcount and wait queue) for each network namespace to ensure the nfs service isn't shut down while the notifier is running.
This is excessive. As there is unlikely to be contention between notifiers and they run without sleeping, a single spinlock is sufficient to avoid problems.
Signed-off-by: NeilBrown neilb@suse.de [ cel: ensure nfsd_notifier_lock is static ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 3 --- fs/nfsd/nfsctl.c | 2 -- fs/nfsd/nfssvc.c | 38 ++++++++++++++++++++------------------ 3 files changed, 20 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 1fd59eb0730bb..021acdc0d03bb 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -131,9 +131,6 @@ struct nfsd_net { */ int keep_active;
- wait_queue_head_t ntf_wq; - atomic_t ntf_refcnt; - /* * clientid and stateid data for construction of net unique COPY * stateids. diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 53076c5afe62c..504b169d27881 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,8 +1484,6 @@ static __net_init int nfsd_init_net(struct net *net) nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++;
- atomic_set(&nn->ntf_refcnt, 0); - init_waitqueue_head(&nn->ntf_wq); seqlock_init(&nn->boot_lock);
return 0; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8d49dfbe03f85..8554bc7ff4322 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -434,6 +434,7 @@ static void nfsd_shutdown_net(struct net *net) nfsd_shutdown_generic(); }
+static DEFINE_SPINLOCK(nfsd_notifier_lock); static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, void *ptr) { @@ -443,18 +444,17 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in sin;
- if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nn->ntf_refcnt)) + if (event != NETDEV_DOWN || !nn->nfsd_serv) goto out;
+ spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin); } - atomic_dec(&nn->ntf_refcnt); - wake_up(&nn->ntf_wq); + spin_unlock(&nfsd_notifier_lock);
out: return NOTIFY_DONE; @@ -474,10 +474,10 @@ static int nfsd_inet6addr_event(struct notifier_block *this, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in6 sin6;
- if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nn->ntf_refcnt)) + if (event != NETDEV_DOWN || !nn->nfsd_serv) goto out;
+ spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; @@ -486,8 +486,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this, sin6.sin6_scope_id = ifa->idev->dev->ifindex; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6); } - atomic_dec(&nn->ntf_refcnt); - wake_up(&nn->ntf_wq); + spin_unlock(&nfsd_notifier_lock); + out: return NOTIFY_DONE; } @@ -504,7 +504,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- atomic_dec(&nn->ntf_refcnt); /* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { unregister_inetaddr_notifier(&nfsd_inetaddr_notifier); @@ -512,7 +511,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) unregister_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - wait_event(nn->ntf_wq, atomic_read(&nn->ntf_refcnt) == 0);
/* * write_ports can create the server without actually starting @@ -624,6 +622,7 @@ int nfsd_create_serv(struct net *net) { int error; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
WARN_ON(!mutex_is_locked(&nfsd_mutex)); if (nn->nfsd_serv) { @@ -633,21 +632,23 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - nn->nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, - &nfsd_thread_sv_ops); - if (nn->nfsd_serv == NULL) + serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, + &nfsd_thread_sv_ops); + if (serv == NULL) return -ENOMEM;
- nn->nfsd_serv->sv_maxconn = nn->max_connections; - error = svc_bind(nn->nfsd_serv, net); + serv->sv_maxconn = nn->max_connections; + error = svc_bind(serv, net); if (error < 0) { /* NOT nfsd_put() as notifiers (see below) haven't * been set up yet. */ - svc_put(nn->nfsd_serv); - nn->nfsd_serv = NULL; + svc_put(serv); return error; } + spin_lock(&nfsd_notifier_lock); + nn->nfsd_serv = serv; + spin_unlock(&nfsd_notifier_lock);
set_max_drc(); /* check if the notifier is already set */ @@ -657,7 +658,6 @@ int nfsd_create_serv(struct net *net) register_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - atomic_inc(&nn->ntf_refcnt); nfsd_reset_boot_verifier(nn); return 0; } @@ -701,7 +701,9 @@ void nfsd_put(struct net *net) if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); + spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; + spin_unlock(&nfsd_notifier_lock); } }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 2840fe864c91a0fe822169b1fbfddbcac9aeac43 ]
lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it wants the 'struct svc_serv', and when it doesn't want it exactly it can get to what it wants from the serv.
This patch is a first step to removing nlmsvc_task and nlmsvc_rqst. It introduces nlmsvc_serv to store the 'struct svc_serv*'. This is set as soon as the serv is created, and cleared only when it is destroyed.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index a9669b106dbde..83874878f41d8 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -54,6 +54,7 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops);
static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; +static struct svc_serv *nlmsvc_serv; static struct task_struct *nlmsvc_task; static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout; @@ -306,13 +307,12 @@ static int lockd_inetaddr_event(struct notifier_block *this, !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out;
- if (nlmsvc_rqst) { + if (nlmsvc_serv) { dprintk("lockd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; - svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, - (struct sockaddr *)&sin); + svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin); } atomic_dec(&nlm_ntf_refcnt); wake_up(&nlm_ntf_wq); @@ -336,14 +336,13 @@ static int lockd_inet6addr_event(struct notifier_block *this, !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out;
- if (nlmsvc_rqst) { + if (nlmsvc_serv) { dprintk("lockd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; sin6.sin6_addr = ifa->addr; if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) sin6.sin6_scope_id = ifa->idev->dev->ifindex; - svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, - (struct sockaddr *)&sin6); + svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6); } atomic_dec(&nlm_ntf_refcnt); wake_up(&nlm_ntf_wq); @@ -423,15 +422,17 @@ static const struct svc_serv_ops lockd_sv_ops = { .svo_enqueue_xprt = svc_xprt_do_enqueue, };
-static struct svc_serv *lockd_create_svc(void) +static int lockd_create_svc(void) { struct svc_serv *serv;
/* * Check whether we're already up and running. */ - if (nlmsvc_rqst) - return svc_get(nlmsvc_rqst->rq_server); + if (nlmsvc_serv) { + svc_get(nlmsvc_serv); + return 0; + }
/* * Sanity check: if there's no pid, @@ -448,14 +449,15 @@ static struct svc_serv *lockd_create_svc(void) serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops); if (!serv) { printk(KERN_WARNING "lockd_up: create service failed\n"); - return ERR_PTR(-ENOMEM); + return -ENOMEM; } + nlmsvc_serv = serv; register_inetaddr_notifier(&lockd_inetaddr_notifier); #if IS_ENABLED(CONFIG_IPV6) register_inet6addr_notifier(&lockd_inet6addr_notifier); #endif dprintk("lockd_up: service created\n"); - return serv; + return 0; }
/* @@ -468,11 +470,10 @@ int lockd_up(struct net *net, const struct cred *cred)
mutex_lock(&nlmsvc_mutex);
- serv = lockd_create_svc(); - if (IS_ERR(serv)) { - error = PTR_ERR(serv); + error = lockd_create_svc(); + if (error) goto err_create; - } + serv = nlmsvc_serv;
error = lockd_up_net(serv, net, cred); if (error < 0) { @@ -487,6 +488,8 @@ int lockd_up(struct net *net, const struct cred *cred) } nlmsvc_users++; err_put: + if (nlmsvc_users == 0) + nlmsvc_serv = NULL; svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); @@ -501,7 +504,7 @@ void lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); - lockd_down_net(nlmsvc_rqst->rq_server, net); + lockd_down_net(nlmsvc_serv, net); if (nlmsvc_users) { if (--nlmsvc_users) goto out; @@ -519,6 +522,7 @@ lockd_down(struct net *net) dprintk("lockd_down: service stopped\n"); lockd_svc_exit_thread(); dprintk("lockd_down: service destroyed\n"); + nlmsvc_serv = NULL; nlmsvc_task = NULL; nlmsvc_rqst = NULL; out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 5a8a7ff57421b7de3ae72019938ffb5daaee36e7 ]
Now that the network status notifiers use nlmsvc_serv rather then nlmsvc_rqst the management can be simplified.
Notifier unregistration synchronises with any pending notifications so providing we unregister before nlm_serv is freed no further interlock is required.
So we move the unregister call to just before the thread is killed (which destroys the service) and just before the service is destroyed in the failure-path of lockd_up().
Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 35 +++++++++-------------------------- 1 file changed, 9 insertions(+), 26 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 83874878f41d8..20cebb191350f 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -59,9 +59,6 @@ static struct task_struct *nlmsvc_task; static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout;
-static atomic_t nlm_ntf_refcnt = ATOMIC_INIT(0); -static DECLARE_WAIT_QUEUE_HEAD(nlm_ntf_wq); - unsigned int lockd_net_id;
/* @@ -303,8 +300,7 @@ static int lockd_inetaddr_event(struct notifier_block *this, struct in_ifaddr *ifa = (struct in_ifaddr *)ptr; struct sockaddr_in sin;
- if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nlm_ntf_refcnt)) + if (event != NETDEV_DOWN) goto out;
if (nlmsvc_serv) { @@ -314,8 +310,6 @@ static int lockd_inetaddr_event(struct notifier_block *this, sin.sin_addr.s_addr = ifa->ifa_local; svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin); } - atomic_dec(&nlm_ntf_refcnt); - wake_up(&nlm_ntf_wq);
out: return NOTIFY_DONE; @@ -332,8 +326,7 @@ static int lockd_inet6addr_event(struct notifier_block *this, struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr; struct sockaddr_in6 sin6;
- if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nlm_ntf_refcnt)) + if (event != NETDEV_DOWN) goto out;
if (nlmsvc_serv) { @@ -344,8 +337,6 @@ static int lockd_inet6addr_event(struct notifier_block *this, sin6.sin6_scope_id = ifa->idev->dev->ifindex; svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6); } - atomic_dec(&nlm_ntf_refcnt); - wake_up(&nlm_ntf_wq);
out: return NOTIFY_DONE; @@ -362,14 +353,6 @@ static void lockd_unregister_notifiers(void) #if IS_ENABLED(CONFIG_IPV6) unregister_inet6addr_notifier(&lockd_inet6addr_notifier); #endif - wait_event(nlm_ntf_wq, atomic_read(&nlm_ntf_refcnt) == 0); -} - -static void lockd_svc_exit_thread(void) -{ - atomic_dec(&nlm_ntf_refcnt); - lockd_unregister_notifiers(); - svc_exit_thread(nlmsvc_rqst); }
static int lockd_start_svc(struct svc_serv *serv) @@ -388,11 +371,9 @@ static int lockd_start_svc(struct svc_serv *serv) printk(KERN_WARNING "lockd_up: svc_rqst allocation failed, error=%d\n", error); - lockd_unregister_notifiers(); goto out_rqst; }
- atomic_inc(&nlm_ntf_refcnt); svc_sock_update_bufs(serv); serv->sv_maxconn = nlm_max_connections;
@@ -410,7 +391,7 @@ static int lockd_start_svc(struct svc_serv *serv) return 0;
out_task: - lockd_svc_exit_thread(); + svc_exit_thread(nlmsvc_rqst); nlmsvc_task = NULL; out_rqst: nlmsvc_rqst = NULL; @@ -477,7 +458,6 @@ int lockd_up(struct net *net, const struct cred *cred)
error = lockd_up_net(serv, net, cred); if (error < 0) { - lockd_unregister_notifiers(); goto err_put; }
@@ -488,8 +468,10 @@ int lockd_up(struct net *net, const struct cred *cred) } nlmsvc_users++; err_put: - if (nlmsvc_users == 0) + if (nlmsvc_users == 0) { + lockd_unregister_notifiers(); nlmsvc_serv = NULL; + } svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); @@ -518,13 +500,14 @@ lockd_down(struct net *net) printk(KERN_ERR "lockd_down: no lockd running.\n"); BUG(); } + lockd_unregister_notifiers(); kthread_stop(nlmsvc_task); dprintk("lockd_down: service stopped\n"); - lockd_svc_exit_thread(); + svc_exit_thread(nlmsvc_rqst); + nlmsvc_rqst = NULL; dprintk("lockd_down: service destroyed\n"); nlmsvc_serv = NULL; nlmsvc_task = NULL; - nlmsvc_rqst = NULL; out: mutex_unlock(&nlmsvc_mutex); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit b73a2972041bee70eb0cbbb25fa77828c63c916b ]
lockd_start_svc() only needs to be called once, just after the svc is created. If the start fails, the svc is discarded too.
It thus makes sense to call lockd_start_svc() from lockd_create_svc(). This allows us to remove the test against nlmsvc_rqst at the start of lockd_start_svc() - it must always be NULL.
lockd_up() only held an extra reference on the svc until a thread was created - then it dropped it. The thread - and thus the extra reference - will remain until kthread_stop() is called. Now that the thread is created in lockd_create_svc(), the extra reference can be dropped there. So the 'serv' variable is no longer needed in lockd_up().
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 20cebb191350f..91e7c839841ec 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -359,9 +359,6 @@ static int lockd_start_svc(struct svc_serv *serv) { int error;
- if (nlmsvc_rqst) - return 0; - /* * Create the kernel thread and wait for it to start. */ @@ -406,6 +403,7 @@ static const struct svc_serv_ops lockd_sv_ops = { static int lockd_create_svc(void) { struct svc_serv *serv; + int error;
/* * Check whether we're already up and running. @@ -432,6 +430,13 @@ static int lockd_create_svc(void) printk(KERN_WARNING "lockd_up: create service failed\n"); return -ENOMEM; } + + error = lockd_start_svc(serv); + /* The thread now holds the only reference */ + svc_put(serv); + if (error < 0) + return error; + nlmsvc_serv = serv; register_inetaddr_notifier(&lockd_inetaddr_notifier); #if IS_ENABLED(CONFIG_IPV6) @@ -446,7 +451,6 @@ static int lockd_create_svc(void) */ int lockd_up(struct net *net, const struct cred *cred) { - struct svc_serv *serv; int error;
mutex_lock(&nlmsvc_mutex); @@ -454,25 +458,19 @@ int lockd_up(struct net *net, const struct cred *cred) error = lockd_create_svc(); if (error) goto err_create; - serv = nlmsvc_serv;
- error = lockd_up_net(serv, net, cred); + error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) { goto err_put; }
- error = lockd_start_svc(serv); - if (error < 0) { - lockd_down_net(serv, net); - goto err_put; - } nlmsvc_users++; err_put: if (nlmsvc_users == 0) { lockd_unregister_notifiers(); + kthread_stop(nlmsvc_task); nlmsvc_serv = NULL; } - svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); return error;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 6a4e2527a63620a820c4ebf3596b57176da26fb3 ]
The normal place to call svc_exit_thread() is from the thread itself just before it exists. Do this for lockd.
This means that nlmsvc_rqst is not used out side of lockd_start_svc(), so it can be made local to that function, and renamed to 'rqst'.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 91e7c839841ec..9aa499a761591 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -56,7 +56,6 @@ static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct svc_serv *nlmsvc_serv; static struct task_struct *nlmsvc_task; -static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout;
unsigned int lockd_net_id; @@ -182,6 +181,11 @@ lockd(void *vrqstp) nlm_shutdown_hosts(); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); + + dprintk("lockd_down: service stopped\n"); + + svc_exit_thread(rqstp); + return 0; }
@@ -358,13 +362,14 @@ static void lockd_unregister_notifiers(void) static int lockd_start_svc(struct svc_serv *serv) { int error; + struct svc_rqst *rqst;
/* * Create the kernel thread and wait for it to start. */ - nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); - if (IS_ERR(nlmsvc_rqst)) { - error = PTR_ERR(nlmsvc_rqst); + rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); + if (IS_ERR(rqst)) { + error = PTR_ERR(rqst); printk(KERN_WARNING "lockd_up: svc_rqst allocation failed, error=%d\n", error); @@ -374,24 +379,23 @@ static int lockd_start_svc(struct svc_serv *serv) svc_sock_update_bufs(serv); serv->sv_maxconn = nlm_max_connections;
- nlmsvc_task = kthread_create(lockd, nlmsvc_rqst, "%s", serv->sv_name); + nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name); if (IS_ERR(nlmsvc_task)) { error = PTR_ERR(nlmsvc_task); printk(KERN_WARNING "lockd_up: kthread_run failed, error=%d\n", error); goto out_task; } - nlmsvc_rqst->rq_task = nlmsvc_task; + rqst->rq_task = nlmsvc_task; wake_up_process(nlmsvc_task);
dprintk("lockd_up: service started\n"); return 0;
out_task: - svc_exit_thread(nlmsvc_rqst); + svc_exit_thread(rqst); nlmsvc_task = NULL; out_rqst: - nlmsvc_rqst = NULL; return error; }
@@ -500,9 +504,6 @@ lockd_down(struct net *net) } lockd_unregister_notifiers(); kthread_stop(nlmsvc_task); - dprintk("lockd_down: service stopped\n"); - svc_exit_thread(nlmsvc_rqst); - nlmsvc_rqst = NULL; dprintk("lockd_down: service destroyed\n"); nlmsvc_serv = NULL; nlmsvc_task = NULL;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 865b674069e05e5779fcf8cf7a166d2acb7e930b ]
There is some cleanup that is duplicated in lockd_down() and the failure path of lockd_up(). Factor these out into a new lockd_put() and call it from both places.
lockd_put() does *not* take the mutex - that must be held by the caller. It decrements nlmsvc_users and if that reaches zero, it cleans up.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 64 +++++++++++++++++++++----------------------------- 1 file changed, 27 insertions(+), 37 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 9aa499a761591..7f12c280fd30d 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -351,14 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif
-static void lockd_unregister_notifiers(void) -{ - unregister_inetaddr_notifier(&lockd_inetaddr_notifier); -#if IS_ENABLED(CONFIG_IPV6) - unregister_inet6addr_notifier(&lockd_inet6addr_notifier); -#endif -} - static int lockd_start_svc(struct svc_serv *serv) { int error; @@ -450,6 +442,27 @@ static int lockd_create_svc(void) return 0; }
+static void lockd_put(void) +{ + if (WARN(nlmsvc_users <= 0, "lockd_down: no users!\n")) + return; + if (--nlmsvc_users) + return; + + unregister_inetaddr_notifier(&lockd_inetaddr_notifier); +#if IS_ENABLED(CONFIG_IPV6) + unregister_inet6addr_notifier(&lockd_inet6addr_notifier); +#endif + + if (nlmsvc_task) { + kthread_stop(nlmsvc_task); + dprintk("lockd_down: service stopped\n"); + nlmsvc_task = NULL; + } + nlmsvc_serv = NULL; + dprintk("lockd_down: service destroyed\n"); +} + /* * Bring up the lockd process if it's not already up. */ @@ -461,21 +474,16 @@ int lockd_up(struct net *net, const struct cred *cred)
error = lockd_create_svc(); if (error) - goto err_create; + goto err; + nlmsvc_users++;
error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) { - goto err_put; + lockd_put(); + goto err; }
- nlmsvc_users++; -err_put: - if (nlmsvc_users == 0) { - lockd_unregister_notifiers(); - kthread_stop(nlmsvc_task); - nlmsvc_serv = NULL; - } -err_create: +err: mutex_unlock(&nlmsvc_mutex); return error; } @@ -489,25 +497,7 @@ lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); lockd_down_net(nlmsvc_serv, net); - if (nlmsvc_users) { - if (--nlmsvc_users) - goto out; - } else { - printk(KERN_ERR "lockd_down: no users! task=%p\n", - nlmsvc_task); - BUG(); - } - - if (!nlmsvc_task) { - printk(KERN_ERR "lockd_down: no lockd running.\n"); - BUG(); - } - lockd_unregister_notifiers(); - kthread_stop(nlmsvc_task); - dprintk("lockd_down: service destroyed\n"); - nlmsvc_serv = NULL; - nlmsvc_task = NULL; -out: + lockd_put(); mutex_unlock(&nlmsvc_mutex); } EXPORT_SYMBOL_GPL(lockd_down);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit ecd3ad68d2c6d3ae178a63a2d9a02c392904fd36 ]
lockd_create_svc() already does an svc_get() if the service already exists, so it is more like a "get" than a "create".
So: - Move the increment of nlmsvc_users into the function as well - rename to lockd_get().
It is now the inverse of lockd_put().
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 7f12c280fd30d..1a7c11118b320 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -396,16 +396,14 @@ static const struct svc_serv_ops lockd_sv_ops = { .svo_enqueue_xprt = svc_xprt_do_enqueue, };
-static int lockd_create_svc(void) +static int lockd_get(void) { struct svc_serv *serv; int error;
- /* - * Check whether we're already up and running. - */ if (nlmsvc_serv) { svc_get(nlmsvc_serv); + nlmsvc_users++; return 0; }
@@ -439,6 +437,7 @@ static int lockd_create_svc(void) register_inet6addr_notifier(&lockd_inet6addr_notifier); #endif dprintk("lockd_up: service created\n"); + nlmsvc_users++; return 0; }
@@ -472,10 +471,9 @@ int lockd_up(struct net *net, const struct cred *cred)
mutex_lock(&nlmsvc_mutex);
- error = lockd_create_svc(); + error = lockd_get(); if (error) goto err; - nlmsvc_users++;
error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit cf0e124e0a489944d08fcc3c694d2b234d2cc658 ]
These definitions are not used outside of svc.c, and there is no evidence that they ever have been. So move them into svc.c and make the declarations 'static'.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/svc.h | 25 ------------------------- net/sunrpc/svc.c | 31 +++++++++++++++++++++++++------ 2 files changed, 25 insertions(+), 31 deletions(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 165719a6229ab..89e9d00af601b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -497,29 +497,6 @@ struct svc_procedure { const char * pc_name; /* for display */ };
-/* - * Mode for mapping cpus to pools. - */ -enum { - SVC_POOL_AUTO = -1, /* choose one of the others */ - SVC_POOL_GLOBAL, /* no mapping, just a single global pool - * (legacy & UP mode) */ - SVC_POOL_PERCPU, /* one pool per cpu */ - SVC_POOL_PERNODE /* one pool per numa node */ -}; - -struct svc_pool_map { - int count; /* How many svc_servs use us */ - int mode; /* Note: int not enum to avoid - * warnings about "enumeration value - * not handled in switch" */ - unsigned int npools; - unsigned int *pool_to; /* maps pool id to cpu or node */ - unsigned int *to_pool; /* maps cpu or node to pool id */ -}; - -extern struct svc_pool_map svc_pool_map; - /* * Function prototypes. */ @@ -536,8 +513,6 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); -unsigned int svc_pool_map_get(void); -void svc_pool_map_put(void); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index da5f008b8d27c..c681ac1c9d569 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -39,14 +39,35 @@ static void svc_unregister(const struct svc_serv *serv, struct net *net);
#define SVC_POOL_DEFAULT SVC_POOL_GLOBAL
+/* + * Mode for mapping cpus to pools. + */ +enum { + SVC_POOL_AUTO = -1, /* choose one of the others */ + SVC_POOL_GLOBAL, /* no mapping, just a single global pool + * (legacy & UP mode) */ + SVC_POOL_PERCPU, /* one pool per cpu */ + SVC_POOL_PERNODE /* one pool per numa node */ +}; + /* * Structure for mapping cpus to pools and vice versa. * Setup once during sunrpc initialisation. */ -struct svc_pool_map svc_pool_map = { + +struct svc_pool_map { + int count; /* How many svc_servs use us */ + int mode; /* Note: int not enum to avoid + * warnings about "enumeration value + * not handled in switch" */ + unsigned int npools; + unsigned int *pool_to; /* maps pool id to cpu or node */ + unsigned int *to_pool; /* maps cpu or node to pool id */ +}; + +static struct svc_pool_map svc_pool_map = { .mode = SVC_POOL_DEFAULT }; -EXPORT_SYMBOL_GPL(svc_pool_map);
static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */
@@ -220,7 +241,7 @@ svc_pool_map_init_pernode(struct svc_pool_map *m) * vice versa). Initialise the map if we're the first user. * Returns the number of pools. */ -unsigned int +static unsigned int svc_pool_map_get(void) { struct svc_pool_map *m = &svc_pool_map; @@ -255,7 +276,6 @@ svc_pool_map_get(void) mutex_unlock(&svc_pool_map_mutex); return m->npools; } -EXPORT_SYMBOL_GPL(svc_pool_map_get);
/* * Drop a reference to the global map of cpus to pools. @@ -264,7 +284,7 @@ EXPORT_SYMBOL_GPL(svc_pool_map_get); * mode using the pool_mode module option without * rebooting or re-loading sunrpc.ko. */ -void +static void svc_pool_map_put(void) { struct svc_pool_map *m = &svc_pool_map; @@ -281,7 +301,6 @@ svc_pool_map_put(void)
mutex_unlock(&svc_pool_map_mutex); } -EXPORT_SYMBOL_GPL(svc_pool_map_put);
static int svc_pool_map_get_node(unsigned int pidx) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 93aa619eb0b42eec2f3a9b4d9db41f5095390aec ]
Currently 'pooled' services hold a reference on the pool_map, and 'unpooled' services do not. svc_destroy() uses the presence of ->svo_function (via svc_serv_is_pooled()) to determine if the reference should be dropped. There is no direct correlation between being pooled and the use of svo_function, though in practice, lockd is the only non-pooled service, and the only one not to use svo_function.
This is untidy and would cause problems if we changed lockd to use svc_set_num_threads(), which requires the use of ->svo_function.
So change the test for "is the service pooled" to "is sv_nrpools > 1".
This means that when svc_pool_map_get() returns 1, it must NOT take a reference to the pool.
We discard svc_serv_is_pooled(), and test sv_nrpools directly.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svc.c | 54 ++++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index c681ac1c9d569..ceccd4ae5f797 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -35,8 +35,6 @@
static void svc_unregister(const struct svc_serv *serv, struct net *net);
-#define svc_serv_is_pooled(serv) ((serv)->sv_ops->svo_function) - #define SVC_POOL_DEFAULT SVC_POOL_GLOBAL
/* @@ -238,8 +236,10 @@ svc_pool_map_init_pernode(struct svc_pool_map *m)
/* * Add a reference to the global map of cpus to pools (and - * vice versa). Initialise the map if we're the first user. - * Returns the number of pools. + * vice versa) if pools are in use. + * Initialise the map if we're the first user. + * Returns the number of pools. If this is '1', no reference + * was taken. */ static unsigned int svc_pool_map_get(void) @@ -251,6 +251,7 @@ svc_pool_map_get(void)
if (m->count++) { mutex_unlock(&svc_pool_map_mutex); + WARN_ON_ONCE(m->npools <= 1); return m->npools; }
@@ -266,29 +267,36 @@ svc_pool_map_get(void) break; }
- if (npools < 0) { + if (npools <= 0) { /* default, or memory allocation failure */ npools = 1; m->mode = SVC_POOL_GLOBAL; } m->npools = npools;
+ if (npools == 1) + /* service is unpooled, so doesn't hold a reference */ + m->count--; + mutex_unlock(&svc_pool_map_mutex); - return m->npools; + return npools; }
/* - * Drop a reference to the global map of cpus to pools. + * Drop a reference to the global map of cpus to pools, if + * pools were in use, i.e. if npools > 1. * When the last reference is dropped, the map data is * freed; this allows the sysadmin to change the pool * mode using the pool_mode module option without * rebooting or re-loading sunrpc.ko. */ static void -svc_pool_map_put(void) +svc_pool_map_put(int npools) { struct svc_pool_map *m = &svc_pool_map;
+ if (npools <= 1) + return; mutex_lock(&svc_pool_map_mutex);
if (!--m->count) { @@ -357,21 +365,18 @@ svc_pool_for_cpu(struct svc_serv *serv, int cpu) struct svc_pool_map *m = &svc_pool_map; unsigned int pidx = 0;
- /* - * An uninitialised map happens in a pure client when - * lockd is brought up, so silently treat it the - * same as SVC_POOL_GLOBAL. - */ - if (svc_serv_is_pooled(serv)) { - switch (m->mode) { - case SVC_POOL_PERCPU: - pidx = m->to_pool[cpu]; - break; - case SVC_POOL_PERNODE: - pidx = m->to_pool[cpu_to_node(cpu)]; - break; - } + if (serv->sv_nrpools <= 1) + return serv->sv_pools; + + switch (m->mode) { + case SVC_POOL_PERCPU: + pidx = m->to_pool[cpu]; + break; + case SVC_POOL_PERNODE: + pidx = m->to_pool[cpu_to_node(cpu)]; + break; } + return &serv->sv_pools[pidx % serv->sv_nrpools]; }
@@ -524,7 +529,7 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, goto out_err; return serv; out_err: - svc_pool_map_put(); + svc_pool_map_put(npools); return NULL; } EXPORT_SYMBOL_GPL(svc_create_pooled); @@ -559,8 +564,7 @@ svc_destroy(struct kref *ref)
cache_clean_deferred(serv);
- if (svc_serv_is_pooled(serv)) - svc_pool_map_put(); + svc_pool_map_put(serv->sv_nrpools);
kfree(serv->sv_pools); kfree(serv);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 6b044fbaab02292fedb17565dbb3f2528083b169 ]
svc_set_num_threads() does everything that lockd_start_svc() does, except set sv_maxconn. It also (when passed 0) finds the threads and stops them with kthread_stop().
So move the setting for sv_maxconn, and use svc_set_num_thread()
We now don't need nlmsvc_task.
Now that we use svc_set_num_threads() it makes sense to set svo_module. This request that the thread exists with module_put_and_exit(). Also fix the documentation for svo_module to make this explicit.
svc_prepare_thread is now only used where it is defined, so it can be made static.
Signed-off-by: NeilBrown neilb@suse.de [ cel: address merge conflict with fd2468fa1301 ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 58 ++++++-------------------------------- include/linux/sunrpc/svc.h | 6 ++-- net/sunrpc/svc.c | 3 +- 3 files changed, 12 insertions(+), 55 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1a7c11118b320..0475c5a5d061e 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -55,7 +55,6 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct svc_serv *nlmsvc_serv; -static struct task_struct *nlmsvc_task; unsigned long nlmsvc_timeout;
unsigned int lockd_net_id; @@ -186,7 +185,7 @@ lockd(void *vrqstp)
svc_exit_thread(rqstp);
- return 0; + module_put_and_kthread_exit(0); }
static int create_lockd_listener(struct svc_serv *serv, const char *name, @@ -292,8 +291,8 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) __func__, net->ns.inum); } } else { - pr_err("%s: no users! task=%p, net=%x\n", - __func__, nlmsvc_task, net->ns.inum); + pr_err("%s: no users! net=%x\n", + __func__, net->ns.inum); BUG(); } } @@ -351,49 +350,11 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif
-static int lockd_start_svc(struct svc_serv *serv) -{ - int error; - struct svc_rqst *rqst; - - /* - * Create the kernel thread and wait for it to start. - */ - rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); - if (IS_ERR(rqst)) { - error = PTR_ERR(rqst); - printk(KERN_WARNING - "lockd_up: svc_rqst allocation failed, error=%d\n", - error); - goto out_rqst; - } - - svc_sock_update_bufs(serv); - serv->sv_maxconn = nlm_max_connections; - - nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name); - if (IS_ERR(nlmsvc_task)) { - error = PTR_ERR(nlmsvc_task); - printk(KERN_WARNING - "lockd_up: kthread_run failed, error=%d\n", error); - goto out_task; - } - rqst->rq_task = nlmsvc_task; - wake_up_process(nlmsvc_task); - - dprintk("lockd_up: service started\n"); - return 0; - -out_task: - svc_exit_thread(rqst); - nlmsvc_task = NULL; -out_rqst: - return error; -} - static const struct svc_serv_ops lockd_sv_ops = { .svo_shutdown = svc_rpcb_cleanup, + .svo_function = lockd, .svo_enqueue_xprt = svc_xprt_do_enqueue, + .svo_module = THIS_MODULE, };
static int lockd_get(void) @@ -425,7 +386,8 @@ static int lockd_get(void) return -ENOMEM; }
- error = lockd_start_svc(serv); + serv->sv_maxconn = nlm_max_connections; + error = svc_set_num_threads(serv, NULL, 1); /* The thread now holds the only reference */ svc_put(serv); if (error < 0) @@ -453,11 +415,7 @@ static void lockd_put(void) unregister_inet6addr_notifier(&lockd_inet6addr_notifier); #endif
- if (nlmsvc_task) { - kthread_stop(nlmsvc_task); - dprintk("lockd_down: service stopped\n"); - nlmsvc_task = NULL; - } + svc_set_num_threads(nlmsvc_serv, NULL, 0); nlmsvc_serv = NULL; dprintk("lockd_down: service destroyed\n"); } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 89e9d00af601b..f116141ea64d0 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -64,7 +64,9 @@ struct svc_serv_ops { /* queue up a transport for servicing */ void (*svo_enqueue_xprt)(struct svc_xprt *);
- /* optional module to count when adding threads (pooled svcs only) */ + /* optional module to count when adding threads. + * Thread function must call module_put_and_kthread_exit() to exit. + */ struct module *svo_module; };
@@ -507,8 +509,6 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, const struct svc_serv_ops *); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); -struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, - struct svc_pool *pool, int node); void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page); void svc_rqst_free(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index ceccd4ae5f797..d34d03b0bf76b 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -650,7 +650,7 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) } EXPORT_SYMBOL_GPL(svc_rqst_alloc);
-struct svc_rqst * +static struct svc_rqst * svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) { struct svc_rqst *rqstp; @@ -670,7 +670,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) spin_unlock_bh(&pool->sp_lock); return rqstp; } -EXPORT_SYMBOL_GPL(svc_prepare_thread);
/* * Choose a pool in which to create a new thread, for svc_set_num_threads
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 23a1a573c61ccb5e7829c1f5472d3e025293a031 ]
Now that thread management is consistent there is no need for nfs-callback to use svc_create_pooled() as introduced in Commit df807fffaabd ("NFSv4.x/callback: Create the callback service through svc_create_pooled"). So switch back to svc_create().
If service pools were configured, but the number of threads were left at '1', nfs callback may not work reliably when svc_create_pooled() is used.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 422055a1092f0..054cc1255fac6 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -286,7 +286,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", cb_info->users);
- serv = svc_create_pooled(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); + serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); if (!serv) { printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); return ERR_PTR(-ENOMEM);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7578b2f628db27281d3165af0aa862311883a858 ]
Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for global spinlock elimination"), billed as a clean-up, added be32_to_cpu() to the DRC hash function without explanation. That commit removed two comments that state that byte-swapping in the hash function is unnecessary without explaining whether there was a need for that change.
On some Intel CPUs, the swab32 instruction is known to cause a CPU pipeline stall. be32_to_cpu() does not add extra randomness, since the hash multiplication is done /before/ shifting to the high-order bits of the result.
As a micro-optimization, remove the unnecessary transform from the DRC hash function.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 6e0b6f3148dca..a4a69ab6ab280 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -87,7 +87,7 @@ nfsd_hashsize(unsigned int limit) static u32 nfsd_cache_hash(__be32 xid, struct nfsd_net *nn) { - return hash_32(be32_to_cpu(xid), nn->maskbits); + return hash_32((__force u32)xid, nn->maskbits); }
static struct svc_cacherep *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiapeng Chong jiapeng.chong@linux.alibaba.com
[ Upstream commit 1e37d0e5bda45881eea1bec4b812def72c7d4aea ]
Eliminate the follow smatch warning:
fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent indenting.
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Jiapeng Chong jiapeng.chong@linux.alibaba.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d9758232a398c..638a626af18dc 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4812,8 +4812,8 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, return nfserr_resource;
*p++ = htonl(NFS4_CONTENT_HOLE); - p = xdr_encode_hyper(p, read->rd_offset); - p = xdr_encode_hyper(p, count); + p = xdr_encode_hyper(p, read->rd_offset); + p = xdr_encode_hyper(p, count);
*eof = (read->rd_offset + count) >= f_size; *maxcount = min_t(unsigned long, count, *maxcount);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 1463b38e7cf34d4cc60f41daff459ad807b2e408 ]
We currently have a 'laundrette' for closing cached files - a different work-item for each network-namespace.
These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a list, and are freed using rcu.
The list is not necessary as we have a per-namespace structure (struct nfsd_net) which can hold a link to the nfsd_fcache_disposal. The use of kfree_rcu is also unnecessary as the cache is cleaned of all files associated with a given namespace, and no new files can be added, before the nfsd_fcache_disposal is freed.
So add a '->fcache_disposal' link to nfsd_net, and discard the list management and rcu usage.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 76 +++++++++------------------------------------ fs/nfsd/netns.h | 2 ++ 2 files changed, 17 insertions(+), 61 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 8cd7d5d6955a0..b6ef8256c9c64 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -44,12 +44,9 @@ struct nfsd_fcache_bucket { static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
struct nfsd_fcache_disposal { - struct list_head list; struct work_struct work; - struct net *net; spinlock_t lock; struct list_head freeme; - struct rcu_head rcu; };
static struct workqueue_struct *nfsd_filecache_wq __read_mostly; @@ -62,8 +59,6 @@ static long nfsd_file_lru_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; -static DEFINE_SPINLOCK(laundrette_lock); -static LIST_HEAD(laundrettes);
static void nfsd_file_gc(void);
@@ -366,19 +361,13 @@ nfsd_file_list_remove_disposal(struct list_head *dst, static void nfsd_file_list_add_disposal(struct list_head *files, struct net *net) { - struct nfsd_fcache_disposal *l; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal;
- rcu_read_lock(); - list_for_each_entry_rcu(l, &laundrettes, list) { - if (l->net == net) { - spin_lock(&l->lock); - list_splice_tail_init(files, &l->freeme); - spin_unlock(&l->lock); - queue_work(nfsd_filecache_wq, &l->work); - break; - } - } - rcu_read_unlock(); + spin_lock(&l->lock); + list_splice_tail_init(files, &l->freeme); + spin_unlock(&l->lock); + queue_work(nfsd_filecache_wq, &l->work); }
static void @@ -754,7 +743,7 @@ nfsd_file_cache_purge(struct net *net) }
static struct nfsd_fcache_disposal * -nfsd_alloc_fcache_disposal(struct net *net) +nfsd_alloc_fcache_disposal(void) { struct nfsd_fcache_disposal *l;
@@ -762,7 +751,6 @@ nfsd_alloc_fcache_disposal(struct net *net) if (!l) return NULL; INIT_WORK(&l->work, nfsd_file_delayed_close); - l->net = net; spin_lock_init(&l->lock); INIT_LIST_HEAD(&l->freeme); return l; @@ -771,61 +759,27 @@ nfsd_alloc_fcache_disposal(struct net *net) static void nfsd_free_fcache_disposal(struct nfsd_fcache_disposal *l) { - rcu_assign_pointer(l->net, NULL); cancel_work_sync(&l->work); nfsd_file_dispose_list(&l->freeme); - kfree_rcu(l, rcu); -} - -static void -nfsd_add_fcache_disposal(struct nfsd_fcache_disposal *l) -{ - spin_lock(&laundrette_lock); - list_add_tail_rcu(&l->list, &laundrettes); - spin_unlock(&laundrette_lock); -} - -static void -nfsd_del_fcache_disposal(struct nfsd_fcache_disposal *l) -{ - spin_lock(&laundrette_lock); - list_del_rcu(&l->list); - spin_unlock(&laundrette_lock); -} - -static int -nfsd_alloc_fcache_disposal_net(struct net *net) -{ - struct nfsd_fcache_disposal *l; - - l = nfsd_alloc_fcache_disposal(net); - if (!l) - return -ENOMEM; - nfsd_add_fcache_disposal(l); - return 0; + kfree(l); }
static void nfsd_free_fcache_disposal_net(struct net *net) { - struct nfsd_fcache_disposal *l; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal;
- rcu_read_lock(); - list_for_each_entry_rcu(l, &laundrettes, list) { - if (l->net != net) - continue; - nfsd_del_fcache_disposal(l); - rcu_read_unlock(); - nfsd_free_fcache_disposal(l); - return; - } - rcu_read_unlock(); + nfsd_free_fcache_disposal(l); }
int nfsd_file_cache_start_net(struct net *net) { - return nfsd_alloc_fcache_disposal_net(net); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + nn->fcache_disposal = nfsd_alloc_fcache_disposal(); + return nn->fcache_disposal ? 0 : -ENOMEM; }
void diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 021acdc0d03bb..9e8b77d2a3a47 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -185,6 +185,8 @@ struct nfsd_net {
/* utsname taken from the process that starts the server */ char nfsd_name[UNX_MAXNODENAME+1]; + + struct nfsd_fcache_disposal *fcache_disposal; };
/* Simple check to find out if a given net was properly initialized */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 70e94d757b3e1f46486d573729d84c8955c81dce ]
Clean up: The garbage_args and cant_encode tracepoints report the same information as each other, so combine them into a single tracepoint class to reduce code duplication and slightly reduce the size of trace.o.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index de245f433392d..cba38e0b204b9 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -46,7 +46,7 @@ rqstp->rq_xprt->xpt_remotelen); \ } while (0);
-TRACE_EVENT(nfsd_garbage_args_err, +DECLARE_EVENT_CLASS(nfsd_xdr_err_class, TP_PROTO( const struct svc_rqst *rqstp ), @@ -68,27 +68,13 @@ TRACE_EVENT(nfsd_garbage_args_err, ) );
-TRACE_EVENT(nfsd_cant_encode_err, - TP_PROTO( - const struct svc_rqst *rqstp - ), - TP_ARGS(rqstp), - TP_STRUCT__entry( - NFSD_TRACE_PROC_ARG_FIELDS +#define DEFINE_NFSD_XDR_ERR_EVENT(name) \ +DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \ + TP_PROTO(const struct svc_rqst *rqstp), \ + TP_ARGS(rqstp))
- __field(u32, vers) - __field(u32, proc) - ), - TP_fast_assign( - NFSD_TRACE_PROC_ARG_ASSIGNMENTS - - __entry->vers = rqstp->rq_vers; - __entry->proc = rqstp->rq_proc; - ), - TP_printk("xid=0x%08x vers=%u proc=%u", - __entry->xid, __entry->vers, __entry->proc - ) -); +DEFINE_NFSD_XDR_ERR_EVENT(garbage_args); +DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
#define show_nfsd_may_flags(x) \ __print_flags(x, "|", \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 3dcd1d8aab00c5d3a0a3725253c86440b1a0f5a7 ]
The use of the bitmaps is confusing. Add a cross-reference to make it easier to find the existing comment. Add an updated reference with URL to make it quicker to look up. And a bit more editorializing about the value of this.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 14 ++++++++++---- fs/nfsd/state.h | 4 ++++ 2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0b39ed1568d40..3b8c5f2283975 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -360,11 +360,13 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { * st_{access,deny}_bmap field of the stateid, in order to track not * only what share bits are currently in force, but also what * combinations of share bits previous opens have used. This allows us - * to enforce the recommendation of rfc 3530 14.2.19 that the server - * return an error if the client attempt to downgrade to a combination - * of share bits not explicable by closing some of its previous opens. + * to enforce the recommendation in + * https://datatracker.ietf.org/doc/html/rfc7530#section-16.19.4 that + * the server return an error if the client attempt to downgrade to a + * combination of share bits not explicable by closing some of its + * previous opens. * - * XXX: This enforcement is actually incomplete, since we don't keep + * This enforcement is arguably incomplete, since we don't keep * track of access/deny bit combinations; so, e.g., we allow: * * OPEN allow read, deny write @@ -372,6 +374,10 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { * DOWNGRADE allow read, deny none * * which we should reject. + * + * But you could also argue that our current code is already overkill, + * since it only exists to return NFS4ERR_INVAL on incorrect client + * behavior. */ static unsigned int bmap_to_share_mode(unsigned long bmap) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e73bdbb1634ab..6eb3c7157214b 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -568,6 +568,10 @@ struct nfs4_ol_stateid { struct list_head st_locks; struct nfs4_stateowner *st_stateowner; struct nfs4_clnt_odstate *st_clnt_odstate; +/* + * These bitmasks use 3 separate bits for READ, ALLOW, and BOTH; see the + * comment above bmap_to_share_mode() for explanation: + */ unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cd2e999c7c394ae916d8be741418b3c6c1dddea8 ]
Clean up. Trond points out that xdr_stream_decode_uint32_array() does the same thing as nfsd4_decode_bitmap4().
Suggested-by: Trond Myklebust trondmy@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 638a626af18dc..adf97d72bda80 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -277,21 +277,10 @@ nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) static __be32 nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) { - u32 i, count; - __be32 *p; - - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - /* request sanity */ - if (count > 1000) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, count << 2); - if (!p) - return nfserr_bad_xdr; - for (i = 0; i < bmlen; i++) - bmval[i] = (i < count) ? be32_to_cpup(p++) : 0; + ssize_t status;
- return nfs_ok; + status = xdr_stream_decode_uint32_array(argp->xdr, bmval, bmlen); + return status == -EBADMSG ? nfserr_bad_xdr : nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 40595cdc93edf4110c0f0c0b06f8d82008f23929 ]
NFSv4.1 supports an optional lock notification feature which notifies the client when a lock comes available. (Normally NFSv4 clients just poll for locks if necessary.) To make that work, we need to request a blocking lock from the filesystem.
We turned that off for NFS in commit f657f8eef3ff ("nfs: don't atempt blocking locks on nfs reexports") [sic] because it actually blocks the nfsd thread while waiting for the lock.
Thanks to Vasily Averin for pointing out that NFS isn't the only filesystem with that problem.
Any filesystem that leaves ->lock NULL will use posix_lock_file(), which does the right thing. Simplest is just to assume that any filesystem that defines its own ->lock is not safe to request a blocking lock from.
So, this patch mostly reverts commit f657f8eef3ff ("nfs: don't atempt blocking locks on nfs reexports") [sic] and commit b840be2f00c0 ("lockd: don't attempt blocking locks on nfs reexports"), and instead uses a check of ->lock (Vasily's suggestion) to decide whether to support blocking lock notifications on a given filesystem. Also add a little documentation.
Perhaps someday we could add back an export flag later to allow filesystems with "good" ->lock methods to support blocking lock notifications.
Reported-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: Description rewritten to address checkpatch nits ] [ cel: Fixed warning when SUNRPC debugging is disabled ] [ cel: Fixed NULL check ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svclock.c | 6 ++++-- fs/nfs/export.c | 2 +- fs/nfsd/nfs4state.c | 18 ++++++++++++------ include/linux/exportfs.h | 2 -- include/linux/lockd/lockd.h | 9 +++++++-- 5 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index e9b85d8fd5fe7..cb3658ab9b7ae 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -470,8 +470,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_host *host, struct nlm_lock *lock, int wait, struct nlm_cookie *cookie, int reclaim) { - struct nlm_block *block = NULL; +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) struct inode *inode = nlmsvc_file_inode(file); +#endif + struct nlm_block *block = NULL; int error; int mode; int async_block = 0; @@ -484,7 +486,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, (long long)lock->fl.fl_end, wait);
- if (inode->i_sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS) { + if (nlmsvc_file_file(file)->f_op->lock) { async_block = wait; wait = 0; } diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 40beac65d1355..b347e3ce0cc8e 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -184,5 +184,5 @@ const struct export_operations nfs_export_ops = { .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR|EXPORT_OP_SYNC_LOCKS, + EXPORT_OP_NOATOMIC_ATTR, }; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3b8c5f2283975..36ae55fbfbc67 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6884,7 +6884,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_blocked_lock *nbl = NULL; struct file_lock *file_lock = NULL; struct file_lock *conflock = NULL; - struct super_block *sb; __be32 status = 0; int lkflg; int err; @@ -6906,7 +6905,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, dprintk("NFSD: nfsd4_lock: permission denied!\n"); return status; } - sb = cstate->current_fh.fh_dentry->d_sb;
if (lock->lk_is_new) { if (nfsd4_has_session(cstate)) @@ -6958,8 +6956,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: - if (nfsd4_has_session(cstate) && - !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) + if (nfsd4_has_session(cstate)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_READ_LT: @@ -6971,8 +6968,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fl_type = F_RDLCK; break; case NFS4_WRITEW_LT: - if (nfsd4_has_session(cstate) && - !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) + if (nfsd4_has_session(cstate)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_WRITE_LT: @@ -6993,6 +6989,16 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; }
+ /* + * Most filesystems with their own ->lock operations will block + * the nfsd thread waiting to acquire the lock. That leads to + * deadlocks (we don't want every nfsd thread tied up waiting + * for file locks), so don't attempt blocking lock notifications + * on those filesystems: + */ + if (nf->nf_file->f_op->lock) + fl_flags &= ~FL_SLEEP; + nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn); if (!nbl) { dprintk("NFSD: %s: unable to allocate block!\n", __func__); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3260fe7148462..fe848901fcc3a 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,8 +221,6 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ -#define EXPORT_OP_SYNC_LOCKS (0x20) /* Filesystem can't do - asychronous blocking locks */ unsigned long flags; };
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index c4ae6506b8b36..fcef192e5e45e 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -303,10 +303,15 @@ void nlmsvc_invalidate_all(void); int nlmsvc_unlock_all_by_sb(struct super_block *sb); int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr);
+static inline struct file *nlmsvc_file_file(struct nlm_file *file) +{ + return file->f_file[O_RDONLY] ? + file->f_file[O_RDONLY] : file->f_file[O_WRONLY]; +} + static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { - return locks_inode(file->f_file[O_RDONLY] ? - file->f_file[O_RDONLY] : file->f_file[O_WRONLY]); + return locks_inode(nlmsvc_file_file(file)); }
static inline int __nlm_privileged_request4(const struct sockaddr *sap)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
[ Upstream commit 47446d74f1707049067fee038507cdffda805631 ]
nbl allocated in nfsd4_lock can be released by a several ways: directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs command RELEASE_LOCKOWNER or via nfsd4_callback. This structure should be refcounted to be used and released correctly in all these cases.
Refcount is initialized to 1 during allocation and is incremented when nbl is added into nbl_list/nbl_lru lists.
Usually nbl is linked into both lists together, so only one refcount is used for both lists.
However nfsd4_lock() should keep in mind that nbl can be present in one of lists only. This can happen if nbl was handled already by nfs4_laundromat/nfsd4_callback/etc.
Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED, because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.
Refcount is not changed in find_blocked_lock() because of it reuses counter released after removing nbl from lists.
Signed-off-by: Vasily Averin vvs@virtuozzo.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++++--- fs/nfsd/state.h | 1 + 2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 36ae55fbfbc67..4161a4854c430 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -246,6 +246,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh, list_for_each_entry(cur, &lo->lo_blocked, nbl_list) { if (fh_match(fh, &cur->nbl_fh)) { list_del_init(&cur->nbl_list); + WARN_ON(list_empty(&cur->nbl_lru)); list_del_init(&cur->nbl_lru); found = cur; break; @@ -271,6 +272,7 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, INIT_LIST_HEAD(&nbl->nbl_lru); fh_copy_shallow(&nbl->nbl_fh, fh); locks_init_lock(&nbl->nbl_lock); + kref_init(&nbl->nbl_kref); nfsd4_init_cb(&nbl->nbl_cb, lo->lo_owner.so_client, &nfsd4_cb_notify_lock_ops, NFSPROC4_CLNT_CB_NOTIFY_LOCK); @@ -279,12 +281,21 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, return nbl; }
+static void +free_nbl(struct kref *kref) +{ + struct nfsd4_blocked_lock *nbl; + + nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); + kfree(nbl); +} + static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); locks_release_private(&nbl->nbl_lock); - kfree(nbl); + kref_put(&nbl->nbl_kref, free_nbl); }
static void @@ -302,6 +313,7 @@ remove_blocked_locks(struct nfs4_lockowner *lo) struct nfsd4_blocked_lock, nbl_list); list_del_init(&nbl->nbl_list); + WARN_ON(list_empty(&nbl->nbl_lru)); list_move(&nbl->nbl_lru, &reaplist); } spin_unlock(&nn->blocked_locks_lock); @@ -7029,6 +7041,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_lock(&nn->blocked_locks_lock); list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked); list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru); + kref_get(&nbl->nbl_kref); spin_unlock(&nn->blocked_locks_lock); }
@@ -7041,6 +7054,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, nn->somebody_reclaimed = true; break; case FILE_LOCK_DEFERRED: + kref_put(&nbl->nbl_kref, free_nbl); nbl = NULL; fallthrough; case -EAGAIN: /* conflock holds conflicting lock */ @@ -7061,8 +7075,13 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* dequeue it if we queued it before */ if (fl_flags & FL_SLEEP) { spin_lock(&nn->blocked_locks_lock); - list_del_init(&nbl->nbl_list); - list_del_init(&nbl->nbl_lru); + if (!list_empty(&nbl->nbl_list) && + !list_empty(&nbl->nbl_lru)) { + list_del_init(&nbl->nbl_list); + list_del_init(&nbl->nbl_lru); + kref_put(&nbl->nbl_kref, free_nbl); + } + /* nbl can use one of lists to be linked to reaplist */ spin_unlock(&nn->blocked_locks_lock); } free_blocked_lock(nbl); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 6eb3c7157214b..95457cfd37fc0 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -633,6 +633,7 @@ struct nfsd4_blocked_lock { struct file_lock nbl_lock; struct knfsd_fh nbl_fh; struct nfsd4_callback nbl_cb; + struct kref nbl_kref; };
struct nfsd4_compound_state;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6a2f774424bfdcc2df3e17de0cefe74a4269cad5 ]
The Linux NFS server currently responds to a zero-length NFSv3 WRITE request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE with NFS4_OK and count of zero.
RFC 1813 says of the WRITE procedure's @count argument:
count The number of bytes of data to be written. If count is 0, the WRITE will succeed and return a count of 0, barring errors due to permissions checking.
RFC 8881 has similar language for NFSv4, though NFSv4 removed the explicit @count argument because that value is already contained in the opaque payload array.
The synthetic client pynfs's WRT4 and WRT15 tests do emit zero- length WRITEs to exercise this spec requirement. Commit fdec6114ee1f ("nfsd4: zero-length WRITE should succeed") addressed the same problem there with the same fix.
But interestingly the Linux NFS client does not appear to emit zero- length WRITEs, instead squelching them. I'm not aware of a test that can generate such WRITEs for NFSv3, so I wrote a naive C program to generate a zero-length WRITE and test this fix.
Fixes: 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders") Reported-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 6 +----- fs/nfsd/nfsproc.c | 5 ----- 2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d26271c60ab44..1515c32e08db2 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -207,15 +207,11 @@ nfsd3_proc_write(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; nvecs = svc_fill_write_vector(rqstp, &argp->payload); - if (!nvecs) { - resp->status = nfserr_io; - goto out; - } + resp->status = nfsd_write(rqstp, &resp->fh, argp->offset, rqstp->rq_vec, nvecs, &cnt, resp->committed, resp->verf); resp->count = cnt; -out: return rpc_success; }
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 8ee329bc3815d..0a2bab7ef33c9 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -235,10 +235,6 @@ nfsd_proc_write(struct svc_rqst *rqstp) argp->len, argp->offset);
nvecs = svc_fill_write_vector(rqstp, &argp->payload); - if (!nvecs) { - resp->status = nfserr_io; - goto out; - }
resp->status = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), argp->offset, rqstp->rq_vec, nvecs, @@ -247,7 +243,6 @@ nfsd_proc_write(struct svc_rqst *rqstp) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) return rpc_drop_reply; -out: return rpc_success; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Tao tao.peng@primarydata.com
[ Upstream commit b3d0db706c77d02055910fcfe2f6eb5155ff9d5e ]
Now that we have open file cache, it is possible that another client deletes the file and DP will not know about it. Then IO to MDS would fail with BADSTATEID and knfsd would start state recovery, which should fail as well and then nfs read/write will fail with EBADF. And it triggers a WARN() in nfserrno().
-----------[ cut here ]------------ WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]() nfsd: non-standard errno: -9 modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt pata_acpi floppy CPU: 0 PID: 13529 Comm: nfsd Tainted: G W 4.1.5-00307-g6e6579b #7 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936 0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48 all Trace: [<ffffffff81789936>] dump_stack+0x45/0x57 [<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0 [<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70 [<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230 [<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd] [<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd] [<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd] [<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd] [<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd] [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd] [<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd] [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc] [<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc] [<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd] [<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd] [<ffffffff810bf3a8>] kthread+0xd8/0xf0 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff817912a2>] ret_from_fork+0x42/0x70 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
Signed-off-by: Peng Tao tao.peng@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 0a2bab7ef33c9..de4b97cdbd2bc 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -845,6 +845,7 @@ nfserrno (int errno) { nfserr_io, -EIO }, { nfserr_nxio, -ENXIO }, { nfserr_fbig, -E2BIG }, + { nfserr_stale, -EBADF }, { nfserr_acces, -EACCES }, { nfserr_exist, -EEXIST }, { nfserr_xdev, -EXDEV },
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jeff.layton@primarydata.com
[ Upstream commit a2694e51f60c5a18c7e43d1a9feaa46d7f153e65 ]
The NFS client can occasionally return EREMOTEIO when signalling issues with the server. ...map to NFSERR_IO.
Signed-off-by: Jeff Layton jeff.layton@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index de4b97cdbd2bc..6ed13754290a2 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -874,6 +874,7 @@ nfserrno (int errno) { nfserr_toosmall, -ETOOSMALL }, { nfserr_serverfault, -ESERVERFAULT }, { nfserr_serverfault, -ENFILE }, + { nfserr_io, -EREMOTEIO }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, { nfserr_no_grace, -ENOGRACE},
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jeff.layton@primarydata.com
[ Upstream commit 12bcbd40fd931472c7fc9cf3bfe66799ece93ed8 ]
If we get back -EOPENSTALE from an NFSv4 open, then we either got some unhandled error or the inode we got back was not the same as the one associated with the dentry.
We really have no recourse in that situation other than to retry the open, and if it fails to just return nfserr_stale back to the client.
Signed-off-by: Jeff Layton jeff.layton@primarydata.com Signed-off-by: Lance Shelton lance.shelton@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 1 + fs/nfsd/vfs.c | 10 +++++++++- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 6ed13754290a2..b8ddf21e70ea0 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -875,6 +875,7 @@ nfserrno (int errno) { nfserr_serverfault, -ESERVERFAULT }, { nfserr_serverfault, -ENFILE }, { nfserr_io, -EREMOTEIO }, + { nfserr_stale, -EOPENSTALE }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, { nfserr_no_grace, -ENOGRACE}, diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index deaf4a50550d5..b343677b01efa 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -805,6 +805,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int may_flags, struct file **filp) { __be32 err; + bool retried = false;
validate_process_creds(); /* @@ -820,9 +821,16 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, */ if (type == S_IFREG) may_flags |= NFSD_MAY_OWNER_OVERRIDE; +retry: err = fh_verify(rqstp, fhp, type, may_flags); - if (!err) + if (!err) { err = __nfsd_open(rqstp, fhp, type, may_flags, filp); + if (err == nfserr_stale && !retried) { + retried = true; + fh_put(fhp); + goto retry; + } + } validate_process_creds(); return err; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 33388b3aefefd4d83764dab8038cb54068161a44 ]
The RWF_SYNC and !RWF_SYNC arms are now exactly alike except that the RWF_SYNC arm resets the boot verifier twice in a row. Fix that redundancy and de-duplicate the code.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index b343677b01efa..cef8435d76a69 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1023,22 +1023,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); - if (flags & RWF_SYNC) { - if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); - host_err = vfs_iter_write(file, &iter, &pos, flags); - if (host_err < 0) - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); - } else { - if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); - host_err = vfs_iter_write(file, &iter, &pos, flags); - } + if (verf) + nfsd_copy_boot_verifier(verf, + net_generic(SVC_NET(rqstp), + nfsd_net_id)); + host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), nfsd_net_id));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fb7622c2dbd1aa41133a8c73e1137b833c074519 ]
Since this pointer is used repeatedly, move it to a stack variable.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index cef8435d76a69..aba9d479d0840 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -980,6 +980,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, unsigned long *cnt, int stable, __be32 *verf) { + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct file *file = nf->nf_file; struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; @@ -1024,13 +1025,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn); host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); goto out_nfserr; } *cnt = host_err; @@ -1043,8 +1041,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); if (host_err < 0) - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); }
out_nfserr:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2c445a0e72cb1fbfbdb7f9473c53556ee27c1d90 ]
Since this pointer is used repeatedly, move it to a stack variable.
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index aba9d479d0840..2e3b0bd560fcc 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1129,6 +1129,7 @@ __be32 nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, unsigned long count, __be32 *verf) { + struct nfsd_net *nn; struct nfsd_file *nf; loff_t end = LLONG_MAX; __be32 err = nfserr_inval; @@ -1145,6 +1146,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); if (err) goto out; + nn = net_generic(nf->nf_net, nfsd_net_id); if (EX_ISSYNC(fhp->fh_export)) { errseq_t since = READ_ONCE(nf->nf_file->f_wb_err); int err2; @@ -1152,8 +1154,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); switch (err2) { case 0: - nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn); err2 = filemap_check_wb_err(nf->nf_file->f_mapping, since); err = nfserrno(err2); @@ -1162,13 +1163,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserr_notsupp; break; default: - nfsd_reset_boot_verifier(net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); err = nfserrno(err2); } } else - nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn);
nfsd_file_put(nf); out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit a2f4c3fa4db94ba44d32a72201927cfd132a8e82 ]
Since a clone error commit can cause the boot verifier to change, we should trace those errors.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com [ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/trace.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/vfs.c | 18 +++++++++++++++-- fs/nfsd/vfs.h | 3 ++- 4 files changed, 69 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index eaf2f95d059ca..0f2025b7a6415 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1104,7 +1104,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out;
- status = nfsd4_clone_file_range(src, clone->cl_src_pos, + status = nfsd4_clone_file_range(rqstp, src, clone->cl_src_pos, dst, clone->cl_dst_pos, clone->cl_count, EX_ISSYNC(cstate->current_fh.fh_export));
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index cba38e0b204b9..199b485c77179 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -400,6 +400,56 @@ TRACE_EVENT(nfsd_dirent, __entry->len, __get_str(name)) )
+DECLARE_EVENT_CLASS(nfsd_copy_err_class, + TP_PROTO(struct svc_rqst *rqstp, + struct svc_fh *src_fhp, + loff_t src_offset, + struct svc_fh *dst_fhp, + loff_t dst_offset, + u64 count, + int status), + TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, count, status), + TP_STRUCT__entry( + __field(u32, xid) + __field(u32, src_fh_hash) + __field(loff_t, src_offset) + __field(u32, dst_fh_hash) + __field(loff_t, dst_offset) + __field(u64, count) + __field(int, status) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->src_fh_hash = knfsd_fh_hash(&src_fhp->fh_handle); + __entry->src_offset = src_offset; + __entry->dst_fh_hash = knfsd_fh_hash(&dst_fhp->fh_handle); + __entry->dst_offset = dst_offset; + __entry->count = count; + __entry->status = status; + ), + TP_printk("xid=0x%08x src_fh_hash=0x%08x src_offset=%lld " + "dst_fh_hash=0x%08x dst_offset=%lld " + "count=%llu status=%d", + __entry->xid, __entry->src_fh_hash, __entry->src_offset, + __entry->dst_fh_hash, __entry->dst_offset, + (unsigned long long)__entry->count, + __entry->status) +) + +#define DEFINE_NFSD_COPY_ERR_EVENT(name) \ +DEFINE_EVENT(nfsd_copy_err_class, nfsd_##name, \ + TP_PROTO(struct svc_rqst *rqstp, \ + struct svc_fh *src_fhp, \ + loff_t src_offset, \ + struct svc_fh *dst_fhp, \ + loff_t dst_offset, \ + u64 count, \ + int status), \ + TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, \ + count, status)) + +DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err); + #include "state.h" #include "filecache.h" #include "vfs.h" diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2e3b0bd560fcc..b1ce38c642cde 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -40,6 +40,7 @@ #include "../internal.h" #include "acl.h" #include "idmap.h" +#include "xdr4.h" #endif /* CONFIG_NFSD_V4 */
#include "nfsd.h" @@ -530,8 +531,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif
-__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, - struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync) +static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp) +{ + return &((struct nfsd4_compoundres *)rqstp->rq_resp)->cstate; +} + +__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, + struct nfsd_file *nf_src, u64 src_pos, + struct nfsd_file *nf_dst, u64 dst_pos, + u64 count, bool sync) { struct file *src = nf_src->nf_file; struct file *dst = nf_dst->nf_file; @@ -558,6 +566,12 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, if (!status) status = commit_inode_metadata(file_inode(src)); if (status < 0) { + trace_nfsd_clone_file_range_err(rqstp, + &nfsd4_get_cstate(rqstp)->save_fh, + src_pos, + &nfsd4_get_cstate(rqstp)->current_fh, + dst_pos, + count, status); nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net, nfsd_net_id)); ret = nfserrno(status); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index b21b76e6b9a87..9f56dcb22ff72 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -57,7 +57,8 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); -__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, +__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, + struct nfsd_file *nf_src, u64 src_pos, struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cdc556600c0133575487cc69fb3128440b3c3e92 ]
When vfs_iter_write() starts to fail because a file system is full, a bunch of writes can fail at once with ENOSPC. These writes repeatedly invoke nfsd_reset_boot_verifier() in quick succession.
Ensure that the time it grabs doesn't go backwards due to an ntp adjustment going on at the same time.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8554bc7ff4322..4d1d8aa6d7f9d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -363,7 +363,7 @@ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn)
static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) { - ktime_get_real_ts64(&nn->nfssvc_boot); + ktime_get_raw_ts64(&nn->nfssvc_boot); }
void nfsd_reset_boot_verifier(struct nfsd_net *nn)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 91d2e9b56cf5c80f9efc530d494968369a8a0e0d ]
There are two boot-time fields in struct nfsd_net: one called boot_time and one called nfssvc_boot. The latter is used only to form write verifiers, but its documenting comment declares:
/* Time of server startup */
Since commit 27c438f53e79 ("nfsd: Support the server resetting the boot verifier"), this field can be reset at any time; it's no longer tied to server restart. So that comment is stale.
Also, according to pahole, struct timespec64 is 16 bytes long on x86_64. The nfssvc_boot field is used only to form a write verifier, which is 8 bytes long.
Let's clarify this situation by manufacturing an 8-byte verifier in nfs_reset_boot_verifier() and storing only that in struct nfsd_net.
We're grabbing 128 bits of time, so compress all of those into a 64-bit verifier instead of throwing out the high-order bits. In the future, the siphash_key can be re-used for other hashed objects per-nfsd_net.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 8 +++++--- fs/nfsd/nfsctl.c | 3 ++- fs/nfsd/nfssvc.c | 51 ++++++++++++++++++++++++++++++++++++------------ 3 files changed, 45 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 9e8b77d2a3a47..a6ed300259849 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -11,6 +11,7 @@ #include <net/net_namespace.h> #include <net/netns/generic.h> #include <linux/percpu_counter.h> +#include <linux/siphash.h>
/* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -108,9 +109,8 @@ struct nfsd_net { bool nfsd_net_up; bool lockd_up;
- /* Time of server startup */ - struct timespec64 nfssvc_boot; - seqlock_t boot_lock; + seqlock_t writeverf_lock; + unsigned char writeverf[8];
/* * Max number of connections this nfsd container will allow. Defaults @@ -187,6 +187,8 @@ struct nfsd_net { char nfsd_name[UNX_MAXNODENAME+1];
struct nfsd_fcache_disposal *fcache_disposal; + + siphash_key_t siphash_key; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 504b169d27881..68b020f2002b7 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,7 +1484,8 @@ static __net_init int nfsd_init_net(struct net *net) nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++;
- seqlock_init(&nn->boot_lock); + get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); + seqlock_init(&nn->writeverf_lock);
return 0;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 4d1d8aa6d7f9d..5a60664695352 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -12,6 +12,7 @@ #include <linux/module.h> #include <linux/fs_struct.h> #include <linux/swap.h> +#include <linux/siphash.h>
#include <linux/sunrpc/stats.h> #include <linux/sunrpc/svcsock.h> @@ -344,33 +345,57 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST); }
+/** + * nfsd_copy_boot_verifier - Atomically copy a write verifier + * @verf: buffer in which to receive the verifier cookie + * @nn: NFS net namespace + * + * This function provides a wait-free mechanism for copying the + * namespace's boot verifier without tearing it. + */ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) { int seq = 0;
do { - read_seqbegin_or_lock(&nn->boot_lock, &seq); - /* - * This is opaque to client, so no need to byte-swap. Use - * __force to keep sparse happy. y2038 time_t overflow is - * irrelevant in this usage - */ - verf[0] = (__force __be32)nn->nfssvc_boot.tv_sec; - verf[1] = (__force __be32)nn->nfssvc_boot.tv_nsec; - } while (need_seqretry(&nn->boot_lock, seq)); - done_seqretry(&nn->boot_lock, seq); + read_seqbegin_or_lock(&nn->writeverf_lock, &seq); + memcpy(verf, nn->writeverf, sizeof(*verf)); + } while (need_seqretry(&nn->writeverf_lock, seq)); + done_seqretry(&nn->writeverf_lock, seq); }
static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) { - ktime_get_raw_ts64(&nn->nfssvc_boot); + struct timespec64 now; + u64 verf; + + /* + * Because the time value is hashed, y2038 time_t overflow + * is irrelevant in this usage. + */ + ktime_get_raw_ts64(&now); + verf = siphash_2u64(now.tv_sec, now.tv_nsec, &nn->siphash_key); + memcpy(nn->writeverf, &verf, sizeof(nn->writeverf)); }
+/** + * nfsd_reset_boot_verifier - Generate a new boot verifier + * @nn: NFS net namespace + * + * This function updates the ->writeverf field of @nn. This field + * contains an opaque cookie that, according to Section 18.32.3 of + * RFC 8881, "the client can use to determine whether a server has + * changed instance state (e.g., server restart) between a call to + * WRITE and a subsequent call to either WRITE or COMMIT. This + * cookie MUST be unchanged during a single instance of the NFSv4.1 + * server and MUST be unique between instances of the NFSv4.1 + * server." + */ void nfsd_reset_boot_verifier(struct nfsd_net *nn) { - write_seqlock(&nn->boot_lock); + write_seqlock(&nn->writeverf_lock); nfsd_reset_boot_verifier_locked(nn); - write_sequnlock(&nn->boot_lock); + write_sequnlock(&nn->writeverf_lock); }
static int nfsd_startup_net(struct net *net, const struct cred *cred)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3988a57885eeac05ef89f0ab4d7e47b52fbcf630 ]
Clean up: These functions handle what the specs call a write verifier, which in the Linux NFS server implementation is now divorced from the server's boot instance
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 +- fs/nfsd/netns.h | 4 ++-- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfssvc.c | 16 ++++++++-------- fs/nfsd/vfs.c | 16 ++++++++-------- 5 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b6ef8256c9c64..cc2831cec6695 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -243,7 +243,7 @@ nfsd_file_do_unhash(struct nfsd_file *nf) trace_nfsd_file_unhash(nf);
if (nfsd_file_check_write_error(nf)) - nfsd_reset_boot_verifier(net_generic(nf->nf_net, nfsd_net_id)); + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); --nfsd_file_hashtbl[nf->nf_hashval].nfb_count; hlist_del_rcu(&nf->nf_node); atomic_long_dec(&nfsd_filecache_count); diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index a6ed300259849..1b1a962a18041 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -198,6 +198,6 @@ extern void nfsd_netns_free_versions(struct nfsd_net *nn);
extern unsigned int nfsd_net_id;
-void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn); -void nfsd_reset_boot_verifier(struct nfsd_net *nn); +void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); +void nfsd_reset_write_verifier(struct nfsd_net *nn); #endif /* __NFSD_NETNS_H__ */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0f2025b7a6415..e8ffaa7faced9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -598,7 +598,7 @@ static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net)
BUILD_BUG_ON(2*sizeof(*verf) != sizeof(verifier->data));
- nfsd_copy_boot_verifier(verf, net_generic(net, nfsd_net_id)); + nfsd_copy_write_verifier(verf, net_generic(net, nfsd_net_id)); }
static __be32 diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 5a60664695352..2efe9d33a2827 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -346,14 +346,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) }
/** - * nfsd_copy_boot_verifier - Atomically copy a write verifier + * nfsd_copy_write_verifier - Atomically copy a write verifier * @verf: buffer in which to receive the verifier cookie * @nn: NFS net namespace * * This function provides a wait-free mechanism for copying the - * namespace's boot verifier without tearing it. + * namespace's write verifier without tearing it. */ -void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) +void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) { int seq = 0;
@@ -364,7 +364,7 @@ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) done_seqretry(&nn->writeverf_lock, seq); }
-static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) +static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn) { struct timespec64 now; u64 verf; @@ -379,7 +379,7 @@ static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) }
/** - * nfsd_reset_boot_verifier - Generate a new boot verifier + * nfsd_reset_write_verifier - Generate a new write verifier * @nn: NFS net namespace * * This function updates the ->writeverf field of @nn. This field @@ -391,10 +391,10 @@ static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) * server and MUST be unique between instances of the NFSv4.1 * server." */ -void nfsd_reset_boot_verifier(struct nfsd_net *nn) +void nfsd_reset_write_verifier(struct nfsd_net *nn) { write_seqlock(&nn->writeverf_lock); - nfsd_reset_boot_verifier_locked(nn); + nfsd_reset_write_verifier_locked(nn); write_sequnlock(&nn->writeverf_lock); }
@@ -683,7 +683,7 @@ int nfsd_create_serv(struct net *net) register_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); return 0; }
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index b1ce38c642cde..8cf053b698314 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -572,8 +572,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, &nfsd4_get_cstate(rqstp)->current_fh, dst_pos, count, status); - nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net, - nfsd_net_id)); + nfsd_reset_write_verifier(net_generic(nf_dst->nf_net, + nfsd_net_id)); ret = nfserrno(status); } } @@ -1039,10 +1039,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); if (verf) - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn); host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); goto out_nfserr; } *cnt = host_err; @@ -1055,7 +1055,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); if (host_err < 0) - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); }
out_nfserr: @@ -1168,7 +1168,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); switch (err2) { case 0: - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn); err2 = filemap_check_wb_err(nf->nf_file->f_mapping, since); err = nfserrno(err2); @@ -1177,11 +1177,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserr_notsupp; break; default: - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); err = nfserrno(err2); } } else - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn);
nfsd_file_put(nf); out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 75acacb6583df0b9328dc701d8eeea05af49b8b5 ]
According to commit bbf2f098838a ("nfsd: Reset the boot verifier on all write I/O errors"), the Linux NFS server forces all clients to resend pending unstable writes if any server-side write or commit operation encounters an error (say, ENOSPC). This is a rare and quite exceptional event that could require administrative recovery action, so it should be made trace-able. Example trace event:
nfsd-938 [002] 7174.945558: nfsd_writeverf_reset: boot_time= 61cc920d xid=0xdcd62036 error=-28 new verifier=0x08aecc6142515904
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ fs/nfsd/vfs.c | 13 ++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 199b485c77179..8327d1601d710 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -575,6 +575,34 @@ DEFINE_EVENT(nfsd_net_class, nfsd_##name, \ DEFINE_NET_EVENT(grace_start); DEFINE_NET_EVENT(grace_complete);
+TRACE_EVENT(nfsd_writeverf_reset, + TP_PROTO( + const struct nfsd_net *nn, + const struct svc_rqst *rqstp, + int error + ), + TP_ARGS(nn, rqstp, error), + TP_STRUCT__entry( + __field(unsigned long long, boot_time) + __field(u32, xid) + __field(int, error) + __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) + ), + TP_fast_assign( + __entry->boot_time = nn->boot_time; + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->error = error; + + /* avoid seqlock inside TP_fast_assign */ + memcpy(__entry->verifier, nn->writeverf, + NFS4_VERIFIER_SIZE); + ), + TP_printk("boot_time=%16llx xid=0x%08x error=%d new verifier=0x%s", + __entry->boot_time, __entry->xid, __entry->error, + __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE) + ) +); + TRACE_EVENT(nfsd_clid_cred_mismatch, TP_PROTO( const struct nfs4_client *clp, diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 8cf053b698314..77f48779210d0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -566,14 +566,17 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, if (!status) status = commit_inode_metadata(file_inode(src)); if (status < 0) { + struct nfsd_net *nn = net_generic(nf_dst->nf_net, + nfsd_net_id); + trace_nfsd_clone_file_range_err(rqstp, &nfsd4_get_cstate(rqstp)->save_fh, src_pos, &nfsd4_get_cstate(rqstp)->current_fh, dst_pos, count, status); - nfsd_reset_write_verifier(net_generic(nf_dst->nf_net, - nfsd_net_id)); + nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, status); ret = nfserrno(status); } } @@ -1043,6 +1046,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, host_err); goto out_nfserr; } *cnt = host_err; @@ -1054,8 +1058,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); - if (host_err < 0) + if (host_err < 0) { nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, host_err); + } }
out_nfserr: @@ -1178,6 +1184,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, break; default: nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, err2); err = nfserrno(err2); } } else
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 58f258f65267542959487dbe8b5641754411843d ]
On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes returning a reasonable-looking value in the cinfo.before field and zero in the cinfo.after field.
RFC 8881 Section 10.8.1 says:
When a client is making changes to a given directory, it needs to determine whether there have been changes made to the directory by other clients. It does this by using the change attribute as reported before and after the directory operation in the associated change_info4 value returned for the operation.
and
... The post-operation change value needs to be saved as the basis for future change_info4 comparisons.
A good quality client implementation therefore saves the zero cinfo.after value. During a subsequent OPEN operation, it will receive a different non-zero value in the cinfo.before field for that directory, and it will incorrectly believe the directory has changed, triggering an undesirable directory cache invalidation.
There are filesystem types where fs_supports_change_attribute() returns false, tmpfs being one. On NFSv4 mounts, this means the fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is never invoked. Subsequently, nfsd4_change_attribute() is invoked with an uninitialized @stat argument.
In fill_pre_wcc(), @stat contains stale stack garbage, which is then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all zeroes, so zero is placed on the wire. Both of these values are meaningless.
This fix can be applied immediately to stable kernels. Once there are more regression tests in this area, this optimization can be attempted again.
Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 44 +++++++++++++++++--------------------------- 1 file changed, 17 insertions(+), 27 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index c3ac1b6aa3aaa..84088581bbe09 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -487,11 +487,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-static bool fs_supports_change_attribute(struct super_block *sb) -{ - return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; -} - /* * Fill in the pre_op attr for the wcc data */ @@ -500,26 +495,24 @@ void fill_pre_wcc(struct svc_fh *fhp) struct inode *inode; struct kstat stat; bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + __be32 err;
if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); - if (fs_supports_change_attribute(inode->i_sb) || !v4) { - __be32 err = fh_getattr(fhp, &stat); - - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; - } - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; + err = fh_getattr(fhp, &stat); + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; } if (v4) fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode);
+ fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; fhp->fh_pre_saved = true; }
@@ -530,6 +523,7 @@ void fill_post_wcc(struct svc_fh *fhp) { bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); struct inode *inode = d_inode(fhp->fh_dentry); + __be32 err;
if (fhp->fh_no_wcc) return; @@ -537,16 +531,12 @@ void fill_post_wcc(struct svc_fh *fhp) if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n");
- fhp->fh_post_saved = true; - - if (fs_supports_change_attribute(inode->i_sb) || !v4) { - __be32 err = fh_getattr(fhp, &fhp->fh_post_attr); - - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } - } + err = fh_getattr(fhp, &fhp->fh_post_attr); + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } else + fhp->fh_post_saved = true; if (v4) fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, inode);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fcb5e3fa012351f3b96024c07bc44834c2478213 ]
These functions are related to file handle processing and have nothing to do with XDR encoding or decoding. Also they are no longer NFSv3-specific. As a clean-up, move their definitions to a more appropriate location. WCC is also an NFSv3-specific term, so rename them as general-purpose helpers.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 55 -------------------------------------- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfsfh.c | 66 +++++++++++++++++++++++++++++++++++++++++++++- fs/nfsd/nfsfh.h | 40 ++++++++++++++++++---------- fs/nfsd/vfs.c | 8 +++--- 5 files changed, 96 insertions(+), 75 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 84088581bbe09..7c45ba4db61be 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -487,61 +487,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; }
-/* - * Fill in the pre_op attr for the wcc data - */ -void fill_pre_wcc(struct svc_fh *fhp) -{ - struct inode *inode; - struct kstat stat; - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - __be32 err; - - if (fhp->fh_no_wcc || fhp->fh_pre_saved) - return; - inode = d_inode(fhp->fh_dentry); - err = fh_getattr(fhp, &stat); - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; - } - if (v4) - fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); - - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; - fhp->fh_pre_saved = true; -} - -/* - * Fill in the post_op attr for the wcc data - */ -void fill_post_wcc(struct svc_fh *fhp) -{ - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - struct inode *inode = d_inode(fhp->fh_dentry); - __be32 err; - - if (fhp->fh_no_wcc) - return; - - if (fhp->fh_post_saved) - printk("nfsd: inode locked twice during operation.\n"); - - err = fh_getattr(fhp, &fhp->fh_post_attr); - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } else - fhp->fh_post_saved = true; - if (v4) - fhp->fh_post_change = - nfsd4_change_attribute(&fhp->fh_post_attr, inode); -} - /* * XDR decode functions */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e8ffaa7faced9..451190813302e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2523,7 +2523,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) goto encode_op; }
- fh_clear_wcc(current_fh); + fh_clear_pre_post_attrs(current_fh);
/* If op is non-idempotent */ if (op->opdesc->op_flags & OP_MODIFIES_SOMETHING) { diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 34e201b6eb623..3b9751555f8f2 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -610,6 +610,70 @@ fh_update(struct svc_fh *fhp) return nfserr_serverfault; }
+#ifdef CONFIG_NFSD_V3 + +/** + * fh_fill_pre_attrs - Fill in pre-op attributes + * @fhp: file handle to be updated + * + */ +void fh_fill_pre_attrs(struct svc_fh *fhp) +{ + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode; + struct kstat stat; + __be32 err; + + if (fhp->fh_no_wcc || fhp->fh_pre_saved) + return; + + inode = d_inode(fhp->fh_dentry); + err = fh_getattr(fhp, &stat); + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; + } + if (v4) + fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); + + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; + fhp->fh_pre_saved = true; +} + +/** + * fh_fill_post_attrs - Fill in post-op attributes + * @fhp: file handle to be updated + * + */ +void fh_fill_post_attrs(struct svc_fh *fhp) +{ + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode = d_inode(fhp->fh_dentry); + __be32 err; + + if (fhp->fh_no_wcc) + return; + + if (fhp->fh_post_saved) + printk("nfsd: inode locked twice during operation.\n"); + + err = fh_getattr(fhp, &fhp->fh_post_attr); + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } else + fhp->fh_post_saved = true; + if (v4) + fhp->fh_post_change = + nfsd4_change_attribute(&fhp->fh_post_attr, inode); +} + +#endif /* CONFIG_NFSD_V3 */ + /* * Release a file handle. */ @@ -622,7 +686,7 @@ fh_put(struct svc_fh *fhp) fh_unlock(fhp); fhp->fh_dentry = NULL; dput(dentry); - fh_clear_wcc(fhp); + fh_clear_pre_post_attrs(fhp); } fh_drop_write(fhp); if (exp) { diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index d11e4b6870d68..434930d8a946e 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -284,12 +284,13 @@ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) #endif
#ifdef CONFIG_NFSD_V3 -/* - * The wcc data stored in current_fh should be cleared - * between compound ops. + +/** + * fh_clear_pre_post_attrs - Reset pre/post attributes + * @fhp: file handle to be updated + * */ -static inline void -fh_clear_wcc(struct svc_fh *fhp) +static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) { fhp->fh_post_saved = false; fhp->fh_pre_saved = false; @@ -323,13 +324,24 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, return time_to_chattr(&stat->ctime); }
-extern void fill_pre_wcc(struct svc_fh *fhp); -extern void fill_post_wcc(struct svc_fh *fhp); -#else -#define fh_clear_wcc(ignored) -#define fill_pre_wcc(ignored) -#define fill_post_wcc(notused) -#endif /* CONFIG_NFSD_V3 */ +extern void fh_fill_pre_attrs(struct svc_fh *fhp); +extern void fh_fill_post_attrs(struct svc_fh *fhp); + +#else /* !CONFIG_NFSD_V3 */ + +static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) +{ +} + +static inline void fh_fill_pre_attrs(struct svc_fh *fhp) +{ +} + +static inline void fh_fill_post_attrs(struct svc_fh *fhp) +{ +} + +#endif /* !CONFIG_NFSD_V3 */
/* @@ -355,7 +367,7 @@ fh_lock_nested(struct svc_fh *fhp, unsigned int subclass)
inode = d_inode(dentry); inode_lock_nested(inode, subclass); - fill_pre_wcc(fhp); + fh_fill_pre_attrs(fhp); fhp->fh_locked = true; }
@@ -372,7 +384,7 @@ static inline void fh_unlock(struct svc_fh *fhp) { if (fhp->fh_locked) { - fill_post_wcc(fhp); + fh_fill_post_attrs(fhp); inode_unlock(d_inode(fhp->fh_dentry)); fhp->fh_locked = false; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 77f48779210d0..d4b6bc3b4d735 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1787,8 +1787,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * so do it by hand */ trap = lock_rename(tdentry, fdentry); ffhp->fh_locked = tfhp->fh_locked = true; - fill_pre_wcc(ffhp); - fill_pre_wcc(tfhp); + fh_fill_pre_attrs(ffhp); + fh_fill_pre_attrs(tfhp);
odentry = lookup_one_len(fname, fdentry, flen); host_err = PTR_ERR(odentry); @@ -1840,8 +1840,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * were the same, so again we do it by hand. */ if (!close_cached) { - fill_post_wcc(ffhp); - fill_post_wcc(tfhp); + fh_fill_post_attrs(ffhp); + fh_fill_post_attrs(tfhp); } unlock_rename(tdentry, fdentry); ffhp->fh_locked = tfhp->fh_locked = false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 074b07d94e0bb6ddce5690a9b7e2373088e8b33a ]
RTM says "If the special ONE stateid is passed to nfs4_preprocess_stateid_op(), it returns status=0 but does not set *cstid. nfsd4_copy_notify() depends on stid being set if status=0, and thus can crash if the client sends the right COPY_NOTIFY RPC."
RFC 7862 says "The cna_src_stateid MUST refer to either open or locking states provided earlier by the server. If it is invalid, then the operation MUST fail."
The RFC doesn't specify an error, and the choice doesn't matter much as this is clearly illegal client behavior, but bad_stateid seems reasonable.
Simplest is just to guarantee that nfs4_preprocess_stateid_op, called with non-NULL cstid, errors out if it can't return a stateid.
Reported-by: rtm@csail.mit.edu Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation") Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Olga Kornievskaia kolga@netapp.com Tested-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4161a4854c430..60d5d1cb2cc65 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6097,7 +6097,11 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp, return nfserr_grace;
if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) { - status = check_special_stateids(net, fhp, stateid, flags); + if (cstid) + status = nfserr_bad_stateid; + else + status = check_special_stateids(net, fhp, stateid, + flags); goto done; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Li yang.lee@linux.alibaba.com
[ Upstream commit 217663f101a56ef77f82273818253fff082bf503 ]
The code that uses the pointer info has been removed in 7326e382c21e ("fanotify: report old and/or new parent+name in FAN_RENAME event"). and fanotify_event_info() doesn't change 'event', so the declaration and assignment of info can be removed.
Eliminate the following clang warning: fs/notify/fanotify/fanotify_user.c:161:24: warning: variable ‘info’ set but not used
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Yang Li yang.lee@linux.alibaba.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3bac2329dc35f..6679700574113 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -146,7 +146,6 @@ static size_t fanotify_event_len(unsigned int info_mode, struct fanotify_event *event) { size_t event_len = FAN_EVENT_METADATA_LEN; - struct fanotify_info *info; int fh_len; int dot_len = 0;
@@ -156,8 +155,6 @@ static size_t fanotify_event_len(unsigned int info_mode, if (fanotify_is_error_event(event->mask)) event_len += FANOTIFY_ERROR_INFO_LEN;
- info = fanotify_event_info(event); - if (fanotify_event_has_any_dir_fh(event)) { event_len += fanotify_dir_name_info_len(event); } else if ((info_mode & FAN_REPORT_NAME) &&
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 6e7f90d163afa8fc2efd6ae318e7c20156a5621f ]
I thought I was iterating over the array when actually the iteration is over the values contained in the array?
Ugh, keep it simple.
Symptoms were a null deference in vfs_lock_file() when an NFSv3 client that previously held a lock came back up and sent a notify.
Reported-by: Jonathan Woithe jwoithe@just42.net Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index cb3a7512c33ec..54c2e42130ca2 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -179,19 +179,20 @@ nlm_delete_file(struct nlm_file *file) static int nlm_unlock_files(struct nlm_file *file) { struct file_lock lock; - struct file *f;
lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; - for (f = file->f_file[0]; f <= file->f_file[1]; f++) { - if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { - pr_warn("lockd: unlock failure in %s:%d\n", - __FILE__, __LINE__); - return 1; - } - } + if (file->f_file[O_RDONLY] && + vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) + goto out_err; + if (file->f_file[O_WRONLY] && + vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) + goto out_err; return 0; +out_err: + pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__); + return 1; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit d19a7af73b5ecaac8168712d18be72b9db166768 ]
In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS case in posix_lock_inode, presumably just by random luck since we're not actually initializing fl_flags here.
This probably didn't matter before commit 7f024fcd5c97 ("Keep read and write fds with each nlm_file") since we wouldn't previously unlock unless we knew there were locks.
But now it causes lockd to give up on removing more locks.
We could just initialize fl_flags, but really it seems dubious to be calling vfs_lock_file with random values in some of the fields.
Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: J. Bruce Fields bfields@redhat.com [ cel: fixed checkpatch.pl nit ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 54c2e42130ca2..0a22a2faf5522 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -180,6 +180,7 @@ static int nlm_unlock_files(struct nlm_file *file) { struct file_lock lock;
+ locks_init_lock(&lock); lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0cb4d23ae08c48f6bf3c29a8e5c4a74b8388b960 ]
Dan Aloni reports:
Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") on the client, a read of 0xfff is aligned up to server rsize of 0x1000.
As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request.
The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ.
Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers.
Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly.
Reported-by: Dan Aloni dan.aloni@vastdata.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 8 ++++++-- fs/nfsd/nfs4proc.c | 8 ++++++-- fs/nfsd/nfs4xdr.c | 8 ++------ 3 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1515c32e08db2..b540489ea240d 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -150,13 +150,17 @@ nfsd3_proc_read(struct svc_rqst *rqstp) unsigned int len; int v;
- argp->count = min_t(u32, argp->count, max_blocksize); - dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset);
+ argp->count = min_t(u32, argp->count, max_blocksize); + if (argp->offset > (u64)OFFSET_MAX) + argp->offset = (u64)OFFSET_MAX; + if (argp->offset + argp->count > (u64)OFFSET_MAX) + argp->count = (u64)OFFSET_MAX - argp->offset; + v = 0; len = argp->count; resp->pages = rqstp->rq_next_page; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 451190813302e..f3d6bd2bfa4f7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -782,12 +782,16 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status;
read->rd_nf = NULL; - if (read->rd_offset >= OFFSET_MAX) - return nfserr_inval;
trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length);
+ read->rd_length = min_t(u32, read->rd_length, svc_max_payload(rqstp)); + if (read->rd_offset > (u64)OFFSET_MAX) + read->rd_offset = (u64)OFFSET_MAX; + if (read->rd_offset + read->rd_length > (u64)OFFSET_MAX) + read->rd_length = (u64)OFFSET_MAX - read->rd_offset; + /* * If we do a zero copy read, then a client will see read data * that reflects the state of the file *after* performing the diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index adf97d72bda80..be0995bb9459a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3997,10 +3997,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, } xdr_commit_encode(xdr);
- maxcount = svc_max_payload(resp->rqstp); - maxcount = min_t(unsigned long, maxcount, + maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len)); - maxcount = min_t(unsigned long, maxcount, read->rd_length);
if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) @@ -4834,10 +4832,8 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, return nfserr_resource; xdr_commit_encode(xdr);
- maxcount = svc_max_payload(resp->rqstp); - maxcount = min_t(unsigned long, maxcount, + maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len)); - maxcount = min_t(unsigned long, maxcount, read->rd_length); count = maxcount;
eof = read->rd_offset >= i_size_read(file_inode(file));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e6faac3f58c7c4176b66f63def17a34232a17b0e ]
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle.
Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr().
Cc: stable@vger.kernel.org [ cel: context adjusted, 2f221d6f7b88 has not been applied ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index d4b6bc3b4d735..09e4a0af6fb43 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -449,6 +449,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, .ia_size = iap->ia_size, };
+ host_err = -EFBIG; + if (iap->ia_size < 0) + goto out_unlock; + host_err = notify_change(dentry, &size_attr, NULL); if (host_err) goto out_unlock;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a648fdeb7c0e17177a2280344d015dba3fbe3314 ]
iattr::ia_size is a loff_t, so these NFSv3 procedures must be careful to deal with incoming client size values that are larger than s64_max without corrupting the value.
Silently capping the value results in storing a different value than the client passed in which is unexpected behavior, so remove the min_t() check in decode_sattr3().
Note that RFC 1813 permits only the WRITE procedure to return NFS3ERR_FBIG. We believe that NFSv3 reference implementations also return NFS3ERR_FBIG when ia_size is too large.
Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 7c45ba4db61be..2e47a07029f1d 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -254,7 +254,7 @@ svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, if (xdr_stream_decode_u64(xdr, &newsize) < 0) return false; iap->ia_valid |= ATTR_SIZE; - iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); + iap->ia_size = newsize; } if (xdr_stream_decode_u32(xdr, &set_it) < 0) return false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3f965021c8bc38965ecb1924f570c4842b33d408 ]
Since, well, forever, the Linux NFS server's nfsd_commit() function has returned nfserr_inval when the passed-in byte range arguments were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests are permitted to return only the following non-zero status codes:
NFS3ERR_IO NFS3ERR_STALE NFS3ERR_BADHANDLE NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that is not supported, turn it into a valid request by treating one or both arguments as zero. Offset zero means start-of-file, count zero means until-end-of-file, so we only ever extend the commit range. NFS servers are always allowed to commit more and sooner than requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather by the value that is returned in the maxfilesize field of the NFSv3 FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING CMT4 st_commit.testCommitOverflow : FAILURE COMMIT with offset + count overflow should return NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni dan.aloni@vastdata.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Bruce Fields bfields@fieldses.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 6 ------ fs/nfsd/vfs.c | 53 +++++++++++++++++++++++++++++++--------------- fs/nfsd/vfs.h | 4 ++-- 3 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index b540489ea240d..936eebd4c56dc 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -660,15 +660,9 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) argp->count, (unsigned long long) argp->offset);
- if (argp->offset > NFS_OFFSET_MAX) { - resp->status = nfserr_inval; - goto out; - } - fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset, argp->count, resp->verf); -out: return rpc_success; }
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 09e4a0af6fb43..89c50ccedf4d3 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1140,42 +1140,61 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, }
#ifdef CONFIG_NFSD_V3 -/* - * Commit all pending writes to stable storage. +/** + * nfsd_commit - Commit pending writes to stable storage + * @rqstp: RPC request being processed + * @fhp: NFS filehandle + * @offset: raw offset from beginning of file + * @count: raw count of bytes to sync + * @verf: filled in with the server's current write verifier * - * Note: we only guarantee that data that lies within the range specified - * by the 'offset' and 'count' parameters will be synced. + * Note: we guarantee that data that lies within the range specified + * by the 'offset' and 'count' parameters will be synced. The server + * is permitted to sync data that lies outside this range at the + * same time. * * Unfortunately we cannot lock the file to make sure we return full WCC * data to the client, as locking happens lower down in the filesystem. + * + * Return values: + * An nfsstat value in network byte order. */ __be32 -nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, - loff_t offset, unsigned long count, __be32 *verf) +nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, + u32 count, __be32 *verf) { + u64 maxbytes; + loff_t start, end; struct nfsd_net *nn; struct nfsd_file *nf; - loff_t end = LLONG_MAX; - __be32 err = nfserr_inval; - - if (offset < 0) - goto out; - if (count != 0) { - end = offset + (loff_t)count - 1; - if (end < offset) - goto out; - } + __be32 err;
err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); if (err) goto out; + + /* + * Convert the client-provided (offset, count) range to a + * (start, end) range. If the client-provided range falls + * outside the maximum file size of the underlying FS, + * clamp the sync range appropriately. + */ + start = 0; + end = LLONG_MAX; + maxbytes = (u64)fhp->fh_dentry->d_sb->s_maxbytes; + if (offset < maxbytes) { + start = offset; + if (count && (offset + count - 1 < maxbytes)) + end = offset + count - 1; + } + nn = net_generic(nf->nf_net, nfsd_net_id); if (EX_ISSYNC(fhp->fh_export)) { errseq_t since = READ_ONCE(nf->nf_file->f_wb_err); int err2;
- err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); + err2 = vfs_fsync_range(nf->nf_file, start, end, 0); switch (err2) { case 0: nfsd_copy_write_verifier(verf, nn); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 9f56dcb22ff72..2c43d10e3cab4 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -74,8 +74,8 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, struct svc_fh *res, int createmode, u32 *verifier, bool *truncp, bool *created); -__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *, - loff_t, unsigned long, __be32 *verf); +__be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, + u64 offset, u32 count, __be32 *verf); #endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c306d737691ef84305d4ed0d302c63db2932f0bb ]
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 2 +- fs/nfsd/nfs4xdr.c | 2 +- include/linux/nfs.h | 8 -------- 3 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2e47a07029f1d..0293b8d65f10f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1060,7 +1060,7 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, return false; /* cookie */ resp->cookie_offset = dirlist->len; - if (xdr_stream_encode_u64(xdr, NFS_OFFSET_MAX) < 0) + if (xdr_stream_encode_u64(xdr, OFFSET_MAX) < 0) return false;
return true; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index be0995bb9459a..a7b9c3faf7a79 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3495,7 +3495,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, p = xdr_reserve_space(xdr, 3*4 + namlen); if (!p) goto fail; - p = xdr_encode_hyper(p, NFS_OFFSET_MAX); /* offset of next entry */ + p = xdr_encode_hyper(p, OFFSET_MAX); /* offset of next entry */ p = xdr_encode_array(p, name, namlen); /* name length & name */
nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen); diff --git a/include/linux/nfs.h b/include/linux/nfs.h index 0dc7ad38a0da4..b06375e88e589 100644 --- a/include/linux/nfs.h +++ b/include/linux/nfs.h @@ -36,14 +36,6 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc memcpy(target->data, source->data, source->size); }
- -/* - * This is really a general kernel constant, but since nothing like - * this is defined in the kernel headers, I have to do it here. - */ -#define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) - - enum nfs3_stable_how { NFS_UNSTABLE = 0, NFS_DATA_SYNC = 1,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ondrej Valousek ondrej.valousek.xm@renesas.com
[ Upstream commit e377a3e698fb56cb63f6bddbebe7da76dc37e316 ]
For filesystems that supports "btime" timestamp (i.e. most modern filesystems do) we share it via kernel nfsd. Btime support for NFS client has already been added by Trond recently.
Suggested-by: Bruce Fields bfields@fieldses.org Signed-off-by: Ondrej Valousek ondrej.valousek.xm@renesas.com [ cel: addressed some whitespace/checkpatch nits ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 ++++++++++ fs/nfsd/nfsd.h | 2 +- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a7b9c3faf7a79..a9f6c7eeb756e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2854,6 +2854,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); if (err) goto out_nfserr; + if (!(stat.result_mask & STATX_BTIME)) + /* underlying FS does not offer btime so we can't share it */ + bmval1 &= ~FATTR4_WORD1_TIME_CREATE; if ((bmval0 & (FATTR4_WORD0_FILES_AVAIL | FATTR4_WORD0_FILES_FREE | FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_MAXNAME)) || (bmval1 & (FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | @@ -3254,6 +3257,13 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec); *p++ = cpu_to_be32(stat.mtime.tv_nsec); } + if (bmval1 & FATTR4_WORD1_TIME_CREATE) { + p = xdr_reserve_space(xdr, 12); + if (!p) + goto out_resource; + p = xdr_encode_hyper(p, (s64)stat.btime.tv_sec); + *p++ = cpu_to_be32(stat.btime.tv_nsec); + } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { struct kstat parent_stat; u64 ino = stat.ino; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3e5008b475ff0..4fc1fd639527a 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -364,7 +364,7 @@ void nfsd_lockd_shutdown(void); | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP | FATTR4_WORD1_RAWDEV \ | FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | FATTR4_WORD1_SPACE_TOTAL \ | FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_ACCESS_SET \ - | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA \ + | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_CREATE \ | FATTR4_WORD1_TIME_MODIFY | FATTR4_WORD1_TIME_MODIFY_SET | FATTR4_WORD1_MOUNTED_ON_FILEID)
#define NFSD4_SUPPORTED_ATTRS_WORD2 0
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 378a6109dd142a678f629b740f558365150f60f9 ]
Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index a4a69ab6ab280..f79790d367288 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -84,12 +84,6 @@ nfsd_hashsize(unsigned int limit) return roundup_pow_of_two(limit / TARGET_BUCKET_SIZE); }
-static u32 -nfsd_cache_hash(__be32 xid, struct nfsd_net *nn) -{ - return hash_32((__force u32)xid, nn->maskbits); -} - static struct svc_cacherep * nfsd_reply_cache_alloc(struct svc_rqst *rqstp, __wsum csum, struct nfsd_net *nn) @@ -241,6 +235,14 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); }
+static noinline struct nfsd_drc_bucket * +nfsd_cache_bucket_find(__be32 xid, struct nfsd_net *nn) +{ + unsigned int hash = hash_32((__force u32)xid, nn->maskbits); + + return &nn->drc_hashtbl[hash]; +} + static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, unsigned int max) { @@ -421,10 +423,8 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) { struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp, *found; - __be32 xid = rqstp->rq_xid; __wsum csum; - u32 hash = nfsd_cache_hash(xid, nn); - struct nfsd_drc_bucket *b = &nn->drc_hashtbl[hash]; + struct nfsd_drc_bucket *b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); int type = rqstp->rq_cachetype; int rtn = RC_DOIT;
@@ -528,7 +528,6 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp = rqstp->rq_cacherep; struct kvec *resv = &rqstp->rq_res.head[0], *cachv; - u32 hash; struct nfsd_drc_bucket *b; int len; size_t bufsize = 0; @@ -536,8 +535,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) if (!rp) return;
- hash = nfsd_cache_hash(rp->c_key.k_xid, nn); - b = &nn->drc_hashtbl[hash]; + b = nfsd_cache_bucket_find(rp->c_key.k_xid, nn);
len = resv->iov_len - ((char*)statp - (char*)resv->iov_base); len >>= 2;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0f29ce32fbc56cfdb304eec8a4deb920ccfd89c3 ]
Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index f79790d367288..34087a7e4f93c 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -421,10 +421,10 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct svc_cacherep *key, */ int nfsd_cache_lookup(struct svc_rqst *rqstp) { - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfsd_net *nn; struct svc_cacherep *rp, *found; __wsum csum; - struct nfsd_drc_bucket *b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); + struct nfsd_drc_bucket *b; int type = rqstp->rq_cachetype; int rtn = RC_DOIT;
@@ -440,10 +440,12 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) * Since the common case is a cache miss followed by an insert, * preallocate an entry. */ + nn = net_generic(SVC_NET(rqstp), nfsd_net_id); rp = nfsd_reply_cache_alloc(rqstp, csum, nn); if (!rp) goto out;
+ b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); spin_lock(&b->cache_lock); found = nfsd_cache_insert(b, rp, nn); if (found != rp) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit add1511c38166cf1036765f8c4aa939f0275a799 ]
Move a rarely called function call site out of the hot path.
This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 34087a7e4f93c..0b3f12aa37ff5 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -448,11 +448,8 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); spin_lock(&b->cache_lock); found = nfsd_cache_insert(b, rp, nn); - if (found != rp) { - nfsd_reply_cache_free_locked(NULL, rp, nn); - rp = found; + if (found != rp) goto found_entry; - }
nfsd_stats_rc_misses_inc(); rqstp->rq_cacherep = rp; @@ -470,8 +467,10 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp)
found_entry: /* We found a matching entry which is either in progress or done. */ + nfsd_reply_cache_free_locked(NULL, rp, nn); nfsd_stats_rc_hits_inc(); rtn = RC_DROPIT; + rp = found;
/* Request being processed */ if (rp->c_state == RC_INPROG)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a9ff2e99e9fa501ec965da03c18a5422b37a2f44 ]
We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support.
As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly.
This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation").
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 1 - fs/nfs/callback.c | 2 -- fs/nfsd/nfssvc.c | 1 - include/linux/sunrpc/svc.h | 3 --- include/linux/sunrpc/svc_xprt.h | 1 - net/sunrpc/svc_xprt.c | 10 +++++----- 6 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0475c5a5d061e..3a05af8736259 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -353,7 +353,6 @@ static struct notifier_block lockd_inet6addr_notifier = { static const struct svc_serv_ops lockd_sv_ops = { .svo_shutdown = svc_rpcb_cleanup, .svo_function = lockd, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, };
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 054cc1255fac6..7a810f8850632 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -234,13 +234,11 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv,
static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, };
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2efe9d33a2827..3b79b97f2715d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -615,7 +615,6 @@ static int nfsd_get_default_max_blksize(void) static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, };
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f116141ea64d0..3c8ed018c6868 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -61,9 +61,6 @@ struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *);
- /* queue up a transport for servicing */ - void (*svo_enqueue_xprt)(struct svc_xprt *); - /* optional module to count when adding threads. * Thread function must call module_put_and_kthread_exit() to exit. */ diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index c5278871f9e40..1311425ddab7f 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,7 +131,6 @@ int svc_create_xprt(struct svc_serv *, const char *, struct net *, const int, const unsigned short, int, const struct cred *); void svc_xprt_received(struct svc_xprt *xprt); -void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 570b092165d73..833952db23192 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -31,6 +31,7 @@ static int svc_deferred_recv(struct svc_rqst *rqstp); static struct cache_deferred_req *svc_defer(struct cache_req *req); static void svc_age_temp_xprts(struct timer_list *t); static void svc_delete_xprt(struct svc_xprt *xprt); +static void svc_xprt_do_enqueue(struct svc_xprt *xprt);
/* apparently the "standard" is that clients close * idle connections after 5 minutes, servers after @@ -253,12 +254,12 @@ void svc_xprt_received(struct svc_xprt *xprt) trace_svc_xprt_received(xprt);
/* As soon as we clear busy, the xprt could be closed and - * 'put', so we need a reference to call svc_enqueue_xprt with: + * 'put', so we need a reference to call svc_xprt_do_enqueue with: */ svc_xprt_get(xprt); smp_mb__before_atomic(); clear_bit(XPT_BUSY, &xprt->xpt_flags); - xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); + svc_xprt_do_enqueue(xprt); svc_xprt_put(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_received); @@ -410,7 +411,7 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) return false; }
-void svc_xprt_do_enqueue(struct svc_xprt *xprt) +static void svc_xprt_do_enqueue(struct svc_xprt *xprt) { struct svc_pool *pool; struct svc_rqst *rqstp = NULL; @@ -454,7 +455,6 @@ void svc_xprt_do_enqueue(struct svc_xprt *xprt) put_cpu(); trace_svc_xprt_do_enqueue(xprt, rqstp); } -EXPORT_SYMBOL_GPL(svc_xprt_do_enqueue);
/* * Queue up a transport with data pending. If there are idle nfsd @@ -465,7 +465,7 @@ void svc_xprt_enqueue(struct svc_xprt *xprt) { if (test_bit(XPT_BUSY, &xprt->xpt_flags)) return; - xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); + svc_xprt_do_enqueue(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c0219c499799c1e92bd570c15a47e6257a27bb15 ]
Neil says: "These functions were separated in commit 0971374e2818 ("SUNRPC: Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check happened before taking any spinlocks.
We have since moved or removed the spinlocks so the extra test is fairly pointless."
I've made this a separate patch in case the XPT_BUSY change has unexpected consequences and needs to be reverted.
Suggested-by: Neil Brown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svc_xprt.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 833952db23192..b5e80817b02f5 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -31,7 +31,6 @@ static int svc_deferred_recv(struct svc_rqst *rqstp); static struct cache_deferred_req *svc_defer(struct cache_req *req); static void svc_age_temp_xprts(struct timer_list *t); static void svc_delete_xprt(struct svc_xprt *xprt); -static void svc_xprt_do_enqueue(struct svc_xprt *xprt);
/* apparently the "standard" is that clients close * idle connections after 5 minutes, servers after @@ -254,12 +253,12 @@ void svc_xprt_received(struct svc_xprt *xprt) trace_svc_xprt_received(xprt);
/* As soon as we clear busy, the xprt could be closed and - * 'put', so we need a reference to call svc_xprt_do_enqueue with: + * 'put', so we need a reference to call svc_xprt_enqueue with: */ svc_xprt_get(xprt); smp_mb__before_atomic(); clear_bit(XPT_BUSY, &xprt->xpt_flags); - svc_xprt_do_enqueue(xprt); + svc_xprt_enqueue(xprt); svc_xprt_put(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_received); @@ -399,6 +398,8 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) smp_rmb(); xpt_flags = READ_ONCE(xprt->xpt_flags);
+ if (xpt_flags & BIT(XPT_BUSY)) + return false; if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE))) return true; if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) { @@ -411,7 +412,12 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) return false; }
-static void svc_xprt_do_enqueue(struct svc_xprt *xprt) +/** + * svc_xprt_enqueue - Queue a transport on an idle nfsd thread + * @xprt: transport with data pending + * + */ +void svc_xprt_enqueue(struct svc_xprt *xprt) { struct svc_pool *pool; struct svc_rqst *rqstp = NULL; @@ -455,18 +461,6 @@ static void svc_xprt_do_enqueue(struct svc_xprt *xprt) put_cpu(); trace_svc_xprt_do_enqueue(xprt, rqstp); } - -/* - * Queue up a transport with data pending. If there are idle nfsd - * processes, wake 'em up. - * - */ -void svc_xprt_enqueue(struct svc_xprt *xprt) -{ - if (test_bit(XPT_BUSY, &xprt->xpt_flags)) - return; - svc_xprt_do_enqueue(xprt); -} EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 87cdd8641c8a1ec6afd2468265e20840a57fd888 ]
Clean up. Neil observed that "any code that calls svc_shutdown_net() knows what the shutdown function should be, and so can call it directly."
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 5 ++--- fs/nfsd/nfssvc.c | 2 +- include/linux/sunrpc/svc.h | 3 --- net/sunrpc/svc.c | 3 --- 4 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 3a05af8736259..f5b688a844aa5 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -249,6 +249,7 @@ static int make_socks(struct svc_serv *serv, struct net *net, printk(KERN_WARNING "lockd_up: makesock failed, error=%d\n", err); svc_shutdown_net(serv, net); + svc_rpcb_cleanup(serv, net); return err; }
@@ -287,8 +288,7 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); svc_shutdown_net(serv, net); - dprintk("%s: per-net data destroyed; net=%x\n", - __func__, net->ns.inum); + svc_rpcb_cleanup(serv, net); } } else { pr_err("%s: no users! net=%x\n", @@ -351,7 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = { #endif
static const struct svc_serv_ops lockd_sv_ops = { - .svo_shutdown = svc_rpcb_cleanup, .svo_function = lockd, .svo_module = THIS_MODULE, }; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 3b79b97f2715d..a1765e751b739 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -613,7 +613,6 @@ static int nfsd_get_default_max_blksize(void) }
static const struct svc_serv_ops nfsd_thread_sv_ops = { - .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_module = THIS_MODULE, }; @@ -724,6 +723,7 @@ void nfsd_put(struct net *net)
if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); + nfsd_last_thread(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 3c8ed018c6868..6f3ba5c514643 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -55,9 +55,6 @@ struct svc_pool { struct svc_serv;
struct svc_serv_ops { - /* Callback to use when last thread exits. */ - void (*svo_shutdown)(struct svc_serv *, struct net *); - /* function for service threads to run */ int (*svo_function)(void *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index d34d03b0bf76b..709118bac4c32 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -537,9 +537,6 @@ EXPORT_SYMBOL_GPL(svc_create_pooled); void svc_shutdown_net(struct svc_serv *serv, struct net *net) { svc_close_net(serv, net); - - if (serv->sv_ops->svo_shutdown) - serv->sv_ops->svo_shutdown(serv, net); } EXPORT_SYMBOL_GPL(svc_shutdown_net);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 352ad31448fecc78a2e9b78da64eea5d63b8d0ce ]
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 4 ++-- fs/nfs/callback.c | 12 ++++++------ fs/nfsd/nfsctl.c | 8 ++++---- fs/nfsd/nfssvc.c | 8 ++++---- include/linux/sunrpc/svc_xprt.h | 7 ++++--- net/sunrpc/svc_xprt.c | 24 +++++++++++++++++++----- 6 files changed, 39 insertions(+), 24 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index f5b688a844aa5..bba6f2b45b64a 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -197,8 +197,8 @@ static int create_lockd_listener(struct svc_serv *serv, const char *name,
xprt = svc_find_xprt(serv, name, net, family, 0); if (xprt == NULL) - return svc_create_xprt(serv, name, net, family, port, - SVC_SOCK_DEFAULTS, cred); + return svc_xprt_create(serv, name, net, family, port, + SVC_SOCK_DEFAULTS, cred); svc_xprt_put(xprt); return 0; } diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 7a810f8850632..c1a8767100ae9 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -45,18 +45,18 @@ static int nfs4_callback_up_net(struct svc_serv *serv, struct net *net) int ret; struct nfs_net *nn = net_generic(net, nfs_net_id);
- ret = svc_create_xprt(serv, "tcp", net, PF_INET, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_xprt_create(serv, "tcp", net, PF_INET, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret <= 0) goto out_err; nn->nfs_callback_tcpport = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", nn->nfs_callback_tcpport, PF_INET, net->ns.inum);
- ret = svc_create_xprt(serv, "tcp", net, PF_INET6, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_xprt_create(serv, "tcp", net, PF_INET6, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret > 0) { nn->nfs_callback_tcpport6 = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 68b020f2002b7..8fec779994f7b 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -772,13 +772,13 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err != 0) return err;
- err = svc_create_xprt(nn->nfsd_serv, transport, net, - PF_INET, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_xprt_create(nn->nfsd_serv, transport, net, + PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err;
- err = svc_create_xprt(nn->nfsd_serv, transport, net, - PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_xprt_create(nn->nfsd_serv, transport, net, + PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index a1765e751b739..5790b1eaff72b 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -293,13 +293,13 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred) if (!list_empty(&nn->nfsd_serv->sv_permsocks)) return 0;
- error = svc_create_xprt(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error;
- error = svc_create_xprt(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error;
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 1311425ddab7f..cba9559bba6ff 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -127,9 +127,10 @@ int svc_reg_xprt_class(struct svc_xprt_class *); void svc_unreg_xprt_class(struct svc_xprt_class *); void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, struct svc_serv *); -int svc_create_xprt(struct svc_serv *, const char *, struct net *, - const int, const unsigned short, int, - const struct cred *); +int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, + struct net *net, const int family, + const unsigned short port, int flags, + const struct cred *cred); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index b5e80817b02f5..20baa0de70174 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -272,7 +272,7 @@ void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) svc_xprt_received(new); }
-static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name, +static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) @@ -308,21 +308,35 @@ static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name, return -EPROTONOSUPPORT; }
-int svc_create_xprt(struct svc_serv *serv, const char *xprt_name, +/** + * svc_xprt_create - Add a new listener to @serv + * @serv: target RPC service + * @xprt_name: transport class name + * @net: network namespace + * @family: network address family + * @port: listener port + * @flags: SVC_SOCK flags + * @cred: credential to bind to this transport + * + * Return values: + * %0: New listener added successfully + * %-EPROTONOSUPPORT: Requested transport type not supported + */ +int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) { int err;
- err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); + err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); if (err == -EPROTONOSUPPORT) { request_module("svc%s", xprt_name); - err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); + err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); } return err; } -EXPORT_SYMBOL_GPL(svc_create_xprt); +EXPORT_SYMBOL_GPL(svc_xprt_create);
/* * Copy the local and remote xprt addresses to the rqstp structure
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4355d767a21b9445958fc11bce9a9701f76529d3 ]
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 2 +- include/linux/sunrpc/svc_xprt.h | 2 +- net/sunrpc/svc.c | 2 +- net/sunrpc/svc_xprt.c | 9 +++++++-- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- 5 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 8fec779994f7b..16920e4512bde 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -790,7 +790,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr out_close: xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); if (xprt != NULL) { - svc_close_xprt(xprt); + svc_xprt_close(xprt); svc_xprt_put(xprt); } out_err: diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index cba9559bba6ff..f614f8e248e11 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -135,7 +135,7 @@ void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); -void svc_close_xprt(struct svc_xprt *xprt); +void svc_xprt_close(struct svc_xprt *xprt); int svc_port_is_privileged(struct sockaddr *sin); int svc_print_xprts(char *buf, int maxlen); struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 709118bac4c32..9126e1b7d0769 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1403,7 +1403,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) svc_authorise(rqstp); close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) - svc_close_xprt(rqstp->rq_xprt); + svc_xprt_close(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); return 0;
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 20baa0de70174..ee4a6d57056f5 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1056,7 +1056,12 @@ static void svc_delete_xprt(struct svc_xprt *xprt) svc_xprt_put(xprt); }
-void svc_close_xprt(struct svc_xprt *xprt) +/** + * svc_xprt_close - Close a client connection + * @xprt: transport to disconnect + * + */ +void svc_xprt_close(struct svc_xprt *xprt) { trace_svc_xprt_close(xprt); set_bit(XPT_CLOSE, &xprt->xpt_flags); @@ -1071,7 +1076,7 @@ void svc_close_xprt(struct svc_xprt *xprt) */ svc_delete_xprt(xprt); } -EXPORT_SYMBOL_GPL(svc_close_xprt); +EXPORT_SYMBOL_GPL(svc_xprt_close);
static int svc_close_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net) { diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index c5154bc38e129..feac8c26fb87d 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -186,7 +186,7 @@ static int xprt_rdma_bc_send_request(struct rpc_rqst *rqst)
ret = rpcrdma_bc_send_request(rdma, rqst); if (ret == -ENOTCONN) - svc_close_xprt(sxprt); + svc_xprt_close(sxprt); return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c7d7ec8f043e53ad16e30f5ebb8b9df415ec0f2b ]
Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites.
svc_close_net() is renamed to be the inverse of svc_xprt_create().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 4 ++-- fs/nfs/callback.c | 2 +- fs/nfsd/nfssvc.c | 2 +- include/linux/sunrpc/svc.h | 1 - include/linux/sunrpc/svc_xprt.h | 1 + net/sunrpc/svc.c | 6 ------ net/sunrpc/svc_xprt.c | 9 +++++++-- 7 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index bba6f2b45b64a..c83ec4a375bc1 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -248,7 +248,7 @@ static int make_socks(struct svc_serv *serv, struct net *net, if (warned++ == 0) printk(KERN_WARNING "lockd_up: makesock failed, error=%d\n", err); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); svc_rpcb_cleanup(serv, net); return err; } @@ -287,7 +287,7 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) nlm_shutdown_hosts_net(net); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); svc_rpcb_cleanup(serv, net); } } else { diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index c1a8767100ae9..c98c68513590f 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -189,7 +189,7 @@ static void nfs_callback_down_net(u32 minorversion, struct svc_serv *serv, struc return;
dprintk("NFS: destroy per-net callback data; net=%x\n", net->ns.inum); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); }
static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 5790b1eaff72b..38895372ec393 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -722,7 +722,7 @@ void nfsd_put(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id);
if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { - svc_shutdown_net(nn->nfsd_serv, net); + svc_xprt_destroy_all(nn->nfsd_serv, net); nfsd_last_thread(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); spin_lock(&nfsd_notifier_lock); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 6f3ba5c514643..482a024156b11 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -511,7 +511,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); -void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, struct svc_rqst *); diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index f614f8e248e11..dbffb92511ef2 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,6 +131,7 @@ int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred); +void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9126e1b7d0769..c9195e40959c6 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -534,12 +534,6 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, } EXPORT_SYMBOL_GPL(svc_create_pooled);
-void svc_shutdown_net(struct svc_serv *serv, struct net *net) -{ - svc_close_net(serv, net); -} -EXPORT_SYMBOL_GPL(svc_shutdown_net); - /* * Destroy an RPC service. Should be called with appropriate locking to * protect sv_permsocks and sv_tempsocks. diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index ee4a6d57056f5..000b737784bd9 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1128,7 +1128,11 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) } }
-/* +/** + * svc_xprt_destroy_all - Destroy transports associated with @serv + * @serv: RPC service to be shut down + * @net: target network namespace + * * Server threads may still be running (especially in the case where the * service is still running in other network namespaces). * @@ -1140,7 +1144,7 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) * threads, we may need to wait a little while and then check again to * see if they're done. */ -void svc_close_net(struct svc_serv *serv, struct net *net) +void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net) { int delay = 0;
@@ -1151,6 +1155,7 @@ void svc_close_net(struct svc_serv *serv, struct net *net) msleep(delay++); } } +EXPORT_SYMBOL_GPL(svc_xprt_destroy_all);
/* * Handle defer and revisit of requests
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f49169c97fceb21ad6a0aaf671c50b0f520f15a5 ]
struct svc_serv_ops is about to be removed.
Neil Brown says:
I suspect svo_module can go as well - I don't think the thread is ever the thing that primarily keeps a module active.
A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way.
Suggested-by: Neil Brown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 4 +--- fs/nfs/callback.c | 7 ++----- fs/nfs/nfs4state.c | 1 - fs/nfsd/nfssvc.c | 3 --- include/linux/sunrpc/svc.h | 5 ----- kernel/module.c | 2 +- net/sunrpc/svc.c | 2 -- 7 files changed, 4 insertions(+), 20 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index c83ec4a375bc1..bfde31124f3af 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -184,8 +184,7 @@ lockd(void *vrqstp) dprintk("lockd_down: service stopped\n");
svc_exit_thread(rqstp); - - module_put_and_kthread_exit(0); + return 0; }
static int create_lockd_listener(struct svc_serv *serv, const char *name, @@ -352,7 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = {
static const struct svc_serv_ops lockd_sv_ops = { .svo_function = lockd, - .svo_module = THIS_MODULE, };
static int lockd_get(void) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index c98c68513590f..a494f9e7bd0a0 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -17,7 +17,6 @@ #include <linux/errno.h> #include <linux/mutex.h> #include <linux/freezer.h> -#include <linux/kthread.h> #include <linux/sunrpc/svcauth_gss.h> #include <linux/sunrpc/bc_xprt.h>
@@ -92,8 +91,8 @@ nfs4_callback_svc(void *vrqstp) continue; svc_process(rqstp); } + svc_exit_thread(rqstp); - module_put_and_kthread_exit(0); return 0; }
@@ -136,8 +135,8 @@ nfs41_callback_svc(void *vrqstp) finish_wait(&serv->sv_cb_waitq, &wq); } } + svc_exit_thread(rqstp); - module_put_and_kthread_exit(0); return 0; }
@@ -234,12 +233,10 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv,
static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, - .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, - .svo_module = THIS_MODULE, };
static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index d8fc5d72a161c..ae2da65ffbafb 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,7 +2757,6 @@ static int nfs4_run_state_manager(void *ptr) goto again;
nfs_put_client(clp); - module_put_and_kthread_exit(0); return 0; }
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 38895372ec393..d25d4c12a499a 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -614,7 +614,6 @@ static int nfsd_get_default_max_blksize(void)
static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_function = nfsd, - .svo_module = THIS_MODULE, };
void nfsd_shutdown_threads(struct net *net) @@ -1018,8 +1017,6 @@ nfsd(void *vrqstp) msleep(20); }
- /* Release module */ - module_put_and_kthread_exit(0); return 0; }
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 482a024156b11..c64db9b14a643 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -57,11 +57,6 @@ struct svc_serv; struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *); - - /* optional module to count when adding threads. - * Thread function must call module_put_and_kthread_exit() to exit. - */ - struct module *svo_module; };
/* diff --git a/kernel/module.c b/kernel/module.c index 9030ff8c08555..edc7b99cb16fa 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -334,7 +334,7 @@ static inline void add_taint_module(struct module *mod, unsigned flag,
/* * A thread that wants to hold a reference to a module only while it - * is running can call this to safely exit. nfsd and lockd use this. + * is running can call this to safely exit. */ void __noreturn __module_put_and_kthread_exit(struct module *mod, long code) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index c9195e40959c6..bdecc902cf998 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -734,11 +734,9 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) if (IS_ERR(rqstp)) return PTR_ERR(rqstp);
- __module_get(serv->sv_ops->svo_module); task = kthread_create_on_node(serv->sv_ops->svo_function, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { - module_put(serv->sv_ops->svo_module); svc_exit_thread(rqstp); return PTR_ERR(task); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 37902c6313090235c847af89c5515591261ee338 ]
Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 6 +----- fs/nfs/callback.c | 43 ++++++++++---------------------------- fs/nfsd/nfssvc.c | 7 +------ include/linux/sunrpc/svc.h | 14 ++++--------- net/sunrpc/svc.c | 37 ++++++++++++++++++++++---------- 5 files changed, 43 insertions(+), 64 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index bfde31124f3af..59ef8a1f843f3 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -349,10 +349,6 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif
-static const struct svc_serv_ops lockd_sv_ops = { - .svo_function = lockd, -}; - static int lockd_get(void) { struct svc_serv *serv; @@ -376,7 +372,7 @@ static int lockd_get(void) nlm_timeout = LOCKD_DFLT_TIMEO; nlmsvc_timeout = nlm_timeout * HZ;
- serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops); + serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd); if (!serv) { printk(KERN_WARNING "lockd_up: create service failed\n"); return -ENOMEM; diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index a494f9e7bd0a0..456af7d230cf1 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -231,29 +231,10 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, return ret; }
-static const struct svc_serv_ops nfs40_cb_sv_ops = { - .svo_function = nfs4_callback_svc, -}; -#if defined(CONFIG_NFS_V4_1) -static const struct svc_serv_ops nfs41_cb_sv_ops = { - .svo_function = nfs41_callback_svc, -}; - -static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { - [0] = &nfs40_cb_sv_ops, - [1] = &nfs41_cb_sv_ops, -}; -#else -static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { - [0] = &nfs40_cb_sv_ops, - [1] = NULL, -}; -#endif - static struct svc_serv *nfs_callback_create_svc(int minorversion) { struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion]; - const struct svc_serv_ops *sv_ops; + int (*threadfn)(void *data); struct svc_serv *serv;
/* @@ -262,17 +243,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) if (cb_info->serv) return svc_get(cb_info->serv);
- switch (minorversion) { - case 0: - sv_ops = nfs4_cb_sv_ops[0]; - break; - default: - sv_ops = nfs4_cb_sv_ops[1]; - } - - if (sv_ops == NULL) - return ERR_PTR(-ENOTSUPP); - /* * Sanity check: if there's no task, * we should be the first user ... @@ -281,7 +251,16 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", cb_info->users);
- serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); + threadfn = nfs4_callback_svc; +#if defined(CONFIG_NFS_V4_1) + if (minorversion) + threadfn = nfs41_callback_svc; +#else + if (minorversion) + return ERR_PTR(-ENOTSUPP); +#endif + serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, + threadfn); if (!serv) { printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); return ERR_PTR(-ENOMEM); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index d25d4c12a499a..2f74be98ff2d9 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -612,10 +612,6 @@ static int nfsd_get_default_max_blksize(void) return ret; }
-static const struct svc_serv_ops nfsd_thread_sv_ops = { - .svo_function = nfsd, -}; - void nfsd_shutdown_threads(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); @@ -654,8 +650,7 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, - &nfsd_thread_sv_ops); + serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd); if (serv == NULL) return -ENOMEM;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c64db9b14a643..3c908ffbbf45f 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -52,13 +52,6 @@ struct svc_pool { unsigned long sp_flags; } ____cacheline_aligned_in_smp;
-struct svc_serv; - -struct svc_serv_ops { - /* function for service threads to run */ - int (*svo_function)(void *); -}; - /* * RPC service. * @@ -91,7 +84,8 @@ struct svc_serv {
unsigned int sv_nrpools; /* number of thread pools */ struct svc_pool * sv_pools; /* array of thread pools */ - const struct svc_serv_ops *sv_ops; /* server operations */ + int (*sv_threadfn)(void *data); + #if defined(CONFIG_SUNRPC_BACKCHANNEL) struct list_head sv_cb_list; /* queue for callback requests * that arrive over the same @@ -495,7 +489,7 @@ int svc_rpcb_setup(struct svc_serv *serv, struct net *net); void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net); int svc_bind(struct svc_serv *serv, struct net *net); struct svc_serv *svc_create(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); void svc_rqst_replace_page(struct svc_rqst *rqstp, @@ -503,7 +497,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); int svc_process(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index bdecc902cf998..7f231947347ea 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -446,7 +446,7 @@ __svc_init_bc(struct svc_serv *serv) */ static struct svc_serv * __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, - const struct svc_serv_ops *ops) + int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int vers; @@ -463,7 +463,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); - serv->sv_ops = ops; + serv->sv_threadfn = threadfn; xdrsize = 0; while (prog) { prog->pg_lovers = prog->pg_nvers-1; @@ -509,22 +509,37 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return serv; }
-struct svc_serv * -svc_create(struct svc_program *prog, unsigned int bufsize, - const struct svc_serv_ops *ops) +/** + * svc_create - Create an RPC service + * @prog: the RPC program the new service will handle + * @bufsize: maximum message size for @prog + * @threadfn: a function to service RPC requests for @prog + * + * Returns an instantiated struct svc_serv object or NULL. + */ +struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, + int (*threadfn)(void *data)) { - return __svc_create(prog, bufsize, /*npools*/1, ops); + return __svc_create(prog, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create);
-struct svc_serv * -svc_create_pooled(struct svc_program *prog, unsigned int bufsize, - const struct svc_serv_ops *ops) +/** + * svc_create_pooled - Create an RPC service with pooled threads + * @prog: the RPC program the new service will handle + * @bufsize: maximum message size for @prog + * @threadfn: a function to service RPC requests for @prog + * + * Returns an instantiated struct svc_serv object or NULL. + */ +struct svc_serv *svc_create_pooled(struct svc_program *prog, + unsigned int bufsize, + int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get();
- serv = __svc_create(prog, bufsize, npools, ops); + serv = __svc_create(prog, bufsize, npools, threadfn); if (!serv) goto out_err; return serv; @@ -734,7 +749,7 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) if (IS_ERR(rqstp)) return PTR_ERR(rqstp);
- task = kthread_create_on_node(serv->sv_ops->svo_function, rqstp, + task = kthread_create_on_node(serv->sv_threadfn, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { svc_exit_thread(rqstp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5f9a62ff7d2808c7b56c0ec90f3b7eae5872afe6 ]
Eventually support for NFSv2 in the Linux NFS server is to be deprecated and then removed.
However, NFSv2 is the "always supported" version that is available as soon as CONFIG_NFSD is set. Before NFSv2 support can be removed, we need to choose a different "always supported" version.
This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported, as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become the only "always supported" NFS version.
The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/Kconfig | 2 +- fs/nfsd/Kconfig | 12 +----------- fs/nfsd/Makefile | 3 +-- fs/nfsd/nfsfh.c | 4 ---- fs/nfsd/nfsfh.h | 20 -------------------- fs/nfsd/nfssvc.c | 2 -- fs/nfsd/vfs.c | 9 --------- fs/nfsd/vfs.h | 2 -- 8 files changed, 3 insertions(+), 51 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig index eaff422877c39..11b60d160f88f 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -320,7 +320,7 @@ config LOCKD
config LOCKD_V4 bool - depends on NFSD_V3 || NFS_V3 + depends on NFSD || NFS_V3 depends on FILE_LOCKING default y
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index f229172652be0..887af7966b032 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -35,18 +35,9 @@ config NFSD_V2_ACL bool depends on NFSD
-config NFSD_V3 - bool "NFS server support for NFS version 3" - depends on NFSD - help - This option enables support in your system's NFS server for - version 3 of the NFS protocol (RFC 1813). - - If unsure, say Y. - config NFSD_V3_ACL bool "NFS server support for the NFSv3 ACL protocol extension" - depends on NFSD_V3 + depends on NFSD select NFSD_V2_ACL help Solaris NFS servers support an auxiliary NFSv3 ACL protocol that @@ -70,7 +61,6 @@ config NFSD_V3_ACL config NFSD_V4 bool "NFS server support for NFS version 4" depends on NFSD && PROC_FS - select NFSD_V3 select FS_POSIX_ACL select SUNRPC_GSS select CRYPTO diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index 3f0983e93a998..805c06d5f1b4b 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -12,9 +12,8 @@ nfsd-y += trace.o
nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \ export.o auth.o lockd.o nfscache.o nfsxdr.o \ - stats.o filecache.o + stats.o filecache.o nfs3proc.o nfs3xdr.o nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o -nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \ nfs4acl.o nfs4callback.o nfs4recover.o diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 3b9751555f8f2..d4ae838948ba5 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -610,8 +610,6 @@ fh_update(struct svc_fh *fhp) return nfserr_serverfault; }
-#ifdef CONFIG_NFSD_V3 - /** * fh_fill_pre_attrs - Fill in pre-op attributes * @fhp: file handle to be updated @@ -672,8 +670,6 @@ void fh_fill_post_attrs(struct svc_fh *fhp) nfsd4_change_attribute(&fhp->fh_post_attr, inode); }
-#endif /* CONFIG_NFSD_V3 */ - /* * Release a file handle. */ diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 434930d8a946e..fb9d358a267e5 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -90,7 +90,6 @@ typedef struct svc_fh { * operation */ int fh_flags; /* FH flags */ -#ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ bool fh_pre_saved; /* pre-op attrs saved */
@@ -107,7 +106,6 @@ typedef struct svc_fh { /* Post-op attributes saved in fh_unlock */ struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ -#endif /* CONFIG_NFSD_V3 */ } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) @@ -283,8 +281,6 @@ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) } #endif
-#ifdef CONFIG_NFSD_V3 - /** * fh_clear_pre_post_attrs - Reset pre/post attributes * @fhp: file handle to be updated @@ -327,22 +323,6 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp);
-#else /* !CONFIG_NFSD_V3 */ - -static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) -{ -} - -static inline void fh_fill_pre_attrs(struct svc_fh *fhp) -{ -} - -static inline void fh_fill_post_attrs(struct svc_fh *fhp) -{ -} - -#endif /* !CONFIG_NFSD_V3 */ -
/* * Lock a file handle/inode diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2f74be98ff2d9..011c556caa1e7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -117,9 +117,7 @@ static struct svc_stat nfsd_acl_svcstats = {
static const struct svc_version *nfsd_version[] = { [2] = &nfsd_version2, -#if defined(CONFIG_NFSD_V3) [3] = &nfsd_version3, -#endif #if defined(CONFIG_NFSD_V4) [4] = &nfsd_version4, #endif diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 89c50ccedf4d3..86584e727ce09 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -32,9 +32,7 @@ #include <linux/writeback.h> #include <linux/security.h>
-#ifdef CONFIG_NFSD_V3 #include "xdr3.h" -#endif /* CONFIG_NFSD_V3 */
#ifdef CONFIG_NFSD_V4 #include "../internal.h" @@ -627,7 +625,6 @@ __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif /* defined(CONFIG_NFSD_V4) */
-#ifdef CONFIG_NFSD_V3 /* * Check server access rights to a file system object */ @@ -739,7 +736,6 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor out: return error; } -#endif /* CONFIG_NFSD_V3 */
int nfsd_open_break_lease(struct inode *inode, int access) { @@ -1139,7 +1135,6 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, return err; }
-#ifdef CONFIG_NFSD_V3 /** * nfsd_commit - Commit pending writes to stable storage * @rqstp: RPC request being processed @@ -1217,7 +1212,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, out: return err; } -#endif /* CONFIG_NFSD_V3 */
static __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, @@ -1406,8 +1400,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, rdev, resfhp); }
-#ifdef CONFIG_NFSD_V3 - /* * NFSv3 and NFSv4 version of nfsd_create */ @@ -1573,7 +1565,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserrno(host_err); goto out; } -#endif /* CONFIG_NFSD_V3 */
/* * Read a symlink. On entry, *lenp must contain the maximum path length that diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 2c43d10e3cab4..ccb87b2864f64 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -68,7 +68,6 @@ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, int type, dev_t rdev, struct svc_fh *res); -#ifdef CONFIG_NFSD_V3 __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, @@ -76,7 +75,6 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, u32 *verifier, bool *truncp, bool *created); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); -#endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, void **bufp, int *lenp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 35aff0678f99b0623bb72d50112de9e163a19559 ]
The common practice is to name function instances the same as the method names, but with a uniquifying prefix. Commit aef9583b234a ("NFSD: Get reference of lockowner when coping file_lock") missed this -- the new function names should both have been of the form "nfsd4_lm_*".
Before more lock manager operations are added in NFSD, rename these two functions for consistency.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 60d5d1cb2cc65..0170aaf318ea2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6566,7 +6566,7 @@ nfs4_transform_lock_offset(struct file_lock *lock) }
static fl_owner_t -nfsd4_fl_get_owner(fl_owner_t owner) +nfsd4_lm_get_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner;
@@ -6575,7 +6575,7 @@ nfsd4_fl_get_owner(fl_owner_t owner) }
static void -nfsd4_fl_put_owner(fl_owner_t owner) +nfsd4_lm_put_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner;
@@ -6610,8 +6610,8 @@ nfsd4_lm_notify(struct file_lock *fl)
static const struct lock_manager_operations nfsd_posix_mng_ops = { .lm_notify = nfsd4_lm_notify, - .lm_get_owner = nfsd4_fl_get_owner, - .lm_put_owner = nfsd4_fl_put_owner, + .lm_get_owner = nfsd4_lm_get_owner, + .lm_put_owner = nfsd4_lm_put_owner, };
static inline void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakob Koschel jakobkoschel@gmail.com
[ Upstream commit 4fc5f5346592cdc91689455d83885b0af65d71b8 ]
While the original code is valid, it is not the obvious choice for the sizeof() call and in preparation to limit the scope of the list iterator variable the sizeof should be changed to the size of the destination.
Signed-off-by: Jakob Koschel jakobkoschel@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4layouts.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 2673019d30ecd..7018d209b784a 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -421,7 +421,7 @@ nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls) new = kmem_cache_alloc(nfs4_layout_cache, GFP_KERNEL); if (!new) return nfserr_jukebox; - memcpy(&new->lo_seg, seg, sizeof(lp->lo_seg)); + memcpy(&new->lo_seg, seg, sizeof(new->lo_seg)); new->lo_state = ls;
spin_lock(&fp->fi_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 4f0b903ded728c505850daf2914bfc08841f0ae6 ]
fsnotify_parent() does not consider the parent's mark at all unless the parent inode shows interest in events on children and in the specific event.
So unless parent added an event to both its mark mask and ignored mask, the event will not be ignored.
Fix this by declaring the interest of an object in an event when the event is in either a mark mask or ignored mask.
Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 17 +++++++++-------- fs/notify/mark.c | 4 ++-- include/linux/fsnotify_backend.h | 15 +++++++++++++++ 3 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6679700574113..12fb209e60419 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -991,17 +991,18 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, __u32 mask, unsigned int flags, __u32 umask, int *destroy) { - __u32 oldmask = 0; + __u32 oldmask, newmask;
/* umask bits cannot be removed by user */ mask &= ~umask; spin_lock(&fsn_mark->lock); + oldmask = fsnotify_calc_mask(fsn_mark); if (!(flags & FAN_MARK_IGNORED_MASK)) { - oldmask = fsn_mark->mask; fsn_mark->mask &= ~mask; } else { fsn_mark->ignored_mask &= ~mask; } + newmask = fsnotify_calc_mask(fsn_mark); /* * We need to keep the mark around even if remaining mask cannot * result in any events (e.g. mask == FAN_ONDIR) to support incremenal @@ -1011,7 +1012,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); spin_unlock(&fsn_mark->lock);
- return mask & oldmask; + return oldmask & ~newmask; }
static int fanotify_remove_mark(struct fsnotify_group *group, @@ -1069,23 +1070,23 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, }
static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, - unsigned int flags) + __u32 mask, unsigned int flags) { - __u32 oldmask = -1; + __u32 oldmask, newmask;
spin_lock(&fsn_mark->lock); + oldmask = fsnotify_calc_mask(fsn_mark); if (!(flags & FAN_MARK_IGNORED_MASK)) { - oldmask = fsn_mark->mask; fsn_mark->mask |= mask; } else { fsn_mark->ignored_mask |= mask; if (flags & FAN_MARK_IGNORED_SURV_MODIFY) fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; } + newmask = fsnotify_calc_mask(fsn_mark); spin_unlock(&fsn_mark->lock);
- return mask & ~oldmask; + return newmask & ~oldmask; }
static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, diff --git a/fs/notify/mark.c b/fs/notify/mark.c index b42629d2fc1c6..c86982be2d505 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -127,7 +127,7 @@ static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) return; hlist_for_each_entry(mark, &conn->list, obj_list) { if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) - new_mask |= mark->mask; + new_mask |= fsnotify_calc_mask(mark); } *fsnotify_conn_mask_p(conn) = new_mask; } @@ -692,7 +692,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, if (ret) goto err;
- if (mark->mask) + if (mark->mask || mark->ignored_mask) fsnotify_recalc_mask(mark->connector);
return ret; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 790c31844db5d..5f9c960049b07 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -601,6 +601,21 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
/* functions used to manipulate the marks attached to inodes */
+/* Get mask for calculating object interest taking ignored mask into account */ +static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) +{ + __u32 mask = mark->mask; + + if (!mark->ignored_mask) + return mask; + + /* + * If mark is interested in ignoring events on children, the object must + * show interest in those events for fsnotify_parent() to notice it. + */ + return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); +} + /* Get mask of events for a list of marks */ extern __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn); /* Calculate mask of events for a list of marks */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 04e317ba72d07901b03399b3d1525e83424df5b3 ]
fsnotify() treats FS_MODIFY events specially - it does not skip them even if the FS_MODIFY event does not apear in the object's fsnotify mask. This is because send_to_group() checks if FS_MODIFY needs to clear ignored mask of marks.
The common case is that an object does not have any mark with ignored mask and in particular, that it does not have a mark with ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag.
Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask() if object has a mark with an ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could be optimized in the common case.
Call fsnotify_recalc_mask() from fanotify after adding or removing an ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark with ignored mask (the flag cannot be removed by fanotify uapi).
Performance results for doing 10000000 write(2)s to tmpfs:
vanilla patched without notification mark 25.486+-1.054 24.965+-0.244 with notification mark 30.111+-0.139 26.891+-1.355
So we can see the overhead of notification subsystem has been drastically reduced.
Link: https://lore.kernel.org/r/20220223151438.790268-3-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 32 +++++++++++++++++++++++------- fs/notify/fsnotify.c | 8 +++++--- include/linux/fsnotify_backend.h | 4 ++++ 3 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 12fb209e60419..64abec874d8e3 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1069,8 +1069,28 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, flags, umask); }
+static void fanotify_mark_add_ignored_mask(struct fsnotify_mark *fsn_mark, + __u32 mask, unsigned int flags, + __u32 *removed) +{ + fsn_mark->ignored_mask |= mask; + + /* + * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to + * the removal of the FS_MODIFY bit in calculated mask if it was set + * because of an ignored mask that is now going to survive FS_MODIFY. + */ + if ((flags & FAN_MARK_IGNORED_SURV_MODIFY) && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { + fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; + if (!(fsn_mark->mask & FS_MODIFY)) + *removed = FS_MODIFY; + } +} + static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags) + __u32 mask, unsigned int flags, + __u32 *removed) { __u32 oldmask, newmask;
@@ -1079,9 +1099,7 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, if (!(flags & FAN_MARK_IGNORED_MASK)) { fsn_mark->mask |= mask; } else { - fsn_mark->ignored_mask |= mask; - if (flags & FAN_MARK_IGNORED_SURV_MODIFY) - fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; + fanotify_mark_add_ignored_mask(fsn_mark, mask, flags, removed); } newmask = fsnotify_calc_mask(fsn_mark); spin_unlock(&fsn_mark->lock); @@ -1144,7 +1162,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, __kernel_fsid_t *fsid) { struct fsnotify_mark *fsn_mark; - __u32 added; + __u32 added, removed = 0; int ret = 0;
mutex_lock(&group->mark_mutex); @@ -1167,8 +1185,8 @@ static int fanotify_add_mark(struct fsnotify_group *group, goto out; }
- added = fanotify_mark_add_to_mask(fsn_mark, mask, flags); - if (added & ~fsnotify_conn_mask(fsn_mark->connector)) + added = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &removed); + if (removed || (added & ~fsnotify_conn_mask(fsn_mark->connector))) fsnotify_recalc_mask(fsn_mark->connector);
out: diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index ab81a0776ece5..494f653efbc6e 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -531,11 +531,13 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
/* - * if this is a modify event we may need to clear the ignored masks - * otherwise return if none of the marks care about this type of event. + * If this is a modify event we may need to clear some ignored masks. + * In that case, the object with ignored masks will have the FS_MODIFY + * event in its mask. + * Otherwise, return if none of the marks care about this type of event. */ test_mask = (mask & ALL_FSNOTIFY_EVENTS); - if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) + if (!(test_mask & marks_mask)) return 0;
iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 5f9c960049b07..0805b74cae441 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -609,6 +609,10 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) if (!mark->ignored_mask) return mask;
+ /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + mask |= FS_MODIFY; + /* * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bang Li libang.linuxer@gmail.com
[ Upstream commit f92ca72b0263d601807bbd23ed25cbe6f4da89f4 ]
iput() has already judged the incoming parameter, so there is no need to repeat the judgment here.
Link: https://lore.kernel.org/r/20220311151240.62045-1-libang.linuxer@gmail.com Signed-off-by: Bang Li libang.linuxer@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 494f653efbc6e..70a8516b78bc5 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -70,8 +70,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock);
- if (iput_inode) - iput(iput_inode); + iput(iput_inode);
/* for each watch, send FS_UNMOUNT and then remove it */ fsnotify_inode(inode, FS_UNMOUNT); @@ -85,8 +84,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) } spin_unlock(&sb->s_inode_list_lock);
- if (iput_inode) - iput(iput_inode); + iput(iput_inode); }
void fsnotify_sb_delete(struct super_block *sb)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haowen Bai baihaowen@meizu.com
[ Upstream commit 5f7b839d47dbc74cf4a07beeab5191f93678673e ]
Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck:
./fs/nfsd/nfs2acl.c:289:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_accessres' with return type bool ./fs/nfsd/nfs2acl.c:252:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_getaclres' with return type bool
Signed-off-by: Haowen Bai baihaowen@meizu.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 96733dff354d3..03703b22c81ef 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -247,34 +247,34 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) int w;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false;
if (dentry == NULL || d_really_is_negative(dentry)) - return 1; + return true; inode = d_inode(dentry);
if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return 0; + return false;
rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 1; + return true; w -= PAGE_SIZE; }
if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, resp->mask & NFS_ACL, 0)) - return 0; + return false; if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, resp->mask & NFS_DFACL, NFS_ACL_DEFAULT)) - return 0; + return false;
- return 1; + return true; }
/* ACCESS */ @@ -284,17 +284,17 @@ nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) struct nfsd3_accessres *resp = rqstp->rq_resp;
if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return 0; + return false; break; }
- return 1; + return true; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 6b8a94332ee4f7d9a8ae0cbac7609f79c212f06c ]
The call to filemap_flush() in nfsd_file_put() is there to ensure that we clear out any writes belonging to a NFSv3 client relatively quickly and avoid situations where the file can't be evicted by the garbage collector. It also ensures that we detect write errors quickly.
The problem is this causes a regression in performance for some workloads.
So try to improve matters by deferring writeback until we're ready to close the file, and need to detect errors so that we can force the client to resend.
Tested-by: Jan Kara jack@suse.cz Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index cc2831cec6695..496f7b3f75237 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -235,6 +235,13 @@ nfsd_file_check_write_error(struct nfsd_file *nf) return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); }
+static void +nfsd_file_flush(struct nfsd_file *nf) +{ + if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); +} + static void nfsd_file_do_unhash(struct nfsd_file *nf) { @@ -302,11 +309,14 @@ nfsd_file_put(struct nfsd_file *nf) return; }
- filemap_flush(nf->nf_file->f_mapping); is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; - nfsd_file_put_noref(nf); - if (is_hashed) + if (!is_hashed) { + nfsd_file_flush(nf); + nfsd_file_put_noref(nf); + } else { + nfsd_file_put_noref(nf); nfsd_file_schedule_laundrette(); + } if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc(); } @@ -327,6 +337,7 @@ nfsd_file_dispose_list(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del(&nf->nf_lru); + nfsd_file_flush(nf); nfsd_file_put_noref(nf); } } @@ -340,6 +351,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del(&nf->nf_lru); + nfsd_file_flush(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue; if (nfsd_file_free(nf))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 999397926ab3f78c7d1235cc4ca6e3c89d2769bf ]
Make it a little less racy, by removing the refcount_read() test. Then remove the redundant 'is_hashed' variable.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 496f7b3f75237..8f7ed5dbb0031 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -301,21 +301,14 @@ nfsd_file_put_noref(struct nfsd_file *nf) void nfsd_file_put(struct nfsd_file *nf) { - bool is_hashed; - set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); - if (refcount_read(&nf->nf_ref) > 2 || !nf->nf_file) { - nfsd_file_put_noref(nf); - return; - } - - is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; - if (!is_hashed) { + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); } else { nfsd_file_put_noref(nf); - nfsd_file_schedule_laundrette(); + if (nf->nf_file) + nfsd_file_schedule_laundrette(); } if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc();
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit ceaf69f8eadcafb323392be88e7a5248c415d423 ]
Dirent events (create/delete/move) are only reported on watched directory inodes, but in fanotify as well as in legacy inotify, it was always allowed to set them on non-dir inode, which does not result in any meaningful outcome.
Until kernel v5.17, dirent events in fanotify also differed from events "on child" (e.g. FAN_OPEN) in the information provided in the event. For example, FAN_OPEN could be set in the mask of a non-dir or the mask of its parent and event would report the fid of the child regardless of the marked object. By contrast, FAN_DELETE is not reported if the child is marked and the child fid was not reported in the events.
Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the fid of the child is reported with dirent events, like events "on child", which may create confusion for users expecting the same behavior as events "on child" when setting events in the mask on a child.
The desired semantics of setting dirent events in the mask of a child are not clear, so for now, deny this action for a group initialized with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME. We may relax this restriction in the future if we decide on the semantics and implement them.
Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID") Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event") Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3... Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 64abec874d8e3..921ee7b08580d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1663,6 +1663,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt;
+ /* + * FAN_RENAME is not allowed on non-dir (for now). + * We shouldn't have allowed setting any dirent events in mask of + * non-dir, but because we always allowed it, error only if group + * was initialized with the new flag FAN_REPORT_TARGET_FID. + */ + ret = -ENOTDIR; + if (inode && !S_ISDIR(inode->i_mode) && + ((mask & FAN_RENAME) || + ((mask & FANOTIFY_DIRENT_EVENTS) && + FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID)))) + goto path_put_and_out; + /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 9d6647762b9c6b555bc83d97d7c93be6057a990f ]
Update lock usage of lock_manager_operations' functions to reflect the changes in commit 6109c85037e5 ("locks: add a dedicated spinlock to protect i_flctx lists").
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/locking.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index fbd695d66905f..07e57f7629202 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -437,13 +437,13 @@ prototypes:: locking rules:
====================== ============= ================= ========= -ops inode->i_lock blocked_lock_lock may block +ops flc_lock blocked_lock_lock may block ====================== ============= ================= ========= -lm_notify: yes yes no +lm_notify: no yes no lm_grant: no no no lm_break: yes no no lm_change yes no no -lm_breaker_owns_lease: no no no +lm_breaker_owns_lease: yes no no ====================== ============= ================= =========
buffer_head
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 38035c04f5865c4ef9597d6beed6a7178f90f64a ]
The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags.
This frees up some bits in the object interest mask.
Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.c | 4 +-- fs/notify/inotify/inotify.h | 11 ++++++-- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/inotify/inotify_user.c | 38 ++++++++++++++++++---------- include/linux/fsnotify_backend.h | 16 +++++++----- 5 files changed, 45 insertions(+), 26 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 70a8516b78bc5..6eee19d15e8cd 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -253,7 +253,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group, if (WARN_ON_ONCE(!inode && !dir)) return 0;
- if ((inode_mark->mask & FS_EXCL_UNLINK) && + if ((inode_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0;
@@ -581,7 +581,7 @@ static __init int fsnotify_init(void) { int ret;
- BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25); + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
ret = init_srcu_struct(&fsnotify_mark_srcu); if (ret) diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index 8f00151eb731f..7d5df7a215397 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -27,11 +27,18 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse) * userspace. There is at least one bit (FS_EVENT_ON_CHILD) which is * used only internally to the kernel. */ -#define INOTIFY_USER_MASK (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK) +#define INOTIFY_USER_MASK (IN_ALL_EVENTS)
static inline __u32 inotify_mark_user_mask(struct fsnotify_mark *fsn_mark) { - return fsn_mark->mask & INOTIFY_USER_MASK; + __u32 mask = fsn_mark->mask & INOTIFY_USER_MASK; + + if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) + mask |= IN_EXCL_UNLINK; + if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) + mask |= IN_ONESHOT; + + return mask; }
extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 8279827836399..993375f0db673 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -129,7 +129,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, fsnotify_destroy_event(group, fsn_event); }
- if (inode_mark->mask & IN_ONESHOT) + if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) fsnotify_destroy_mark(inode_mark, group);
return 0; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 9676c3f2df3d7..861147e574581 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -102,6 +102,21 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) return mask; }
+#define INOTIFY_MARK_FLAGS \ + (FSNOTIFY_MARK_FLAG_EXCL_UNLINK | FSNOTIFY_MARK_FLAG_IN_ONESHOT) + +static inline unsigned int inotify_arg_to_flags(u32 arg) +{ + unsigned int flags = 0; + + if (arg & IN_EXCL_UNLINK) + flags |= FSNOTIFY_MARK_FLAG_EXCL_UNLINK; + if (arg & IN_ONESHOT) + flags |= FSNOTIFY_MARK_FLAG_IN_ONESHOT; + + return flags; +} + static inline u32 inotify_mask_to_arg(__u32 mask) { return mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED | @@ -513,13 +528,10 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, struct fsnotify_mark *fsn_mark; struct inotify_inode_mark *i_mark; __u32 old_mask, new_mask; - __u32 mask; - int add = (arg & IN_MASK_ADD); + int replace = !(arg & IN_MASK_ADD); int create = (arg & IN_MASK_CREATE); int ret;
- mask = inotify_arg_to_mask(inode, arg); - fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group); if (!fsn_mark) return -ENOENT; @@ -532,10 +544,12 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
spin_lock(&fsn_mark->lock); old_mask = fsn_mark->mask; - if (add) - fsn_mark->mask |= mask; - else - fsn_mark->mask = mask; + if (replace) { + fsn_mark->mask = 0; + fsn_mark->flags &= ~INOTIFY_MARK_FLAGS; + } + fsn_mark->mask |= inotify_arg_to_mask(inode, arg); + fsn_mark->flags |= inotify_arg_to_flags(arg); new_mask = fsn_mark->mask; spin_unlock(&fsn_mark->lock);
@@ -566,19 +580,17 @@ static int inotify_new_watch(struct fsnotify_group *group, u32 arg) { struct inotify_inode_mark *tmp_i_mark; - __u32 mask; int ret; struct idr *idr = &group->inotify_data.idr; spinlock_t *idr_lock = &group->inotify_data.idr_lock;
- mask = inotify_arg_to_mask(inode, arg); - tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL); if (unlikely(!tmp_i_mark)) return -ENOMEM;
fsnotify_init_mark(&tmp_i_mark->fsn_mark, group); - tmp_i_mark->fsn_mark.mask = mask; + tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg); + tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg); tmp_i_mark->wd = -1;
ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark); @@ -832,9 +844,7 @@ static int __init inotify_user_setup(void) BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT); BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW); BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED); - BUILD_BUG_ON(IN_EXCL_UNLINK != FS_EXCL_UNLINK); BUILD_BUG_ON(IN_ISDIR != FS_ISDIR); - BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);
BUILD_BUG_ON(HWEIGHT32(ALL_INOTIFY_BITS) != 22);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 0805b74cae441..b1c72edd97845 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -55,7 +55,6 @@ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
-#define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ /* * Set on inode mark that cares about things that happen to its children. * Always set for dnotify and inotify. @@ -66,7 +65,6 @@ #define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ -#define FS_IN_ONESHOT 0x80000000 /* only send event once */
#define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO)
@@ -106,8 +104,7 @@ FS_ERROR)
/* Extra flags that may be reported with event or control handling of events */ -#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ - FS_DN_MULTISHOT | FS_EVENT_ON_CHILD) +#define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
#define ALL_FSNOTIFY_BITS (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS)
@@ -473,9 +470,14 @@ struct fsnotify_mark { struct fsnotify_mark_connector *connector; /* Events types to ignore [mark->lock, group->mark_mutex] */ __u32 ignored_mask; -#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x01 -#define FSNOTIFY_MARK_FLAG_ALIVE 0x02 -#define FSNOTIFY_MARK_FLAG_ATTACHED 0x04 + /* General fsnotify mark flags */ +#define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 +#define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 + /* inotify mark flags */ +#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK 0x0010 +#define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 + /* fanotify mark flags */ +#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 unsigned int flags; /* flags [mark->lock] */ };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 867a448d587e7fa845bceaf4ee1c632448f2a9fa ]
Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation.
Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags.
Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 3 ++- fs/notify/dnotify/dnotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 3 ++- fs/notify/group.c | 21 +++++++++------------ fs/notify/inotify/inotify_user.c | 3 ++- include/linux/fsnotify_backend.h | 8 ++++++-- kernel/audit_fsnotify.c | 3 ++- kernel/audit_tree.c | 2 +- kernel/audit_watch.c | 2 +- 9 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 8f7ed5dbb0031..7ae2b6611fb29 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -677,7 +677,8 @@ nfsd_file_cache_init(void) goto out_shrinker; }
- nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops); + nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops, + 0); if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index d5ebebb034ffe..6c586802c50e6 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -383,7 +383,7 @@ static int __init dnotify_init(void) SLAB_PANIC|SLAB_ACCOUNT); dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
- dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops); + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); return 0; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 921ee7b08580d..731bd7f64e018 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1343,7 +1343,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) f_flags |= O_NONBLOCK;
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ - group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); + group = fsnotify_alloc_group(&fanotify_fsnotify_ops, + FSNOTIFY_GROUP_USER); if (IS_ERR(group)) { return PTR_ERR(group); } diff --git a/fs/notify/group.c b/fs/notify/group.c index b7d4d64f87c29..18446b7b0d495 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -112,7 +112,8 @@ void fsnotify_put_group(struct fsnotify_group *group) EXPORT_SYMBOL_GPL(fsnotify_put_group);
static struct fsnotify_group *__fsnotify_alloc_group( - const struct fsnotify_ops *ops, gfp_t gfp) + const struct fsnotify_ops *ops, + int flags, gfp_t gfp) { struct fsnotify_group *group;
@@ -133,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group( INIT_LIST_HEAD(&group->marks_list);
group->ops = ops; + group->flags = flags;
return group; } @@ -140,20 +142,15 @@ static struct fsnotify_group *__fsnotify_alloc_group( /* * Create a new fsnotify_group and hold a reference for the group returned. */ -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops, + int flags) { - return __fsnotify_alloc_group(ops, GFP_KERNEL); -} -EXPORT_SYMBOL_GPL(fsnotify_alloc_group); + gfp_t gfp = (flags & FSNOTIFY_GROUP_USER) ? GFP_KERNEL_ACCOUNT : + GFP_KERNEL;
-/* - * Create a new fsnotify_group and hold a reference for the group returned. - */ -struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops) -{ - return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT); + return __fsnotify_alloc_group(ops, flags, gfp); } -EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group); +EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
int fsnotify_fasync(int fd, struct file *file, int on) { diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 861147e574581..9e1cf8392385a 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -643,7 +643,8 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) struct fsnotify_group *group; struct inotify_event_info *oevent;
- group = fsnotify_alloc_user_group(&inotify_fsnotify_ops); + group = fsnotify_alloc_group(&inotify_fsnotify_ops, + FSNOTIFY_GROUP_USER); if (IS_ERR(group)) return group;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b1c72edd97845..f0bf557af0091 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -210,6 +210,9 @@ struct fsnotify_group { unsigned int priority; bool shutdown; /* group is being shut down, don't queue more events */
+#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ + int flags; + /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ atomic_t user_waits; /* Number of tasks waiting for user @@ -543,8 +546,9 @@ static inline void fsnotify_update_flags(struct dentry *dentry) /* called from fsnotify listeners, such as fanotify or dnotify */
/* create a new group */ -extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); -extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops); +extern struct fsnotify_group *fsnotify_alloc_group( + const struct fsnotify_ops *ops, + int flags); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 76a5925b4e18d..31f43b1475404 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -182,7 +182,8 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = {
static int __init audit_fsnotify_init(void) { - audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops); + audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, + 0); if (IS_ERR(audit_fsnotify_group)) { audit_fsnotify_group = NULL; audit_panic("cannot create audit fsnotify group"); diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 39241207ec044..0c35879bbf7c3 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -1077,7 +1077,7 @@ static int __init audit_tree_init(void)
audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
- audit_tree_group = fsnotify_alloc_group(&audit_tree_ops); + audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0); if (IS_ERR(audit_tree_group)) audit_panic("cannot initialize fsnotify group for rectree watches");
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index fd7b30a2d9a4b..5cf22fe301493 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -492,7 +492,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = {
static int __init audit_watch_init(void) { - audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops); + audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0); if (IS_ERR(audit_watch_group)) { audit_watch_group = NULL; audit_panic("cannot create audit fsnotify group");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit f3010343d9e119da35ee864b3a28993bb5c78ed7 ]
Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark().
Rename the allow_dups argument to generic add_flags argument for future use.
Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/mark.c | 12 ++++++------ include/linux/fsnotify_backend.h | 13 +++++++------ kernel/audit_fsnotify.c | 4 ++-- 3 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index c86982be2d505..1fb246ea61752 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -574,7 +574,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( static int fsnotify_add_mark_list(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid) + int add_flags, __kernel_fsid_t *fsid) { struct fsnotify_mark *lmark, *last = NULL; struct fsnotify_mark_connector *conn; @@ -633,7 +633,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
if ((lmark->group == mark->group) && (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) && - !allow_dups) { + !(mark->group->flags & FSNOTIFY_GROUP_DUPS)) { err = -EEXIST; goto out_err; } @@ -668,7 +668,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, */ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid) + int add_flags, __kernel_fsid_t *fsid) { struct fsnotify_group *group = mark->group; int ret = 0; @@ -688,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock);
- ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid); + ret = fsnotify_add_mark_list(mark, connp, obj_type, add_flags, fsid); if (ret) goto err;
@@ -708,14 +708,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, }
int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int allow_dups, + unsigned int obj_type, int add_flags, __kernel_fsid_t *fsid) { int ret; struct fsnotify_group *group = mark->group;
mutex_lock(&group->mark_mutex); - ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, add_flags, fsid); mutex_unlock(&group->mark_mutex); return ret; } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index f0bf557af0091..dd440e6ff5285 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -211,6 +211,7 @@ struct fsnotify_group { bool shutdown; /* group is being shut down, don't queue more events */
#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ +#define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ int flags;
/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ @@ -641,26 +642,26 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid); + int add_flags, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int allow_dups, + unsigned int obj_type, int add_flags, __kernel_fsid_t *fsid);
/* attach the mark to the inode */ static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL); + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); } static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); }
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 31f43b1475404..691f90dd09d25 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -100,7 +100,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa audit_update_mark(audit_mark, dentry->d_inode); audit_mark->rule = krule;
- ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); + ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0); if (ret < 0) { audit_mark->path = NULL; fsnotify_put_mark(&audit_mark->mark); @@ -183,7 +183,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = { static int __init audit_fsnotify_init(void) { audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, - 0); + FSNOTIFY_GROUP_DUPS); if (IS_ERR(audit_fsnotify_group)) { audit_fsnotify_group = NULL; audit_panic("cannot create audit fsnotify group");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 43b245a788e2d8f1bb742668a9bdace02fcb3e96 ]
Create helpers to take and release the group mark_mutex lock.
Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim.
In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes.
For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support.
Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fdinfo.c | 4 ++-- fs/notify/group.c | 11 +++++++++++ fs/notify/mark.c | 24 +++++++++++------------- include/linux/fsnotify_backend.h | 28 ++++++++++++++++++++++++++++ 4 files changed, 52 insertions(+), 15 deletions(-)
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 3451708fd035c..1f34c5c29fdbd 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -28,13 +28,13 @@ static void show_fdinfo(struct seq_file *m, struct file *f, struct fsnotify_group *group = f->private_data; struct fsnotify_mark *mark;
- mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); list_for_each_entry(mark, &group->marks_list, g_list) { show(m, mark); if (seq_has_overflowed(m)) break; } - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); }
#if defined(CONFIG_EXPORTFS) diff --git a/fs/notify/group.c b/fs/notify/group.c index 18446b7b0d495..1de6631a3925e 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -115,6 +115,7 @@ static struct fsnotify_group *__fsnotify_alloc_group( const struct fsnotify_ops *ops, int flags, gfp_t gfp) { + static struct lock_class_key nofs_marks_lock; struct fsnotify_group *group;
group = kzalloc(sizeof(struct fsnotify_group), gfp); @@ -135,6 +136,16 @@ static struct fsnotify_group *__fsnotify_alloc_group(
group->ops = ops; group->flags = flags; + /* + * For most backends, eviction of inode with a mark is not expected, + * because marks hold a refcount on the inode against eviction. + * + * Use a different lockdep class for groups that support evictable + * inode marks, because with evictable marks, mark_mutex is NOT + * fs-reclaim safe - the mutex is taken when evicting inodes. + */ + if (flags & FSNOTIFY_GROUP_NOFS) + lockdep_set_class(&group->mark_mutex, &nofs_marks_lock);
return group; } diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 1fb246ea61752..982ca2f20ff5d 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -398,9 +398,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) */ void fsnotify_detach_mark(struct fsnotify_mark *mark) { - struct fsnotify_group *group = mark->group; - - WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); + fsnotify_group_assert_locked(mark->group); WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) && refcount_read(&mark->refcnt) < 1 + !!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)); @@ -452,9 +450,9 @@ void fsnotify_free_mark(struct fsnotify_mark *mark) void fsnotify_destroy_mark(struct fsnotify_mark *mark, struct fsnotify_group *group) { - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsnotify_detach_mark(mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); fsnotify_free_mark(mark); } EXPORT_SYMBOL_GPL(fsnotify_destroy_mark); @@ -673,7 +671,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, struct fsnotify_group *group = mark->group; int ret = 0;
- BUG_ON(!mutex_is_locked(&group->mark_mutex)); + fsnotify_group_assert_locked(group);
/* * LOCKING ORDER!!!! @@ -714,9 +712,9 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, int ret; struct fsnotify_group *group = mark->group;
- mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); ret = fsnotify_add_mark_locked(mark, connp, obj_type, add_flags, fsid); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return ret; } EXPORT_SYMBOL_GPL(fsnotify_add_mark); @@ -770,24 +768,24 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, * move marks to free to to_free list in one go and then free marks in * to_free list one by one. */ - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { if (mark->connector->type == obj_type) list_move(&mark->g_list, &to_free); } - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group);
clear: while (1) { - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); if (list_empty(head)) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); break; } mark = list_first_entry(head, struct fsnotify_mark, g_list); fsnotify_get_mark(mark); fsnotify_detach_mark(mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); fsnotify_free_mark(mark); fsnotify_put_mark(mark); } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index dd440e6ff5285..d62111e832440 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -20,6 +20,7 @@ #include <linux/user_namespace.h> #include <linux/refcount.h> #include <linux/mempool.h> +#include <linux/sched/mm.h>
/* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -212,7 +213,9 @@ struct fsnotify_group {
#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ +#define FSNOTIFY_GROUP_NOFS 0x04 /* group lock is not direct reclaim safe */ int flags; + unsigned int owner_flags; /* stored flags of mark_mutex owner */
/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ @@ -254,6 +257,31 @@ struct fsnotify_group { }; };
+/* + * These helpers are used to prevent deadlock when reclaiming inodes with + * evictable marks of the same group that is allocating a new mark. + */ +static inline void fsnotify_group_lock(struct fsnotify_group *group) +{ + mutex_lock(&group->mark_mutex); + if (group->flags & FSNOTIFY_GROUP_NOFS) + group->owner_flags = memalloc_nofs_save(); +} + +static inline void fsnotify_group_unlock(struct fsnotify_group *group) +{ + if (group->flags & FSNOTIFY_GROUP_NOFS) + memalloc_nofs_restore(group->owner_flags); + mutex_unlock(&group->mark_mutex); +} + +static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) +{ + WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); + if (group->flags & FSNOTIFY_GROUP_NOFS) + WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); +} + /* When calling fsnotify tell it if the data is a path or inode */ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 642054b87058019be36033f73c3e48ffff1915aa ]
inotify inode marks pin the inode so there is no need to set the FSNOTIFY_GROUP_NOFS flag.
Link: https://lore.kernel.org/r/20220422120327.3459282-8-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/inotify/inotify_user.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 9e1cf8392385a..3d5d536f8fd63 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -627,13 +627,13 @@ static int inotify_update_watch(struct fsnotify_group *group, struct inode *inod { int ret = 0;
- mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); /* try to update and existing watch with the new arg */ ret = inotify_update_existing_watch(group, inode, arg); /* no mark present, try to add a new one */ if (ret == -ENOENT) ret = inotify_new_watch(group, inode, arg); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group);
return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit b8962a9d8cc2d8c93362e2f684091c79f702f6f3 ]
Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context and that could cause a deadlock when fsnotify mark allocation went into direct reclaim and nfsd shrinker tried to free existing fsnotify marks.
To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS flag on nfsd fsnotify group to prevent going into direct reclaim from fsnotify_add_inode_mark().
Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7ae2b6611fb29..7e99e75b75d73 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -118,14 +118,14 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) struct inode *inode = nf->nf_inode;
do { - mutex_lock(&nfsd_file_fsnotify_group->mark_mutex); + fsnotify_group_lock(nfsd_file_fsnotify_group); mark = fsnotify_find_mark(&inode->i_fsnotify_marks, - nfsd_file_fsnotify_group); + nfsd_file_fsnotify_group); if (mark) { nfm = nfsd_file_mark_get(container_of(mark, struct nfsd_file_mark, nfm_mark)); - mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); + fsnotify_group_unlock(nfsd_file_fsnotify_group); if (nfm) { fsnotify_put_mark(mark); break; @@ -133,8 +133,9 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) /* Avoid soft lockup race with nfsd_file_mark_put() */ fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group); fsnotify_put_mark(mark); - } else - mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); + } else { + fsnotify_group_unlock(nfsd_file_fsnotify_group); + }
/* allocate a new nfm */ new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL); @@ -678,7 +679,7 @@ nfsd_file_cache_init(void) }
nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops, - 0); + FSNOTIFY_GROUP_NOFS); if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit aabb45fdcb31f00f1e7cae2bce83e83474a87c03 ]
Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context. There is no guarantee that others memory shrinkers don't do the same and no guarantee that future shrinkers won't do that.
For example, if overlayfs implements inode cache of fscache would keep open files to cached objects, inode shrinkers could end up closing open files to underlying fs.
Direct reclaim from dnotify mark allocation context may try to close open files that have dnotify marks of the same group and hit a deadlock on mark_mutex.
Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under dnotify group lock and use the safe group lock helpers.
Link: https://lore.kernel.org/r/20220422120327.3459282-11-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/dnotify/dnotify.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 6c586802c50e6..fa81c59a2ad41 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -150,7 +150,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) return; dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark);
- mutex_lock(&dnotify_group->mark_mutex); + fsnotify_group_lock(dnotify_group);
spin_lock(&fsn_mark->lock); prev = &dn_mark->dn; @@ -173,7 +173,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) free = true; }
- mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group);
if (free) fsnotify_free_mark(fsn_mark); @@ -306,7 +306,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) new_dn_mark->dn = NULL;
/* this is needed to prevent the fcntl/close race described below */ - mutex_lock(&dnotify_group->mark_mutex); + fsnotify_group_lock(dnotify_group);
/* add the new_fsn_mark or find an old one. */ fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group); @@ -316,7 +316,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) } else { error = fsnotify_add_inode_mark_locked(new_fsn_mark, inode, 0); if (error) { - mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group); goto out_err; } spin_lock(&new_fsn_mark->lock); @@ -365,7 +365,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
if (destroy) fsnotify_detach_mark(fsn_mark); - mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group); if (destroy) fsnotify_free_mark(fsn_mark); fsnotify_put_mark(fsn_mark); @@ -383,7 +383,8 @@ static int __init dnotify_init(void) SLAB_PANIC|SLAB_ACCOUNT); dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
- dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0); + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, + FSNOTIFY_GROUP_NOFS); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); return 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit c3638b5b13740fa31762d414bbce8b7a694e582a ]
fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode.
Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.
Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference.
Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce.
This leaves the choice to the backend whether or not to pin the inode when adding an inode mark.
This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted.
Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/mark.c | 76 +++++++++++++++++++++++--------- include/linux/fsnotify_backend.h | 2 + 2 files changed, 58 insertions(+), 20 deletions(-)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 982ca2f20ff5d..c74ef947447d6 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -116,20 +116,64 @@ __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn) return *fsnotify_conn_mask_p(conn); }
-static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) +static void fsnotify_get_inode_ref(struct inode *inode) +{ + ihold(inode); + atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); +} + +/* + * Grab or drop inode reference for the connector if needed. + * + * When it's time to drop the reference, we only clear the HAS_IREF flag and + * return the inode object. fsnotify_drop_object() will be resonsible for doing + * iput() outside of spinlocks. This happens when last mark that wanted iref is + * detached. + */ +static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn, + bool want_iref) +{ + bool has_iref = conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF; + struct inode *inode = NULL; + + if (conn->type != FSNOTIFY_OBJ_TYPE_INODE || + want_iref == has_iref) + return NULL; + + if (want_iref) { + /* Pin inode if any mark wants inode refcount held */ + fsnotify_get_inode_ref(fsnotify_conn_inode(conn)); + conn->flags |= FSNOTIFY_CONN_FLAG_HAS_IREF; + } else { + /* Unpin inode after detach of last mark that wanted iref */ + inode = fsnotify_conn_inode(conn); + conn->flags &= ~FSNOTIFY_CONN_FLAG_HAS_IREF; + } + + return inode; +} + +static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) { u32 new_mask = 0; + bool want_iref = false; struct fsnotify_mark *mark;
assert_spin_locked(&conn->lock); /* We can get detached connector here when inode is getting unlinked. */ if (!fsnotify_valid_obj_type(conn->type)) - return; + return NULL; hlist_for_each_entry(mark, &conn->list, obj_list) { - if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) - new_mask |= fsnotify_calc_mask(mark); + if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) + continue; + new_mask |= fsnotify_calc_mask(mark); + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE && + !(mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + want_iref = true; } *fsnotify_conn_mask_p(conn) = new_mask; + + return fsnotify_update_iref(conn, want_iref); }
/* @@ -169,12 +213,6 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) } }
-static void fsnotify_get_inode_ref(struct inode *inode) -{ - ihold(inode); - atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); -} - static void fsnotify_put_inode_ref(struct inode *inode) { struct super_block *sb = inode->i_sb; @@ -213,6 +251,10 @@ static void *fsnotify_detach_connector_from_object( if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); inode->i_fsnotify_mask = 0; + + /* Unpin inode when detaching from connector */ + if (!(conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF)) + inode = NULL; } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) { @@ -274,7 +316,8 @@ void fsnotify_put_mark(struct fsnotify_mark *mark) objp = fsnotify_detach_connector_from_object(conn, &type); free_conn = true; } else { - __fsnotify_recalc_mask(conn); + objp = __fsnotify_recalc_mask(conn); + type = conn->type; } WRITE_ONCE(mark->connector, NULL); spin_unlock(&conn->lock); @@ -497,7 +540,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, unsigned int obj_type, __kernel_fsid_t *fsid) { - struct inode *inode = NULL; struct fsnotify_mark_connector *conn;
conn = kmem_cache_alloc(fsnotify_mark_connector_cachep, GFP_KERNEL); @@ -505,6 +547,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, return -ENOMEM; spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); + conn->flags = 0; conn->type = obj_type; conn->obj = connp; /* Cache fsid of filesystem containing the object */ @@ -515,10 +558,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, conn->fsid.val[0] = conn->fsid.val[1] = 0; conn->flags = 0; } - if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { - inode = fsnotify_conn_inode(conn); - fsnotify_get_inode_ref(inode); - } fsnotify_get_sb_connectors(conn);
/* @@ -527,8 +566,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, */ if (cmpxchg(connp, NULL, conn)) { /* Someone else created list structure for us */ - if (inode) - fsnotify_put_inode_ref(inode); fsnotify_put_sb_connectors(conn); kmem_cache_free(fsnotify_mark_connector_cachep, conn); } @@ -690,8 +727,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, if (ret) goto err;
- if (mark->mask || mark->ignored_mask) - fsnotify_recalc_mask(mark->connector); + fsnotify_recalc_mask(mark->connector);
return ret; err: diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d62111e832440..9a1a9e78f69f5 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -456,6 +456,7 @@ struct fsnotify_mark_connector { spinlock_t lock; unsigned short type; /* Type of object [lock] */ #define FSNOTIFY_CONN_FLAG_HAS_FSID 0x01 +#define FSNOTIFY_CONN_FLAG_HAS_IREF 0x02 unsigned short flags; /* flags [lock] */ __kernel_fsid_t fsid; /* fsid of filesystem containing object */ union { @@ -510,6 +511,7 @@ struct fsnotify_mark { #define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 +#define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 unsigned int flags; /* flags [mark->lock] */ };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 4adce25ccfff215939ee465b8c0aa70526d5c352 ]
To translate from fsnotify mark flags to user visible flags.
Link: https://lore.kernel.org/r/20220422120327.3459282-13-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 10 ++++++++++ fs/notify/fdinfo.c | 6 ++---- 2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index a3d5b751cac5b..87142bc0131a4 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -490,3 +490,13 @@ static inline unsigned int fanotify_event_hash_bucket( { return event->hash & FANOTIFY_HTABLE_MASK; } + +static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) +{ + unsigned int mflags = 0; + + if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + mflags |= FAN_MARK_IGNORED_SURV_MODIFY; + + return mflags; +} diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 1f34c5c29fdbd..59fb40abe33d3 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -14,6 +14,7 @@ #include <linux/exportfs.h>
#include "inotify/inotify.h" +#include "fanotify/fanotify.h" #include "fdinfo.h" #include "fsnotify.h"
@@ -103,12 +104,9 @@ void inotify_show_fdinfo(struct seq_file *m, struct file *f)
static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) { - unsigned int mflags = 0; + unsigned int mflags = fanotify_mark_user_flags(mark); struct inode *inode;
- if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) - mflags |= FAN_MARK_IGNORED_SURV_MODIFY; - if (mark->connector->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = igrab(fsnotify_conn_inode(mark->connector)); if (!inode)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 8998d110835e3781ccd3f1ae061a590b4aaba911 ]
Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that is called after updating the mark mask.
Replace the added and removed return values and help variables with bool recalc return values and help variable, which makes the code a bit easier to follow.
Rename flags argument to fan_flags to emphasize the difference from mark->flags.
Link: https://lore.kernel.org/r/20220422120327.3459282-14-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 47 ++++++++++++++++-------------- 1 file changed, 25 insertions(+), 22 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 731bd7f64e018..f0206e3d11a75 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1069,42 +1069,45 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, flags, umask); }
-static void fanotify_mark_add_ignored_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags, - __u32 *removed) +static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, + unsigned int fan_flags) { - fsn_mark->ignored_mask |= mask; + bool recalc = false;
/* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set * because of an ignored mask that is now going to survive FS_MODIFY. */ - if ((flags & FAN_MARK_IGNORED_SURV_MODIFY) && + if ((fan_flags & FAN_MARK_IGNORED_MASK) && + (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; if (!(fsn_mark->mask & FS_MODIFY)) - *removed = FS_MODIFY; + recalc = true; } + + return recalc; }
-static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags, - __u32 *removed) +static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, + __u32 mask, unsigned int fan_flags) { - __u32 oldmask, newmask; + bool recalc;
spin_lock(&fsn_mark->lock); - oldmask = fsnotify_calc_mask(fsn_mark); - if (!(flags & FAN_MARK_IGNORED_MASK)) { + if (!(fan_flags & FAN_MARK_IGNORED_MASK)) fsn_mark->mask |= mask; - } else { - fanotify_mark_add_ignored_mask(fsn_mark, mask, flags, removed); - } - newmask = fsnotify_calc_mask(fsn_mark); + else + fsn_mark->ignored_mask |= mask; + + recalc = fsnotify_calc_mask(fsn_mark) & + ~fsnotify_conn_mask(fsn_mark->connector); + + recalc |= fanotify_mark_update_flags(fsn_mark, fan_flags); spin_unlock(&fsn_mark->lock);
- return newmask & ~oldmask; + return recalc; }
static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, @@ -1158,11 +1161,11 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group)
static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, - __u32 mask, unsigned int flags, + __u32 mask, unsigned int fan_flags, __kernel_fsid_t *fsid) { struct fsnotify_mark *fsn_mark; - __u32 added, removed = 0; + bool recalc; int ret = 0;
mutex_lock(&group->mark_mutex); @@ -1179,14 +1182,14 @@ static int fanotify_add_mark(struct fsnotify_group *group, * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). */ - if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + if (!(fan_flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { ret = fanotify_group_init_error_pool(group); if (ret) goto out; }
- added = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &removed); - if (removed || (added & ~fsnotify_conn_mask(fsn_mark->connector))) + recalc = fanotify_mark_add_to_mask(fsn_mark, mask, fan_flags); + if (recalc) fsnotify_recalc_mask(fsn_mark->connector);
out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 7d5e005d982527e4029b0139823d179986e34cdc ]
When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not pin the marked inode to inode cache, so when inode is evicted from cache due to memory pressure, the mark will be lost.
When an inode mark with flag FAN_MARK_EVICATBLE is updated without using this flag, the marked inode is pinned to inode cache.
When an inode mark is updated with flag FAN_MARK_EVICTABLE but an existing mark already has the inode pinned, the mark update fails with error EEXIST.
Evictable inode marks can be used to setup inode marks with ignored mask to suppress events from uninteresting files or directories in a lazy manner, upon receiving the first event, without having to iterate all the uninteresting files or directories before hand.
The evictbale inode mark feature allows performing this lazy marks setup without exhausting the system memory with pinned inodes.
This change does not enable the feature yet.
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkun... Link: https://lore.kernel.org/r/20220422120327.3459282-15-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 2 ++ fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++-- include/uapi/linux/fanotify.h | 1 + 3 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 87142bc0131a4..80e0ec95b1131 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -497,6 +497,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark)
if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) mflags |= FAN_MARK_IGNORED_SURV_MODIFY; + if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF) + mflags |= FAN_MARK_EVICTABLE;
return mflags; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index f0206e3d11a75..ab7a13686b49d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1072,6 +1072,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, unsigned int fan_flags) { + bool want_iref = !(fan_flags & FAN_MARK_EVICTABLE); bool recalc = false;
/* @@ -1087,7 +1088,18 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, recalc = true; }
- return recalc; + if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE || + want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + return recalc; + + /* + * NO_IREF may be removed from a mark, but not added. + * When removed, fsnotify_recalc_mask() will take the inode ref. + */ + WARN_ON_ONCE(!want_iref); + fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF; + + return true; }
static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, @@ -1113,6 +1125,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, + unsigned int fan_flags, __kernel_fsid_t *fsid) { struct ucounts *ucounts = group->fanotify_data.ucounts; @@ -1135,6 +1148,9 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, }
fsnotify_init_mark(mark, group); + if (fan_flags & FAN_MARK_EVICTABLE) + mark->flags |= FSNOTIFY_MARK_FLAG_NO_IREF; + ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid); if (ret) { fsnotify_put_mark(mark); @@ -1171,13 +1187,23 @@ static int fanotify_add_mark(struct fsnotify_group *group, mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid); + fsn_mark = fanotify_add_new_mark(group, connp, obj_type, + fan_flags, fsid); if (IS_ERR(fsn_mark)) { mutex_unlock(&group->mark_mutex); return PTR_ERR(fsn_mark); } }
+ /* + * Non evictable mark cannot be downgraded to evictable mark. + */ + if (fan_flags & FAN_MARK_EVICTABLE && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) { + ret = -EEXIST; + goto out; + } + /* * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). @@ -1607,6 +1633,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, mark_type != FAN_MARK_FILESYSTEM) goto fput_and_out;
+ /* + * Evictable is only relevant for inode marks, because only inode object + * can be evicted on memory pressure. + */ + if (flags & FAN_MARK_EVICTABLE && + mark_type != FAN_MARK_INODE) + goto fput_and_out; + /* * Events that do not carry enough information to report * event->fd require a group that supports reporting fid. Those diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index e8ac38cc2fd6d..f1f89132d60e2 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -82,6 +82,7 @@ #define FAN_MARK_IGNORED_SURV_MODIFY 0x00000040 #define FAN_MARK_FLUSH 0x00000080 /* FAN_MARK_FILESYSTEM is 0x00000100 */ +#define FAN_MARK_EVICTABLE 0x00000200
/* These are NOT bitwise flags. Both bits can be used togther. */ #define FAN_MARK_INODE 0x00000000
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e79719a2ca5c61912c0493bc1367db52759cf6fd ]
Direct reclaim from fanotify mark allocation context may try to evict inodes with evictable marks of the same group and hit this deadlock:
[<0>] fsnotify_destroy_mark+0x1f/0x3a [<0>] fsnotify_destroy_marks+0x71/0xd9 [<0>] __destroy_inode+0x24/0x7e [<0>] destroy_inode+0x2c/0x67 [<0>] dispose_list+0x49/0x68 [<0>] prune_icache_sb+0x5b/0x79 [<0>] super_cache_scan+0x11c/0x16f [<0>] shrink_slab.constprop.0+0x23e/0x40f [<0>] shrink_node+0x218/0x3e7 [<0>] do_try_to_free_pages+0x12a/0x2d2 [<0>] try_to_free_pages+0x166/0x242 [<0>] __alloc_pages_slowpath.constprop.0+0x30c/0x903 [<0>] __alloc_pages+0xeb/0x1c7 [<0>] cache_grow_begin+0x6f/0x31e [<0>] fallback_alloc+0xe0/0x12d [<0>] ____cache_alloc_node+0x15a/0x17e [<0>] kmem_cache_alloc_trace+0xa1/0x143 [<0>] fanotify_add_mark+0xd5/0x2b2 [<0>] do_fanotify_mark+0x566/0x5eb [<0>] __x64_sys_fanotify_mark+0x21/0x24 [<0>] do_syscall_64+0x6d/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under fanotify group lock and use the safe group lock helpers.
Link: https://lore.kernel.org/r/20220422120327.3459282-16-amir73il@gmail.com Suggested-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index ab7a13686b49d..ad520a2796181 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1023,10 +1023,10 @@ static int fanotify_remove_mark(struct fsnotify_group *group, __u32 removed; int destroy_mark;
- mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return -ENOENT; }
@@ -1036,7 +1036,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group, fsnotify_recalc_mask(fsn_mark->connector); if (destroy_mark) fsnotify_detach_mark(fsn_mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); if (destroy_mark) fsnotify_free_mark(fsn_mark);
@@ -1184,13 +1184,13 @@ static int fanotify_add_mark(struct fsnotify_group *group, bool recalc; int ret = 0;
- mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fan_flags, fsid); if (IS_ERR(fsn_mark)) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return PTR_ERR(fsn_mark); } } @@ -1219,7 +1219,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_recalc_mask(fsn_mark->connector);
out: - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group);
fsnotify_put_mark(fsn_mark); return ret; @@ -1373,7 +1373,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ group = fsnotify_alloc_group(&fanotify_fsnotify_ops, - FSNOTIFY_GROUP_USER); + FSNOTIFY_GROUP_USER | FSNOTIFY_GROUP_NOFS); if (IS_ERR(group)) { return PTR_ERR(group); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 5f9d3bd520261fd7a850818c71809fd580e0f30c ]
Now that the direct reclaim path is handled we can enable evictable inode marks.
Link: https://lore.kernel.org/r/20220422120327.3459282-17-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/fanotify.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index ad520a2796181..3a1325c90ff86 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1806,7 +1806,7 @@ static int __init fanotify_user_setup(void)
BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10);
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 3afdf339d53c9..81f45061c1b18 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -68,6 +68,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ + FAN_MARK_EVICTABLE | \ FAN_MARK_FLUSH)
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 14362a2541797cf9df0e86fb12dcd7950baf566e ]
fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask.
Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask.
At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children.
Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara jack@suse.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 14 +++------ fs/notify/fsnotify.c | 53 ++++++++++++++++---------------- include/linux/fsnotify_backend.h | 31 ++++++++++++++----- 3 files changed, 54 insertions(+), 44 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 985e995d2a398..263d303d8f8f1 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -319,11 +319,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; }
- fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { /* Apply ignore mask regardless of ISDIR and ON_CHILD flags */ marks_ignored_mask |= mark->ignored_mask;
@@ -849,16 +845,14 @@ static struct fanotify_event *fanotify_alloc_event( */ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *mark; int type; __kernel_fsid_t fsid = {};
- fsnotify_foreach_iter_type(type) { + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { struct fsnotify_mark_connector *conn;
- if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - - conn = READ_ONCE(iter_info->marks[type]->connector); + conn = READ_ONCE(mark->connector); /* Mark is just getting destroyed or created? */ if (!conn) continue; diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 6eee19d15e8cd..35740a64ee453 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -335,31 +335,23 @@ static int send_to_group(__u32 mask, const void *data, int data_type, struct fsnotify_mark *mark; int type;
- if (WARN_ON(!iter_info->report_mask)) + if (!iter_info->report_mask) return 0;
/* clear ignored on inode modification */ if (mask & FS_MODIFY) { - fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - if (mark && - !(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { + if (!(mark->flags & + FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mark->ignored_mask = 0; } }
- fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - /* does the object mark tell us to do something? */ - if (mark) { - group = mark->group; - marks_mask |= mark->mask; - marks_ignored_mask |= mark->ignored_mask; - } + /* Are any of the group marks interested in this event? */ + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { + group = mark->group; + marks_mask |= mark->mask; + marks_ignored_mask |= mark->ignored_mask; }
pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", @@ -403,11 +395,11 @@ static struct fsnotify_mark *fsnotify_next_mark(struct fsnotify_mark *mark)
/* * iter_info is a multi head priority queue of marks. - * Pick a subset of marks from queue heads, all with the - * same group and set the report_mask for selected subset. - * Returns the report_mask of the selected subset. + * Pick a subset of marks from queue heads, all with the same group + * and set the report_mask to a subset of the selected marks. + * Returns false if there are no more groups to iterate. */ -static unsigned int fsnotify_iter_select_report_types( +static bool fsnotify_iter_select_report_types( struct fsnotify_iter_info *iter_info) { struct fsnotify_group *max_prio_group = NULL; @@ -423,30 +415,37 @@ static unsigned int fsnotify_iter_select_report_types( }
if (!max_prio_group) - return 0; + return false;
/* Set the report mask for marks from same group as max prio group */ + iter_info->current_group = max_prio_group; iter_info->report_mask = 0; fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; - if (mark && - fsnotify_compare_groups(max_prio_group, mark->group) == 0) + if (mark && mark->group == iter_info->current_group) fsnotify_iter_set_report_type(iter_info, type); }
- return iter_info->report_mask; + return true; }
/* - * Pop from iter_info multi head queue, the marks that were iterated in the + * Pop from iter_info multi head queue, the marks that belong to the group of * current iteration step. */ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *mark; int type;
+ /* + * We cannot use fsnotify_foreach_iter_mark_type() here because we + * may need to advance a mark of type X that belongs to current_group + * but was not selected for reporting. + */ fsnotify_foreach_iter_type(type) { - if (fsnotify_iter_should_report_type(iter_info, type)) + mark = iter_info->marks[type]; + if (mark && mark->group == iter_info->current_group) iter_info->marks[type] = fsnotify_next_mark(iter_info->marks[type]); } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9a1a9e78f69f5..9560734759fa6 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -399,6 +399,7 @@ static inline bool fsnotify_valid_obj_type(unsigned int obj_type)
struct fsnotify_iter_info { struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; + struct fsnotify_group *current_group; unsigned int report_mask; int srcu_idx; }; @@ -415,20 +416,31 @@ static inline void fsnotify_iter_set_report_type( iter_info->report_mask |= (1U << iter_type); }
-static inline void fsnotify_iter_set_report_type_mark( - struct fsnotify_iter_info *iter_info, int iter_type, - struct fsnotify_mark *mark) +static inline struct fsnotify_mark *fsnotify_iter_mark( + struct fsnotify_iter_info *iter_info, int iter_type) { - iter_info->marks[iter_type] = mark; - iter_info->report_mask |= (1U << iter_type); + if (fsnotify_iter_should_report_type(iter_info, iter_type)) + return iter_info->marks[iter_type]; + return NULL; +} + +static inline int fsnotify_iter_step(struct fsnotify_iter_info *iter, int type, + struct fsnotify_mark **markp) +{ + while (type < FSNOTIFY_ITER_TYPE_COUNT) { + *markp = fsnotify_iter_mark(iter, type); + if (*markp) + break; + type++; + } + return type; }
#define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & (1U << FSNOTIFY_ITER_TYPE_##NAME)) ? \ - iter_info->marks[FSNOTIFY_ITER_TYPE_##NAME] : NULL; \ + return fsnotify_iter_mark(iter_info, FSNOTIFY_ITER_TYPE_##NAME); \ }
FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -438,6 +450,11 @@ FSNOTIFY_ITER_FUNCS(sb, SB)
#define fsnotify_foreach_iter_type(type) \ for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) +#define fsnotify_foreach_iter_mark_type(iter, mark, type) \ + for (type = 0; \ + type = fsnotify_iter_step(iter, type, &mark), \ + type < FSNOTIFY_ITER_TYPE_COUNT; \ + type++)
/* * fsnotify_connp_t is what we embed in objects which connector can be attached
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e730558adffb88a52e562db089e969ee9510184a ]
The logic for handling events on child in groups that have a mark on the parent inode, but without FS_EVENT_ON_CHILD flag in the mask is duplicated in several places and inconsistent.
Move the logic into the preparation of mark type iterator, so that the parent mark type will be excluded from all mark type iterations in that case.
This results in several subtle changes of behavior, hopefully all desired changes of behavior, for example:
- Group A has a mount mark with FS_MODIFY in mask - Group A has a mark with ignore mask that does not survive FS_MODIFY and does not watch children on directory D. - Group B has a mark with FS_MODIFY in mask that does watch children on directory D. - FS_MODIFY event on file D/foo should not clear the ignore mask of group A, but before this change it does
And if group A ignore mask was set to survive FS_MODIFY: - FS_MODIFY event on file D/foo should be reported to group A on account of the mount mark, but before this change it is wrongly ignored
Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara jack@suse.com Link: https://lore.kernel.org/linux-fsdevel/20220314113337.j7slrb5srxukztje@quack3... Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220511190213.831646-3-amir73il@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 10 +--------- fs/notify/fsnotify.c | 34 +++++++++++++++++++--------------- 2 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 263d303d8f8f1..4f897e1095470 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -320,7 +320,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, }
fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - /* Apply ignore mask regardless of ISDIR and ON_CHILD flags */ + /* Apply ignore mask regardless of mark's ISDIR flag */ marks_ignored_mask |= mark->ignored_mask;
/* @@ -330,14 +330,6 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR)) continue;
- /* - * If the event is on a child and this mark is on a parent not - * watching children, don't send it! - */ - if (type == FSNOTIFY_ITER_TYPE_PARENT && - !(mark->mask & FS_EVENT_ON_CHILD)) - continue; - marks_mask |= mark->mask;
/* Record the mark types of this group that matched the event */ diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 35740a64ee453..0b3e74935cb4f 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -290,22 +290,15 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, }
if (parent_mark) { - /* - * parent_mark indicates that the parent inode is watching - * children and interested in this event, which is an event - * possible on child. But is *this mark* watching children and - * interested in this event? - */ - if (parent_mark->mask & FS_EVENT_ON_CHILD) { - ret = fsnotify_handle_inode_event(group, parent_mark, mask, - data, data_type, dir, name, 0); - if (ret) - return ret; - } - if (!inode_mark) - return 0; + ret = fsnotify_handle_inode_event(group, parent_mark, mask, + data, data_type, dir, name, 0); + if (ret) + return ret; }
+ if (!inode_mark) + return 0; + if (mask & FS_EVENT_ON_CHILD) { /* * Some events can be sent on both parent dir and child marks @@ -422,8 +415,19 @@ static bool fsnotify_iter_select_report_types( iter_info->report_mask = 0; fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; - if (mark && mark->group == iter_info->current_group) + if (mark && mark->group == iter_info->current_group) { + /* + * FSNOTIFY_ITER_TYPE_PARENT indicates that this inode + * is watching children and interested in this event, + * which is an event possible on child. + * But is *this mark* watching children? + */ + if (type == FSNOTIFY_ITER_TYPE_PARENT && + !(mark->mask & FS_EVENT_ON_CHILD)) + continue; + fsnotify_iter_set_report_type(iter_info, type); + } }
return true;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@openvz.org
[ Upstream commit dccd855771b37820b6d976a99729c88259549f85 ]
Fixes sparce warnings: fs/notify/fanotify/fanotify_user.c:267:63: sparse: warning: restricted fmode_t degrades to integer fs/notify/fanotify/fanotify_user.c:1351:28: sparse: warning: restricted fmode_t degrades to integer
FMODE_NONTIFY have bitwise fmode_t type and requires __force attribute for any casts.
Signed-off-by: Vasily Averin vvs@openvz.org Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/9adfd6ac-1b89-791e-796b-49ada3293985@openvz.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3a1325c90ff86..f4a5e9074dd42 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -252,7 +252,7 @@ static int create_fd(struct fsnotify_group *group, struct path *path, * originally opened O_WRONLY. */ new_file = dentry_open(path, - group->fanotify_data.f_flags | FMODE_NONOTIFY, + group->fanotify_data.f_flags | __FMODE_NONOTIFY, current_cred()); if (IS_ERR(new_file)) { /* @@ -1365,7 +1365,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID))) return -EINVAL;
- f_flags = O_RDWR | FMODE_NONOTIFY; + f_flags = O_RDWR | __FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; if (flags & FAN_NONBLOCK)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 91e23b1c39820bfed642119ff6b6ef9f43cf09ce ]
nfsd_splice_actor() checks that the page being spliced does not match the previous element in the svc_rqst::rq_pages array. We believe this is to prevent a double put_page() in cases where the READ payload is partially contained in the xdr_buf's head buffer.
However, the NFSD READ proc functions no longer place any part of the READ payload in the head buffer, in order to properly support NFS/RDMA READ with Write chunks. Therefore, simplify the logic in nfsd_splice_actor() to remove this unnecessary check.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 86584e727ce09..0968eaf735c85 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -874,17 +874,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { struct svc_rqst *rqstp = sd->u.data; - struct page **pp = rqstp->rq_next_page; - struct page *page = buf->page;
- if (rqstp->rq_res.page_len == 0) { - svc_rqst_replace_page(rqstp, page); + svc_rqst_replace_page(rqstp, buf->page); + if (rqstp->rq_res.page_len == 0) rqstp->rq_res.page_base = buf->offset; - } else if (page != pp[-1]) { - svc_rqst_replace_page(rqstp, page); - } rqstp->rq_res.page_len += sd->len; - return sd->len; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 66af25799940b26efd41ea6e648f75c41a48a2c2 ]
This patch provides courteous server support for delegation only. Only expired client with delegation but no conflict and no open or lock state is allowed to be in COURTESY state.
Delegation conflict with COURTESY/EXPIRABLE client is resolved by setting it to EXPIRABLE, queue work for the laundromat and return delay to the caller. Conflict is resolved when the laudromat runs and expires the EXIRABLE client while the NFS client retries the OPEN request. Local thread request that gets conflict is doing the retry in _break_lease.
Client in COURTESY or EXPIRABLE state is allowed to reconnect and continues to have access to its state. Access to the nfs4_client by the reconnecting thread and the laundromat is serialized via the client_lock.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 82 ++++++++++++++++++++++++++++++++++++--------- fs/nfsd/nfsd.h | 1 + fs/nfsd/state.h | 31 +++++++++++++++++ 3 files changed, 99 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0170aaf318ea2..74c0a88904047 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -125,6 +125,8 @@ static void free_session(struct nfsd4_session *); static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
+static struct workqueue_struct *laundry_wq; + static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -152,6 +154,7 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); + clp->cl_state = NFSD4_ACTIVE; return nfs_ok; }
@@ -172,6 +175,7 @@ renew_client_locked(struct nfs4_client *clp)
list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); + clp->cl_state = NFSD4_ACTIVE; }
static void put_client_renew_locked(struct nfs4_client *clp) @@ -1102,6 +1106,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, get_clnt_odstate(odstate); dp->dl_type = NFS4_OPEN_DELEGATE_READ; dp->dl_retries = 1; + dp->dl_recalled = false; nfsd4_init_cb(&dp->dl_recall, dp->dl_stid.sc_client, &nfsd4_cb_recall_ops, NFSPROC4_CLNT_CB_RECALL); get_nfs4_file(fp); @@ -2017,6 +2022,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) idr_init(&clp->cl_stateids); atomic_set(&clp->cl_rpc_users, 0); clp->cl_cb_state = NFSD4_CB_UNKNOWN; + clp->cl_state = NFSD4_ACTIVE; + atomic_set(&clp->cl_delegs_in_recall, 0); INIT_LIST_HEAD(&clp->cl_idhash); INIT_LIST_HEAD(&clp->cl_openowners); INIT_LIST_HEAD(&clp->cl_delegations); @@ -4711,9 +4718,18 @@ nfsd_break_deleg_cb(struct file_lock *fl) bool ret = false; struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; + struct nfs4_client *clp = dp->dl_stid.sc_client; + struct nfsd_net *nn;
trace_nfsd_cb_recall(&dp->dl_stid);
+ dp->dl_recalled = true; + atomic_inc(&clp->cl_delegs_in_recall); + if (try_to_expire_client(clp)) { + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + } + /* * We don't want the locks code to timeout the lease for us; * we'll remove it ourself if a delegation isn't returned @@ -4756,9 +4772,14 @@ static int nfsd_change_deleg_cb(struct file_lock *onlist, int arg, struct list_head *dispose) { - if (arg & F_UNLCK) + struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner; + struct nfs4_client *clp = dp->dl_stid.sc_client; + + if (arg & F_UNLCK) { + if (dp->dl_recalled) + atomic_dec(&clp->cl_delegs_in_recall); return lease_modify(onlist, arg, dispose); - else + } else return -EAGAIN; }
@@ -5622,6 +5643,49 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif
+/* + * place holder for now, no check for lock blockers yet + */ +static bool +nfs4_anylock_blockers(struct nfs4_client *clp) +{ + if (atomic_read(&clp->cl_delegs_in_recall) || + client_has_openowners(clp) || + !list_empty(&clp->async_copies)) + return true; + return false; +} + +static void +nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, + struct laundry_time *lt) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + + INIT_LIST_HEAD(reaplist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state == NFSD4_EXPIRABLE) + goto exp_client; + if (!state_expired(lt, clp->cl_time)) + break; + if (!atomic_read(&clp->cl_rpc_users)) + clp->cl_state = NFSD4_COURTESY; + if (!client_has_state(clp) || + ktime_get_boottime_seconds() >= + (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) + goto exp_client; + if (nfs4_anylock_blockers(clp)) { +exp_client: + if (!mark_client_expired_locked(clp)) + list_add(&clp->cl_lru, reaplist); + } + } + spin_unlock(&nn->client_lock); +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5644,7 +5708,6 @@ nfs4_laundromat(struct nfsd_net *nn) goto out; } nfsd4_end_grace(nn); - INIT_LIST_HEAD(&reaplist);
spin_lock(&nn->s2s_cp_lock); idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { @@ -5654,17 +5717,7 @@ nfs4_laundromat(struct nfsd_net *nn) _free_cpntf_state_locked(nn, cps); } spin_unlock(&nn->s2s_cp_lock); - - spin_lock(&nn->client_lock); - list_for_each_safe(pos, next, &nn->client_lru) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - if (!state_expired(<, clp->cl_time)) - break; - if (mark_client_expired_locked(clp)) - continue; - list_add(&clp->cl_lru, &reaplist); - } - spin_unlock(&nn->client_lock); + nfs4_get_client_reaplist(nn, &reaplist, <); list_for_each_safe(pos, next, &reaplist) { clp = list_entry(pos, struct nfs4_client, cl_lru); trace_nfsd_clid_purged(&clp->cl_clientid); @@ -5739,7 +5792,6 @@ nfs4_laundromat(struct nfsd_net *nn) return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); }
-static struct workqueue_struct *laundry_wq; static void laundromat_main(struct work_struct *);
static void diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 4fc1fd639527a..23996c6ca75e3 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -336,6 +336,7 @@ void nfsd_lockd_shutdown(void); #define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */
#define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ +#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */
/* * The following attributes are currently not supported by the NFSv4 server: diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 95457cfd37fc0..f3d6313914ed0 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -149,6 +149,7 @@ struct nfs4_delegation { /* For recall: */ int dl_retries; struct nfsd4_callback dl_recall; + bool dl_recalled; };
#define cb_to_delegation(cb) \ @@ -282,6 +283,28 @@ struct nfsd4_sessionid {
#define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */
+/* + * State Meaning Where set + * -------------------------------------------------------------------------- + * | NFSD4_ACTIVE | Confirmed, active | Default | + * |------------------- ----------------------------------------------------| + * | NFSD4_COURTESY | Courtesy state. | nfs4_get_client_reaplist | + * | | Lease/lock/share | | + * | | reservation conflict | | + * | | can cause Courtesy | | + * | | client to be expired | | + * |------------------------------------------------------------------------| + * | NFSD4_EXPIRABLE | Courtesy client to be| nfs4_laundromat | + * | | expired by Laundromat| try_to_expire_client | + * | | due to conflict | | + * |------------------------------------------------------------------------| + */ +enum { + NFSD4_ACTIVE = 0, + NFSD4_COURTESY, + NFSD4_EXPIRABLE, +}; + /* * struct nfs4_client - one per client. Clientids live here. * @@ -385,6 +408,9 @@ struct nfs4_client { struct list_head async_copies; /* list of async copies */ spinlock_t async_lock; /* lock for async copies */ atomic_t cl_cb_inflight; /* Outstanding callbacks */ + + unsigned int cl_state; + atomic_t cl_delegs_in_recall; };
/* struct nfs4_client_reset @@ -702,4 +728,9 @@ extern void nfsd4_client_record_remove(struct nfs4_client *clp); extern int nfsd4_client_record_check(struct nfs4_client *clp); extern void nfsd4_record_grace_done(struct nfsd_net *nn);
+static inline bool try_to_expire_client(struct nfs4_client *clp) +{ + cmpxchg(&clp->cl_state, NFSD4_COURTESY, NFSD4_EXPIRABLE); + return clp->cl_state == NFSD4_EXPIRABLE; +} #endif /* NFSD4_STATE_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 3d69427151806656abf129342028f3f4e5e1fee0 ]
This patch allows expired client with open state to be in COURTESY state. Share/access conflict with COURTESY client is resolved by setting COURTESY client to EXPIRABLE state, schedule laundromat to run and returning nfserr_jukebox to the request client.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 109 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 101 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 74c0a88904047..4ba0a70d8990f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -705,6 +705,57 @@ static unsigned int file_hashval(struct svc_fh *fh)
static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
+/* + * Check if courtesy clients have conflicting access and resolve it if possible + * + * access: is op_share_access if share_access is true. + * Check if access mode, op_share_access, would conflict with + * the current deny mode of the file 'fp'. + * access: is op_share_deny if share_access is false. + * Check if the deny mode, op_share_deny, would conflict with + * current access of the file 'fp'. + * stp: skip checking this entry. + * new_stp: normal open, not open upgrade. + * + * Function returns: + * false - access/deny mode conflict with normal client. + * true - no conflict or conflict with courtesy client(s) is resolved. + */ +static bool +nfs4_resolve_deny_conflicts_locked(struct nfs4_file *fp, bool new_stp, + struct nfs4_ol_stateid *stp, u32 access, bool share_access) +{ + struct nfs4_ol_stateid *st; + bool resolvable = true; + unsigned char bmap; + struct nfsd_net *nn; + struct nfs4_client *clp; + + lockdep_assert_held(&fp->fi_lock); + list_for_each_entry(st, &fp->fi_stateids, st_perfile) { + /* ignore lock stateid */ + if (st->st_openstp) + continue; + if (st == stp && new_stp) + continue; + /* check file access against deny mode or vice versa */ + bmap = share_access ? st->st_deny_bmap : st->st_access_bmap; + if (!(access & bmap_to_share_mode(bmap))) + continue; + clp = st->st_stid.sc_client; + if (try_to_expire_client(clp)) + continue; + resolvable = false; + break; + } + if (resolvable) { + clp = stp->st_stid.sc_client; + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + } + return resolvable; +} + static void __nfs4_file_get_access(struct nfs4_file *fp, u32 access) { @@ -4985,7 +5036,7 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh,
static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, - struct nfsd4_open *open) + struct nfsd4_open *open, bool new_stp) { struct nfsd_file *nf = NULL; __be32 status; @@ -5001,6 +5052,13 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, */ status = nfs4_file_check_deny(fp, open->op_share_deny); if (status != nfs_ok) { + if (status != nfserr_share_denied) { + spin_unlock(&fp->fi_lock); + goto out; + } + if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5008,6 +5066,13 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, /* set access to the file */ status = nfs4_file_get_access(fp, open->op_share_access); if (status != nfs_ok) { + if (status != nfserr_share_denied) { + spin_unlock(&fp->fi_lock); + goto out; + } + if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, + stp, open->op_share_access, true)) + status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5054,21 +5119,29 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, }
static __be32 -nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) +nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, + struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, + struct nfsd4_open *open) { __be32 status; unsigned char old_deny_bmap = stp->st_deny_bmap;
if (!test_access(open->op_share_access, stp)) - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); + return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open, false);
/* test and set deny mode */ spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); if (status == nfs_ok) { - set_deny(open->op_share_deny, stp); - fp->fi_share_deny |= + if (status != nfserr_share_denied) { + set_deny(open->op_share_deny, stp); + fp->fi_share_deny |= (open->op_share_deny & NFS4_SHARE_DENY_BOTH); + } else { + if (nfs4_resolve_deny_conflicts_locked(fp, false, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; + } } spin_unlock(&fp->fi_lock);
@@ -5409,7 +5482,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); + status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open, true); if (status) { stp->st_stid.sc_type = NFS4_CLOSED_STID; release_open_stateid(stp); @@ -5643,6 +5716,26 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif
+static bool +nfs4_has_any_locks(struct nfs4_client *clp) +{ + int i; + struct nfs4_stateowner *so; + + spin_lock(&clp->cl_lock); + for (i = 0; i < OWNER_HASH_SIZE; i++) { + list_for_each_entry(so, &clp->cl_ownerstr_hashtbl[i], + so_strhash) { + if (so->so_is_open_owner) + continue; + spin_unlock(&clp->cl_lock); + return true; + } + } + spin_unlock(&clp->cl_lock); + return false; +} + /* * place holder for now, no check for lock blockers yet */ @@ -5650,8 +5743,8 @@ static bool nfs4_anylock_blockers(struct nfs4_client *clp) { if (atomic_read(&clp->cl_delegs_in_recall) || - client_has_openowners(clp) || - !list_empty(&clp->async_copies)) + !list_empty(&clp->async_copies) || + nfs4_has_any_locks(clp)) return true; return false; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit d76cc46b37e123e8d245cc3490978dbda56f979d ]
This patch moves create/destroy of laundry_wq from nfs4_state_start and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent the laundromat from being freed while a thread is processing a conflicting lock.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 28 ++++++++++++++++------------ fs/nfsd/nfsctl.c | 4 ++++ fs/nfsd/nfsd.h | 4 ++++ 3 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4ba0a70d8990f..30ea1c7b6b9fd 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -127,6 +127,21 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
static struct workqueue_struct *laundry_wq;
+int nfsd4_create_laundry_wq(void) +{ + int rc = 0; + + laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); + if (laundry_wq == NULL) + rc = -ENOMEM; + return rc; +} + +void nfsd4_destroy_laundry_wq(void) +{ + destroy_workqueue(laundry_wq); +} + static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -7775,22 +7790,12 @@ nfs4_state_start(void) { int ret;
- laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); - if (laundry_wq == NULL) { - ret = -ENOMEM; - goto out; - } ret = nfsd4_create_callback_queue(); if (ret) - goto out_free_laundry; + return ret;
set_max_delegations(); return 0; - -out_free_laundry: - destroy_workqueue(laundry_wq); -out: - return ret; }
void @@ -7827,7 +7832,6 @@ nfs4_state_shutdown_net(struct net *net) void nfs4_state_shutdown(void) { - destroy_workqueue(laundry_wq); nfsd4_destroy_callback_queue(); }
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 16920e4512bde..322a208878f2c 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1542,6 +1542,9 @@ static int __init init_nfsd(void) if (retval < 0) goto out_free_filesystem; retval = register_cld_notifier(); + if (retval) + goto out_free_all; + retval = nfsd4_create_laundry_wq(); if (retval) goto out_free_all; return 0; @@ -1566,6 +1569,7 @@ static int __init init_nfsd(void)
static void __exit exit_nfsd(void) { + nfsd4_destroy_laundry_wq(); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); nfsd_drc_slab_free(); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 23996c6ca75e3..847b482155ae9 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -162,6 +162,8 @@ void nfs4_state_shutdown_net(struct net *net); int nfs4_reset_recoverydir(char *recdir); char * nfs4_recoverydir(void); bool nfsd4_spo_must_allow(struct svc_rqst *rqstp); +int nfsd4_create_laundry_wq(void); +void nfsd4_destroy_laundry_wq(void); #else static inline int nfsd4_init_slabs(void) { return 0; } static inline void nfsd4_free_slabs(void) { } @@ -175,6 +177,8 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { return false; } +static inline int nfsd4_create_laundry_wq(void) { return 0; }; +static inline void nfsd4_destroy_laundry_wq(void) {}; #endif
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 591502c5cb325b1c6ec59ab161927d606b918aa0 ]
Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/locks.c | 28 ++++++++++++++++++++++++++++ include/linux/fs.h | 7 +++++++ 2 files changed, 35 insertions(+)
diff --git a/fs/locks.c b/fs/locks.c index 101867933e4d3..118df2812f8aa 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -376,6 +376,34 @@ void locks_release_private(struct file_lock *fl) } EXPORT_SYMBOL_GPL(locks_release_private);
+/** + * locks_owner_has_blockers - Check for blocking lock requests + * @flctx: file lock context + * @owner: lock owner + * + * Return values: + * %true: @owner has at least one blocker + * %false: @owner has no blockers + */ +bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner) +{ + struct file_lock *fl; + + spin_lock(&flctx->flc_lock); + list_for_each_entry(fl, &flctx->flc_posix, fl_list) { + if (fl->fl_owner != owner) + continue; + if (!list_empty(&fl->fl_blocked_requests)) { + spin_unlock(&flctx->flc_lock); + return true; + } + } + spin_unlock(&flctx->flc_lock); + return false; +} +EXPORT_SYMBOL_GPL(locks_owner_has_blockers); + /* Free a lock which is not in use. */ void locks_free_lock(struct file_lock *fl) { diff --git a/include/linux/fs.h b/include/linux/fs.h index c0459446e1440..17dc1ee8c6cb2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1163,6 +1163,8 @@ extern void lease_unregister_notifier(struct notifier_block *); struct files_struct; extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); +extern bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner); #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1303,6 +1305,11 @@ static inline int lease_modify(struct file_lock *fl, int arg, struct files_struct; static inline void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files) {} +static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner) +{ + return false; +} #endif /* !CONFIG_FILE_LOCKING */
static inline struct inode *file_inode(const struct file *f)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 2443da2259e97688f93d64d17ab69b15f466078a ]
Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible.
A new field, lm_mod_owner, is also added to lock_manager_operations. The lm_mod_owner is used by the fs/lock code to make sure the lock manager module such as nfsd, is not freed while lock conflict is being resolved.
lm_lock_expirable checks and returns true to indicate that the lock conflict can be resolved else return false. This callback must be called with the flc_lock held so it can not block.
lm_expire_lock is called to resolve the lock conflict if the returned value from lm_lock_expirable is true. This callback is called without the flc_lock held since it's allowed to block. Upon returning from this callback, the lock conflict should be resolved and the caller is expected to restart the conflict check from the beginnning of the list.
Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/locking.rst | 4 ++++ fs/locks.c | 33 ++++++++++++++++++++++++--- include/linux/fs.h | 3 +++ 3 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 07e57f7629202..23a0d24168bc5 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -433,6 +433,8 @@ prototypes:: void (*lm_break)(struct file_lock *); /* break_lease callback */ int (*lm_change)(struct file_lock **, int); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_lock_expirable)(struct file_lock *); + void (*lm_expire_lock)(void);
locking rules:
@@ -444,6 +446,8 @@ lm_grant: no no no lm_break: yes no no lm_change yes no no lm_breaker_owns_lease: yes no no +lm_lock_expirable yes no no +lm_expire_lock no no yes ====================== ============= ================= =========
buffer_head diff --git a/fs/locks.c b/fs/locks.c index 118df2812f8aa..13a3ba97b73d1 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -982,6 +982,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl) struct file_lock *cfl; struct file_lock_context *ctx; struct inode *inode = locks_inode(filp); + void *owner; + void (*func)(void);
ctx = smp_load_acquire(&inode->i_flctx); if (!ctx || list_empty_careful(&ctx->flc_posix)) { @@ -989,12 +991,23 @@ posix_test_lock(struct file *filp, struct file_lock *fl) return; }
+retry: spin_lock(&ctx->flc_lock); list_for_each_entry(cfl, &ctx->flc_posix, fl_list) { - if (posix_locks_conflict(fl, cfl)) { - locks_copy_conflock(fl, cfl); - goto out; + if (!posix_locks_conflict(fl, cfl)) + continue; + if (cfl->fl_lmops && cfl->fl_lmops->lm_lock_expirable + && (*cfl->fl_lmops->lm_lock_expirable)(cfl)) { + owner = cfl->fl_lmops->lm_mod_owner; + func = cfl->fl_lmops->lm_expire_lock; + __module_get(owner); + spin_unlock(&ctx->flc_lock); + (*func)(); + module_put(owner); + goto retry; } + locks_copy_conflock(fl, cfl); + goto out; } fl->fl_type = F_UNLCK; out: @@ -1168,6 +1181,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, int error; bool added = false; LIST_HEAD(dispose); + void *owner; + void (*func)(void);
ctx = locks_get_lock_context(inode, request->fl_type); if (!ctx) @@ -1186,6 +1201,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, new_fl2 = locks_alloc_lock(); }
+retry: percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); /* @@ -1197,6 +1213,17 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, list_for_each_entry(fl, &ctx->flc_posix, fl_list) { if (!posix_locks_conflict(request, fl)) continue; + if (fl->fl_lmops && fl->fl_lmops->lm_lock_expirable + && (*fl->fl_lmops->lm_lock_expirable)(fl)) { + owner = fl->fl_lmops->lm_mod_owner; + func = fl->fl_lmops->lm_expire_lock; + __module_get(owner); + spin_unlock(&ctx->flc_lock); + percpu_up_read(&file_rwsem); + (*func)(); + module_put(owner); + goto retry; + } if (conflock) locks_copy_conflock(conflock, fl); error = -EAGAIN; diff --git a/include/linux/fs.h b/include/linux/fs.h index 17dc1ee8c6cb2..3e9105b3cc767 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1017,6 +1017,7 @@ struct file_lock_operations { };
struct lock_manager_operations { + void *lm_mod_owner; fl_owner_t (*lm_get_owner)(fl_owner_t); void (*lm_put_owner)(fl_owner_t); void (*lm_notify)(struct file_lock *); /* unblock callback */ @@ -1025,6 +1026,8 @@ struct lock_manager_operations { int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_lock_expirable)(struct file_lock *cfl); + void (*lm_expire_lock)(void); };
struct lock_manager {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 27431affb0dbc259ac6ffe6071243a576c8f38f1 ]
This patch allows expired client with lock state to be in COURTESY state. Lock conflict with COURTESY client is resolved by the fs/lock code using the lm_lock_expirable and lm_expire_lock callback in the struct lock_manager_operations.
If conflict client is in COURTESY state, set it to EXPIRABLE and schedule the laundromat to run immediately to expire the client. The callback lm_expire_lock waits for the laundromat to flush its work queue before returning to caller.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 70 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 54 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 30ea1c7b6b9fd..6851fece3a760 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5731,39 +5731,51 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif
+/* Check if any lock belonging to this lockowner has any blockers */ static bool -nfs4_has_any_locks(struct nfs4_client *clp) +nfs4_lockowner_has_blockers(struct nfs4_lockowner *lo) +{ + struct file_lock_context *ctx; + struct nfs4_ol_stateid *stp; + struct nfs4_file *nf; + + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { + nf = stp->st_stid.sc_file; + ctx = nf->fi_inode->i_flctx; + if (!ctx) + continue; + if (locks_owner_has_blockers(ctx, lo)) + return true; + } + return false; +} + +static bool +nfs4_anylock_blockers(struct nfs4_client *clp) { int i; struct nfs4_stateowner *so; + struct nfs4_lockowner *lo;
+ if (atomic_read(&clp->cl_delegs_in_recall)) + return true; spin_lock(&clp->cl_lock); for (i = 0; i < OWNER_HASH_SIZE; i++) { list_for_each_entry(so, &clp->cl_ownerstr_hashtbl[i], so_strhash) { if (so->so_is_open_owner) continue; - spin_unlock(&clp->cl_lock); - return true; + lo = lockowner(so); + if (nfs4_lockowner_has_blockers(lo)) { + spin_unlock(&clp->cl_lock); + return true; + } } } spin_unlock(&clp->cl_lock); return false; }
-/* - * place holder for now, no check for lock blockers yet - */ -static bool -nfs4_anylock_blockers(struct nfs4_client *clp) -{ - if (atomic_read(&clp->cl_delegs_in_recall) || - !list_empty(&clp->async_copies) || - nfs4_has_any_locks(clp)) - return true; - return false; -} - static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) @@ -6743,6 +6755,29 @@ nfsd4_lm_put_owner(fl_owner_t owner) nfs4_put_stateowner(&lo->lo_owner); }
+/* return pointer to struct nfs4_client if client is expirable */ +static bool +nfsd4_lm_lock_expirable(struct file_lock *cfl) +{ + struct nfs4_lockowner *lo = (struct nfs4_lockowner *)cfl->fl_owner; + struct nfs4_client *clp = lo->lo_owner.so_client; + struct nfsd_net *nn; + + if (try_to_expire_client(clp)) { + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + return true; + } + return false; +} + +/* schedule laundromat to run immediately and wait for it to complete */ +static void +nfsd4_lm_expire_lock(void) +{ + flush_workqueue(laundry_wq); +} + static void nfsd4_lm_notify(struct file_lock *fl) { @@ -6769,9 +6804,12 @@ nfsd4_lm_notify(struct file_lock *fl) }
static const struct lock_manager_operations nfsd_posix_mng_ops = { + .lm_mod_owner = THIS_MODULE, .lm_notify = nfsd4_lm_notify, .lm_get_owner = nfsd4_lm_get_owner, .lm_put_owner = nfsd4_lm_put_owner, + .lm_lock_expirable = nfsd4_lm_lock_expirable, + .lm_expire_lock = nfsd4_lm_expire_lock, };
static inline void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit e9488d5ae13c0a72223c507e2508dc2ac66cad4f ]
Update client_info_show to show state of courtesy client and seconds since last renew.
Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6851fece3a760..9116496b476aa 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2494,10 +2494,17 @@ static int client_info_show(struct seq_file *m, void *v) memcpy(&clid, &clp->cl_clientid, sizeof(clid)); seq_printf(m, "clientid: 0x%llx\n", clid); seq_printf(m, "address: "%pISpc"\n", (struct sockaddr *)&clp->cl_addr); - if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) + + if (clp->cl_state == NFSD4_COURTESY) + seq_puts(m, "status: courtesy\n"); + else if (clp->cl_state == NFSD4_EXPIRABLE) + seq_puts(m, "status: expirable\n"); + else if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) seq_puts(m, "status: confirmed\n"); else seq_puts(m, "status: unconfirmed\n"); + seq_printf(m, "seconds from last renew: %lld\n", + ktime_get_boottime_seconds() - clp->cl_time); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e61568599c9ad638fdaba150fee07d7065e31851 ]
As near as I can tell, mode bit masking and setting S_IFREG is already done by do_nfsd_create() and vfs_create(). The NFSv4 path (do_open_lookup), for example, does not bother with this special processing.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 936eebd4c56dc..981a2a71c5af7 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -229,8 +229,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - svc_fh *dirfhp, *newfhp = NULL; - struct iattr *attr; + svc_fh *dirfhp, *newfhp;
dprintk("nfsd: CREATE(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -239,20 +238,9 @@ nfsd3_proc_create(struct svc_rqst *rqstp)
dirfhp = fh_copy(&resp->dirfh, &argp->fh); newfhp = fh_init(&resp->fh, NFS3_FHSIZE); - attr = &argp->attrs; - - /* Unfudge the mode bits */ - attr->ia_mode &= ~S_IFMT; - if (!(attr->ia_valid & ATTR_MODE)) { - attr->ia_valid |= ATTR_MODE; - attr->ia_mode = S_IFREG; - } else { - attr->ia_mode = (attr->ia_mode & ~S_IFMT) | S_IFREG; - }
- /* Now create the file and set attributes */ resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len, - attr, newfhp, argp->createmode, + &argp->attrs, newfhp, argp->createmode, (u32 *)argp->verf, NULL, NULL); return rpc_success; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 14ee45b70dd0d9ae76fb066cd8c0652d657353f6 ]
Clean up: The "out" label already invokes fh_drop_write().
Note that fh_drop_write() is already careful not to invoke mnt_drop_write() if either it has already been done or there is nothing to drop. Therefore no change in behavior is expected.
[ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 0968eaf735c85..cdeba19db16df 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1504,7 +1504,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, case NFS3_CREATE_GUARDED: err = nfserr_exist; } - fh_drop_write(fhp); goto out; }
@@ -1512,10 +1511,8 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, iap->ia_mode &= ~current_umask();
host_err = vfs_create(dirp, dchild, iap->ia_mode, true); - if (host_err < 0) { - fh_drop_write(fhp); + if (host_err < 0) goto out_nfserr; - } if (created) *created = true;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5f46e950c395b9c14c282b53ba78c5fd46d6c256 ]
I'd like to move do_nfsd_create() out of vfs.c. Therefore nfsd_create_setattr() needs to be made publicly visible.
Note that both call sites in vfs.c commit both the new object and its parent directory, so just combine those common metadata commits into nfsd_create_setattr().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 79 +++++++++++++++++++++++++++------------------------ fs/nfsd/vfs.h | 2 ++ 2 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index cdeba19db16df..50451789a9444 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1207,14 +1207,26 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, return err; }
-static __be32 -nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, - struct iattr *iap) +/** + * nfsd_create_setattr - Set a created file's attributes + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of parent directory + * @resfhp: NFS filehandle of new object + * @iap: requested attributes of new object + * + * Returns nfs_ok on success, or an nfsstat in network byte order. + */ +__be32 +nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct iattr *iap) { + __be32 status; + /* - * Mode has already been set earlier in create: + * Mode has already been set by file creation. */ iap->ia_valid &= ~ATTR_MODE; + /* * Setting uid/gid works only for root. Irix appears to * send along the gid on create when it tries to implement @@ -1222,10 +1234,31 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, */ if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID)) iap->ia_valid &= ~(ATTR_UID|ATTR_GID); + + /* + * Callers expect new file metadata to be committed even + * if the attributes have not changed. + */ if (iap->ia_valid) - return nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); - /* Callers expect file metadata to be committed here */ - return nfserrno(commit_metadata(resfhp)); + status = nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); + else + status = nfserrno(commit_metadata(resfhp)); + + /* + * Transactional filesystems had a chance to commit changes + * for both parent and child simultaneously making the + * following commit_metadata a noop in many cases. + */ + if (!status) + status = nfserrno(commit_metadata(fhp)); + + /* + * Update the new filehandle to pick up the new attributes. + */ + if (!status) + status = fh_update(resfhp); + + return status; }
/* HPUX client sometimes creates a file in mode 000, and sets size to 0. @@ -1252,7 +1285,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, struct dentry *dentry, *dchild; struct inode *dirp; __be32 err; - __be32 err2; int host_err;
dentry = fhp->fh_dentry; @@ -1324,22 +1356,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err < 0) goto out_nfserr;
- err = nfsd_create_setattr(rqstp, resfhp, iap); + err = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
- /* - * nfsd_create_setattr already committed the child. Transactional - * filesystems had a chance to commit changes for both parent and - * child simultaneously making the following commit_metadata a - * noop. - */ - err2 = nfserrno(commit_metadata(fhp)); - if (err2) - err = err2; - /* - * Update the file handle to get the new inode info. - */ - if (!err) - err = fh_update(resfhp); out: dput(dchild); return err; @@ -1530,20 +1548,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, }
set_attr: - err = nfsd_create_setattr(rqstp, resfhp, iap); - - /* - * nfsd_create_setattr already committed the child - * (and possibly also the parent). - */ - if (!err) - err = nfserrno(commit_metadata(fhp)); - - /* - * Update the filehandle to get the new inode info. - */ - if (!err) - err = fh_update(resfhp); + err = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
out: fh_unlock(fhp); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index ccb87b2864f64..1f32a83456b03 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -69,6 +69,8 @@ __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); +__be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct iattr *iap); __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, struct svc_fh *res, int createmode,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit df9606abddfb01090d5ece7dcc2441d848f690f0 ]
The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to diverge such that it makes sense to split do_nfsd_create() into one version for NFSv3 and one for NFSv4.
As a first step, copy do_nfsd_create() to nfs3proc.c and remove NFSv4-specific logic.
One immediate legibility benefit is that the logic for handling NFSv3 createhow is now quite straightforward. NFSv4 createhow has some subtleties that IMO do not belong in generic code.
[ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 127 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 121 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 981a2a71c5af7..e314545bbdb2e 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -8,6 +8,7 @@ #include <linux/fs.h> #include <linux/ext2_fs.h> #include <linux/magic.h> +#include <linux/namei.h>
#include "cache.h" #include "xdr3.h" @@ -220,10 +221,126 @@ nfsd3_proc_write(struct svc_rqst *rqstp) }
/* - * With NFSv3, CREATE processing is a lot easier than with NFSv2. - * At least in theory; we'll see how it fares in practice when the - * first reports about SunOS compatibility problems start to pour in... + * Implement NFSv3's unchecked, guarded, and exclusive CREATE + * semantics for regular files. Except for the created file, + * this operation is stateless on the server. + * + * Upon return, caller must release @fhp and @resfhp. */ +static __be32 +nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct nfsd3_createargs *argp) +{ + struct iattr *iap = &argp->attrs; + struct dentry *parent, *child; + __u32 v_mtime, v_atime; + struct inode *inode; + __be32 status; + int host_err; + + if (isdotent(argp->name, argp->len)) + return nfserr_exist; + if (!(iap->ia_valid & ATTR_MODE)) + iap->ia_mode = 0; + + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); + if (status != nfs_ok) + return status; + + parent = fhp->fh_dentry; + inode = d_inode(parent); + + host_err = fh_want_write(fhp); + if (host_err) + return nfserrno(host_err); + + fh_lock_nested(fhp, I_MUTEX_PARENT); + + child = lookup_one_len(argp->name, parent, argp->len); + if (IS_ERR(child)) { + status = nfserrno(PTR_ERR(child)); + goto out; + } + + if (d_really_is_negative(child)) { + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); + if (status != nfs_ok) + goto out; + } + + status = fh_compose(resfhp, fhp->fh_export, child, fhp); + if (status != nfs_ok) + goto out; + + v_mtime = 0; + v_atime = 0; + if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { + u32 *verifier = (u32 *)argp->verf; + + /* + * Solaris 7 gets confused (bugid 4218508) if these have + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. + */ + v_mtime = verifier[0] & 0x7fffffff; + v_atime = verifier[1] & 0x7fffffff; + } + + if (d_really_is_positive(child)) { + status = nfs_ok; + + switch (argp->createmode) { + case NFS3_CREATE_UNCHECKED: + if (!d_is_reg(child)) + break; + iap->ia_valid &= ATTR_SIZE; + goto set_attr; + case NFS3_CREATE_GUARDED: + status = nfserr_exist; + break; + case NFS3_CREATE_EXCLUSIVE: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + break; + } + status = nfserr_exist; + } + goto out; + } + + if (!IS_POSIXACL(inode)) + iap->ia_mode &= ~current_umask(); + + host_err = vfs_create(inode, child, iap->ia_mode, true); + if (host_err < 0) { + status = nfserrno(host_err); + goto out; + } + + /* A newly created file already has a file size of zero. */ + if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) + iap->ia_valid &= ~ATTR_SIZE; + if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { + iap->ia_valid = ATTR_MTIME | ATTR_ATIME | + ATTR_MTIME_SET | ATTR_ATIME_SET; + iap->ia_mtime.tv_sec = v_mtime; + iap->ia_atime.tv_sec = v_atime; + iap->ia_mtime.tv_nsec = 0; + iap->ia_atime.tv_nsec = 0; + } + +set_attr: + status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + +out: + fh_unlock(fhp); + if (child && !IS_ERR(child)) + dput(child); + fh_drop_write(fhp); + return status; +} + static __be32 nfsd3_proc_create(struct svc_rqst *rqstp) { @@ -239,9 +356,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp) dirfhp = fh_copy(&resp->dirfh, &argp->fh); newfhp = fh_init(&resp->fh, NFS3_FHSIZE);
- resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len, - &argp->attrs, newfhp, argp->createmode, - (u32 *)argp->verf, NULL, NULL); + resp->status = nfsd3_create_file(rqstp, dirfhp, newfhp, argp); return rpc_success; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 254454a5aa4a9f696d6bae080c08d5863e650f49 ]
Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic.
[ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 162 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 152 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f3d6bd2bfa4f7..adb64c9259bd8 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -37,6 +37,8 @@ #include <linux/falloc.h> #include <linux/slab.h> #include <linux/kthread.h> +#include <linux/namei.h> + #include <linux/sunrpc/addr.h> #include <linux/nfs_ssc.h>
@@ -235,6 +237,154 @@ static void nfsd4_set_open_owner_reply_cache(struct nfsd4_compound_state *cstate &resfh->fh_handle); }
+static inline bool nfsd4_create_is_exclusive(int createmode) +{ + return createmode == NFS4_CREATE_EXCLUSIVE || + createmode == NFS4_CREATE_EXCLUSIVE4_1; +} + +/* + * Implement NFSv4's unchecked, guarded, and exclusive create + * semantics for regular files. Open state for this new file is + * subsequently fabricated in nfsd4_process_open2(). + * + * Upon return, caller must release @fhp and @resfhp. + */ +static __be32 +nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct nfsd4_open *open) +{ + struct iattr *iap = &open->op_iattr; + struct dentry *parent, *child; + __u32 v_mtime, v_atime; + struct inode *inode; + __be32 status; + int host_err; + + if (isdotent(open->op_fname, open->op_fnamelen)) + return nfserr_exist; + if (!(iap->ia_valid & ATTR_MODE)) + iap->ia_mode = 0; + + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); + if (status != nfs_ok) + return status; + parent = fhp->fh_dentry; + inode = d_inode(parent); + + host_err = fh_want_write(fhp); + if (host_err) + return nfserrno(host_err); + + fh_lock_nested(fhp, I_MUTEX_PARENT); + + child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); + if (IS_ERR(child)) { + status = nfserrno(PTR_ERR(child)); + goto out; + } + + if (d_really_is_negative(child)) { + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); + if (status != nfs_ok) + goto out; + } + + status = fh_compose(resfhp, fhp->fh_export, child, fhp); + if (status != nfs_ok) + goto out; + + v_mtime = 0; + v_atime = 0; + if (nfsd4_create_is_exclusive(open->op_createmode)) { + u32 *verifier = (u32 *)open->op_verf.data; + + /* + * Solaris 7 gets confused (bugid 4218508) if these have + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. If this + * is ever changed to use different attrs for storing the + * verifier, then do_open_lookup() will also need to be + * fixed accordingly. + */ + v_mtime = verifier[0] & 0x7fffffff; + v_atime = verifier[1] & 0x7fffffff; + } + + if (d_really_is_positive(child)) { + status = nfs_ok; + + switch (open->op_createmode) { + case NFS4_CREATE_UNCHECKED: + if (!d_is_reg(child)) + break; + + /* + * In NFSv4, we don't want to truncate the file + * now. This would be wrong if the OPEN fails for + * some other reason. Furthermore, if the size is + * nonzero, we should ignore it according to spec! + */ + open->op_truncate = (iap->ia_valid & ATTR_SIZE) && + !iap->ia_size; + break; + case NFS4_CREATE_GUARDED: + status = nfserr_exist; + break; + case NFS4_CREATE_EXCLUSIVE: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + open->op_created = true; + break; /* subtle */ + } + status = nfserr_exist; + break; + case NFS4_CREATE_EXCLUSIVE4_1: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + open->op_created = true; + goto set_attr; /* subtle */ + } + status = nfserr_exist; + } + goto out; + } + + if (!IS_POSIXACL(inode)) + iap->ia_mode &= ~current_umask(); + + host_err = vfs_create(inode, child, iap->ia_mode, true); + if (host_err < 0) { + status = nfserrno(host_err); + goto out; + } + open->op_created = true; + + /* A newly created file already has a file size of zero. */ + if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) + iap->ia_valid &= ~ATTR_SIZE; + if (nfsd4_create_is_exclusive(open->op_createmode)) { + iap->ia_valid = ATTR_MTIME | ATTR_ATIME | + ATTR_MTIME_SET|ATTR_ATIME_SET; + iap->ia_mtime.tv_sec = v_mtime; + iap->ia_atime.tv_sec = v_atime; + iap->ia_mtime.tv_nsec = 0; + iap->ia_atime.tv_nsec = 0; + } + +set_attr: + status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + +out: + fh_unlock(fhp); + if (child && !IS_ERR(child)) + dput(child); + fh_drop_write(fhp); + return status; +} + static __be32 do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh) { @@ -264,16 +414,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * yes | yes | GUARDED4 | GUARDED4 */
- /* - * Note: create modes (UNCHECKED,GUARDED...) are the same - * in NFSv4 as in v3 except EXCLUSIVE4_1. - */ current->fs->umask = open->op_umask; - status = do_nfsd_create(rqstp, current_fh, open->op_fname, - open->op_fnamelen, &open->op_iattr, - *resfh, open->op_createmode, - (u32 *)open->op_verf.data, - &open->op_truncate, &open->op_created); + status = nfsd4_create_file(rqstp, current_fh, *resfh, open); current->fs->umask = 0;
if (!status && open->op_label.len) @@ -284,7 +426,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * use the returned bitmask to indicate which attributes * we used to store the verifier: */ - if (nfsd_create_is_exclusive(open->op_createmode) && status == 0) + if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0) open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_MODIFY); } else
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1c388f27759c5d9271d4fca081f7ee138986eb7d ]
Now that its two callers have their own version-specific instance of this function, do_nfsd_create() is no longer used.
[ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 150 -------------------------------------------------- fs/nfsd/vfs.h | 10 ---- 2 files changed, 160 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 50451789a9444..26ae28b6f9b02 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1412,156 +1412,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, rdev, resfhp); }
-/* - * NFSv3 and NFSv4 version of nfsd_create - */ -__be32 -do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - struct svc_fh *resfhp, int createmode, u32 *verifier, - bool *truncp, bool *created) -{ - struct dentry *dentry, *dchild = NULL; - struct inode *dirp; - __be32 err; - int host_err; - __u32 v_mtime=0, v_atime=0; - - err = nfserr_perm; - if (!flen) - goto out; - err = nfserr_exist; - if (isdotent(fname, flen)) - goto out; - if (!(iap->ia_valid & ATTR_MODE)) - iap->ia_mode = 0; - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); - if (err) - goto out; - - dentry = fhp->fh_dentry; - dirp = d_inode(dentry); - - host_err = fh_want_write(fhp); - if (host_err) - goto out_nfserr; - - fh_lock_nested(fhp, I_MUTEX_PARENT); - - /* - * Compose the response file handle. - */ - dchild = lookup_one_len(fname, dentry, flen); - host_err = PTR_ERR(dchild); - if (IS_ERR(dchild)) - goto out_nfserr; - - /* If file doesn't exist, check for permissions to create one */ - if (d_really_is_negative(dchild)) { - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); - if (err) - goto out; - } - - err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); - if (err) - goto out; - - if (nfsd_create_is_exclusive(createmode)) { - /* solaris7 gets confused (bugid 4218508) if these have - * the high bit set, as do xfs filesystems without the - * "bigtime" feature. So just clear the high bits. If this is - * ever changed to use different attrs for storing the - * verifier, then do_open_lookup() will also need to be fixed - * accordingly. - */ - v_mtime = verifier[0]&0x7fffffff; - v_atime = verifier[1]&0x7fffffff; - } - - if (d_really_is_positive(dchild)) { - err = 0; - - switch (createmode) { - case NFS3_CREATE_UNCHECKED: - if (! d_is_reg(dchild)) - goto out; - else if (truncp) { - /* in nfsv4, we need to treat this case a little - * differently. we don't want to truncate the - * file now; this would be wrong if the OPEN - * fails for some other reason. furthermore, - * if the size is nonzero, we should ignore it - * according to spec! - */ - *truncp = (iap->ia_valid & ATTR_SIZE) && !iap->ia_size; - } - else { - iap->ia_valid &= ATTR_SIZE; - goto set_attr; - } - break; - case NFS3_CREATE_EXCLUSIVE: - if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime - && d_inode(dchild)->i_atime.tv_sec == v_atime - && d_inode(dchild)->i_size == 0 ) { - if (created) - *created = true; - break; - } - fallthrough; - case NFS4_CREATE_EXCLUSIVE4_1: - if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime - && d_inode(dchild)->i_atime.tv_sec == v_atime - && d_inode(dchild)->i_size == 0 ) { - if (created) - *created = true; - goto set_attr; - } - fallthrough; - case NFS3_CREATE_GUARDED: - err = nfserr_exist; - } - goto out; - } - - if (!IS_POSIXACL(dirp)) - iap->ia_mode &= ~current_umask(); - - host_err = vfs_create(dirp, dchild, iap->ia_mode, true); - if (host_err < 0) - goto out_nfserr; - if (created) - *created = true; - - nfsd_check_ignore_resizing(iap); - - if (nfsd_create_is_exclusive(createmode)) { - /* Cram the verifier into atime/mtime */ - iap->ia_valid = ATTR_MTIME|ATTR_ATIME - | ATTR_MTIME_SET|ATTR_ATIME_SET; - /* XXX someone who knows this better please fix it for nsec */ - iap->ia_mtime.tv_sec = v_mtime; - iap->ia_atime.tv_sec = v_atime; - iap->ia_mtime.tv_nsec = 0; - iap->ia_atime.tv_nsec = 0; - } - - set_attr: - err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); - - out: - fh_unlock(fhp); - if (dchild && !IS_ERR(dchild)) - dput(dchild); - fh_drop_write(fhp); - return err; - - out_nfserr: - err = nfserrno(host_err); - goto out; -} - /* * Read a symlink. On entry, *lenp must contain the maximum path length that * fits into the buffer. On return, it contains the true length. diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 1f32a83456b03..f99794b033a55 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -71,10 +71,6 @@ __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct iattr *iap); -__be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, - struct svc_fh *res, int createmode, - u32 *verifier, bool *truncp, bool *created); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); #ifdef CONFIG_NFSD_V4 @@ -161,10 +157,4 @@ static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) AT_STATX_SYNC_AS_STAT)); }
-static inline int nfsd_create_is_exclusive(int createmode) -{ - return createmode == NFS3_CREATE_EXCLUSIVE - || createmode == NFS4_CREATE_EXCLUSIVE4_1; -} - #endif /* LINUX_NFSD_VFS_H */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f4d84c52643ae1d63a8e73e2585464470e7944d1 ]
Its only caller always passes S_IFREG as the @type parameter. As an additional clean-up, add a kerneldoc comment.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 4 ++-- fs/nfsd/vfs.c | 15 ++++++++++++--- fs/nfsd/vfs.h | 2 +- 3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7e99e75b75d73..3c297ccfcc59d 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -996,8 +996,8 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) - status = nfsd_open_verified(rqstp, fhp, S_IFREG, - may_flags, &nf->nf_file); + status = nfsd_open_verified(rqstp, fhp, may_flags, + &nf->nf_file); else status = nfserr_jukebox; /* diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 26ae28b6f9b02..5c8dc1a05e57e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -852,14 +852,23 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, return err; }
+/** + * nfsd_open_verified - Open a regular file for the filecache + * @rqstp: RPC request + * @fhp: NFS filehandle of the file to open + * @may_flags: internal permission flags + * @filp: OUT: open "struct file *" + * + * Returns an nfsstat value in network byte order. + */ __be32 -nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, - int may_flags, struct file **filp) +nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, int may_flags, + struct file **filp) { __be32 err;
validate_process_creds(); - err = __nfsd_open(rqstp, fhp, type, may_flags, filp); + err = __nfsd_open(rqstp, fhp, S_IFREG, may_flags, filp); validate_process_creds(); return err; } diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index f99794b033a55..26347d76f44a0 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -86,7 +86,7 @@ __be32 nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int nfsd_open_break_lease(struct inode *, int); __be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t, int, struct file **); -__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t, +__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, int, struct file **); __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, loff_t offset,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fb70bf124b051d4ded4ce57511dfec6d3ebf2b43 ]
There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case.
To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked.
Two new APIs are added:
+ Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file.
+ Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: JianHong Yin jiyin@redhat.com [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 51 ++++++++++++++++++++++++++++++++++++++------- fs/nfsd/filecache.h | 2 ++ fs/nfsd/nfs4proc.c | 43 ++++++++++++++++++++++++++++++++++---- fs/nfsd/nfs4state.c | 16 +++++++++++--- fs/nfsd/xdr4.h | 1 + fs/open.c | 41 ++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 7 files changed, 142 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 3c297ccfcc59d..db9c68a3c1f3b 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -898,9 +898,9 @@ nfsd_file_is_cached(struct inode *inode) return ret; }
-__be32 -nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) +static __be32 +nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf, bool open) { __be32 status; struct net *net = SVC_NET(rqstp); @@ -995,10 +995,13 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_gc();
nf->nf_mark = nfsd_file_mark_find_or_create(nf); - if (nf->nf_mark) - status = nfsd_open_verified(rqstp, fhp, may_flags, - &nf->nf_file); - else + if (nf->nf_mark) { + if (open) + status = nfsd_open_verified(rqstp, fhp, may_flags, + &nf->nf_file); + else + status = nfs_ok; + } else status = nfserr_jukebox; /* * If construction failed, or we raced with a call to unlink() @@ -1018,6 +1021,40 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; }
+/** + * nfsd_file_acquire - Get a struct nfsd_file with an open file + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file to be opened + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, true); +} + +/** + * nfsd_file_create - Get a struct nfsd_file, do not open + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file just created + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, false); +} + /* * Note that fields may be added, removed or reordered in the future. Programs * scraping this file for info should test the labels to ensure they're diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 435ceab27897a..1da0c79a55804 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -59,5 +59,7 @@ void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); +__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **nfp); int nfsd_file_cache_stats_open(struct inode *, struct file *); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index adb64c9259bd8..166ebb126d37b 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -243,6 +243,37 @@ static inline bool nfsd4_create_is_exclusive(int createmode) createmode == NFS4_CREATE_EXCLUSIVE4_1; }
+static __be32 +nfsd4_vfs_create(struct svc_fh *fhp, struct dentry *child, + struct nfsd4_open *open) +{ + struct file *filp; + struct path path; + int oflags; + + oflags = O_CREAT | O_LARGEFILE; + switch (open->op_share_access & NFS4_SHARE_ACCESS_BOTH) { + case NFS4_SHARE_ACCESS_WRITE: + oflags |= O_WRONLY; + break; + case NFS4_SHARE_ACCESS_BOTH: + oflags |= O_RDWR; + break; + default: + oflags |= O_RDONLY; + } + + path.mnt = fhp->fh_export->ex_path.mnt; + path.dentry = child; + filp = dentry_create(&path, oflags, open->op_iattr.ia_mode, + current_cred()); + if (IS_ERR(filp)) + return nfserrno(PTR_ERR(filp)); + + open->op_filp = filp; + return nfs_ok; +} + /* * Implement NFSv4's unchecked, guarded, and exclusive create * semantics for regular files. Open state for this new file is @@ -355,11 +386,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask();
- host_err = vfs_create(inode, child, iap->ia_mode, true); - if (host_err < 0) { - status = nfserrno(host_err); + status = nfsd4_vfs_create(fhp, child, open); + if (status != nfs_ok) goto out; - } open->op_created = true;
/* A newly created file already has a file size of zero. */ @@ -517,6 +546,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, (int)open->op_fnamelen, open->op_fname, open->op_openowner);
+ open->op_filp = NULL; + /* This check required by spec. */ if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL) return nfserr_inval; @@ -613,6 +644,10 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (reclaim && !status) nn->somebody_reclaimed = true; out: + if (open->op_filp) { + fput(open->op_filp); + open->op_filp = NULL; + } if (resfh && resfh != &cstate->current_fh) { fh_dup2(&cstate->current_fh, resfh); fh_put(resfh); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9116496b476aa..9f972ca09eec7 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5110,9 +5110,19 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
if (!fp->fi_fds[oflag]) { spin_unlock(&fp->fi_lock); - status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); - if (status) - goto out_put_access; + + if (!open->op_filp) { + status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); + if (status != nfs_ok) + goto out_put_access; + } else { + status = nfsd_file_create(rqstp, cur_fh, access, &nf); + if (status != nfs_ok) + goto out_put_access; + nf->nf_file = open->op_filp; + open->op_filp = NULL; + } + spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { fp->fi_fds[oflag] = nf; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 4a298ac5515df..448b687943cd3 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -273,6 +273,7 @@ struct nfsd4_open { bool op_truncate; /* used during processing */ bool op_created; /* used during processing */ struct nfs4_openowner *op_openowner; /* used during processing */ + struct file *op_filp; /* used during processing */ struct nfs4_file *op_file; /* used during processing */ struct nfs4_ol_stateid *op_stp; /* used during processing */ struct nfs4_clnt_odstate *op_odstate; /* used during processing */ diff --git a/fs/open.c b/fs/open.c index 9f56ebacfbefe..d69312a2d434b 100644 --- a/fs/open.c +++ b/fs/open.c @@ -954,6 +954,47 @@ struct file *dentry_open(const struct path *path, int flags, } EXPORT_SYMBOL(dentry_open);
+/** + * dentry_create - Create and open a file + * @path: path to create + * @flags: O_ flags + * @mode: mode bits for new file + * @cred: credentials to use + * + * Caller must hold the parent directory's lock, and have prepared + * a negative dentry, placed in @path->dentry, for the new file. + * + * Caller sets @path->mnt to the vfsmount of the filesystem where + * the new file is to be created. The parent directory and the + * negative dentry must reside on the same filesystem instance. + * + * On success, returns a "struct file *". Otherwise a ERR_PTR + * is returned. + */ +struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred) +{ + struct file *f; + int error; + + validate_creds(cred); + f = alloc_empty_file(flags, cred); + if (IS_ERR(f)) + return f; + + error = vfs_create(d_inode(path->dentry->d_parent), + path->dentry, mode, true); + if (!error) + error = vfs_open(path, f); + + if (unlikely(error)) { + fput(f); + return ERR_PTR(error); + } + return f; +} +EXPORT_SYMBOL(dentry_create); + struct file *open_with_fake_path(const struct path *path, int flags, struct inode *inode, const struct cred *cred) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 3e9105b3cc767..7e7098bf2c57e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2618,6 +2618,8 @@ extern struct file *filp_open(const char *, int, umode_t); extern struct file *file_open_root(struct dentry *, struct vfsmount *, const char *, int, umode_t); extern struct file * dentry_open(const struct path *, int, const struct cred *); +extern struct file *dentry_create(const struct path *path, int flags, + umode_t mode, const struct cred *cred); extern struct file * open_with_fake_path(const struct path *, int, struct inode*, const struct cred *); static inline struct file *file_clone_open(struct file *file)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f67a16b147045815b6aaafeef8663e5faeb6d569 ]
Clean up: These relics are not likely to benefit server administrators.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 166ebb126d37b..c21cfaabf873a 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -622,13 +622,9 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, break; case NFS4_OPEN_CLAIM_DELEG_PREV_FH: case NFS4_OPEN_CLAIM_DELEGATE_PREV: - dprintk("NFSD: unsupported OPEN claim type %d\n", - open->op_claim_type); status = nfserr_notsupp; goto out; default: - dprintk("NFSD: Invalid OPEN claim type %d\n", - open->op_claim_type); status = nfserr_inval; goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 26320d7e317c37404c811603d50d811132aef78c ]
Clean up: Pull case arms back one tab stop to conform every other switch statement in fs/nfsd/nfs4proc.c.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 50 +++++++++++++++++++++++----------------------- 1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c21cfaabf873a..90a12ccf96713 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -600,33 +600,33 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out;
switch (open->op_claim_type) { - case NFS4_OPEN_CLAIM_DELEGATE_CUR: - case NFS4_OPEN_CLAIM_NULL: - status = do_open_lookup(rqstp, cstate, open, &resfh); - if (status) - goto out; - break; - case NFS4_OPEN_CLAIM_PREVIOUS: - status = nfs4_check_open_reclaim(cstate->clp); - if (status) - goto out; - open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; - reclaim = true; - fallthrough; - case NFS4_OPEN_CLAIM_FH: - case NFS4_OPEN_CLAIM_DELEG_CUR_FH: - status = do_open_fhandle(rqstp, cstate, open); - if (status) - goto out; - resfh = &cstate->current_fh; - break; - case NFS4_OPEN_CLAIM_DELEG_PREV_FH: - case NFS4_OPEN_CLAIM_DELEGATE_PREV: - status = nfserr_notsupp; + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + case NFS4_OPEN_CLAIM_NULL: + status = do_open_lookup(rqstp, cstate, open, &resfh); + if (status) goto out; - default: - status = nfserr_inval; + break; + case NFS4_OPEN_CLAIM_PREVIOUS: + status = nfs4_check_open_reclaim(cstate->clp); + if (status) goto out; + open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; + reclaim = true; + fallthrough; + case NFS4_OPEN_CLAIM_FH: + case NFS4_OPEN_CLAIM_DELEG_CUR_FH: + status = do_open_fhandle(rqstp, cstate, open); + if (status) + goto out; + resfh = &cstate->current_fh; + break; + case NFS4_OPEN_CLAIM_DELEG_PREV_FH: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + status = nfserr_notsupp; + goto out; + default: + status = nfserr_inval; + goto out; } /* * nfsd4_process_open2() does the actual opening of the file. If
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7e2ce0cc15a509b859199235a2bad9cece00f67a ]
Clean up nfsd4_open() by converting a large comment at the only call site for nfsd4_process_open2() to a kerneldoc comment in front of that function.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 6 +----- fs/nfsd/nfs4state.c | 12 ++++++++++++ 2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 90a12ccf96713..d62d962ed2f13 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -628,11 +628,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfserr_inval; goto out; } - /* - * nfsd4_process_open2() does the actual opening of the file. If - * successful, it (1) truncates the file if open->op_truncate was - * set, (2) sets open->op_stateid, (3) sets open->op_delegation. - */ + status = nfsd4_process_open2(rqstp, resfh, open); WARN(status && open->op_created, "nfsd4_process_open2 failed to open newly-created file! status=%u\n", diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9f972ca09eec7..e0ce0412d7dea 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5465,6 +5465,18 @@ static void nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open, */ }
+/** + * nfsd4_process_open2 - finish open processing + * @rqstp: the RPC transaction being executed + * @current_fh: NFSv4 COMPOUND's current filehandle + * @open: OPEN arguments + * + * If successful, (1) truncate the file if open->op_truncate was + * set, (2) set open->op_stateid, (3) set open->op_delegation. + * + * Returns %nfs_ok on success; otherwise an nfs4stat value in + * network byte order is returned. + */ __be32 nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0122e882119ddbd9efa6edfeeac3f5c704a7aeea ]
Instrument calls to nfsd_open_verified() to get a sense of the filecache hit rate.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 +++-- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index db9c68a3c1f3b..bdb5de8c08036 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -996,10 +996,11 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { - if (open) + if (open) { status = nfsd_open_verified(rqstp, fhp, may_flags, &nf->nf_file); - else + trace_nfsd_file_open(nf, status); + } else status = nfs_ok; } else status = nfserr_jukebox; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8327d1601d710..c4c073e85fdd9 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -795,6 +795,34 @@ TRACE_EVENT(nfsd_file_acquire, __entry->nf_file, __entry->status) );
+TRACE_EVENT(nfsd_file_open, + TP_PROTO(struct nfsd_file *nf, __be32 status), + TP_ARGS(nf, status), + TP_STRUCT__entry( + __field(unsigned int, nf_hashval) + __field(void *, nf_inode) /* cannot be dereferenced */ + __field(int, nf_ref) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(void *, nf_file) /* cannot be dereferenced */ + ), + TP_fast_assign( + __entry->nf_hashval = nf->nf_hashval; + __entry->nf_inode = nf->nf_inode; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", + __entry->nf_hashval, + __entry->nf_inode, + __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file) +) + DECLARE_EVENT_CLASS(nfsd_file_search_class, TP_PROTO(struct inode *inode, unsigned int hash, int found), TP_ARGS(inode, hash, found),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bb283ca18d1e67c82d22a329c96c9d6036a74790 ]
The flags are defined using C macros, so TRACE_DEFINE_ENUM is unnecessary.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/trace.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index c4c073e85fdd9..8ccce4ac66b4e 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -703,12 +703,6 @@ DEFINE_CLID_EVENT(confirmed_r); /* * from fs/nfsd/filecache.h */ -TRACE_DEFINE_ENUM(NFSD_FILE_HASHED); -TRACE_DEFINE_ENUM(NFSD_FILE_PENDING); -TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ); -TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_WRITE); -TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED); - #define show_nf_flags(val) \ __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 28df0988815f63e2af5e6718193c9f68681ad7ff ]
I noticed CPU pipeline stalls while using perf.
Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are not needed outside the svc thread scheduler.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 7 ++++--- fs/nfsd/nfs4xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 4 ++-- net/sunrpc/svc.c | 6 +++--- net/sunrpc/svc_xprt.c | 2 +- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 7 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d62d962ed2f13..adbac1e77e9e2 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -970,7 +970,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * the client wants us to do more in this compound: */ if (!nfsd4_last_compound_op(rqstp)) - clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
/* check stateid */ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, @@ -2642,11 +2642,12 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) cstate->minorversion = args->minorversion; fh_init(current_fh, NFS4_FHSIZE); fh_init(save_fh, NFS4_FHSIZE); + /* * Don't use the deferral mechanism for NFSv4; compounds make it * too hard to avoid non-idempotency problems. */ - clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
/* * According to RFC3010, this takes precedence over all other errors. @@ -2761,7 +2762,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) out: cstate->status = status; /* Reset deferral mechanism for RPC deferrals */ - set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); return rpc_success; }
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a9f6c7eeb756e..804c137fabec5 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2411,7 +2411,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;
if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) - clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags);
return true; } diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 329eac782cc5e..abaa952ae7518 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -898,7 +898,7 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct g * rejecting the server-computed MIC in this somewhat rare case, * do not use splice with the GSS integrity service. */ - clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
/* Did we already verify the signature on the original pass through? */ if (rqstp->rq_deferred) @@ -970,7 +970,7 @@ unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gs int pad, remaining_len, offset; u32 rseqno;
- clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
priv_len = svc_getnl(&buf->head[0]); if (rqstp->rq_deferred) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 7f231947347ea..9caaed1c143e6 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1276,10 +1276,10 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto err_short_len;
/* Will be turned off by GSS integrity and privacy services */ - set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Will be turned off only when NFSv4 Sessions are used */ - set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); - clear_bit(RQ_DROPME, &rqstp->rq_flags); + __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __clear_bit(RQ_DROPME, &rqstp->rq_flags);
svc_putu32(resv, rqstp->rq_xid);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 000b737784bd9..ea65036dc5068 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1228,7 +1228,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) trace_svc_defer(rqstp); svc_xprt_get(rqstp->rq_xprt); dr->xprt = rqstp->rq_xprt; - set_bit(RQ_DROPME, &rqstp->rq_flags); + __set_bit(RQ_DROPME, &rqstp->rq_flags);
dr->handle.revisit = svc_revisit; return &dr->handle; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 90f6231d6ed67..ff8dcebbdfc4e 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -309,9 +309,9 @@ static void svc_sock_setbufsize(struct svc_sock *svsk, unsigned int nreqs) static void svc_sock_secure_port(struct svc_rqst *rqstp) { if (svc_port_is_privileged(svc_addr(rqstp))) - set_bit(RQ_SECURE, &rqstp->rq_flags); + __set_bit(RQ_SECURE, &rqstp->rq_flags); else - clear_bit(RQ_SECURE, &rqstp->rq_flags); + __clear_bit(RQ_SECURE, &rqstp->rq_flags); }
/* @@ -1014,9 +1014,9 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) rqstp->rq_xprt_ctxt = NULL; rqstp->rq_prot = IPPROTO_TCP; if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags)) - set_bit(RQ_LOCAL, &rqstp->rq_flags); + __set_bit(RQ_LOCAL, &rqstp->rq_flags); else - clear_bit(RQ_LOCAL, &rqstp->rq_flags); + __clear_bit(RQ_LOCAL, &rqstp->rq_flags);
p = (__be32 *)rqstp->rq_arg.head[0].iov_base; calldir = p[1]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index c895f80df659c..e72e36989985f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -606,7 +606,7 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt)
static void svc_rdma_secure_port(struct svc_rqst *rqstp) { - set_bit(RQ_SECURE, &rqstp->rq_flags); + __set_bit(RQ_SECURE, &rqstp->rq_flags); }
static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 62fdb65edb6c43306c774939001f3a00974832aa ]
If laundry_wq create failed, the cld notifier should be unregistered.
Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 322a208878f2c..55949e60897d5 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1543,12 +1543,14 @@ static int __init init_nfsd(void) goto out_free_filesystem; retval = register_cld_notifier(); if (retval) - goto out_free_all; + goto out_free_subsys; retval = nfsd4_create_laundry_wq(); if (retval) goto out_free_all; return 0; out_free_all: + unregister_cld_notifier(); +out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); out_free_filesystem: unregister_filesystem(&nfsd_fs_type);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
[ Upstream commit 6f6f84aa215f7b6665ccbb937db50860f9ec2989 ]
KASAN report null-ptr-deref as follows:
BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852
CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd
Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred.
So init_nfsd() should call register_filesystem() last.
Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 55949e60897d5..0621c2faf2424 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1535,25 +1535,25 @@ static int __init init_nfsd(void) retval = create_proc_exports_entry(); if (retval) goto out_free_lockd; - retval = register_filesystem(&nfsd_fs_type); - if (retval) - goto out_free_exports; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) - goto out_free_filesystem; + goto out_free_exports; retval = register_cld_notifier(); if (retval) goto out_free_subsys; retval = nfsd4_create_laundry_wq(); + if (retval) + goto out_free_cld; + retval = register_filesystem(&nfsd_fs_type); if (retval) goto out_free_all; return 0; out_free_all: + nfsd4_destroy_laundry_wq(); +out_free_cld: unregister_cld_notifier(); out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); -out_free_filesystem: - unregister_filesystem(&nfsd_fs_type); out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); @@ -1571,6 +1571,7 @@ static int __init init_nfsd(void)
static void __exit exit_nfsd(void) { + unregister_filesystem(&nfsd_fs_type); nfsd4_destroy_laundry_wq(); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); @@ -1581,7 +1582,6 @@ static void __exit exit_nfsd(void) nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); - unregister_filesystem(&nfsd_fs_type); }
MODULE_AUTHOR("Olaf Kirch okir@monad.swb.de");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Schroeder jumaco@amazon.com
[ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ]
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed.
Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder jumaco@amazon.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 0b3f12aa37ff5..7da88bdc0d6c3 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -206,7 +206,6 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) struct svc_cacherep *rp; unsigned int i;
- nfsd_reply_cache_stats_destroy(nn); unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
for (i = 0; i < nn->drc_hashsize; i++) { @@ -217,6 +216,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) rp, nn); } } + nfsd_reply_cache_stats_destroy(nn);
kvfree(nn->drc_hashtbl); nn->drc_hashtbl = NULL;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bd8fdb6e545f950f4654a9a10d7e819ad48146e5 ]
Refactor: Use existing helpers that other lock operations use. This change removes several automatic variables, so re-organize the variable declarations for readability.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 36 +++++++++++------------------------- 1 file changed, 11 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e0ce0412d7dea..1c32765e86b1f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7562,16 +7562,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, union nfsd4_op_u *u) { struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); clientid_t *clid = &rlockowner->rl_clientid; - struct nfs4_stateowner *sop; - struct nfs4_lockowner *lo = NULL; struct nfs4_ol_stateid *stp; - struct xdr_netobj *owner = &rlockowner->rl_owner; - unsigned int hashval = ownerstr_hashval(owner); - __be32 status; - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfs4_lockowner *lo; struct nfs4_client *clp; - LIST_HEAD (reaplist); + LIST_HEAD(reaplist); + __be32 status;
dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id); @@ -7579,30 +7576,19 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, status = set_client(clid, cstate, nn); if (status) return status; - clp = cstate->clp; - /* Find the matching lock stateowner */ - spin_lock(&clp->cl_lock); - list_for_each_entry(sop, &clp->cl_ownerstr_hashtbl[hashval], - so_strhash) {
- if (sop->so_is_open_owner || !same_owner_str(sop, owner)) - continue; - - if (atomic_read(&sop->so_count) != 1) { - spin_unlock(&clp->cl_lock); - return nfserr_locks_held; - } - - lo = lockowner(sop); - nfs4_get_stateowner(sop); - break; - } + spin_lock(&clp->cl_lock); + lo = find_lockowner_str_locked(clp, &rlockowner->rl_owner); if (!lo) { spin_unlock(&clp->cl_lock); return status; } - + if (atomic_read(&lo->lo_owner.so_count) != 2) { + spin_unlock(&clp->cl_lock); + nfs4_put_stateowner(&lo->lo_owner); + return nfserr_locks_held; + } unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) { stp = list_first_entry(&lo->lo_owner.so_stateids,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 043862b09cc00273e35e6c3a6389957953a34207 ]
And return explicit nfserr values that match what is documented in the new comment / API contract.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1c32765e86b1f..76ec207f5e44d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7556,6 +7556,23 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) return status; }
+/** + * nfsd4_release_lockowner - process NFSv4.0 RELEASE_LOCKOWNER operations + * @rqstp: RPC transaction + * @cstate: NFSv4 COMPOUND state + * @u: RELEASE_LOCKOWNER arguments + * + * The lockowner's so_count is bumped when a lock record is added + * or when copying a conflicting lock. The latter case is brief, + * but can lead to fleeting false positives when looking for + * locks-in-use. + * + * Return values: + * %nfs_ok: lockowner released or not found + * %nfserr_locks_held: lockowner still in use + * %nfserr_stale_clientid: clientid no longer active + * %nfserr_expired: clientid not recognized + */ __be32 nfsd4_release_lockowner(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, @@ -7582,7 +7599,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, lo = find_lockowner_str_locked(clp, &rlockowner->rl_owner); if (!lo) { spin_unlock(&clp->cl_lock); - return status; + return nfs_ok; } if (atomic_read(&lo->lo_owner.so_count) != 2) { spin_unlock(&clp->cl_lock); @@ -7598,11 +7615,11 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, put_ol_stateid_locked(stp, &reaplist); } spin_unlock(&clp->cl_lock); + free_ol_stateid_reaplist(&reaplist); remove_blocked_locks(lo); nfs4_put_stateowner(&lo->lo_owner); - - return status; + return nfs_ok; }
static inline struct nfs4_client_reclaim *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 08af54b3e5729bc1d56ad3190af811301bdc37a1 ]
Now that there are no more callers of nfsd_file_put() that might hold a spin lock, ensure the lockdep infrastructure can catch newly introduced calls to nfsd_file_put() made while a spinlock is held.
Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracl... Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index bdb5de8c08036..11c096b447401 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -302,6 +302,8 @@ nfsd_file_put_noref(struct nfsd_file *nf) void nfsd_file_put(struct nfsd_file *nf) { + might_sleep(); + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b6c71c66b0ad8f2b59d9bc08c7a5079b110bec01 ]
nfsd_file_put_noref() can free @nf, so don't dereference @nf immediately upon return from nfsd_file_put_noref().
Suggested-by: Trond Myklebust trondmy@hammerspace.com Fixes: 999397926ab3 ("nfsd: Clean up nfsd_file_put()") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 11c096b447401..fc0fcb3321537 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -308,11 +308,12 @@ nfsd_file_put(struct nfsd_file *nf) if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); - } else { + } else if (nf->nf_file) { nfsd_file_put_noref(nf); - if (nf->nf_file) - nfsd_file_schedule_laundrette(); - } + nfsd_file_schedule_laundrette(); + } else + nfsd_file_put_noref(nf); + if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc(); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 62ed448cc53b654036f7d7f3c99f299d79ad14c3 ]
Transitioning between encode buffers is quite infrequent. It happens about 1 time in 400 calls to xdr_reserve_space(), measured on NFSD with a typical build/test workload.
Force the compiler to remove that code from xdr_reserve_space(), which is a hot path on both the server and the client. This change reduces the size of xdr_reserve_space() from 10 cache lines to 2 when compiled with -Os.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: J. Bruce Fields bfields@fieldses.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/xdr.h | 16 +++++++++++++++- net/sunrpc/xdr.c | 17 ++++++++++------- 2 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 927f1458bcab9..b8fb866e1fd32 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -242,7 +242,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); -extern void xdr_commit_encode(struct xdr_stream *xdr); +extern void __xdr_commit_encode(struct xdr_stream *xdr); extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len); extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen); extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, @@ -305,6 +305,20 @@ xdr_reset_scratch_buffer(struct xdr_stream *xdr) xdr_set_scratch_buffer(xdr, NULL, 0); }
+/** + * xdr_commit_encode - Ensure all data is written to xdr->buf + * @xdr: pointer to xdr_stream + * + * Handle encoding across page boundaries by giving the caller a + * temporary location to write to, then later copying the data into + * place. __xdr_commit_encode() does that copying. + */ +static inline void xdr_commit_encode(struct xdr_stream *xdr) +{ + if (unlikely(xdr->scratch.iov_len)) + __xdr_commit_encode(xdr); +} + /** * xdr_stream_remaining - Return the number of bytes remaining in the stream * @xdr: pointer to struct xdr_stream diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 722586696fade..f66e2de7cd279 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -691,7 +691,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, EXPORT_SYMBOL_GPL(xdr_init_encode);
/** - * xdr_commit_encode - Ensure all data is written to buffer + * __xdr_commit_encode - Ensure all data is written to buffer * @xdr: pointer to xdr_stream * * We handle encoding across page boundaries by giving the caller a @@ -703,22 +703,25 @@ EXPORT_SYMBOL_GPL(xdr_init_encode); * required at the end of encoding, or any other time when the xdr_buf * data might be read. */ -inline void xdr_commit_encode(struct xdr_stream *xdr) +void __xdr_commit_encode(struct xdr_stream *xdr) { int shift = xdr->scratch.iov_len; void *page;
- if (shift == 0) - return; page = page_address(*xdr->page_ptr); memcpy(xdr->scratch.iov_base, page, shift); memmove(page, page + shift, (void *)xdr->p - page); xdr_reset_scratch_buffer(xdr); } -EXPORT_SYMBOL_GPL(xdr_commit_encode); +EXPORT_SYMBOL_GPL(__xdr_commit_encode);
-static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, - size_t nbytes) +/* + * The buffer space to be reserved crosses the boundary between + * xdr->buf->head and xdr->buf->pages, or between two pages + * in xdr->buf->pages. + */ +static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, + size_t nbytes) { __be32 *p; int space_left;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 8698e3bab4dd7968666e84e111d0bfd17c040e77 ]
Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense.
For backward compatibility, these restictions were added only to new (v5.17+) APIs.
It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well.
Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode.
Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3... Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 30 +++++++++++++++--------------- include/linux/fanotify.h | 4 ++++ 2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index f4a5e9074dd42..990464e00aec7 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1501,10 +1501,14 @@ static int fanotify_test_fid(struct dentry *dentry) return 0; }
-static int fanotify_events_supported(struct path *path, __u64 mask, +static int fanotify_events_supported(struct fsnotify_group *group, + struct path *path, __u64 mask, unsigned int flags) { unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; + /* Strict validation of events in non-dir inode mask with v5.17+ APIs */ + bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) || + (mask & FAN_RENAME);
/* * Some filesystems such as 'proc' acquire unusual locks when opening @@ -1532,6 +1536,15 @@ static int fanotify_events_supported(struct path *path, __u64 mask, path->mnt->mnt_sb->s_flags & SB_NOUSER) return -EINVAL;
+ /* + * We shouldn't have allowed setting dirent events and the directory + * flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode, + * but because we always allowed it, error only when using new APIs. + */ + if (strict_dir_events && mark_type == FAN_MARK_INODE && + !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS)) + return -ENOTDIR; + return 0; }
@@ -1678,7 +1691,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out;
if (flags & FAN_MARK_ADD) { - ret = fanotify_events_supported(&path, mask, flags); + ret = fanotify_events_supported(group, &path, mask, flags); if (ret) goto path_put_and_out; } @@ -1701,19 +1714,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt;
- /* - * FAN_RENAME is not allowed on non-dir (for now). - * We shouldn't have allowed setting any dirent events in mask of - * non-dir, but because we always allowed it, error only if group - * was initialized with the new flag FAN_REPORT_TARGET_FID. - */ - ret = -ENOTDIR; - if (inode && !S_ISDIR(inode->i_mode) && - ((mask & FAN_RENAME) || - ((mask & FANOTIFY_DIRENT_EVENTS) && - FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID)))) - goto path_put_and_out; - /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 81f45061c1b18..4f6cbe6c6e235 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -113,6 +113,10 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ FANOTIFY_PERM_EVENTS | \ FAN_Q_OVERFLOW | FAN_ONDIR)
+/* Events and flags relevant only for directories */ +#define FANOTIFY_DIRONLY_EVENT_BITS (FANOTIFY_DIRENT_EVENTS | \ + FAN_EVENT_ON_CHILD | FAN_ONDIR) + #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \ FANOTIFY_EVENT_FLAGS)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 080abad71e99d2becf38c978572982130b927a28 ]
Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct.
It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service.
Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded.
So restore the module_put_and_kthread_exit() call.
Fixes: f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index ae2da65ffbafb..d8fc5d72a161c 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,6 +2757,7 @@ static int nfs4_run_state_manager(void *ptr) goto again;
nfs_put_client(clp); + module_put_and_kthread_exit(0); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5b2f3e0777da2a5dd62824bbe2fdab1d12caaf8f ]
NFSD has advertised support for the NFSv4 time_create attribute since commit e377a3e698fb ("nfsd: Add support for the birth time attribute").
Igor Mammedov reports that Mac OS clients attempt to set the NFSv4 birth time attribute via OPEN(CREATE) and SETATTR if the server indicates that it supports it, but since the above commit was merged, those attempts now fail.
Table 5 in RFC 8881 lists the time_create attribute as one that can be both set and retrieved, but the above commit did not add server support for clients to provide a time_create attribute. IMO that's a bug in our implementation of the NFSv4 protocol, which this commit addresses.
Whether NFSD silently ignores the new birth time or actually sets it is another matter. I haven't found another filesystem service in the Linux kernel that enables users or clients to modify a file's birth time attribute.
This commit reflects my (perhaps incorrect) understanding of whether Linux users can set a file's birth time. NFSD will now recognize a time_create attribute but it ignores its value. It clears the time_create bit in the returned attribute bitmask to indicate that the value was not used.
Reported-by: Igor Mammedov imammedo@redhat.com Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Tested-by: Igor Mammedov imammedo@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 9 +++++++++ fs/nfsd/nfsd.h | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 804c137fabec5..b98a24c2a753c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -470,6 +470,15 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, return nfserr_bad_xdr; } } + if (bmval[1] & FATTR4_WORD1_TIME_CREATE) { + struct timespec64 ts; + + /* No Linux filesystem supports setting this attribute. */ + bmval[1] &= ~FATTR4_WORD1_TIME_CREATE; + status = nfsd4_decode_nfstime4(argp, &ts); + if (status) + return status; + } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { u32 set_it;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 847b482155ae9..9a8b09afc1733 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -465,7 +465,8 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) (FATTR4_WORD0_SIZE | FATTR4_WORD0_ACL) #define NFSD_WRITEABLE_ATTRS_WORD1 \ (FATTR4_WORD1_MODE | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP \ - | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_MODIFY_SET) + | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_CREATE \ + | FATTR4_WORD1_TIME_MODIFY_SET) #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #define MAYBE_FATTR4_WORD2_SECURITY_LABEL \ FATTR4_WORD2_SECURITY_LABEL
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit aec158242b87a43d83322e99bc71ab4428e5ab79 ]
Unlocking a POSIX lock on an inode with vfs_lock_file only works if the owner matches. Ensure we set it in the request.
Cc: J. Bruce Fields bfields@fieldses.org Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 0a22a2faf5522..b2f277727469c 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) } }
-static int nlm_unlock_files(struct nlm_file *file) +static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) { struct file_lock lock;
@@ -184,6 +184,7 @@ static int nlm_unlock_files(struct nlm_file *file) lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; + lock.fl_owner = owner; if (file->f_file[O_RDONLY] && vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) goto out_err; @@ -225,7 +226,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, if (match(lockhost, host)) {
spin_unlock(&flctx->flc_lock); - if (nlm_unlock_files(file)) + if (nlm_unlock_files(file, fl->fl_owner)) return 1; goto again; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 1197eb5906a5464dbaea24cac296dfc38499cc00 ]
This loop condition tries a bit too hard to be clever. Just test for the two indices we care about explicitly.
Cc: J. Bruce Fields bfields@fieldses.org Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index b2f277727469c..e1c4617de7714 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -283,11 +283,10 @@ nlm_file_inuse(struct nlm_file *file)
static void nlm_close_files(struct nlm_file *file) { - struct file *f; - - for (f = file->f_file[0]; f <= file->f_file[1]; f++) - if (f) - nlmsvc_ops->fclose(f); + if (file->f_file[O_RDONLY]) + nlmsvc_ops->fclose(file->f_file[O_RDONLY]); + if (file->f_file[O_WRONLY]) + nlmsvc_ops->fclose(file->f_file[O_WRONLY]); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Ford ojford@gmail.com
[ Upstream commit c05787b4c2f80a3bebcb9cdbf255d4fa5c1e24e1 ]
Correct spelling in comment.
Signed-off-by: Oliver Ford ojford@gmail.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220518145959.41-1-ojford@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/inotify/inotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 3d5d536f8fd63..7360d16ce46d7 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -123,7 +123,7 @@ static inline u32 inotify_mask_to_arg(__u32 mask) IN_Q_OVERFLOW); }
-/* intofiy userspace file descriptor functions */ +/* inotify userspace file descriptor functions */ static __poll_t inotify_poll(struct file *file, poll_table *wait) { struct fsnotify_group *group = file->private_data;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 31a371e419c885e0f137ce70395356ba8639dc52 ]
Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories.
Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask.
Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories.
To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags.
This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments.
Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 19 ++++--- fs/notify/fanotify/fanotify_user.c | 21 ++++--- fs/notify/fdinfo.c | 6 +- fs/notify/fsnotify.c | 21 ++++--- include/linux/fsnotify_backend.h | 89 ++++++++++++++++++++++++++++-- 5 files changed, 121 insertions(+), 35 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 4f897e1095470..cd7d09a569fff 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -295,12 +295,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, const void *data, int data_type, struct inode *dir) { - __u32 marks_mask = 0, marks_ignored_mask = 0; + __u32 marks_mask = 0, marks_ignore_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | FANOTIFY_EVENT_FLAGS; const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct fsnotify_mark *mark; + bool ondir = event_mask & FAN_ONDIR; int type;
pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n", @@ -315,19 +316,21 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; } else if (!(fid_mode & FAN_REPORT_FID)) { /* Do we have a directory inode to report? */ - if (!dir && !(event_mask & FS_ISDIR)) + if (!dir && !ondir) return 0; }
fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - /* Apply ignore mask regardless of mark's ISDIR flag */ - marks_ignored_mask |= mark->ignored_mask; + /* + * Apply ignore mask depending on event flags in ignore mask. + */ + marks_ignore_mask |= + fsnotify_effective_ignore_mask(mark, ondir, type);
/* - * If the event is on dir and this mark doesn't care about - * events on dir, don't send it! + * Send the event depending on event flags in mark mask. */ - if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR)) + if (!fsnotify_mask_applicable(mark->mask, ondir, type)) continue;
marks_mask |= mark->mask; @@ -336,7 +339,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, *match_mask |= 1U << type; }
- test_mask = event_mask & marks_mask & ~marks_ignored_mask; + test_mask = event_mask & marks_mask & ~marks_ignore_mask;
/* * For dirent modification events (create/delete/move) that do not carry diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 990464e00aec7..0d61cb0e49075 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1000,7 +1000,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, if (!(flags & FAN_MARK_IGNORED_MASK)) { fsn_mark->mask &= ~mask; } else { - fsn_mark->ignored_mask &= ~mask; + fsn_mark->ignore_mask &= ~mask; } newmask = fsnotify_calc_mask(fsn_mark); /* @@ -1009,7 +1009,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, * changes to the mask. * Destroy mark when only umask bits remain. */ - *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); + *destroy = !((fsn_mark->mask | fsn_mark->ignore_mask) & ~umask); spin_unlock(&fsn_mark->lock);
return oldmask & ~newmask; @@ -1078,7 +1078,7 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, /* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set - * because of an ignored mask that is now going to survive FS_MODIFY. + * because of an ignore mask that is now going to survive FS_MODIFY. */ if ((fan_flags & FAN_MARK_IGNORED_MASK) && (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && @@ -1111,7 +1111,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, if (!(fan_flags & FAN_MARK_IGNORED_MASK)) fsn_mark->mask |= mask; else - fsn_mark->ignored_mask |= mask; + fsn_mark->ignore_mask |= mask;
recalc = fsnotify_calc_mask(fsn_mark) & ~fsnotify_conn_mask(fsn_mark->connector); @@ -1249,7 +1249,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
/* * If some other task has this inode open for write we should not add - * an ignored mark, unless that ignored mark is supposed to survive + * an ignore mask, unless that ignore mask is supposed to survive * modification changes anyway. */ if ((flags & FAN_MARK_IGNORED_MASK) && @@ -1559,7 +1559,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; - bool ignored = flags & FAN_MARK_IGNORED_MASK; + bool ignore = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; u32 umask = 0; int ret; @@ -1608,8 +1608,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL;
- /* Event flags (ONDIR, ON_CHILD) are meaningless in ignored mask */ - if (ignored) + /* + * Event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) have no effect with + * FAN_MARK_IGNORED_MASK. + */ + if (ignore) mask &= ~FANOTIFY_EVENT_FLAGS;
f = fdget(fanotify_fd); @@ -1723,7 +1726,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, * events with parent/name info for non-directory. */ if ((fid_mode & FAN_REPORT_DIR_FID) && - (flags & FAN_MARK_ADD) && !ignored) + (flags & FAN_MARK_ADD) && !ignore) mask |= FAN_EVENT_ON_CHILD; }
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 59fb40abe33d3..55081ae3a6ec0 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -113,7 +113,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) return; seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ", inode->i_ino, inode->i_sb->s_dev, - mflags, mark->mask, mark->ignored_mask); + mflags, mark->mask, mark->ignore_mask); show_mark_fhandle(m, inode); seq_putc(m, '\n'); iput(inode); @@ -121,12 +121,12 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) struct mount *mnt = fsnotify_conn_mount(mark->connector);
seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n", - mnt->mnt_id, mflags, mark->mask, mark->ignored_mask); + mnt->mnt_id, mflags, mark->mask, mark->ignore_mask); } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_SB) { struct super_block *sb = fsnotify_conn_sb(mark->connector);
seq_printf(m, "fanotify sdev:%x mflags:%x mask:%x ignored_mask:%x\n", - sb->s_dev, mflags, mark->mask, mark->ignored_mask); + sb->s_dev, mflags, mark->mask, mark->ignore_mask); } }
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 0b3e74935cb4f..8687562df2e37 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -324,7 +324,8 @@ static int send_to_group(__u32 mask, const void *data, int data_type, struct fsnotify_group *group = NULL; __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); __u32 marks_mask = 0; - __u32 marks_ignored_mask = 0; + __u32 marks_ignore_mask = 0; + bool is_dir = mask & FS_ISDIR; struct fsnotify_mark *mark; int type;
@@ -336,7 +337,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type, fsnotify_foreach_iter_mark_type(iter_info, mark, type) { if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - mark->ignored_mask = 0; + mark->ignore_mask = 0; } }
@@ -344,14 +345,15 @@ static int send_to_group(__u32 mask, const void *data, int data_type, fsnotify_foreach_iter_mark_type(iter_info, mark, type) { group = mark->group; marks_mask |= mark->mask; - marks_ignored_mask |= mark->ignored_mask; + marks_ignore_mask |= + fsnotify_effective_ignore_mask(mark, is_dir, type); }
- pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", - __func__, group, mask, marks_mask, marks_ignored_mask, + pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignore_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", + __func__, group, mask, marks_mask, marks_ignore_mask, data, data_type, dir, cookie);
- if (!(test_mask & marks_mask & ~marks_ignored_mask)) + if (!(test_mask & marks_mask & ~marks_ignore_mask)) return 0;
if (group->ops->handle_event) { @@ -423,7 +425,8 @@ static bool fsnotify_iter_select_report_types( * But is *this mark* watching children? */ if (type == FSNOTIFY_ITER_TYPE_PARENT && - !(mark->mask & FS_EVENT_ON_CHILD)) + !(mark->mask & FS_EVENT_ON_CHILD) && + !(fsnotify_ignore_mask(mark) & FS_EVENT_ON_CHILD)) continue;
fsnotify_iter_set_report_type(iter_info, type); @@ -532,8 +535,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
/* - * If this is a modify event we may need to clear some ignored masks. - * In that case, the object with ignored masks will have the FS_MODIFY + * If this is a modify event we may need to clear some ignore masks. + * In that case, the object with ignore masks will have the FS_MODIFY * event in its mask. * Otherwise, return if none of the marks care about this type of event. */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9560734759fa6..d7d96c806bff2 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -518,8 +518,8 @@ struct fsnotify_mark { struct hlist_node obj_list; /* Head of list of marks for an object [mark ref] */ struct fsnotify_mark_connector *connector; - /* Events types to ignore [mark->lock, group->mark_mutex] */ - __u32 ignored_mask; + /* Events types and flags to ignore [mark->lock, group->mark_mutex] */ + __u32 ignore_mask; /* General fsnotify mark flags */ #define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 #define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 @@ -529,6 +529,7 @@ struct fsnotify_mark { /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 #define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 +#define FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS 0x0400 unsigned int flags; /* flags [mark->lock] */ };
@@ -655,15 +656,91 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
/* functions used to manipulate the marks attached to inodes */
-/* Get mask for calculating object interest taking ignored mask into account */ +/* + * Canonical "ignore mask" including event flags. + * + * Note the subtle semantic difference from the legacy ->ignored_mask. + * ->ignored_mask traditionally only meant which events should be ignored, + * while ->ignore_mask also includes flags regarding the type of objects on + * which events should be ignored. + */ +static inline __u32 fsnotify_ignore_mask(struct fsnotify_mark *mark) +{ + __u32 ignore_mask = mark->ignore_mask; + + /* The event flags in ignore mask take effect */ + if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + return ignore_mask; + + /* + * Legacy behavior: + * - Always ignore events on dir + * - Ignore events on child if parent is watching children + */ + ignore_mask |= FS_ISDIR; + ignore_mask &= ~FS_EVENT_ON_CHILD; + ignore_mask |= mark->mask & FS_EVENT_ON_CHILD; + + return ignore_mask; +} + +/* Legacy ignored_mask - only event types to ignore */ +static inline __u32 fsnotify_ignored_events(struct fsnotify_mark *mark) +{ + return mark->ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* + * Check if mask (or ignore mask) should be applied depending if victim is a + * directory and whether it is reported to a watching parent. + */ +static inline bool fsnotify_mask_applicable(__u32 mask, bool is_dir, + int iter_type) +{ + /* Should mask be applied to a directory? */ + if (is_dir && !(mask & FS_ISDIR)) + return false; + + /* Should mask be applied to a child? */ + if (iter_type == FSNOTIFY_ITER_TYPE_PARENT && + !(mask & FS_EVENT_ON_CHILD)) + return false; + + return true; +} + +/* + * Effective ignore mask taking into account if event victim is a + * directory and whether it is reported to a watching parent. + */ +static inline __u32 fsnotify_effective_ignore_mask(struct fsnotify_mark *mark, + bool is_dir, int iter_type) +{ + __u32 ignore_mask = fsnotify_ignored_events(mark); + + if (!ignore_mask) + return 0; + + /* For non-dir and non-child, no need to consult the event flags */ + if (!is_dir && iter_type != FSNOTIFY_ITER_TYPE_PARENT) + return ignore_mask; + + ignore_mask = fsnotify_ignore_mask(mark); + if (!fsnotify_mask_applicable(ignore_mask, is_dir, iter_type)) + return 0; + + return ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* Get mask for calculating object interest taking ignore mask into account */ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) { __u32 mask = mark->mask;
- if (!mark->ignored_mask) + if (!fsnotify_ignored_events(mark)) return mask;
- /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + /* Interest in FS_MODIFY may be needed for clearing ignore mask */ if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mask |= FS_MODIFY;
@@ -671,7 +748,7 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. */ - return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); + return mask | mark->ignore_mask; }
/* Get mask of events for a list of marks */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit 8afd7215aa97f8868d033f6e1d01a276ab2d29c0 ]
Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags.
Use variable mark_cmd to make the checks for mark command bits cleaner.
Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify_user.c | 30 +++++++++++++++++++++--------- include/linux/fanotify.h | 9 +++++---- 2 files changed, 26 insertions(+), 13 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 0d61cb0e49075..715e41b344129 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1175,6 +1175,19 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group) sizeof(struct fanotify_error_event)); }
+static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark, + unsigned int fan_flags) +{ + /* + * Non evictable mark cannot be downgraded to evictable mark. + */ + if (fan_flags & FAN_MARK_EVICTABLE && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + return -EEXIST; + + return 0; +} + static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, __u32 mask, unsigned int fan_flags, @@ -1196,13 +1209,11 @@ static int fanotify_add_mark(struct fsnotify_group *group, }
/* - * Non evictable mark cannot be downgraded to evictable mark. + * Check if requested mark flags conflict with an existing mark flags. */ - if (fan_flags & FAN_MARK_EVICTABLE && - !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) { - ret = -EEXIST; + ret = fanotify_may_update_existing_mark(fsn_mark, fan_flags); + if (ret) goto out; - }
/* * Error events are pre-allocated per group, only if strictly @@ -1559,6 +1570,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; + unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS; bool ignore = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; u32 umask = 0; @@ -1588,7 +1600,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, return -EINVAL; }
- switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) { + switch (mark_cmd) { case FAN_MARK_ADD: case FAN_MARK_REMOVE: if (!mask) @@ -1677,7 +1689,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME)) goto fput_and_out;
- if (flags & FAN_MARK_FLUSH) { + if (mark_cmd == FAN_MARK_FLUSH) { ret = 0; if (mark_type == FAN_MARK_MOUNT) fsnotify_clear_vfsmount_marks_by_group(group); @@ -1693,7 +1705,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (ret) goto fput_and_out;
- if (flags & FAN_MARK_ADD) { + if (mark_cmd == FAN_MARK_ADD) { ret = fanotify_events_supported(group, &path, mask, flags); if (ret) goto path_put_and_out; @@ -1731,7 +1743,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, }
/* create/update an inode mark */ - switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) { + switch (mark_cmd) { case FAN_MARK_ADD: if (mark_type == FAN_MARK_MOUNT) ret = fanotify_add_vfsmount_mark(group, mnt, mask, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 4f6cbe6c6e235..c9e185407ebcb 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -61,15 +61,16 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM)
+#define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ + FAN_MARK_FLUSH) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ - FAN_MARK_ADD | \ - FAN_MARK_REMOVE | \ + FANOTIFY_MARK_CMD_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ - FAN_MARK_EVICTABLE | \ - FAN_MARK_FLUSH) + FAN_MARK_EVICTABLE)
/* * Events that can be reported with data type FSNOTIFY_EVENT_PATH.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit e252f2ed1c8c6c3884ab5dd34e003ed21f1fe6e0 ]
This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask.
The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask.
FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask.
The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error.
Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error.
The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences:
1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error.
2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error.
The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names.
Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 2 + fs/notify/fanotify/fanotify_user.c | 63 +++++++++++++++++++++++++----- include/linux/fanotify.h | 5 ++- include/uapi/linux/fanotify.h | 8 ++++ 4 files changed, 67 insertions(+), 11 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 80e0ec95b1131..1d9f11255c64f 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -499,6 +499,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) mflags |= FAN_MARK_IGNORED_SURV_MODIFY; if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF) mflags |= FAN_MARK_EVICTABLE; + if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + mflags |= FAN_MARK_IGNORE;
return mflags; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 715e41b344129..72dd446606a78 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -997,7 +997,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, mask &= ~umask; spin_lock(&fsn_mark->lock); oldmask = fsnotify_calc_mask(fsn_mark); - if (!(flags & FAN_MARK_IGNORED_MASK)) { + if (!(flags & FANOTIFY_MARK_IGNORE_BITS)) { fsn_mark->mask &= ~mask; } else { fsn_mark->ignore_mask &= ~mask; @@ -1073,15 +1073,24 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, unsigned int fan_flags) { bool want_iref = !(fan_flags & FAN_MARK_EVICTABLE); + unsigned int ignore = fan_flags & FANOTIFY_MARK_IGNORE_BITS; bool recalc = false;
+ /* + * When using FAN_MARK_IGNORE for the first time, mark starts using + * independent event flags in ignore mask. After that, trying to + * update the ignore mask with the old FAN_MARK_IGNORED_MASK API + * will result in EEXIST error. + */ + if (ignore == FAN_MARK_IGNORE) + fsn_mark->flags |= FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS; + /* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set * because of an ignore mask that is now going to survive FS_MODIFY. */ - if ((fan_flags & FAN_MARK_IGNORED_MASK) && - (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && + if (ignore && (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; if (!(fsn_mark->mask & FS_MODIFY)) @@ -1108,7 +1117,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, bool recalc;
spin_lock(&fsn_mark->lock); - if (!(fan_flags & FAN_MARK_IGNORED_MASK)) + if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS)) fsn_mark->mask |= mask; else fsn_mark->ignore_mask |= mask; @@ -1185,6 +1194,24 @@ static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark, !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) return -EEXIST;
+ /* + * New ignore mask semantics cannot be downgraded to old semantics. + */ + if (fan_flags & FAN_MARK_IGNORED_MASK && + fsn_mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + return -EEXIST; + + /* + * An ignore mask that survives modify could never be downgraded to not + * survive modify. With new FAN_MARK_IGNORE semantics we make that rule + * explicit and return an error when trying to update the ignore mask + * without the original FAN_MARK_IGNORED_SURV_MODIFY value. + */ + if (fan_flags & FAN_MARK_IGNORE && + !(fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && + fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + return -EEXIST; + return 0; }
@@ -1219,7 +1246,8 @@ static int fanotify_add_mark(struct fsnotify_group *group, * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). */ - if (!(fan_flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS) && + (mask & FAN_FS_ERROR)) { ret = fanotify_group_init_error_pool(group); if (ret) goto out; @@ -1263,7 +1291,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group, * an ignore mask, unless that ignore mask is supposed to survive * modification changes anyway. */ - if ((flags & FAN_MARK_IGNORED_MASK) && + if ((flags & FANOTIFY_MARK_IGNORE_BITS) && !(flags & FAN_MARK_IGNORED_SURV_MODIFY) && inode_is_open_for_write(inode)) return 0; @@ -1519,7 +1547,8 @@ static int fanotify_events_supported(struct fsnotify_group *group, unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; /* Strict validation of events in non-dir inode mask with v5.17+ APIs */ bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) || - (mask & FAN_RENAME); + (mask & FAN_RENAME) || + (flags & FAN_MARK_IGNORE);
/* * Some filesystems such as 'proc' acquire unusual locks when opening @@ -1571,7 +1600,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS; - bool ignore = flags & FAN_MARK_IGNORED_MASK; + unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS; unsigned int obj_type, fid_mode; u32 umask = 0; int ret; @@ -1620,12 +1649,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL;
+ + /* We don't allow FAN_MARK_IGNORE & FAN_MARK_IGNORED_MASK together */ + if (ignore == (FAN_MARK_IGNORE | FAN_MARK_IGNORED_MASK)) + return -EINVAL; + /* * Event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) have no effect with * FAN_MARK_IGNORED_MASK. */ - if (ignore) + if (ignore == FAN_MARK_IGNORED_MASK) { mask &= ~FANOTIFY_EVENT_FLAGS; + umask = FANOTIFY_EVENT_FLAGS; + }
f = fdget(fanotify_fd); if (unlikely(!f.file)) @@ -1729,6 +1765,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt;
+ ret = mnt ? -EINVAL : -EISDIR; + /* FAN_MARK_IGNORE requires SURV_MODIFY for sb/mount/dir marks */ + if (mark_cmd == FAN_MARK_ADD && ignore == FAN_MARK_IGNORE && + (mnt || S_ISDIR(inode->i_mode)) && + !(flags & FAN_MARK_IGNORED_SURV_MODIFY)) + goto path_put_and_out; + /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; @@ -1821,7 +1864,7 @@ static int __init fanotify_user_setup(void)
BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index c9e185407ebcb..558844c8d2598 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -64,11 +64,14 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ FAN_MARK_FLUSH)
+#define FANOTIFY_MARK_IGNORE_BITS (FAN_MARK_IGNORED_MASK | \ + FAN_MARK_IGNORE) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ FANOTIFY_MARK_CMD_BITS | \ + FANOTIFY_MARK_IGNORE_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ - FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ FAN_MARK_EVICTABLE)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index f1f89132d60e2..d8536d77fea1c 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -83,12 +83,20 @@ #define FAN_MARK_FLUSH 0x00000080 /* FAN_MARK_FILESYSTEM is 0x00000100 */ #define FAN_MARK_EVICTABLE 0x00000200 +/* This bit is mutually exclusive with FAN_MARK_IGNORED_MASK bit */ +#define FAN_MARK_IGNORE 0x00000400
/* These are NOT bitwise flags. Both bits can be used togther. */ #define FAN_MARK_INODE 0x00000000 #define FAN_MARK_MOUNT 0x00000010 #define FAN_MARK_FILESYSTEM 0x00000100
+/* + * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY + * for non-inode mark types. + */ +#define FAN_MARK_IGNORE_SURV (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY) + /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_MARK_FLAGS (FAN_MARK_ADD |\ FAN_MARK_REMOVE |\
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Gao gaoxin@cdjrlc.com
[ Upstream commit feee1ce45a5666bbdb08c5bb2f5f394047b1915b ]
The double `if' is duplicated in line 104, remove one.
Signed-off-by: Xin Gao gaoxin@cdjrlc.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220722194639.18545-1-gaoxin@cdjrlc.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 8687562df2e37..7974e91ffe134 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -100,7 +100,7 @@ void fsnotify_sb_delete(struct super_block *sb) * Given an inode, first check if we care what happens to our children. Inotify * and dnotify both tell their parents about events. If we care about any event * on a child we run all of our children and set a dentry flag saying that the - * parent cares. Thus when an event happens on a child it can quickly tell if + * parent cares. Thus when an event happens on a child it can quickly tell * if there is a need to find a parent and send the event to the parent. */ void __fsnotify_update_child_dentry_flags(struct inode *inode)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 23ba98de6dcec665e15c0ca19244379bb0d30932 ]
We had a report from the spring Bake-a-thon of data corruption in some nfstest_interop tests. Looking at the traces showed the NFS server allowing a v3 WRITE to proceed while a read delegation was still outstanding.
Currently, we only set NFSD_FILE_BREAK_* flags if NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for COMMIT ops, where we need a writeable filehandle but don't need to break read leases.
It doesn't make any sense to consult that flag when allocating a file since the file may be used on subsequent calls where we do want to break the lease (and the usage of it here seems to be reverse from what it should be anyway).
Also, after calling nfsd_open_break_lease, we don't want to clear the BREAK_* bits. A lease could end up being set on it later (more than once) and we need to be able to break those leases as well.
This means that the NFSD_FILE_BREAK_* flags now just mirror NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just drop those flags and unconditionally call nfsd_open_break_lease every time.
Reported-by: Olga Kornieskaia kolga@netapp.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Cc: stable@vger.kernel.org # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro Cc: stable@vger.kernel.org # 5.4.x Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 22 +--------------------- fs/nfsd/filecache.h | 4 +--- fs/nfsd/trace.h | 2 -- 3 files changed, 2 insertions(+), 26 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fc0fcb3321537..1d3d13b78be0e 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -183,12 +183,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, nf->nf_hashval = hashval; refcount_set(&nf->nf_ref, 1); nf->nf_may = may & NFSD_FILE_MAY_MASK; - if (may & NFSD_MAY_NOT_BREAK_LEASE) { - if (may & NFSD_MAY_WRITE) - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); - if (may & NFSD_MAY_READ) - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - } nf->nf_mark = NULL; trace_nfsd_file_alloc(nf); } @@ -957,21 +951,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
this_cpu_inc(nfsd_file_cache_hits);
- if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { - bool write = (may_flags & NFSD_MAY_WRITE); - - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { - status = nfserrno(nfsd_open_break_lease( - file_inode(nf->nf_file), may_flags)); - if (status == nfs_ok) { - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - if (write) - clear_bit(NFSD_FILE_BREAK_WRITE, - &nf->nf_flags); - } - } - } + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { *pnf = nf; diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 1da0c79a55804..c9e3c6eb4776e 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -37,9 +37,7 @@ struct nfsd_file { struct net *nf_net; #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) -#define NFSD_FILE_BREAK_READ (2) -#define NFSD_FILE_BREAK_WRITE (3) -#define NFSD_FILE_REFERENCED (4) +#define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; struct inode *nf_inode; unsigned int nf_hashval; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8ccce4ac66b4e..5c2292a1892cc 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -707,8 +707,6 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"})
DECLARE_EVENT_CLASS(nfsd_file_class,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c770f31d8f580ed4b965c64f924ec1cc50e41734 ]
I discovered that xdr_encode_bool() was returning the same address that was passed in the @p parameter. The documenting comment states that the intent is to return the address of the next buffer location, just like the other "xdr_encode_*" helpers.
The result was the encoded results of NFSv3 PATHCONF operations were not formed correctly.
Fixes: ded04a587f6c ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream") Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/xdr.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index b8fb866e1fd32..59d99ff31c1a9 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -418,8 +418,8 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) */ static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) { - *p = n ? xdr_one : xdr_zero; - return p++; + *p++ = n ? xdr_one : xdr_zero; + return p; }
/**
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Coddington bcodding@redhat.com
[ Upstream commit 184cefbe62627730c30282df12bcff9aae4816ea ]
Instead of trusting that struct file_lock returns completely unchanged after vfs_test_lock() when there's no conflicting lock, stash away our nlm_lockowner reference so we can properly release it for all cases.
This defends against another file_lock implementation overwriting fl_owner when the return type is F_UNLCK.
Reported-by: Roberto Bergantinos Corpas rbergant@redhat.com Tested-by: Roberto Bergantinos Corpas rbergant@redhat.com Signed-off-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 4 +++- fs/lockd/svclock.c | 10 +--------- fs/lockd/svcproc.c | 5 ++++- include/linux/lockd/lockd.h | 1 + 4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 176b468a61c75..4f247ab8be611 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -87,6 +87,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; + struct nlm_lockowner *test_owner; __be32 rc = rpc_success;
dprintk("lockd: TEST4 called\n"); @@ -96,6 +97,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
+ test_owner = argp->lock.fl.fl_owner; /* Now check for conflicting locks */ resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie); if (resp->status == nlm_drop_reply) @@ -103,7 +105,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) else dprintk("lockd: TEST4 status %d\n", ntohl(resp->status));
- nlmsvc_release_lockowner(&argp->lock); + nlmsvc_put_lockowner(test_owner); nlmsvc_release_host(host); nlm_release_file(file); return rc; diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index cb3658ab9b7ae..9c1aa75441e1c 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -340,7 +340,7 @@ nlmsvc_get_lockowner(struct nlm_lockowner *lockowner) return lockowner; }
-static void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) +void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) { if (!refcount_dec_and_lock(&lockowner->count, &lockowner->host->h_lock)) return; @@ -590,7 +590,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, int error; int mode; __be32 ret; - struct nlm_lockowner *test_owner;
dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -604,9 +603,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, goto out; }
- /* If there's a conflicting lock, remember to clean up the test lock */ - test_owner = (struct nlm_lockowner *)lock->fl.fl_owner; - mode = lock_to_openmode(&lock->fl); error = vfs_test_lock(file->f_file[mode], &lock->fl); if (error) { @@ -635,10 +631,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, conflock->fl.fl_end = lock->fl.fl_end; locks_release_private(&lock->fl);
- /* Clean up the test lock */ - lock->fl.fl_owner = NULL; - nlmsvc_put_lockowner(test_owner); - ret = nlm_lck_denied; out: return ret; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 4dc1b40a489a2..b09ca35b527cc 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -116,6 +116,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; + struct nlm_lockowner *test_owner; __be32 rc = rpc_success;
dprintk("lockd: TEST called\n"); @@ -125,6 +126,8 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
+ test_owner = argp->lock.fl.fl_owner; + /* Now check for conflicting locks */ resp->status = cast_status(nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie)); if (resp->status == nlm_drop_reply) @@ -133,7 +136,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) dprintk("lockd: TEST status %d vers %d\n", ntohl(resp->status), rqstp->rq_vers);
- nlmsvc_release_lockowner(&argp->lock); + nlmsvc_put_lockowner(test_owner); nlmsvc_release_host(host); nlm_release_file(file); return rc; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index fcef192e5e45e..70ce419e27093 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -292,6 +292,7 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, struct nlm_lock *); void nlm_release_file(struct nlm_file *); +void nlmsvc_put_lockowner(struct nlm_lockowner *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *); void nlmsvc_free_host_resources(struct nlm_host *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Jiaming jiaming@nfschina.com
[ Upstream commit f532c9ff103897be0e2a787c0876683c3dc39ed3 ]
Add a blank space after ','. Change 'succesful' to 'successful'.
Signed-off-by: Zhang Jiaming jiaming@nfschina.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index adbac1e77e9e2..cb4a037266709 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -828,7 +828,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr,S_IFCHR, rdev, &resfh); + &create->cr_iattr, S_IFCHR, rdev, &resfh); break;
case NF4SOCK: @@ -2703,7 +2703,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) if (op->opdesc->op_flags & OP_MODIFIES_SOMETHING) { /* * Don't execute this op if we couldn't encode a - * succesful reply: + * successful reply: */ u32 plen = op->opdesc->op_rsize_bop(rqstp, op); /*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.i.king@gmail.com
[ Upstream commit 842e00ac3aa3b4a4f7f750c8ab54f8578fc875d3 ]
Variable len is being assigned a value zero and this is never read, it is being re-assigned later. The assignment is redundant and can be removed.
Cleans up clang scan-build warning: fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read
Signed-off-by: Colin Ian King colin.i.king@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 0621c2faf2424..66c352bf61b1d 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -633,7 +633,6 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) }
/* Now write current state into reply buffer */ - len = 0; sep = ""; remaining = SIMPLE_TRANSACTION_LIMIT; for (num=2 ; num <= 4 ; num++) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ca3f9acb6d3faf78da2b63324f7c737dbddf7f69 ]
The call trace doesn't add much value, but it sure is noisy.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cb4a037266709..08c2eaca4f24e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -630,9 +630,9 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, }
status = nfsd4_process_open2(rqstp, resfh, open); - WARN(status && open->op_created, - "nfsd4_process_open2 failed to open newly-created file! status=%u\n", - be32_to_cpu(status)); + if (status && open->op_created) + pr_warn("nfsd4_process_open2 failed to open newly-created file: status=%u\n", + be32_to_cpu(status)); if (reclaim && !status) nn->somebody_reclaimed = true; out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0fd244c115f0321fc5e34ad2291f2a572508e3f7 ]
Surface the NFSD filecache's LRU list length to help field troubleshooters monitor filecache issues.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d3d13b78be0e..377d8211200ff 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1047,7 +1047,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned int i, count = 0, longest = 0; - unsigned long hits = 0; + unsigned long lru = 0, hits = 0;
/* * No need for spinlocks here since we're not terribly interested in @@ -1060,6 +1060,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) count += nfsd_file_hashtbl[i].nfb_count; longest = max(longest, nfsd_file_hashtbl[i].nfb_count); } + lru = list_lru_count(&nfsd_file_lru); } mutex_unlock(&nfsd_mutex);
@@ -1068,6 +1069,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
seq_printf(m, "total entries: %u\n", count); seq_printf(m, "longest chain: %u\n", longest); + seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 29d4bdbbb910f33d6058d2c51278f00f656df325 ]
Count the number of successful acquisitions that did not create a file (ie, acquisitions that do not result in a compulsory cache miss). This count can be compared directly with the reported hit count to compute a hit ratio.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 377d8211200ff..5a09b76ae25a8 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -42,6 +42,7 @@ struct nfsd_fcache_bucket { };
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); +static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
struct nfsd_fcache_disposal { struct work_struct work; @@ -954,6 +955,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { + if (open) + this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { nfsd_file_put(nf); @@ -1046,8 +1049,9 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { + unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, longest = 0; - unsigned long lru = 0, hits = 0; + unsigned long lru = 0;
/* * No need for spinlocks here since we're not terribly interested in @@ -1064,13 +1068,16 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) } mutex_unlock(&nfsd_mutex);
- for_each_possible_cpu(i) + for_each_possible_cpu(i) { hits += per_cpu(nfsd_file_cache_hits, i); + acquisitions += per_cpu(nfsd_file_acquisitions, i); + }
seq_printf(m, "total entries: %u\n", count); seq_printf(m, "longest chain: %u\n", longest); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); + seq_printf(m, "acquisitions: %lu\n", acquisitions); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d63293272abb51c02457f1017dfd61c3270d9ae3 ]
Surface the count of freed nfsd_file items.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5a09b76ae25a8..5d31622c23040 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -43,6 +43,7 @@ struct nfsd_fcache_bucket {
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); +static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
struct nfsd_fcache_disposal { struct work_struct work; @@ -195,6 +196,8 @@ nfsd_file_free(struct nfsd_file *nf) { bool flush = false;
+ this_cpu_inc(nfsd_file_releases); + trace_nfsd_file_put_final(nf); if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); @@ -1049,7 +1052,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0; + unsigned long hits = 0, acquisitions = 0, releases = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0;
@@ -1071,6 +1074,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) for_each_possible_cpu(i) { hits += per_cpu(nfsd_file_cache_hits, i); acquisitions += per_cpu(nfsd_file_acquisitions, i); + releases += per_cpu(nfsd_file_releases, i); }
seq_printf(m, "total entries: %u\n", count); @@ -1078,6 +1082,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); + seq_printf(m, "releases: %lu\n", releases); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 904940e94a887701db24401e3ed6928a1d4e329f ]
This is a measure of how long items stay in the filecache, to help assess how efficient the cache is.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 11 ++++++++++- fs/nfsd/filecache.h | 1 + 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5d31622c23040..0cd72c20fc12d 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -44,6 +44,7 @@ struct nfsd_fcache_bucket { static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); +static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age);
struct nfsd_fcache_disposal { struct work_struct work; @@ -177,6 +178,7 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, if (nf) { INIT_HLIST_NODE(&nf->nf_node); INIT_LIST_HEAD(&nf->nf_lru); + nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; nf->nf_cred = get_current_cred(); nf->nf_net = net; @@ -194,9 +196,11 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, static bool nfsd_file_free(struct nfsd_file *nf) { + s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); bool flush = false;
this_cpu_inc(nfsd_file_releases); + this_cpu_add(nfsd_file_total_age, age);
trace_nfsd_file_put_final(nf); if (nf->nf_mark) @@ -1054,7 +1058,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long hits = 0, acquisitions = 0, releases = 0; unsigned int i, count = 0, longest = 0; - unsigned long lru = 0; + unsigned long lru = 0, total_age = 0;
/* * No need for spinlocks here since we're not terribly interested in @@ -1075,6 +1079,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) hits += per_cpu(nfsd_file_cache_hits, i); acquisitions += per_cpu(nfsd_file_acquisitions, i); releases += per_cpu(nfsd_file_releases, i); + total_age += per_cpu(nfsd_file_total_age, i); }
seq_printf(m, "total entries: %u\n", count); @@ -1083,6 +1088,10 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); seq_printf(m, "releases: %lu\n", releases); + if (releases) + seq_printf(m, "mean age (ms): %ld\n", total_age / releases); + else + seq_printf(m, "mean age (ms): -\n"); return 0; }
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index c9e3c6eb4776e..c6ad5fe47f12f 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -44,6 +44,7 @@ struct nfsd_file { refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark; + ktime_t nf_birthtime; };
int nfsd_file_cache_init(void);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0bac5a264d9a923f5b01f3521e1519a8d0358342 ]
Refactor the invariant part of nfsd_file_lru_walk_list() into a separate helper function.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 0cd72c20fc12d..ffe46f3f33495 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -450,11 +450,31 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, return LRU_SKIP; }
+/* + * Unhash items on @dispose immediately, then queue them on the + * disposal workqueue to finish releasing them in the background. + * + * cel: Note that between the time list_lru_shrink_walk runs and + * now, these items are in the hash table but marked unhashed. + * Why release these outside of lru_cb ? There's no lock ordering + * problem since lru_cb currently takes no lock. + */ +static void nfsd_file_gc_dispose_list(struct list_head *dispose) +{ + struct nfsd_file *nf; + + list_for_each_entry(nf, dispose, nf_lru) { + spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + nfsd_file_do_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + } + nfsd_file_dispose_list_delayed(dispose); +} + static unsigned long nfsd_file_lru_walk_list(struct shrink_control *sc) { LIST_HEAD(head); - struct nfsd_file *nf; unsigned long ret;
if (sc) @@ -464,12 +484,7 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, &head, LONG_MAX); - list_for_each_entry(nf, &head, nf_lru) { - spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - } - nfsd_file_dispose_list_delayed(&head); + nfsd_file_gc_dispose_list(&head); return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3bc6d3470fe412f818f9bff6b71d1be3a76af8f3 ]
Refactor nfsd_file_gc() to use the new list_lru helper.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index ffe46f3f33495..656c94c779417 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -491,7 +491,11 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) static void nfsd_file_gc(void) { - nfsd_file_lru_walk_list(NULL); + LIST_HEAD(dispose); + + list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, + &dispose, LONG_MAX); + nfsd_file_gc_dispose_list(&dispose); }
static void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 39f1d1ff8148902c5692ffb0e1c4479416ab44a7 ]
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 25 +++++++------------------ 1 file changed, 7 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 656c94c779417..1d94491e5ddad 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -471,23 +471,6 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose) nfsd_file_dispose_list_delayed(dispose); }
-static unsigned long -nfsd_file_lru_walk_list(struct shrink_control *sc) -{ - LIST_HEAD(head); - unsigned long ret; - - if (sc) - ret = list_lru_shrink_walk(&nfsd_file_lru, sc, - nfsd_file_lru_cb, &head); - else - ret = list_lru_walk(&nfsd_file_lru, - nfsd_file_lru_cb, - &head, LONG_MAX); - nfsd_file_gc_dispose_list(&head); - return ret; -} - static void nfsd_file_gc(void) { @@ -514,7 +497,13 @@ nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc) static unsigned long nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) { - return nfsd_file_lru_walk_list(sc); + LIST_HEAD(dispose); + unsigned long ret; + + ret = list_lru_shrink_walk(&nfsd_file_lru, sc, + nfsd_file_lru_cb, &dispose); + nfsd_file_gc_dispose_list(&dispose); + return ret; }
static struct shrinker nfsd_file_shrinker = {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 94660cc19c75083af046b0f8362e3d3bc2eba21d ]
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 13 ++++++++++--- fs/nfsd/trace.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d94491e5ddad..e5bd9f06492c8 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -45,6 +45,7 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); +static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
struct nfsd_fcache_disposal { struct work_struct work; @@ -445,6 +446,7 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, goto out_skip;
list_lru_isolate_move(lru, &nf->nf_lru, head); + this_cpu_inc(nfsd_file_evictions); return LRU_REMOVED; out_skip: return LRU_SKIP; @@ -475,9 +477,11 @@ static void nfsd_file_gc(void) { LIST_HEAD(dispose); + unsigned long ret;
- list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, - &dispose, LONG_MAX); + ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, + &dispose, LONG_MAX); + trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); }
@@ -502,6 +506,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
ret = list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, &dispose); + trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); return ret; } @@ -1064,7 +1069,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0, releases = 0; + unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0, total_age = 0;
@@ -1088,6 +1093,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) acquisitions += per_cpu(nfsd_file_acquisitions, i); releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); + evictions += per_cpu(nfsd_file_evictions, i); }
seq_printf(m, "total entries: %u\n", count); @@ -1096,6 +1102,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); seq_printf(m, "releases: %lu\n", releases); + seq_printf(m, "evictions: %lu\n", evictions); if (releases) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5c2292a1892cc..b373a161e862b 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -860,6 +860,35 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->nlink, __entry->mode, __entry->mask) );
+DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, + TP_PROTO( + unsigned long removed, + unsigned long remaining + ), + TP_ARGS(removed, remaining), + TP_STRUCT__entry( + __field(unsigned long, removed) + __field(unsigned long, remaining) + ), + TP_fast_assign( + __entry->removed = removed; + __entry->remaining = remaining; + ), + TP_printk("%lu entries removed, %lu remaining", + __entry->removed, __entry->remaining) +); + +#define DEFINE_NFSD_FILE_LRUWALK_EVENT(name) \ +DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ + TP_PROTO( \ + unsigned long removed, \ + unsigned long remaining \ + ), \ + TP_ARGS(removed, remaining)) + +DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); +DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed); + #include "cache.h"
TRACE_DEFINE_ENUM(RC_DROPIT);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit df2aff524faceaf743b7c5ab0f4fb86cb511f782 ]
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e5bd9f06492c8..b9941d4ef20d6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -45,6 +45,7 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); +static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed); static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
struct nfsd_fcache_disposal { @@ -242,7 +243,12 @@ nfsd_file_check_write_error(struct nfsd_file *nf) static void nfsd_file_flush(struct nfsd_file *nf) { - if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0) + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return; + this_cpu_add(nfsd_file_pages_flushed, file->f_mapping->nrpages); + if (vfs_fsync(file, 1) != 0) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); }
@@ -1069,7 +1075,8 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0; + unsigned long releases = 0, pages_flushed = 0, evictions = 0; + unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0, total_age = 0;
@@ -1094,6 +1101,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); evictions += per_cpu(nfsd_file_evictions, i); + pages_flushed += per_cpu(nfsd_file_pages_flushed, i); }
seq_printf(m, "total entries: %u\n", count); @@ -1107,6 +1115,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else seq_printf(m, "mean age (ms): -\n"); + seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8b330f78040cbe16cf8029df70391b2a491f17e2 ]
If nfsd_file_cache_init() is called after a shutdown, be sure the stat counters are reset.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b9941d4ef20d6..60c51a4d8e0d7 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -823,6 +823,8 @@ nfsd_file_cache_shutdown_net(struct net *net) void nfsd_file_cache_shutdown(void) { + int i; + set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags);
lease_unregister_notifier(&nfsd_file_lease_notifier); @@ -846,6 +848,15 @@ nfsd_file_cache_shutdown(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + + for_each_possible_cpu(i) { + per_cpu(nfsd_file_cache_hits, i) = 0; + per_cpu(nfsd_file_acquisitions, i) = 0; + per_cpu(nfsd_file_releases, i) = 0; + per_cpu(nfsd_file_total_age, i) = 0; + per_cpu(nfsd_file_pages_flushed, i) = 0; + per_cpu(nfsd_file_evictions, i) = 0; + } }
static bool
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 2e6c6e4c4375bfd3defa5b1ff3604d9f33d1c936 ]
There has always been the capability of exporting filecache metrics via /proc, but it was never hooked up. Let's surface these metrics to enable better observability of the filecache.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 66c352bf61b1d..7002edbf26870 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -25,6 +25,7 @@ #include "state.h" #include "netns.h" #include "pnfs.h" +#include "filecache.h"
/* * We have a single directory with several nodes in it. @@ -45,6 +46,7 @@ enum { NFSD_Ports, NFSD_MaxBlkSize, NFSD_MaxConnections, + NFSD_Filecache, NFSD_SupportedEnctypes, /* * The below MUST come last. Otherwise we leave a hole in nfsd_files[] @@ -229,6 +231,13 @@ static const struct file_operations reply_cache_stats_operations = { .release = single_release, };
+static const struct file_operations filecache_ops = { + .open = nfsd_file_cache_stats_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + /*----------------------------------------------------------------------------*/ /* * payload - write methods @@ -1370,6 +1379,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, + [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO}, #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 668ed92e651d3c25f9b6e8cb7ceca54d00daa96d ]
Add a guardrail to prevent freeing memory that is still on a list. This includes either a dispose list or the LRU list.
This is the sign of a bug, but this class of bugs can be detected so that they don't endanger system stability, especially while debugging.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 60c51a4d8e0d7..d9b5f1e183976 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -213,6 +213,14 @@ nfsd_file_free(struct nfsd_file *nf) fput(nf->nf_file); flush = true; } + + /* + * If this item is still linked via nf_lru, that's a bug. + * WARN and leak it to preserve system stability. + */ + if (WARN_ON_ONCE(!list_empty(&nf->nf_lru))) + return flush; + call_rcu(&nf->nf_rcu, nfsd_file_slab_free); return flush; } @@ -342,7 +350,7 @@ nfsd_file_dispose_list(struct list_head *dispose)
while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del(&nf->nf_lru); + list_del_init(&nf->nf_lru); nfsd_file_flush(nf); nfsd_file_put_noref(nf); } @@ -356,7 +364,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose)
while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del(&nf->nf_lru); + list_del_init(&nf->nf_lru); nfsd_file_flush(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c46203acddd9b9200dbc53d0603c97355fd3a03b ]
Observe the operation of garbage collection and the lifetime of filecache items.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 44 +++++++++++++++++++++++++++++++------------- fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d9b5f1e183976..a995a744a7481 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -260,6 +260,18 @@ nfsd_file_flush(struct nfsd_file *nf) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); }
+static void nfsd_file_lru_add(struct nfsd_file *nf) +{ + if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) + trace_nfsd_file_lru_add(nf); +} + +static void nfsd_file_lru_remove(struct nfsd_file *nf) +{ + if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) + trace_nfsd_file_lru_del(nf); +} + static void nfsd_file_do_unhash(struct nfsd_file *nf) { @@ -279,8 +291,7 @@ nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_do_unhash(nf); - if (!list_empty(&nf->nf_lru)) - list_lru_del(&nfsd_file_lru, &nf->nf_lru); + nfsd_file_lru_remove(nf); return true; } return false; @@ -443,27 +454,34 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, * counter. Here we check the counter and then test and clear the flag. * That order is deliberate to ensure that we can do this locklessly. */ - if (refcount_read(&nf->nf_ref) > 1) - goto out_skip; + if (refcount_read(&nf->nf_ref) > 1) { + trace_nfsd_file_gc_in_use(nf); + return LRU_SKIP; + }
/* * Don't throw out files that are still undergoing I/O or * that have uncleared errors pending. */ - if (nfsd_file_check_writeback(nf)) - goto out_skip; + if (nfsd_file_check_writeback(nf)) { + trace_nfsd_file_gc_writeback(nf); + return LRU_SKIP; + }
- if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) - goto out_skip; + if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { + trace_nfsd_file_gc_referenced(nf); + return LRU_SKIP; + }
- if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) - goto out_skip; + if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + trace_nfsd_file_gc_hashed(nf); + return LRU_SKIP; + }
list_lru_isolate_move(lru, &nf->nf_lru, head); this_cpu_inc(nfsd_file_evictions); + trace_nfsd_file_gc_disposed(nf); return LRU_REMOVED; -out_skip: - return LRU_SKIP; }
/* @@ -1016,7 +1034,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, refcount_inc(&nf->nf_ref); __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - list_lru_add(&nfsd_file_lru, &nf->nf_lru); + nfsd_file_lru_add(nf); hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); ++nfsd_file_hashtbl[hashval].nfb_count; nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index b373a161e862b..b1aa28c062ac5 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -860,6 +860,45 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->nlink, __entry->mode, __entry->mask) );
+DECLARE_EVENT_CLASS(nfsd_file_gc_class, + TP_PROTO( + const struct nfsd_file *nf + ), + TP_ARGS(nf), + TP_STRUCT__entry( + __field(void *, nf_inode) + __field(void *, nf_file) + __field(int, nf_ref) + __field(unsigned long, nf_flags) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_file = nf->nf_file; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + ), + TP_printk("inode=%p ref=%d nf_flags=%s nf_file=%p", + __entry->nf_inode, __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + __entry->nf_file + ) +); + +#define DEFINE_NFSD_FILE_GC_EVENT(name) \ +DEFINE_EVENT(nfsd_file_gc_class, name, \ + TP_PROTO( \ + const struct nfsd_file *nf \ + ), \ + TP_ARGS(nf)) + +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_hashed); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed); + DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, TP_PROTO( unsigned long removed,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4a0e73e635e3f36b616ad5c943e3d23debe4632f ]
There have been reports of problems when running fstests generic/531 against Linux NFS servers with NFSv4. The NFS server that hosts the test's SCRATCH_DEV suffers from CPU soft lock-ups during the test. Analysis shows that:
fs/nfsd/filecache.c 482 ret = list_lru_walk(&nfsd_file_lru, 483 nfsd_file_lru_cb, 484 &head, LONG_MAX);
causes nfsd_file_gc() to walk the entire length of the filecache LRU list every time it is called (which is quite frequently). The walk holds a spinlock the entire time that prevents other nfsd threads from accessing the filecache.
What's more, for NFSv4 workloads, none of the items that are visited during this walk may be evicted, since they are all files that are held OPEN by NFS clients.
Address this by ensuring that open files are not kept on the LRU list.
Reported-by: Frank van der Linden fllinden@amazon.com Reported-by: Wang Yugui wangyugui@e16-tech.com Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386 Suggested-by: Trond Myklebust trond.myklebust@hammerspace.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 24 +++++++++++++++++++----- fs/nfsd/trace.h | 2 ++ 2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index a995a744a7481..5c9e3ff6397b0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -262,6 +262,7 @@ nfsd_file_flush(struct nfsd_file *nf)
static void nfsd_file_lru_add(struct nfsd_file *nf) { + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) trace_nfsd_file_lru_add(nf); } @@ -291,7 +292,6 @@ nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_do_unhash(nf); - nfsd_file_lru_remove(nf); return true; } return false; @@ -312,6 +312,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp if (refcount_dec_not_one(&nf->nf_ref)) return true;
+ nfsd_file_lru_remove(nf); list_add(&nf->nf_lru, dispose); return true; } @@ -323,6 +324,7 @@ nfsd_file_put_noref(struct nfsd_file *nf)
if (refcount_dec_and_test(&nf->nf_ref)) { WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); + nfsd_file_lru_remove(nf); nfsd_file_free(nf); } } @@ -332,7 +334,7 @@ nfsd_file_put(struct nfsd_file *nf) { might_sleep();
- set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); + nfsd_file_lru_add(nf); if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); @@ -432,8 +434,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) } }
-/* +/** + * nfsd_file_lru_cb - Examine an entry on the LRU list + * @item: LRU entry to examine + * @lru: controlling LRU + * @lock: LRU list lock (unused) + * @arg: dispose list + * * Note this can deadlock with nfsd_file_cache_purge. + * + * Return values: + * %LRU_REMOVED: @item was removed from the LRU + * %LRU_SKIP: @item cannot be evicted */ static enum lru_status nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, @@ -455,8 +467,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, * That order is deliberate to ensure that we can do this locklessly. */ if (refcount_read(&nf->nf_ref) > 1) { + list_lru_isolate(lru, &nf->nf_lru); trace_nfsd_file_gc_in_use(nf); - return LRU_SKIP; + return LRU_REMOVED; }
/* @@ -1013,6 +1026,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto retry; }
+ nfsd_file_lru_remove(nf); this_cpu_inc(nfsd_file_cache_hits);
status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); @@ -1034,7 +1048,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, refcount_inc(&nf->nf_ref); __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - nfsd_file_lru_add(nf); hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); ++nfsd_file_hashtbl[hashval].nfb_count; nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, @@ -1059,6 +1072,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, */ if (status != nfs_ok || inode->i_nlink == 0) { bool do_free; + nfsd_file_lru_remove(nf); spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); do_free = nfsd_file_unhash(nf); spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index b1aa28c062ac5..5919cdcf137b8 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -892,7 +892,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name, \ TP_ARGS(nf))
DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit edead3a55804739b2e4af0f35e9c7326264e7b22 ]
Without LRU item rotation, the shrinker visits only a few items on the end of the LRU list, and those would always be long-term OPEN files for NFSv4 workloads. That makes the filecache shrinker completely ineffective.
Adopt the same strategy as the inode LRU by using LRU_ROTATE.
Suggested-by: Dave Chinner david@fromorbit.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5c9e3ff6397b0..849c010c6ef61 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -445,6 +445,7 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) * * Return values: * %LRU_REMOVED: @item was removed from the LRU + * %LRU_ROTATE: @item is to be moved to the LRU tail * %LRU_SKIP: @item cannot be evicted */ static enum lru_status @@ -483,7 +484,7 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { trace_nfsd_file_gc_referenced(nf); - return LRU_SKIP; + return LRU_ROTATE; }
if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { @@ -525,7 +526,7 @@ nfsd_file_gc(void) unsigned long ret;
ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, - &dispose, LONG_MAX); + &dispose, list_lru_count(&nfsd_file_lru)); trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6df19411367a5fb4ef61854cbd1af269c077f917 ]
The checks in nfsd_file_acquire() and nfsd_file_put() that directly invoke filecache garbage collection are intended to keep cache occupancy between a low- and high-watermark. The reason to limit the capacity of the filecache is to keep filecache lookups reasonably fast.
However, invoking garbage collection at those points has some undesirable negative impacts. Files that are held open by NFSv4 clients often push the occupancy of the filecache over these watermarks. At that point:
- Every call to nfsd_file_acquire() and nfsd_file_put() results in an LRU walk. This has the same effect on lookup latency as long chains in the hash table. - Garbage collection will then run on every nfsd thread, causing a lot of unnecessary lock contention. - Limiting cache capacity pushes out files used only by NFSv3 clients, which are the type of files the filecache is supposed to help.
To address those negative impacts, remove the direct calls to the garbage collector. Subsequent patches will address maintaining lookup efficiency as cache capacity increases.
Suggested-by: Wang Yugui wangyugui@e16-tech.com Suggested-by: Dave Chinner david@fromorbit.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 849c010c6ef61..7a02ff11b9ec1 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -28,8 +28,6 @@ #define NFSD_LAUNDRETTE_DELAY (2 * HZ)
#define NFSD_FILE_SHUTDOWN (1) -#define NFSD_FILE_LRU_THRESHOLD (4096UL) -#define NFSD_FILE_LRU_LIMIT (NFSD_FILE_LRU_THRESHOLD << 2)
/* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) @@ -65,8 +63,6 @@ static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette;
-static void nfsd_file_gc(void); - static void nfsd_file_schedule_laundrette(void) { @@ -343,9 +339,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_schedule_laundrette(); } else nfsd_file_put_noref(nf); - - if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) - nfsd_file_gc(); }
struct nfsd_file * @@ -1054,8 +1047,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, nfsd_file_hashtbl[hashval].nfb_count); spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - if (atomic_long_inc_return(&nfsd_filecache_count) >= NFSD_FILE_LRU_THRESHOLD) - nfsd_file_gc(); + atomic_long_inc(&nfsd_filecache_count);
nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 54f7df7094b329ca35d9f9808692bb16c48b13e9 ]
I'm about to replace nfsd_file_hashtbl with an rhashtable. The individual hash values will no longer be visible or relevant, so remove them from the tracepoints.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 15 ++++++++------- fs/nfsd/trace.h | 45 +++++++++++++++++++++------------------------ 2 files changed, 29 insertions(+), 31 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7a02ff11b9ec1..2d013a88e3565 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -588,7 +588,7 @@ nfsd_file_close_inode_sync(struct inode *inode) LIST_HEAD(dispose);
__nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose)); + trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose)); nfsd_file_dispose_list_sync(&dispose); }
@@ -608,7 +608,7 @@ nfsd_file_close_inode(struct inode *inode) LIST_HEAD(dispose);
__nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose)); + trace_nfsd_file_close_inode(inode, !list_empty(&dispose)); nfsd_file_dispose_list_delayed(&dispose); }
@@ -962,7 +962,7 @@ nfsd_file_is_cached(struct inode *inode) } } rcu_read_unlock(); - trace_nfsd_file_is_cached(inode, hashval, (int)ret); + trace_nfsd_file_is_cached(inode, (int)ret); return ret; }
@@ -994,9 +994,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
new = nfsd_file_alloc(inode, may_flags, hashval, net); if (!new) { - trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, - NULL, nfserr_jukebox); - return nfserr_jukebox; + status = nfserr_jukebox; + goto out_status; }
spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); @@ -1034,8 +1033,10 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf = NULL; }
- trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status); +out_status: + trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); return status; + open_file: nf = new; /* Take reference for the hashtable */ diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5919cdcf137b8..8b34f2a5ad296 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -713,7 +713,6 @@ DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), TP_ARGS(nf), TP_STRUCT__entry( - __field(unsigned int, nf_hashval) __field(void *, nf_inode) __field(int, nf_ref) __field(unsigned long, nf_flags) @@ -721,15 +720,13 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __field(struct file *, nf_file) ), TP_fast_assign( - __entry->nf_hashval = nf->nf_hashval; __entry->nf_inode = nf->nf_inode; __entry->nf_ref = refcount_read(&nf->nf_ref); __entry->nf_flags = nf->nf_flags; __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", - __entry->nf_hashval, + TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p", __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -749,15 +746,18 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_put); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
TRACE_EVENT(nfsd_file_acquire, - TP_PROTO(struct svc_rqst *rqstp, unsigned int hash, - struct inode *inode, unsigned int may_flags, - struct nfsd_file *nf, __be32 status), + TP_PROTO( + struct svc_rqst *rqstp, + struct inode *inode, + unsigned int may_flags, + struct nfsd_file *nf, + __be32 status + ),
- TP_ARGS(rqstp, hash, inode, may_flags, nf, status), + TP_ARGS(rqstp, inode, may_flags, nf, status),
TP_STRUCT__entry( __field(u32, xid) - __field(unsigned int, hash) __field(void *, inode) __field(unsigned long, may_flags) __field(int, nf_ref) @@ -769,7 +769,6 @@ TRACE_EVENT(nfsd_file_acquire,
TP_fast_assign( __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->hash = hash; __entry->inode = inode; __entry->may_flags = may_flags; __entry->nf_ref = nf ? refcount_read(&nf->nf_ref) : 0; @@ -779,8 +778,8 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ),
- TP_printk("xid=0x%x hash=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", - __entry->xid, __entry->hash, __entry->inode, + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", + __entry->xid, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), show_nfsd_may_flags(__entry->nf_may), @@ -791,7 +790,6 @@ TRACE_EVENT(nfsd_file_open, TP_PROTO(struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status), TP_STRUCT__entry( - __field(unsigned int, nf_hashval) __field(void *, nf_inode) /* cannot be dereferenced */ __field(int, nf_ref) __field(unsigned long, nf_flags) @@ -799,15 +797,13 @@ TRACE_EVENT(nfsd_file_open, __field(void *, nf_file) /* cannot be dereferenced */ ), TP_fast_assign( - __entry->nf_hashval = nf->nf_hashval; __entry->nf_inode = nf->nf_inode; __entry->nf_ref = refcount_read(&nf->nf_ref); __entry->nf_flags = nf->nf_flags; __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", - __entry->nf_hashval, + TP_printk("inode=%p ref=%d flags=%s may=%s file=%p", __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -816,26 +812,27 @@ TRACE_EVENT(nfsd_file_open, )
DECLARE_EVENT_CLASS(nfsd_file_search_class, - TP_PROTO(struct inode *inode, unsigned int hash, int found), - TP_ARGS(inode, hash, found), + TP_PROTO( + struct inode *inode, + int found + ), + TP_ARGS(inode, found), TP_STRUCT__entry( __field(struct inode *, inode) - __field(unsigned int, hash) __field(int, found) ), TP_fast_assign( __entry->inode = inode; - __entry->hash = hash; __entry->found = found; ), - TP_printk("hash=0x%x inode=%p found=%d", __entry->hash, - __entry->inode, __entry->found) + TP_printk("inode=%p found=%d", + __entry->inode, __entry->found) );
#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO(struct inode *inode, unsigned int hash, int found), \ - TP_ARGS(inode, hash, found)) + TP_PROTO(struct inode *inode, int found), \ + TP_ARGS(inode, found))
DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f53cef15dddec7203df702cdc62e554190385450 ]
IIUC, holding the hash bucket lock is needed only in nfsd_file_unhash, and there is already a lockdep assertion there.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 2d013a88e3565..6a01de8677959 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -299,8 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf) static bool nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose) { - lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - trace_nfsd_file_unhash_and_release_locked(nf); if (!nfsd_file_unhash(nf)) return false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8755326399f471ec3b31e2ab8c5074c0d28a0fb5 ]
Remove an unnecessary usage of nf_hashval.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6a01de8677959..d7c74b51eabf3 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -272,13 +272,17 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) static void nfsd_file_do_unhash(struct nfsd_file *nf) { - lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + struct inode *inode = nf->nf_inode; + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + + lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock);
trace_nfsd_file_unhash(nf);
if (nfsd_file_check_write_error(nf)) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - --nfsd_file_hashtbl[nf->nf_hashval].nfb_count; + --nfsd_file_hashtbl[hashval].nfb_count; hlist_del_rcu(&nf->nf_node); atomic_long_dec(&nfsd_filecache_count); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a845511007a63467fee575353c706806c21218b1 ]
The code that computes the hashval is the same in both callers.
To prevent them from going stale, reframe the documenting comments to remove descriptions of the underlying hash table structure, which is about to be replaced.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 40 +++++++++++++++++++++------------------- fs/nfsd/trace.h | 44 +++++++++++++++++++++++++++++++++----------- 2 files changed, 54 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d7c74b51eabf3..3925df9124c39 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -558,39 +558,44 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, };
-static void -__nfsd_file_close_inode(struct inode *inode, unsigned int hashval, - struct list_head *dispose) +/* + * Find all cache items across all net namespaces that match @inode and + * move them to @dispose. The lookup is atomic wrt nfsd_file_acquire(). + */ +static unsigned int +__nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) { + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + unsigned int count = 0; struct nfsd_file *nf; struct hlist_node *tmp;
spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { - if (inode == nf->nf_inode) + if (inode == nf->nf_inode) { nfsd_file_unhash_and_release_locked(nf, dispose); + count++; + } } spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + return count; }
/** * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file * @inode: inode of the file to attempt to remove * - * Walk the whole hash bucket, looking for any files that correspond to "inode". - * If any do, then unhash them and put the hashtable reference to them and - * destroy any that had their last reference put. Also ensure that any of the - * fputs also have their final __fput done as well. + * Unhash and put, then flush and fput all cache items associated with @inode. */ void nfsd_file_close_inode_sync(struct inode *inode) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); LIST_HEAD(dispose); + unsigned int count;
- __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose)); + count = __nfsd_file_close_inode(inode, &dispose); + trace_nfsd_file_close_inode_sync(inode, count); nfsd_file_dispose_list_sync(&dispose); }
@@ -598,19 +603,16 @@ nfsd_file_close_inode_sync(struct inode *inode) * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * - * Walk the whole hash bucket, looking for any files that correspond to "inode". - * If any do, then unhash them and put the hashtable reference to them and - * destroy any that had their last reference put. + * Unhash and put all cache item associated with @inode. */ static void nfsd_file_close_inode(struct inode *inode) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); LIST_HEAD(dispose); + unsigned int count;
- __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode(inode, !list_empty(&dispose)); + count = __nfsd_file_close_inode(inode, &dispose); + trace_nfsd_file_close_inode(inode, count); nfsd_file_dispose_list_delayed(&dispose); }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8b34f2a5ad296..f170f07ec0fd2 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -813,30 +813,52 @@ TRACE_EVENT(nfsd_file_open,
DECLARE_EVENT_CLASS(nfsd_file_search_class, TP_PROTO( - struct inode *inode, - int found + const struct inode *inode, + unsigned int count ), - TP_ARGS(inode, found), + TP_ARGS(inode, count), TP_STRUCT__entry( - __field(struct inode *, inode) - __field(int, found) + __field(const struct inode *, inode) + __field(unsigned int, count) ), TP_fast_assign( __entry->inode = inode; - __entry->found = found; + __entry->count = count; ), - TP_printk("inode=%p found=%d", - __entry->inode, __entry->found) + TP_printk("inode=%p count=%u", + __entry->inode, __entry->count) );
#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO(struct inode *inode, int found), \ - TP_ARGS(inode, found)) + TP_PROTO( \ + const struct inode *inode, \ + unsigned int count \ + ), \ + TP_ARGS(inode, count))
DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached); + +TRACE_EVENT(nfsd_file_is_cached, + TP_PROTO( + const struct inode *inode, + int found + ), + TP_ARGS(inode, found), + TP_STRUCT__entry( + __field(const struct inode *, inode) + __field(int, found) + ), + TP_fast_assign( + __entry->inode = inode; + __entry->found = found; + ), + TP_printk("inode=%p is %scached", + __entry->inode, + __entry->found ? "" : "not " + ) +);
TRACE_EVENT(nfsd_file_fsnotify_handle_event, TP_PROTO(struct inode *inode, u32 mask),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit cb7ec76e73ff6640241c8f1f2f35c81d4005a2d6 ]
Remove an unnecessary use of nf_hashval.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 3925df9124c39..dd59deec8b011 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -287,6 +287,18 @@ nfsd_file_do_unhash(struct nfsd_file *nf) atomic_long_dec(&nfsd_filecache_count); }
+static void +nfsd_file_hash_remove(struct nfsd_file *nf) +{ + struct inode *inode = nf->nf_inode; + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + + spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); + nfsd_file_do_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); +} + static bool nfsd_file_unhash(struct nfsd_file *nf) { @@ -506,11 +518,8 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose) { struct nfsd_file *nf;
- list_for_each_entry(nf, dispose, nf_lru) { - spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - } + list_for_each_entry(nf, dispose, nf_lru) + nfsd_file_hash_remove(nf); nfsd_file_dispose_list_delayed(dispose); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit f0743c2b25c65debd4f599a7c861428cd9de5906 ]
The value in this field can always be computed from nf_inode, thus it is no longer used.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 6 ++---- fs/nfsd/filecache.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index dd59deec8b011..29b1f57692a60 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -167,8 +167,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) }
static struct nfsd_file * -nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, - struct net *net) +nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) { struct nfsd_file *nf;
@@ -182,7 +181,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, nf->nf_net = net; nf->nf_flags = 0; nf->nf_inode = inode; - nf->nf_hashval = hashval; refcount_set(&nf->nf_ref, 1); nf->nf_may = may & NFSD_FILE_MAY_MASK; nf->nf_mark = NULL; @@ -1005,7 +1003,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nf) goto wait_for_construction;
- new = nfsd_file_alloc(inode, may_flags, hashval, net); + new = nfsd_file_alloc(inode, may_flags, net); if (!new) { status = nfserr_jukebox; goto out_status; diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index c6ad5fe47f12f..82051e1b8420d 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -40,7 +40,6 @@ struct nfsd_file { #define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; struct inode *nf_inode; - unsigned int nf_hashval; refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c7b824c3d06c85e054caf86e227255112c5e3c38 ]
In a moment, the nfsd_file_hashtbl global will be replaced with an rhashtable. Replace the one or two spots that need to check if the hash table is available. We can easily reuse the SHUTDOWN flag for this purpose.
Document that this mechanism relies on callers to hold the nfsd_mutex to prevent init, shutdown, and purging to run concurrently.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 29b1f57692a60..33bb4d31b4972 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -27,7 +27,7 @@ #define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS) #define NFSD_LAUNDRETTE_DELAY (2 * HZ)
-#define NFSD_FILE_SHUTDOWN (1) +#define NFSD_FILE_CACHE_UP (0)
/* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) @@ -58,7 +58,7 @@ static struct kmem_cache *nfsd_file_slab; static struct kmem_cache *nfsd_file_mark_slab; static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; -static long nfsd_file_lru_flags; +static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; @@ -66,9 +66,8 @@ static struct delayed_work nfsd_filecache_laundrette; static void nfsd_file_schedule_laundrette(void) { - long count = atomic_long_read(&nfsd_filecache_count); - - if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags)) + if ((atomic_long_read(&nfsd_filecache_count) == 0) || + test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) return;
queue_delayed_work(system_wq, &nfsd_filecache_laundrette, @@ -697,9 +696,8 @@ nfsd_file_cache_init(void) int ret = -ENOMEM; unsigned int i;
- clear_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); - - if (nfsd_file_hashtbl) + lockdep_assert_held(&nfsd_mutex); + if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0;
nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0); @@ -785,8 +783,8 @@ nfsd_file_cache_init(void) /* * Note this can deadlock with nfsd_file_lru_cb. */ -void -nfsd_file_cache_purge(struct net *net) +static void +__nfsd_file_cache_purge(struct net *net) { unsigned int i; struct nfsd_file *nf; @@ -794,9 +792,6 @@ nfsd_file_cache_purge(struct net *net) LIST_HEAD(dispose); bool del;
- if (!nfsd_file_hashtbl) - return; - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i];
@@ -857,6 +852,19 @@ nfsd_file_cache_start_net(struct net *net) return nn->fcache_disposal ? 0 : -ENOMEM; }
+/** + * nfsd_file_cache_purge - Remove all cache items associated with @net + * @net: target net namespace + * + */ +void +nfsd_file_cache_purge(struct net *net) +{ + lockdep_assert_held(&nfsd_mutex); + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) + __nfsd_file_cache_purge(net); +} + void nfsd_file_cache_shutdown_net(struct net *net) { @@ -869,7 +877,9 @@ nfsd_file_cache_shutdown(void) { int i;
- set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); + lockdep_assert_held(&nfsd_mutex); + if (test_and_clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) + return;
lease_unregister_notifier(&nfsd_file_lease_notifier); unregister_shrinker(&nfsd_file_shrinker); @@ -878,7 +888,7 @@ nfsd_file_cache_shutdown(void) * calling nfsd_file_cache_purge */ cancel_delayed_work_sync(&nfsd_filecache_laundrette); - nfsd_file_cache_purge(NULL); + __nfsd_file_cache_purge(NULL); list_lru_destroy(&nfsd_file_lru); rcu_barrier(); fsnotify_put_group(nfsd_file_fsnotify_group); @@ -1142,7 +1152,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) * don't end up racing with server shutdown */ mutex_lock(&nfsd_mutex); - if (nfsd_file_hashtbl) { + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) { for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { count += nfsd_file_hashtbl[i].nfb_count; longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fc22945ecc2a0a028f3683115f98a922d506c284 ]
Add code to initialize and tear down an rhashtable. The rhashtable is not used yet.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 160 ++++++++++++++++++++++++++++++++++++++------ fs/nfsd/filecache.h | 1 + 2 files changed, 140 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 33bb4d31b4972..95e7e15b567e2 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -12,6 +12,7 @@ #include <linux/fsnotify_backend.h> #include <linux/fsnotify.h> #include <linux/seq_file.h> +#include <linux/rhashtable.h>
#include "vfs.h" #include "nfsd.h" @@ -62,6 +63,136 @@ static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; +static struct rhashtable nfsd_file_rhash_tbl + ____cacheline_aligned_in_smp; + +enum nfsd_file_lookup_type { + NFSD_FILE_KEY_INODE, + NFSD_FILE_KEY_FULL, +}; + +struct nfsd_file_lookup_key { + struct inode *inode; + struct net *net; + const struct cred *cred; + unsigned char need; + enum nfsd_file_lookup_type type; +}; + +/* + * The returned hash value is based solely on the address of an in-code + * inode, a pointer to a slab-allocated object. The entropy in such a + * pointer is concentrated in its middle bits. + */ +static u32 nfsd_file_inode_hash(const struct inode *inode, u32 seed) +{ + unsigned long ptr = (unsigned long)inode; + u32 k; + + k = ptr >> L1_CACHE_SHIFT; + k &= 0x00ffffff; + return jhash2(&k, 1, seed); +} + +/** + * nfsd_file_key_hashfn - Compute the hash value of a lookup key + * @data: key on which to compute the hash value + * @len: rhash table's key_len parameter (unused) + * @seed: rhash table's random seed of the day + * + * Return value: + * Computed 32-bit hash value + */ +static u32 nfsd_file_key_hashfn(const void *data, u32 len, u32 seed) +{ + const struct nfsd_file_lookup_key *key = data; + + return nfsd_file_inode_hash(key->inode, seed); +} + +/** + * nfsd_file_obj_hashfn - Compute the hash value of an nfsd_file + * @data: object on which to compute the hash value + * @len: rhash table's key_len parameter (unused) + * @seed: rhash table's random seed of the day + * + * Return value: + * Computed 32-bit hash value + */ +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed) +{ + const struct nfsd_file *nf = data; + + return nfsd_file_inode_hash(nf->nf_inode, seed); +} + +static bool +nfsd_match_cred(const struct cred *c1, const struct cred *c2) +{ + int i; + + if (!uid_eq(c1->fsuid, c2->fsuid)) + return false; + if (!gid_eq(c1->fsgid, c2->fsgid)) + return false; + if (c1->group_info == NULL || c2->group_info == NULL) + return c1->group_info == c2->group_info; + if (c1->group_info->ngroups != c2->group_info->ngroups) + return false; + for (i = 0; i < c1->group_info->ngroups; i++) { + if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) + return false; + } + return true; +} + +/** + * nfsd_file_obj_cmpfn - Match a cache item against search criteria + * @arg: search criteria + * @ptr: cache item to check + * + * Return values: + * %0 - Item matches search criteria + * %1 - Item does not match search criteria + */ +static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct nfsd_file_lookup_key *key = arg->key; + const struct nfsd_file *nf = ptr; + + switch (key->type) { + case NFSD_FILE_KEY_INODE: + if (nf->nf_inode != key->inode) + return 1; + break; + case NFSD_FILE_KEY_FULL: + if (nf->nf_inode != key->inode) + return 1; + if (nf->nf_may != key->need) + return 1; + if (nf->nf_net != key->net) + return 1; + if (!nfsd_match_cred(nf->nf_cred, key->cred)) + return 1; + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) + return 1; + break; + } + return 0; +} + +static const struct rhashtable_params nfsd_file_rhash_params = { + .key_len = sizeof_field(struct nfsd_file, nf_inode), + .key_offset = offsetof(struct nfsd_file, nf_inode), + .head_offset = offsetof(struct nfsd_file, nf_rhash), + .hashfn = nfsd_file_key_hashfn, + .obj_hashfn = nfsd_file_obj_hashfn, + .obj_cmpfn = nfsd_file_obj_cmpfn, + /* Reduce resizing churn on light workloads */ + .min_size = 512, /* buckets */ + .automatic_shrinking = true, +};
static void nfsd_file_schedule_laundrette(void) @@ -693,13 +824,18 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = { int nfsd_file_cache_init(void) { - int ret = -ENOMEM; + int ret; unsigned int i;
lockdep_assert_held(&nfsd_mutex); if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0;
+ ret = rhashtable_init(&nfsd_file_rhash_tbl, &nfsd_file_rhash_params); + if (ret) + return ret; + + ret = -ENOMEM; nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0); if (!nfsd_filecache_wq) goto out; @@ -777,6 +913,7 @@ nfsd_file_cache_init(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + rhashtable_destroy(&nfsd_file_rhash_tbl); goto out; }
@@ -902,6 +1039,7 @@ nfsd_file_cache_shutdown(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + rhashtable_destroy(&nfsd_file_rhash_tbl);
for_each_possible_cpu(i) { per_cpu(nfsd_file_cache_hits, i) = 0; @@ -913,26 +1051,6 @@ nfsd_file_cache_shutdown(void) } }
-static bool -nfsd_match_cred(const struct cred *c1, const struct cred *c2) -{ - int i; - - if (!uid_eq(c1->fsuid, c2->fsuid)) - return false; - if (!gid_eq(c1->fsgid, c2->fsgid)) - return false; - if (c1->group_info == NULL || c2->group_info == NULL) - return c1->group_info == c2->group_info; - if (c1->group_info->ngroups != c2->group_info->ngroups) - return false; - for (i = 0; i < c1->group_info->ngroups; i++) { - if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) - return false; - } - return true; -} - static struct nfsd_file * nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, unsigned int hashval, struct net *net) diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 82051e1b8420d..5cbfc61a7d7d9 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -29,6 +29,7 @@ struct nfsd_file_mark { * never be dereferenced, only used for comparison. */ struct nfsd_file { + struct rhash_head nf_rhash; struct hlist_node nf_node; struct list_head nf_lru; struct rcu_head nf_rcu;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ce502f81ba884c1fe45dc0ebddbcaaa4ec0fc5fb ]
Enable the filecache hash table to start small, then grow with the workload. Smaller server deployments benefit because there should be lower memory utilization. Larger server deployments should see improved scaling with the number of open files.
Suggested-by: Jeff Layton jlayton@kernel.org Suggested-by: Dave Chinner david@fromorbit.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 265 +++++++++++++++++++------------------------- fs/nfsd/trace.h | 63 ++++++++++- 2 files changed, 179 insertions(+), 149 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 95e7e15b567e2..45dd4f3fa0905 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -61,7 +61,6 @@ static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; -static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; static struct rhashtable nfsd_file_rhash_tbl ____cacheline_aligned_in_smp; @@ -197,7 +196,7 @@ static const struct rhashtable_params nfsd_file_rhash_params = { static void nfsd_file_schedule_laundrette(void) { - if ((atomic_long_read(&nfsd_filecache_count) == 0) || + if ((atomic_read(&nfsd_file_rhash_tbl.nelems) == 0) || test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) return;
@@ -297,7 +296,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) }
static struct nfsd_file * -nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) +nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) { struct nfsd_file *nf;
@@ -308,11 +307,14 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; nf->nf_cred = get_current_cred(); - nf->nf_net = net; + nf->nf_net = key->net; nf->nf_flags = 0; - nf->nf_inode = inode; - refcount_set(&nf->nf_ref, 1); - nf->nf_may = may & NFSD_FILE_MAY_MASK; + __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); + __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); + nf->nf_inode = key->inode; + /* nf_ref is pre-incremented for hash table */ + refcount_set(&nf->nf_ref, 2); + nf->nf_may = key->need; nf->nf_mark = NULL; trace_nfsd_file_alloc(nf); } @@ -398,40 +400,21 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) }
static void -nfsd_file_do_unhash(struct nfsd_file *nf) +nfsd_file_hash_remove(struct nfsd_file *nf) { - struct inode *inode = nf->nf_inode; - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - - lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock); - trace_nfsd_file_unhash(nf);
if (nfsd_file_check_write_error(nf)) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - --nfsd_file_hashtbl[hashval].nfb_count; - hlist_del_rcu(&nf->nf_node); - atomic_long_dec(&nfsd_filecache_count); -} - -static void -nfsd_file_hash_remove(struct nfsd_file *nf) -{ - struct inode *inode = nf->nf_inode; - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, + nfsd_file_rhash_params); }
static bool nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_do_unhash(nf); + nfsd_file_hash_remove(nf); return true; } return false; @@ -441,9 +424,9 @@ nfsd_file_unhash(struct nfsd_file *nf) * Return true if the file was unhashed. */ static bool -nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose) +nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) { - trace_nfsd_file_unhash_and_release_locked(nf); + trace_nfsd_file_unhash_and_dispose(nf); if (!nfsd_file_unhash(nf)) return false; /* keep final reference for nfsd_file_lru_dispose */ @@ -702,20 +685,23 @@ static struct shrinker nfsd_file_shrinker = { static unsigned int __nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - unsigned int count = 0; - struct nfsd_file *nf; - struct hlist_node *tmp; - - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { - if (inode == nf->nf_inode) { - nfsd_file_unhash_and_release_locked(nf, dispose); - count++; - } - } - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_INODE, + .inode = inode, + }; + unsigned int count = 0; + struct nfsd_file *nf; + + rcu_read_lock(); + do { + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); + if (!nf) + break; + nfsd_file_unhash_and_dispose(nf, dispose); + count++; + } while (1); + rcu_read_unlock(); return count; }
@@ -923,30 +909,35 @@ nfsd_file_cache_init(void) static void __nfsd_file_cache_purge(struct net *net) { - unsigned int i; - struct nfsd_file *nf; - struct hlist_node *next; + struct rhashtable_iter iter; + struct nfsd_file *nf; LIST_HEAD(dispose); bool del;
- for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i]; + rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); + do { + rhashtable_walk_start(&iter);
- spin_lock(&nfb->nfb_lock); - hlist_for_each_entry_safe(nf, next, &nfb->nfb_head, nf_node) { + nf = rhashtable_walk_next(&iter); + while (!IS_ERR_OR_NULL(nf)) { if (net && nf->nf_net != net) continue; - del = nfsd_file_unhash_and_release_locked(nf, &dispose); + del = nfsd_file_unhash_and_dispose(nf, &dispose);
/* * Deadlock detected! Something marked this entry as * unhased, but hasn't removed it from the hash list. */ WARN_ON_ONCE(!del); + + nf = rhashtable_walk_next(&iter); } - spin_unlock(&nfb->nfb_lock); - nfsd_file_dispose_list(&dispose); - } + + rhashtable_walk_stop(&iter); + } while (nf == ERR_PTR(-EAGAIN)); + rhashtable_walk_exit(&iter); + + nfsd_file_dispose_list(&dispose); }
static struct nfsd_fcache_disposal * @@ -1051,56 +1042,29 @@ nfsd_file_cache_shutdown(void) } }
-static struct nfsd_file * -nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, - unsigned int hashval, struct net *net) -{ - struct nfsd_file *nf; - unsigned char need = may_flags & NFSD_FILE_MAY_MASK; - - hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, - nf_node, lockdep_is_held(&nfsd_file_hashtbl[hashval].nfb_lock)) { - if (nf->nf_may != need) - continue; - if (nf->nf_inode != inode) - continue; - if (nf->nf_net != net) - continue; - if (!nfsd_match_cred(nf->nf_cred, current_cred())) - continue; - if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) - continue; - if (nfsd_file_get(nf) != NULL) - return nf; - } - return NULL; -} - /** - * nfsd_file_is_cached - are there any cached open files for this fh? - * @inode: inode of the file to check + * nfsd_file_is_cached - are there any cached open files for this inode? + * @inode: inode to check + * + * The lookup matches inodes in all net namespaces and is atomic wrt + * nfsd_file_acquire(). * - * Scan the hashtable for open files that match this fh. Returns true if there - * are any, and false if not. + * Return values: + * %true: filecache contains at least one file matching this inode + * %false: filecache contains no files matching this inode */ bool nfsd_file_is_cached(struct inode *inode) { - bool ret = false; - struct nfsd_file *nf; - unsigned int hashval; - - hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); - - rcu_read_lock(); - hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, - nf_node) { - if (inode == nf->nf_inode) { - ret = true; - break; - } - } - rcu_read_unlock(); + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_INODE, + .inode = inode, + }; + bool ret = false; + + if (rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params) != NULL) + ret = true; trace_nfsd_file_is_cached(inode, (int)ret); return ret; } @@ -1109,39 +1073,51 @@ static __be32 nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf, bool open) { - __be32 status; - struct net *net = SVC_NET(rqstp); + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_FULL, + .need = may_flags & NFSD_FILE_MAY_MASK, + .net = SVC_NET(rqstp), + }; struct nfsd_file *nf, *new; - struct inode *inode; - unsigned int hashval; bool retry = true; + __be32 status;
- /* FIXME: skip this if fh_dentry is already set? */ status = fh_verify(rqstp, fhp, S_IFREG, may_flags|NFSD_MAY_OWNER_OVERRIDE); if (status != nfs_ok) return status; + key.inode = d_inode(fhp->fh_dentry); + key.cred = get_current_cred();
- inode = d_inode(fhp->fh_dentry); - hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); retry: - rcu_read_lock(); - nf = nfsd_file_find_locked(inode, may_flags, hashval, net); - rcu_read_unlock(); + /* Avoid allocation if the item is already in cache */ + nf = rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); + if (nf) + nf = nfsd_file_get(nf); if (nf) goto wait_for_construction;
- new = nfsd_file_alloc(inode, may_flags, net); + new = nfsd_file_alloc(&key, may_flags); if (!new) { status = nfserr_jukebox; goto out_status; }
- spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - nf = nfsd_file_find_locked(inode, may_flags, hashval, net); - if (nf == NULL) + nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl, + &key, &new->nf_rhash, + nfsd_file_rhash_params); + if (!nf) { + nf = new; + goto open_file; + } + if (IS_ERR(nf)) + goto insert_err; + nf = nfsd_file_get(nf); + if (nf == NULL) { + nf = new; goto open_file; - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + } nfsd_file_slab_free(&new->nf_rcu);
wait_for_construction: @@ -1149,6 +1125,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
/* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); if (!retry) { status = nfserr_jukebox; goto out; @@ -1173,22 +1150,11 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, }
out_status: - trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); + put_cred(key.cred); + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status;
open_file: - nf = new; - /* Take reference for the hashtable */ - refcount_inc(&nf->nf_ref); - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); - __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); - ++nfsd_file_hashtbl[hashval].nfb_count; - nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, - nfsd_file_hashtbl[hashval].nfb_count); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - atomic_long_inc(&nfsd_filecache_count); - nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { if (open) { @@ -1203,19 +1169,20 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status != nfs_ok || inode->i_nlink == 0) { - bool do_free; - nfsd_file_lru_remove(nf); - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - do_free = nfsd_file_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - if (do_free) + if (status != nfs_ok || key.inode->i_nlink == 0) + if (nfsd_file_unhash(nf)) nfsd_file_put_noref(nf); - } clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; + +insert_err: + nfsd_file_slab_free(&new->nf_rcu); + trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, PTR_ERR(nf)); + nf = NULL; + status = nfserr_jukebox; + goto out_status; }
/** @@ -1261,21 +1228,23 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long releases = 0, pages_flushed = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; - unsigned int i, count = 0, longest = 0; + unsigned int i, count = 0, buckets = 0; unsigned long lru = 0, total_age = 0;
- /* - * No need for spinlocks here since we're not terribly interested in - * accuracy. We do take the nfsd_mutex simply to ensure that we - * don't end up racing with server shutdown - */ + /* Serialize with server shutdown */ mutex_lock(&nfsd_mutex); if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) { - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - count += nfsd_file_hashtbl[i].nfb_count; - longest = max(longest, nfsd_file_hashtbl[i].nfb_count); - } + struct bucket_table *tbl; + struct rhashtable *ht; + lru = list_lru_count(&nfsd_file_lru); + + rcu_read_lock(); + ht = &nfsd_file_rhash_tbl; + count = atomic_read(&ht->nelems); + tbl = rht_dereference_rcu(ht->tbl, ht); + buckets = tbl->size; + rcu_read_unlock(); } mutex_unlock(&nfsd_mutex);
@@ -1289,7 +1258,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) }
seq_printf(m, "total entries: %u\n", count); - seq_printf(m, "longest chain: %u\n", longest); + seq_printf(m, "hash buckets: %u\n", buckets); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index f170f07ec0fd2..33bd8618c20a6 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -743,7 +743,7 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc); DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); -DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked); +DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose);
TRACE_EVENT(nfsd_file_acquire, TP_PROTO( @@ -786,6 +786,67 @@ TRACE_EVENT(nfsd_file_acquire, __entry->nf_file, __entry->status) );
+TRACE_EVENT(nfsd_file_insert_err, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + unsigned int may_flags, + long error + ), + TP_ARGS(rqstp, inode, may_flags, error), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(unsigned long, may_flags) + __field(long, error) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->may_flags = may_flags; + __entry->error = error; + ), + TP_printk("xid=0x%x inode=%p may_flags=%s error=%ld", + __entry->xid, __entry->inode, + show_nfsd_may_flags(__entry->may_flags), + __entry->error + ) +); + +TRACE_EVENT(nfsd_file_cons_err, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + unsigned int may_flags, + const struct nfsd_file *nf + ), + TP_ARGS(rqstp, inode, may_flags, nf), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(unsigned long, may_flags) + __field(unsigned int, nf_ref) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(const void *, nf_file) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->may_flags = may_flags; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", + __entry->xid, __entry->inode, + show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file + ) +); + TRACE_EVENT(nfsd_file_open, TP_PROTO(struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 0ec8e9d1539a7b8109a554028bbce441052f847e ]
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 33 +-------------------------------- fs/nfsd/filecache.h | 1 - 2 files changed, 1 insertion(+), 33 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 45dd4f3fa0905..c6dc55c0f758b 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -21,11 +21,6 @@ #include "filecache.h" #include "trace.h"
-#define NFSDDBG_FACILITY NFSDDBG_FH - -/* FIXME: dynamically size this for the machine somehow? */ -#define NFSD_FILE_HASH_BITS 12 -#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS) #define NFSD_LAUNDRETTE_DELAY (2 * HZ)
#define NFSD_FILE_CACHE_UP (0) @@ -33,13 +28,6 @@ /* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
-struct nfsd_fcache_bucket { - struct hlist_head nfb_head; - spinlock_t nfb_lock; - unsigned int nfb_count; - unsigned int nfb_maxcount; -}; - static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); @@ -57,7 +45,6 @@ static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
static struct kmem_cache *nfsd_file_slab; static struct kmem_cache *nfsd_file_mark_slab; -static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; @@ -302,7 +289,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL); if (nf) { - INIT_HLIST_NODE(&nf->nf_node); INIT_LIST_HEAD(&nf->nf_lru); nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; @@ -810,8 +796,7 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = { int nfsd_file_cache_init(void) { - int ret; - unsigned int i; + int ret;
lockdep_assert_held(&nfsd_mutex); if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) @@ -826,13 +811,6 @@ nfsd_file_cache_init(void) if (!nfsd_filecache_wq) goto out;
- nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE, - sizeof(*nfsd_file_hashtbl), GFP_KERNEL); - if (!nfsd_file_hashtbl) { - pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n"); - goto out_err; - } - nfsd_file_slab = kmem_cache_create("nfsd_file", sizeof(struct nfsd_file), 0, 0, NULL); if (!nfsd_file_slab) { @@ -876,11 +854,6 @@ nfsd_file_cache_init(void) goto out_notifier; }
- for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head); - spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock); - } - INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_gc_worker); out: return ret; @@ -895,8 +868,6 @@ nfsd_file_cache_init(void) nfsd_file_slab = NULL; kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; - kvfree(nfsd_file_hashtbl); - nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; rhashtable_destroy(&nfsd_file_rhash_tbl); @@ -1026,8 +997,6 @@ nfsd_file_cache_shutdown(void) fsnotify_wait_marks_destroyed(); kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; - kvfree(nfsd_file_hashtbl); - nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; rhashtable_destroy(&nfsd_file_rhash_tbl); diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 5cbfc61a7d7d9..ee9ed99d8b8fa 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -30,7 +30,6 @@ struct nfsd_file_mark { */ struct nfsd_file { struct rhash_head nf_rhash; - struct hlist_node nf_node; struct list_head nf_lru; struct rcu_head nf_rcu; struct file *nf_file;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit be0230069fcbf7d332d010b57c1d0cfd623a84d6 ]
These tracepoints collect different information: the create case does not open a file, so there's no nf_file available.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 9 ++++---- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 54 ++++++++++++++++++++++++++++++++++++++------- 3 files changed, 52 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index c6dc55c0f758b..85813affb8abf 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1039,7 +1039,7 @@ nfsd_file_is_cached(struct inode *inode) }
static __be32 -nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf, bool open) { struct nfsd_file_lookup_key key = { @@ -1120,7 +1120,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
out_status: put_cred(key.cred); - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + if (open) + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status;
open_file: @@ -1168,7 +1169,7 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true); }
/** @@ -1185,7 +1186,7 @@ __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false); }
/* diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 76ec207f5e44d..587e792346579 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5121,6 +5121,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, goto out_put_access; nf->nf_file = open->op_filp; open->op_filp = NULL; + trace_nfsd_file_create(rqstp, access, nf); }
spin_lock(&fp->fi_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 33bd8618c20a6..ce391ba2f1ca5 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -747,10 +747,10 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose);
TRACE_EVENT(nfsd_file_acquire, TP_PROTO( - struct svc_rqst *rqstp, - struct inode *inode, + const struct svc_rqst *rqstp, + const struct inode *inode, unsigned int may_flags, - struct nfsd_file *nf, + const struct nfsd_file *nf, __be32 status ),
@@ -758,12 +758,12 @@ TRACE_EVENT(nfsd_file_acquire,
TP_STRUCT__entry( __field(u32, xid) - __field(void *, inode) + __field(const void *, inode) __field(unsigned long, may_flags) - __field(int, nf_ref) + __field(unsigned int, nf_ref) __field(unsigned long, nf_flags) __field(unsigned long, nf_may) - __field(struct file *, nf_file) + __field(const void *, nf_file) __field(u32, status) ),
@@ -778,12 +778,50 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ),
- TP_printk("xid=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p status=%u", __entry->xid, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->status) + __entry->nf_file, __entry->status + ) +); + +TRACE_EVENT(nfsd_file_create, + TP_PROTO( + const struct svc_rqst *rqstp, + unsigned int may_flags, + const struct nfsd_file *nf + ), + + TP_ARGS(rqstp, may_flags, nf), + + TP_STRUCT__entry( + __field(const void *, nf_inode) + __field(const void *, nf_file) + __field(unsigned long, may_flags) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(unsigned int, nf_ref) + __field(u32, xid) + ), + + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_file = nf->nf_file; + __entry->may_flags = may_flags; + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->xid = be32_to_cpu(rqstp->rq_xid); + ), + + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", + __entry->xid, __entry->nf_inode, + show_nfsd_may_flags(__entry->may_flags), + __entry->nf_ref, show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file + ) );
TRACE_EVENT(nfsd_file_insert_err,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b40a2839470cd62ed68c4a32d72a18ee8975b1ac ]
Avoid recording the allocation of an nfsd_file item that is immediately released because a matching item was already inserted in the hash.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 +- fs/nfsd/trace.h | 25 ++++++++++++++++++++++++- 2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 85813affb8abf..26cfae138b906 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -302,7 +302,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) refcount_set(&nf->nf_ref, 2); nf->nf_may = key->need; nf->nf_mark = NULL; - trace_nfsd_file_alloc(nf); } return nf; } @@ -1125,6 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, return status;
open_file: + trace_nfsd_file_alloc(nf); nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { if (open) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index ce391ba2f1ca5..22c1fb735f1a7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -739,12 +739,35 @@ DEFINE_EVENT(nfsd_file_class, name, \ TP_PROTO(struct nfsd_file *nf), \ TP_ARGS(nf))
-DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc); DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose);
+TRACE_EVENT(nfsd_file_alloc, + TP_PROTO( + const struct nfsd_file *nf + ), + TP_ARGS(nf), + TP_STRUCT__entry( + __field(const void *, nf_inode) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(unsigned int, nf_ref) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_flags = nf->nf_flags; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_may = nf->nf_may; + ), + TP_printk("inode=%p ref=%u flags=%s may=%s", + __entry->nf_inode, __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may) + ) +); + TRACE_EVENT(nfsd_file_acquire, TP_PROTO( const struct svc_rqst *rqstp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5e138c4a750dc140d881dab4a8804b094bbc08d2 ]
The last close of a file should enable other accessors to open and use that file immediately. Leaving the file open in the filecache prevents other users from accessing that file until the filecache garbage-collects the file -- sometimes that takes several seconds.
Reported-by: Wang Yugui wangyugui@e16-tech.com Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387 Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 18 ++++++++++++++++++ fs/nfsd/filecache.h | 1 + fs/nfsd/nfs4state.c | 4 ++-- 3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 26cfae138b906..7ad27655db699 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -451,6 +451,24 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); }
+/** + * nfsd_file_close - Close an nfsd_file + * @nf: nfsd_file to close + * + * If this is the final reference for @nf, free it immediately. + * This reflects an on-the-wire CLOSE or DELEGRETURN into the + * VFS and exported filesystem. + */ +void nfsd_file_close(struct nfsd_file *nf) +{ + nfsd_file_put(nf); + if (refcount_dec_if_one(&nf->nf_ref)) { + nfsd_file_unhash(nf); + nfsd_file_lru_remove(nf); + nfsd_file_free(nf); + } +} + struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index ee9ed99d8b8fa..28145f1628923 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -52,6 +52,7 @@ void nfsd_file_cache_shutdown(void); int nfsd_file_cache_start_net(struct net *net); void nfsd_file_cache_shutdown_net(struct net *net); void nfsd_file_put(struct nfsd_file *nf); +void nfsd_file_close(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 587e792346579..c4c1e35d3eb7a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -831,9 +831,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag) swap(f2, fp->fi_fds[O_RDWR]); spin_unlock(&fp->fi_lock); if (f1) - nfsd_file_put(f1); + nfsd_file_close(f1); if (f2) - nfsd_file_put(f2); + nfsd_file_close(f2); } }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 427f5f83a3191cbf024c5aea6e5b601cdf88d895 ]
The documenting comment for struct nf_file states:
/* * A representation of a file that has been opened by knfsd. These are hashed * in the hashtable by inode pointer value. Note that this object doesn't * hold a reference to the inode by itself, so the nf_inode pointer should * never be dereferenced, only used for comparison. */
Replace the two existing dereferences to make the comment always true.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 ++--- fs/nfsd/filecache.h | 2 +- fs/nfsd/nfs4state.c | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7ad27655db699..55478d411e5a0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -227,12 +227,11 @@ nfsd_file_mark_put(struct nfsd_file_mark *nfm) }
static struct nfsd_file_mark * -nfsd_file_mark_find_or_create(struct nfsd_file *nf) +nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) { int err; struct fsnotify_mark *mark; struct nfsd_file_mark *nfm = NULL, *new; - struct inode *inode = nf->nf_inode;
do { fsnotify_group_lock(nfsd_file_fsnotify_group); @@ -1143,7 +1142,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
open_file: trace_nfsd_file_alloc(nf); - nf->nf_mark = nfsd_file_mark_find_or_create(nf); + nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); if (nf->nf_mark) { if (open) { status = nfsd_open_verified(rqstp, fhp, may_flags, diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 28145f1628923..8e8c0c47d67df 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -39,7 +39,7 @@ struct nfsd_file { #define NFSD_FILE_PENDING (1) #define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; - struct inode *nf_inode; + struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c4c1e35d3eb7a..16e5bd54d92c2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2577,7 +2577,7 @@ static void nfs4_show_fname(struct seq_file *s, struct nfsd_file *f)
static void nfs4_show_superblock(struct seq_file *s, struct nfsd_file *f) { - struct inode *inode = f->nf_inode; + struct inode *inode = file_inode(f->nf_file);
seq_printf(s, "superblock: "%02x:%02x:%ld"", MAJOR(inode->i_sb->s_dev),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 6867137ebcf4155fe25f2ecf7c29b9fb90a76d1d ]
This patch moves the v4 specific code from nfsd_init_net() to nfsd4_init_leases_net() helper in nfs4state.c
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 12 ++++++++++++ fs/nfsd/nfsctl.c | 9 +-------- fs/nfsd/nfsd.h | 4 ++++ 3 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 16e5bd54d92c2..76a77329cf368 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4347,6 +4347,18 @@ nfsd4_init_slabs(void) return -ENOMEM; }
+void nfsd4_init_leases_net(struct nfsd_net *nn) +{ + nn->nfsd4_lease = 90; /* default lease time */ + nn->nfsd4_grace = 90; + nn->somebody_reclaimed = false; + nn->track_reclaim_completes = false; + nn->clverifier_counter = prandom_u32(); + nn->clientid_base = prandom_u32(); + nn->clientid_counter = nn->clientid_base + 1; + nn->s2s_cp_cl_id = nn->clientid_counter++; +} + static void init_nfs4_replay(struct nfs4_replay *rp) { rp->rp_status = nfserr_serverfault; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 7002edbf26870..164c822ae3ae9 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,14 +1484,7 @@ static __net_init int nfsd_init_net(struct net *net) retval = nfsd_reply_cache_init(nn); if (retval) goto out_drc_error; - nn->nfsd4_lease = 90; /* default lease time */ - nn->nfsd4_grace = 90; - nn->somebody_reclaimed = false; - nn->track_reclaim_completes = false; - nn->clverifier_counter = prandom_u32(); - nn->clientid_base = prandom_u32(); - nn->clientid_counter = nn->clientid_base + 1; - nn->s2s_cp_cl_id = nn->clientid_counter++; + nfsd4_init_leases_net(nn);
get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 9a8b09afc1733..ef8087691138a 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -496,12 +496,16 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif
+extern void nfsd4_init_leases_net(struct nfsd_net *nn); + #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) { return 0; }
+static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; + #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 0926c39515aa065a296e97dfc8790026f1e53f86 ]
Add counter nfs4_client_count to keep track of the total number of v4 clients, including courtesy clients, in the system.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 1b1a962a18041..ce864f001a3ee 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -189,6 +189,8 @@ struct nfsd_net { struct nfsd_fcache_disposal *fcache_disposal;
siphash_key_t siphash_key; + + atomic_t nfs4_client_count; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 76a77329cf368..5003d73fa9287 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2066,7 +2066,8 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) * This type of memory management is somewhat inefficient, but we use it * anyway since SETCLIENTID is not a common operation. */ -static struct nfs4_client *alloc_client(struct xdr_netobj name) +static struct nfs4_client *alloc_client(struct xdr_netobj name, + struct nfsd_net *nn) { struct nfs4_client *clp; int i; @@ -2089,6 +2090,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) atomic_set(&clp->cl_rpc_users, 0); clp->cl_cb_state = NFSD4_CB_UNKNOWN; clp->cl_state = NFSD4_ACTIVE; + atomic_inc(&nn->nfs4_client_count); atomic_set(&clp->cl_delegs_in_recall, 0); INIT_LIST_HEAD(&clp->cl_idhash); INIT_LIST_HEAD(&clp->cl_openowners); @@ -2196,6 +2198,7 @@ static __be32 mark_client_expired_locked(struct nfs4_client *clp) static void __destroy_client(struct nfs4_client *clp) { + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); int i; struct nfs4_openowner *oo; struct nfs4_delegation *dp; @@ -2239,6 +2242,7 @@ __destroy_client(struct nfs4_client *clp) nfsd4_shutdown_callback(clp); if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); + atomic_add_unless(&nn->nfs4_client_count, -1, 0); free_client(clp); wake_up_all(&expiry_wq); } @@ -2865,7 +2869,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct dentry *dentries[ARRAY_SIZE(client_files)];
- clp = alloc_client(name); + clp = alloc_client(name, nn); if (clp == NULL) return NULL;
@@ -4357,6 +4361,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->clientid_base = prandom_u32(); nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++; + + atomic_set(&nn->nfs4_client_count, 0); }
static void init_nfs4_replay(struct nfs4_replay *rp)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 4271c2c0887562318a0afef97d32d8a71cbe0743 ]
Currently there is no limit on how many v4 clients are supported by the system. This can be a problem in systems with small memory configuration to function properly when a very large number of clients exist that creates memory shortage conditions.
This patch enforces a limit of 1024 NFSv4 clients, including courtesy clients, per 1GB of system memory. When the number of the clients reaches the limit, requests that create new clients are returned with NFS4ERR_DELAY and the laundromat is kicked start to trim old clients. Due to the overhead of the upcall to remove the client record, the maximun number of clients the laundromat removes on each run is limited to 128. This is done to ensure the laundromat can still process the other tasks in a timely manner.
Since there is now a limit of the number of clients, the 24-hr idle time limit of courtesy client is no longer needed and was removed.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4state.c | 27 +++++++++++++++++++++------ fs/nfsd/nfsd.h | 2 ++ 3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ce864f001a3ee..ffe17743cc74b 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -191,6 +191,7 @@ struct nfsd_net { siphash_key_t siphash_key;
atomic_t nfs4_client_count; + int nfs4_max_clients; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5003d73fa9287..1babc08fcb88f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2072,6 +2072,10 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, struct nfs4_client *clp; int i;
+ if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + return NULL; + } clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); if (clp == NULL) return NULL; @@ -4353,6 +4357,9 @@ nfsd4_init_slabs(void)
void nfsd4_init_leases_net(struct nfsd_net *nn) { + struct sysinfo si; + u64 max_clients; + nn->nfsd4_lease = 90; /* default lease time */ nn->nfsd4_grace = 90; nn->somebody_reclaimed = false; @@ -4363,6 +4370,10 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->s2s_cp_cl_id = nn->clientid_counter++;
atomic_set(&nn->nfs4_client_count, 0); + si_meminfo(&si); + max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); + max_clients *= NFS4_CLIENTS_PER_GB; + nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); }
static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5828,9 +5839,12 @@ static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) { + unsigned int maxreap, reapcnt = 0; struct list_head *pos, *next; struct nfs4_client *clp;
+ maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? + NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; INIT_LIST_HEAD(reaplist); spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { @@ -5841,14 +5855,15 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, break; if (!atomic_read(&clp->cl_rpc_users)) clp->cl_state = NFSD4_COURTESY; - if (!client_has_state(clp) || - ktime_get_boottime_seconds() >= - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) + if (!client_has_state(clp)) goto exp_client; - if (nfs4_anylock_blockers(clp)) { + if (!nfs4_anylock_blockers(clp)) + if (reapcnt >= maxreap) + continue; exp_client: - if (!mark_client_expired_locked(clp)) - list_add(&clp->cl_lru, reaplist); + if (!mark_client_expired_locked(clp)) { + list_add(&clp->cl_lru, reaplist); + reapcnt++; } } spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index ef8087691138a..57a468ed85c35 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -341,6 +341,8 @@ void nfsd_lockd_shutdown(void);
#define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ #define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 +#define NFS4_CLIENTS_PER_GB 1024
/* * The following attributes are currently not supported by the NFSv4 server:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 3a5940bfa17fb9964bf9688b4356ca643a8f5e2d ]
This printk pops every time nfsd.ko gets plugged in. Most kmods don't do that and this one is not very informative. Olaf's email address seems to be defunct at this point anyway. Just drop it.
Cc: Olaf Kirch okir@suse.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 164c822ae3ae9..917fa1892fd2d 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1519,7 +1519,6 @@ static struct pernet_operations nfsd_net_ops = { static int __init init_nfsd(void) { int retval; - printk(KERN_INFO "Installing knfsd (copyright (C) 1996 okir@monad.swb.de).\n");
retval = nfsd4_init_slabs(); if (retval)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 095a764b7afb06c9499b798c04eaa3cbf70ebe2d ]
write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer. However, it is costly, and here, it is unnecessary because the data item is fixed in size, the buffer destination address is always word-aligned, and the destination location is already in @p.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b98a24c2a753c..14e8e37550609 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5381,8 +5381,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) so->so_replay.rp_buf, len); } status: - /* Note that op->status is already in network byte order: */ - write_bytes_to_xdr_buf(xdr->buf, post_err_offset - 4, &op->status, 4); + *p = op->status; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ab04de60ae1cc64ae16b77feae795311b97720c7 ]
write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer.
However, it is costly. In nfsd4_encode_fattr(), it is unnecessary because the data item is fixed in size and the buffer destination address is always word-aligned.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 14e8e37550609..1e388cdf9b005 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2828,10 +2828,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct kstat stat; struct svc_fh *tempfh = NULL; struct kstatfs statfs; - __be32 *p; + __be32 *p, *attrlen_p; int starting_len = xdr->buf->len; int attrlen_offset; - __be32 attrlen; u32 dummy; u64 dummy64; u32 rdattr_err = 0; @@ -2919,10 +2918,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out;
attrlen_offset = xdr->buf->len; - p = xdr_reserve_space(xdr, 4); - if (!p) + attrlen_p = xdr_reserve_space(xdr, XDR_UNIT); + if (!attrlen_p) goto out_resource; - p++; /* to be backfilled later */
if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) { u32 supp[3]; @@ -3344,8 +3342,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, *p++ = cpu_to_be32(err == 0); }
- attrlen = htonl(xdr->buf->len - attrlen_offset - 4); - write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, 4); + *attrlen_p = cpu_to_be32(xdr->buf->len - attrlen_offset - XDR_UNIT); status = nfs_ok;
out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c738b218a2e5a753a336b4b7fee6720b902c7ace ]
Do the test_bit() once -- this reduces the number of locked-bus operations and makes the function a little easier to read.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1e388cdf9b005..059e920c21919 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3991,6 +3991,7 @@ static __be32 nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { + bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); unsigned long maxcount; struct xdr_stream *xdr = resp->xdr; struct file *file; @@ -4003,11 +4004,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */ if (!p) { - WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)); + WARN_ON_ONCE(splice_ok); return nfserr_resource; } - if (resp->xdr->buf->page_len && - test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { + if (resp->xdr->buf->page_len && splice_ok) { WARN_ON_ONCE(1); return nfserr_serverfault; } @@ -4016,8 +4016,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len));
- if (file->f_op->splice_read && - test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) + if (file->f_op->splice_read && splice_ok) nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); else nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 24c7fb85498eda1d4c6b42cc4886328429814990 ]
Refactor: Make the EOF result available in the entire NFSv4 READ path.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +++++------ fs/nfsd/xdr4.h | 5 +++-- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 059e920c21919..8437a390480df 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3890,7 +3890,6 @@ static __be32 nfsd4_encode_splice_read( struct xdr_stream *xdr = resp->xdr; struct xdr_buf *buf = xdr->buf; int status, space_left; - u32 eof; __be32 nfserr; __be32 *p = xdr->p - 2;
@@ -3899,7 +3898,8 @@ static __be32 nfsd4_encode_splice_read( return nfserr_resource;
nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp, - file, read->rd_offset, &maxcount, &eof); + file, read->rd_offset, &maxcount, + &read->rd_eof); read->rd_length = maxcount; if (nfserr) goto out_err; @@ -3910,7 +3910,7 @@ static __be32 nfsd4_encode_splice_read( goto out_err; }
- *(p++) = htonl(eof); + *(p++) = htonl(read->rd_eof); *(p++) = htonl(maxcount);
buf->page_len = maxcount; @@ -3954,7 +3954,6 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct file *file, unsigned long maxcount) { struct xdr_stream *xdr = resp->xdr; - u32 eof; int starting_len = xdr->buf->len - 8; __be32 nfserr; __be32 tmp; @@ -3966,7 +3965,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen, &maxcount, - &eof); + &read->rd_eof); read->rd_length = maxcount; if (nfserr) return nfserr; @@ -3974,7 +3973,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount));
- tmp = htonl(eof); + tmp = htonl(read->rd_eof); write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4); tmp = htonl(maxcount); write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 448b687943cd3..32617639a3ece 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -302,9 +302,10 @@ struct nfsd4_read { u32 rd_length; /* request */ int rd_vlen; struct nfsd_file *rd_nf; - + struct svc_rqst *rd_rqstp; /* response */ - struct svc_fh *rd_fhp; /* response */ + struct svc_fh *rd_fhp; /* response */ + u32 rd_eof; /* response */ };
struct nfsd4_readdir {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 28d5bc468efe74b790e052f758ce083a5015c665 ]
write_bytes_to_xdr_buf() is pretty expensive to use for inserting an XDR data item that is always 1 XDR_UNIT at an address that is always XDR word-aligned.
Since both the readv and splice read paths encode EOF and maxcount values, move both to a common code path.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8437a390480df..36f0f06714dec 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3891,7 +3891,6 @@ static __be32 nfsd4_encode_splice_read( struct xdr_buf *buf = xdr->buf; int status, space_left; __be32 nfserr; - __be32 *p = xdr->p - 2;
/* Make sure there will be room for padding if needed */ if (xdr->end - xdr->p < 1) @@ -3910,9 +3909,6 @@ static __be32 nfsd4_encode_splice_read( goto out_err; }
- *(p++) = htonl(read->rd_eof); - *(p++) = htonl(maxcount); - buf->page_len = maxcount; buf->len += maxcount; xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1) @@ -3973,11 +3969,6 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount));
- tmp = htonl(read->rd_eof); - write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4); - tmp = htonl(maxcount); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); - tmp = xdr_zero; pad = (maxcount&3) ? 4 - (maxcount&3) : 0; write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount, @@ -4019,11 +4010,14 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); else nfserr = nfsd4_encode_readv(resp, read, file, maxcount); - - if (nfserr) + if (nfserr) { xdr_truncate_encode(xdr, starting_len); + return nfserr; + }
- return nfserr; + p = xdr_encode_bool(p, read->rd_eof); + *p = cpu_to_be32(read->rd_length); + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 071ae99feadfc55979f89287d6ad2c6a315cb46d ]
Clean-up: Now that nfsd4_encode_readv() does not have to encode the EOF or rd_length values, it no longer needs to subtract 8 from @starting_len.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 36f0f06714dec..f67a54f7eb13e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3950,7 +3950,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct file *file, unsigned long maxcount) { struct xdr_stream *xdr = resp->xdr; - int starting_len = xdr->buf->len - 8; + unsigned int starting_len = xdr->buf->len; __be32 nfserr; __be32 tmp; int pad; @@ -3965,14 +3965,13 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, read->rd_length = maxcount; if (nfserr) return nfserr; - if (svc_encode_result_payload(resp->rqstp, starting_len + 8, maxcount)) + if (svc_encode_result_payload(resp->rqstp, starting_len, maxcount)) return nfserr_io; - xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); + xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount));
tmp = xdr_zero; pad = (maxcount&3) ? 4 - (maxcount&3) : 0; - write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount, - &tmp, pad); + write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &tmp, pad); return 0;
}
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5e64d85c7d0c59cfcd61d899720b8ccfe895d743 ]
Clean up: Use a helper instead of open-coding the calculation of the XDR pad size.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f67a54f7eb13e..4d74eb1fee8f1 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3951,9 +3951,8 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, { struct xdr_stream *xdr = resp->xdr; unsigned int starting_len = xdr->buf->len; + __be32 zero = xdr_zero; __be32 nfserr; - __be32 tmp; - int pad;
read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount); if (read->rd_vlen < 0) @@ -3969,11 +3968,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount));
- tmp = xdr_zero; - pad = (maxcount&3) ? 4 - (maxcount&3) : 0; - write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &tmp, pad); - return 0; - + write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &zero, + xdr_pad_size(maxcount)); + return nfs_ok; }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 99b002a1fa00d90e66357315757e7277447ce973 ]
Similar changes to nfsd4_encode_readv(), all bundled into a single patch.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4d74eb1fee8f1..a98513cb35b10 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4019,16 +4019,13 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink) { - int maxcount; - __be32 wire_count; - int zero = 0; + __be32 *p, *maxcount_p, zero = xdr_zero; struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; - int status; - __be32 *p; + int maxcount, status;
- p = xdr_reserve_space(xdr, 4); - if (!p) + maxcount_p = xdr_reserve_space(xdr, XDR_UNIT); + if (!maxcount_p) return nfserr_resource; maxcount = PAGE_SIZE;
@@ -4053,14 +4050,11 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd nfserr = nfserrno(status); goto out_err; } - - wire_count = htonl(maxcount); - write_bytes_to_xdr_buf(xdr->buf, length_offset, &wire_count, 4); - xdr_truncate_encode(xdr, length_offset + 4 + ALIGN(maxcount, 4)); - if (maxcount & 3) - write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, - &zero, 4 - (maxcount&3)); - return 0; + *maxcount_p = cpu_to_be32(maxcount); + xdr_truncate_encode(xdr, length_offset + 4 + xdr_align_size(maxcount)); + write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, &zero, + xdr_pad_size(maxcount)); + return nfs_ok;
out_err: xdr_truncate_encode(xdr, length_offset);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5304877936c0a67e1a01464d113bae4c81eacdb6 ]
In function ‘strncpy’, inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3, inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11: /home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation] 52 | #define __underlying_strncpy __builtin_strncpy | ^ /home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’ 89 | return __underlying_strncpy(p, q, size); | ^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- include/linux/nfs_ssc.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 08c2eaca4f24e..1b49b4e2803c7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1391,7 +1391,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, return 0; } if (work) { - strncpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr)); + strlcpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 222ae8883e854..75843c00f326a 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -64,7 +64,7 @@ struct nfsd4_ssc_umount_item { refcount_t nsui_refcnt; unsigned long nsui_expire; struct vfsmount *nsui_vfsmount; - char nsui_ipaddr[RPC_MAX_ADDRBUFLEN]; + char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; }; #endif
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit bb4d842722b84a2731257054b6405f2d866fc5f3 ]
Suggested-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a98513cb35b10..fcf56e396e66b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1810,7 +1810,7 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta for (i = 0; i < test_stateid->ts_num_ids; i++) { stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); if (!stateid) - return nfserrno(-ENOMEM); /* XXX: not jukebox? */ + return nfserr_jukebox; INIT_LIST_HEAD(&stateid->ts_id_list); list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); status = nfsd4_decode_stateid4(argp, &stateid->ts_id_stateid); @@ -1933,7 +1933,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL); if (ns_dummy == NULL) - return nfserrno(-ENOMEM); /* XXX: jukebox? */ + return nfserr_jukebox; for (i = 0; i < count - 1; i++) { status = nfsd4_decode_nl4_server(argp, ns_dummy); if (status) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 09426ef2a64ee189ca1e3298f1e874842dbf35ea ]
struct nfsd4_copy_notify is part of struct nfsd4_op, which resides in an 8-element array.
sizeof(struct nfsd4_op): Before: /* size: 2208, cachelines: 35, members: 5 */ After: /* size: 1696, cachelines: 27, members: 5 */
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4xdr.c | 12 ++++++++++-- fs/nfsd/xdr4.h | 4 ++-- 3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1b49b4e2803c7..3ed9c36bc4078 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1945,9 +1945,9 @@ nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* For now, only return one server address in cpn_src, the * address used by the client to connect to this server. */ - cn->cpn_src.nl4_type = NL4_NETADDR; + cn->cpn_src->nl4_type = NL4_NETADDR; status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr, - &cn->cpn_src.u.nl4_addr); + &cn->cpn_src->u.nl4_addr); WARN_ON_ONCE(status); if (status) { nfs4_put_cpntf_state(nn, cps); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fcf56e396e66b..537c358fec98c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1952,10 +1952,17 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, { __be32 status;
+ cn->cpn_src = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_src)); + if (cn->cpn_src == NULL) + return nfserr_jukebox; + cn->cpn_dst = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_dst)); + if (cn->cpn_dst == NULL) + return nfserr_jukebox; + status = nfsd4_decode_stateid4(argp, &cn->cpn_src_stateid); if (status) return status; - return nfsd4_decode_nl4_server(argp, &cn->cpn_dst); + return nfsd4_decode_nl4_server(argp, cn->cpn_dst); }
static __be32 @@ -4906,7 +4913,8 @@ nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr,
*p++ = cpu_to_be32(1);
- return nfsd42_encode_nl4_server(resp, &cn->cpn_src); + nfserr = nfsd42_encode_nl4_server(resp, cn->cpn_src); + return nfserr; }
static __be32 diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 32617639a3ece..27f9300fadbe0 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -595,13 +595,13 @@ struct nfsd4_offload_status { struct nfsd4_copy_notify { /* request */ stateid_t cpn_src_stateid; - struct nl4_server cpn_dst; + struct nl4_server *cpn_dst;
/* response */ stateid_t cpn_cnr_stateid; u64 cpn_sec; u32 cpn_nsec; - struct nl4_server cpn_src; + struct nl4_server *cpn_src; };
struct nfsd4_op {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 87689df694916c40e8e6c179ab1c8710f65cb6c6 ]
struct nfsd4_copy is part of struct nfsd4_op, which resides in an 8-element array.
sizeof(struct nfsd4_op): Before: /* size: 1696, cachelines: 27, members: 5 */ After: /* size: 672, cachelines: 11, members: 5 */
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 8 ++++++-- fs/nfsd/nfs4xdr.c | 5 ++++- fs/nfsd/xdr4.h | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 3ed9c36bc4078..7fb1ef7c4383e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1291,6 +1291,7 @@ void nfs4_put_copy(struct nfsd4_copy *copy) { if (!refcount_dec_and_test(©->refcount)) return; + kfree(copy->cp_src); kfree(copy); }
@@ -1544,7 +1545,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out;
- status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); if (status) goto out;
@@ -1751,7 +1752,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->nf_src = nfsd_file_get(src->nf_src);
memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid)); - memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server)); + memcpy(dst->cp_src, src->cp_src, sizeof(struct nl4_server)); memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid)); memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh)); dst->ss_mnt = src->ss_mnt; @@ -1845,6 +1846,9 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; + async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); + if (!async_copy->cp_src) + goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; refcount_set(&async_copy->refcount, 1); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 537c358fec98c..890f1009bd4ca 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1920,6 +1920,9 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
if (xdr_stream_decode_u32(argp->xdr, &count) < 0) return nfserr_bad_xdr; + copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); + if (copy->cp_src == NULL) + return nfserr_jukebox; copy->cp_intra = false; if (count == 0) { /* intra-server copy */ copy->cp_intra = true; @@ -1927,7 +1930,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) }
/* decode all the supplied server addresses but use only the first */ - status = nfsd4_decode_nl4_server(argp, ©->cp_src); + status = nfsd4_decode_nl4_server(argp, copy->cp_src); if (status) return status;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 27f9300fadbe0..c44c76cef40cd 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -540,7 +540,7 @@ struct nfsd4_copy { u64 cp_src_pos; u64 cp_dst_pos; u64 cp_count; - struct nl4_server cp_src; + struct nl4_server *cp_src; bool cp_intra;
/* both */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d314309425ad5dc1b6facdb2d456580fb5fa5e3a ]
Pack the fields to reduce the size of struct nfsd4_op, which is used an array in struct nfsd4_compoundargs.
sizeof(struct nfsd4_op): Before: /* size: 672, cachelines: 11, members: 5 */ After: /* size: 640, cachelines: 10, members: 5 */
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/xdr4.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c44c76cef40cd..621937fae9acb 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -606,8 +606,9 @@ struct nfsd4_copy_notify {
struct nfsd4_op { u32 opnum; - const struct nfsd4_operation * opdesc; __be32 status; + const struct nfsd4_operation *opdesc; + struct nfs4_replay *replay; union nfsd4_op_u { struct nfsd4_access access; struct nfsd4_close close; @@ -671,7 +672,6 @@ struct nfsd4_op { struct nfsd4_listxattrs listxattrs; struct nfsd4_removexattr removexattr; } u; - struct nfs4_replay * replay; };
bool nfsd4_cache_this_op(struct nfsd4_op *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8ea6e2c90bb0eb74a595a12e23a1dff9abbc760a ]
Clean up: All call sites are in fs/nfsd/nfs4proc.c.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/state.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 7fb1ef7c4383e..64879350ccbda 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1287,7 +1287,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; }
-void nfs4_put_copy(struct nfsd4_copy *copy) +static void nfs4_put_copy(struct nfsd4_copy *copy) { if (!refcount_dec_and_test(©->refcount)) return; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index f3d6313914ed0..ae596dbf86675 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -703,7 +703,6 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn);
void put_nfs4_file(struct nfs4_file *fi); -extern void nfs4_put_copy(struct nfsd4_copy *copy); extern struct nfsd4_copy * find_async_copy(struct nfs4_client *clp, stateid_t *staetid); extern void nfs4_put_cpntf_state(struct nfsd_net *nn,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1913cdf56cb5bfbc8170873728d13598cbecda23 ]
Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy() with an atomic bitop.
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 49 +++++++++++++++++----------------------------- fs/nfsd/nfs4xdr.c | 12 ++++++------ fs/nfsd/xdr4.h | 33 ++++++++++++++++++++++++++----- 3 files changed, 52 insertions(+), 42 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 64879350ccbda..a4bc096e509d4 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1295,23 +1295,9 @@ static void nfs4_put_copy(struct nfsd4_copy *copy) kfree(copy); }
-static bool -check_and_set_stop_copy(struct nfsd4_copy *copy) -{ - bool value; - - spin_lock(©->cp_clp->async_lock); - value = copy->stopped; - if (!copy->stopped) - copy->stopped = true; - spin_unlock(©->cp_clp->async_lock); - return value; -} - static void nfsd4_stop_copy(struct nfsd4_copy *copy) { - /* only 1 thread should stop the copy */ - if (!check_and_set_stop_copy(copy)) + if (!test_and_set_bit(NFSD4_COPY_F_STOPPED, ©->cp_flags)) kthread_stop(copy->copy_task); nfs4_put_copy(copy); } @@ -1668,8 +1654,9 @@ static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = { static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) { copy->cp_res.wr_stable_how = - copy->committed ? NFS_FILE_SYNC : NFS_UNSTABLE; - copy->cp_synchronous = sync; + test_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags) ? + NFS_FILE_SYNC : NFS_UNSTABLE; + nfsd4_copy_set_sync(copy, sync); gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); }
@@ -1698,16 +1685,16 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) copy->cp_res.wr_bytes_written += bytes_copied; src_pos += bytes_copied; dst_pos += bytes_copied; - } while (bytes_total > 0 && !copy->cp_synchronous); + } while (bytes_total > 0 && nfsd4_copy_is_async(copy)); /* for a non-zero asynchronous copy do a commit of data */ - if (!copy->cp_synchronous && copy->cp_res.wr_bytes_written > 0) { + if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) { since = READ_ONCE(dst->f_wb_err); status = vfs_fsync_range(dst, copy->cp_dst_pos, copy->cp_res.wr_bytes_written, 0); if (!status) status = filemap_check_wb_err(dst->f_mapping, since); if (!status) - copy->committed = true; + set_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags); } return bytes_copied; } @@ -1728,7 +1715,7 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) status = nfs_ok; }
- if (!copy->cp_intra) /* Inter server SSC */ + if (nfsd4_ssc_is_inter(copy)) nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src, copy->nf_dst); else @@ -1742,13 +1729,13 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->cp_src_pos = src->cp_src_pos; dst->cp_dst_pos = src->cp_dst_pos; dst->cp_count = src->cp_count; - dst->cp_synchronous = src->cp_synchronous; + dst->cp_flags = src->cp_flags; memcpy(&dst->cp_res, &src->cp_res, sizeof(src->cp_res)); memcpy(&dst->fh, &src->fh, sizeof(src->fh)); dst->cp_clp = src->cp_clp; dst->nf_dst = nfsd_file_get(src->nf_dst); - dst->cp_intra = src->cp_intra; - if (src->cp_intra) /* for inter, file_src doesn't exist yet */ + /* for inter, nf_src doesn't exist yet */ + if (!nfsd4_ssc_is_inter(src)) dst->nf_src = nfsd_file_get(src->nf_src);
memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid)); @@ -1762,7 +1749,7 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); nfsd_file_put(copy->nf_dst); - if (copy->cp_intra) + if (!nfsd4_ssc_is_inter(copy)) nfsd_file_put(copy->nf_src); spin_lock(©->cp_clp->async_lock); list_del(©->copies); @@ -1775,7 +1762,7 @@ static int nfsd4_do_async_copy(void *data) struct nfsd4_copy *copy = (struct nfsd4_copy *)data; struct nfsd4_copy *cb_copy;
- if (!copy->cp_intra) { /* Inter server SSC */ + if (nfsd4_ssc_is_inter(copy)) { copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL); if (!copy->nf_src) { copy->nfserr = nfserr_serverfault; @@ -1807,7 +1794,7 @@ static int nfsd4_do_async_copy(void *data) ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: - if (!copy->cp_intra) + if (nfsd4_ssc_is_inter(copy)) kfree(copy->nf_src); cleanup_async_copy(copy); return 0; @@ -1821,8 +1808,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; struct nfsd4_copy *async_copy = NULL;
- if (!copy->cp_intra) { /* Inter server SSC */ - if (!inter_copy_offload_enable || copy->cp_synchronous) { + if (nfsd4_ssc_is_inter(copy)) { + if (!inter_copy_offload_enable || nfsd4_copy_is_sync(copy)) { status = nfserr_notsupp; goto out; } @@ -1839,7 +1826,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, copy->cp_clp = cstate->clp; memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); - if (!copy->cp_synchronous) { + if (nfsd4_copy_is_async(copy)) { struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
status = nfserrno(-ENOMEM); @@ -2605,7 +2592,7 @@ check_if_stalefh_allowed(struct nfsd4_compoundargs *args) return; } putfh = (struct nfsd4_putfh *)&saved_op->u; - if (!copy->cp_intra) + if (nfsd4_ssc_is_inter(copy)) putfh->no_verify = true; } } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 890f1009bd4ca..5476541530ead 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1896,8 +1896,8 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) { + u32 consecutive, i, count, sync; struct nl4_server *ns_dummy; - u32 consecutive, i, count; __be32 status;
status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); @@ -1915,17 +1915,17 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) /* ca_consecutive: we always do consecutive copies */ if (xdr_stream_decode_u32(argp->xdr, &consecutive) < 0) return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, ©->cp_synchronous) < 0) + if (xdr_stream_decode_bool(argp->xdr, &sync) < 0) return nfserr_bad_xdr; + nfsd4_copy_set_sync(copy, sync);
if (xdr_stream_decode_u32(argp->xdr, &count) < 0) return nfserr_bad_xdr; copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); if (copy->cp_src == NULL) return nfserr_jukebox; - copy->cp_intra = false; if (count == 0) { /* intra-server copy */ - copy->cp_intra = true; + __set_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); return nfs_ok; }
@@ -4712,13 +4712,13 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, __be32 *p;
nfserr = nfsd42_encode_write_res(resp, ©->cp_res, - !!copy->cp_synchronous); + nfsd4_copy_is_sync(copy)); if (nfserr) return nfserr;
p = xdr_reserve_space(resp->xdr, 4 + 4); *p++ = xdr_one; /* cr_consecutive */ - *p++ = cpu_to_be32(copy->cp_synchronous); + *p = nfsd4_copy_is_sync(copy) ? xdr_one : xdr_zero; return 0; }
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 621937fae9acb..1f6ac92bcf856 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -541,10 +541,12 @@ struct nfsd4_copy { u64 cp_dst_pos; u64 cp_count; struct nl4_server *cp_src; - bool cp_intra;
- /* both */ - u32 cp_synchronous; + unsigned long cp_flags; +#define NFSD4_COPY_F_STOPPED (0) +#define NFSD4_COPY_F_INTRA (1) +#define NFSD4_COPY_F_SYNCHRONOUS (2) +#define NFSD4_COPY_F_COMMITTED (3)
/* response */ struct nfsd42_write_res cp_res; @@ -564,14 +566,35 @@ struct nfsd4_copy { struct list_head copies; struct task_struct *copy_task; refcount_t refcount; - bool stopped;
struct vfsmount *ss_mnt; struct nfs_fh c_fh; nfs4_stateid stateid; - bool committed; };
+static inline void nfsd4_copy_set_sync(struct nfsd4_copy *copy, bool sync) +{ + if (sync) + set_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); + else + clear_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_copy_is_sync(const struct nfsd4_copy *copy) +{ + return test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_copy_is_async(const struct nfsd4_copy *copy) +{ + return !test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_ssc_is_inter(const struct nfsd4_copy *copy) +{ + return !test_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); +} + struct nfsd4_seek { /* request */ stateid_t seek_stateid;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 24d796ea383b8a4c8234e06d1b14bbcd371192ea ]
The @src parameter is sometimes a pointer to a struct nfsd_file and sometimes a pointer to struct file hiding in a phony struct nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter is always an explicit struct file.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a4bc096e509d4..d39150425da88 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1549,7 +1549,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, }
static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, struct nfsd_file *dst) { bool found = false; @@ -1558,9 +1558,9 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id);
- nfs42_ssc_close(src->nf_file); + nfs42_ssc_close(filp); nfsd_file_put(dst); - fput(src->nf_file); + fput(filp);
if (!nn) { mntput(ss_mnt); @@ -1603,7 +1603,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, }
static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, struct nfsd_file *dst) { } @@ -1716,7 +1716,7 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) }
if (nfsd4_ssc_is_inter(copy)) - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src, + nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, copy->nf_dst); else nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 478ed7b10d875da2743d1a22822b9f8a82df8f12 ]
Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A subsequent patch will modify one of the new call sites to avoid the need to manufacture the phony struct nfsd_file.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d39150425da88..5d05bb7a0c0f6 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1714,13 +1714,6 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) nfsd4_init_copy_res(copy, sync); status = nfs_ok; } - - if (nfsd4_ssc_is_inter(copy)) - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, - copy->nf_dst); - else - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); - return status; }
@@ -1776,9 +1769,14 @@ static int nfsd4_do_async_copy(void *data) /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } + copy->nfserr = nfsd4_do_copy(copy, 0); + nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, + copy->nf_dst); + } else { + copy->nfserr = nfsd4_do_copy(copy, 0); + nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); }
- copy->nfserr = nfsd4_do_copy(copy, 0); do_callback: cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!cb_copy) @@ -1854,6 +1852,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs_ok; } else { status = nfsd4_do_copy(copy, 1); + nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out: return status;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3b7bf5933cada732783554edf0dc61283551c6cf ]
Refactor: Now that nfsd4_do_copy() no longer calls the cleanup helpers, plumb the use of struct file pointers all the way down to _nfsd_copy_file_range().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5d05bb7a0c0f6..16f968c165c98 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1660,10 +1660,10 @@ static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); }
-static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) +static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, + struct file *dst, + struct file *src) { - struct file *dst = copy->nf_dst->nf_file; - struct file *src = copy->nf_src->nf_file; errseq_t since; ssize_t bytes_copied = 0; u64 bytes_total = copy->cp_count; @@ -1699,12 +1699,15 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) return bytes_copied; }
-static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) +static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, + struct file *src, struct file *dst, + bool sync) { __be32 status; ssize_t bytes;
- bytes = _nfsd_copy_file_range(copy); + bytes = _nfsd_copy_file_range(copy, dst, src); + /* for async copy, we ignore the error, client can always retry * to get the error */ @@ -1769,11 +1772,13 @@ static int nfsd4_do_async_copy(void *data) /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, 0); + copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, copy->nf_dst); } else { - copy->nfserr = nfsd4_do_copy(copy, 0); + copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); }
@@ -1851,7 +1856,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, wake_up_process(async_copy->copy_task); status = nfs_ok; } else { - status = nfsd4_do_copy(copy, 1); + status = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, true); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ad1e46c9b07b13659635ee5405f83ad0df143116 ]
Instead of manufacturing a phony struct nfsd_file, pass the struct file returned by nfs42_ssc_open() directly to nfsd4_do_copy().
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 16f968c165c98..dbc507c9aa11b 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1753,29 +1753,31 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); }
+/** + * nfsd4_do_async_copy - kthread function for background server-side COPY + * @data: arguments for COPY operation + * + * Return values: + * %0: Copy operation is done. + */ static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; struct nfsd4_copy *cb_copy;
if (nfsd4_ssc_is_inter(copy)) { - copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL); - if (!copy->nf_src) { - copy->nfserr = nfserr_serverfault; - /* ss_mnt will be unmounted by the laundromat */ - goto do_callback; - } - copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, - ©->stateid); - if (IS_ERR(copy->nf_src->nf_file)) { + struct file *filp; + + filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, + ©->stateid); + if (IS_ERR(filp)) { copy->nfserr = nfserr_offload_denied; /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, false); - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, - copy->nf_dst); + nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); } else { copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); @@ -1797,8 +1799,6 @@ static int nfsd4_do_async_copy(void *data) ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: - if (nfsd4_ssc_is_inter(copy)) - kfree(copy->nf_src); cleanup_async_copy(copy); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit e72f9bc006c08841c46d27747a4debc747a8fe13 ]
Refactor for legibility.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index dbc507c9aa11b..332fd1d0b188d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1753,6 +1753,27 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); }
+static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) +{ + struct nfsd4_copy *cb_copy; + + cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); + if (!cb_copy) + return; + + refcount_set(&cb_copy->refcount, 1); + memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); + cb_copy->cp_clp = copy->cp_clp; + cb_copy->nfserr = copy->nfserr; + memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); + + nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, + &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, + ©->fh, copy->cp_count, copy->nfserr); + nfsd4_run_cb(&cb_copy->cp_cb); +} + /** * nfsd4_do_async_copy - kthread function for background server-side COPY * @data: arguments for COPY operation @@ -1763,7 +1784,6 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; - struct nfsd4_copy *cb_copy;
if (nfsd4_ssc_is_inter(copy)) { struct file *filp; @@ -1785,20 +1805,7 @@ static int nfsd4_do_async_copy(void *data) }
do_callback: - cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); - if (!cb_copy) - goto out; - refcount_set(&cb_copy->refcount, 1); - memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); - cb_copy->cp_clp = copy->cp_clp; - cb_copy->nfserr = copy->nfserr; - memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); - nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, - &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); - trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, - ©->fh, copy->cp_count, copy->nfserr); - nfsd4_run_cb(&cb_copy->cp_cb); -out: + nfsd4_send_cb_offload(copy); cleanup_async_copy(copy); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a11ada99ce93a79393dc6683d22f7915748c8f6b ]
Refactor so that CB_OFFLOAD arguments can be passed without allocating a whole struct nfsd4_copy object. On my system (x86_64) this removes another 96 bytes from struct nfsd4_copy.
[ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 37 +++++++++++++++++------------------ fs/nfsd/nfs4proc.c | 44 +++++++++++++++++++++--------------------- fs/nfsd/xdr4.h | 11 +++++++---- 3 files changed, 47 insertions(+), 45 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index e1272a7f45220..face8908a40b1 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -679,7 +679,7 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * case NFS4_OK: * write_response4 coa_resok4; * default: - * length4 coa_bytes_copied; + * length4 coa_bytes_copied; * }; * struct CB_OFFLOAD4args { * nfs_fh4 coa_fh; @@ -688,21 +688,22 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * }; */ static void encode_offload_info4(struct xdr_stream *xdr, - __be32 nfserr, - const struct nfsd4_copy *cp) + const struct nfsd4_cb_offload *cbo) { __be32 *p;
p = xdr_reserve_space(xdr, 4); - *p++ = nfserr; - if (!nfserr) { + *p = cbo->co_nfserr; + switch (cbo->co_nfserr) { + case nfs_ok: p = xdr_reserve_space(xdr, 4 + 8 + 4 + NFS4_VERIFIER_SIZE); p = xdr_encode_empty_array(p); - p = xdr_encode_hyper(p, cp->cp_res.wr_bytes_written); - *p++ = cpu_to_be32(cp->cp_res.wr_stable_how); - p = xdr_encode_opaque_fixed(p, cp->cp_res.wr_verifier.data, + p = xdr_encode_hyper(p, cbo->co_res.wr_bytes_written); + *p++ = cpu_to_be32(cbo->co_res.wr_stable_how); + p = xdr_encode_opaque_fixed(p, cbo->co_res.wr_verifier.data, NFS4_VERIFIER_SIZE); - } else { + break; + default: p = xdr_reserve_space(xdr, 8); /* We always return success if bytes were written */ p = xdr_encode_hyper(p, 0); @@ -710,18 +711,16 @@ static void encode_offload_info4(struct xdr_stream *xdr, }
static void encode_cb_offload4args(struct xdr_stream *xdr, - __be32 nfserr, - const struct knfsd_fh *fh, - const struct nfsd4_copy *cp, + const struct nfsd4_cb_offload *cbo, struct nfs4_cb_compound_hdr *hdr) { __be32 *p;
p = xdr_reserve_space(xdr, 4); - *p++ = cpu_to_be32(OP_CB_OFFLOAD); - encode_nfs_fh4(xdr, fh); - encode_stateid4(xdr, &cp->cp_res.cb_stateid); - encode_offload_info4(xdr, nfserr, cp); + *p = cpu_to_be32(OP_CB_OFFLOAD); + encode_nfs_fh4(xdr, &cbo->co_fh); + encode_stateid4(xdr, &cbo->co_res.cb_stateid); + encode_offload_info4(xdr, cbo);
hdr->nops++; } @@ -731,8 +730,8 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, const void *data) { const struct nfsd4_callback *cb = data; - const struct nfsd4_copy *cp = - container_of(cb, struct nfsd4_copy, cp_cb); + const struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb); struct nfs4_cb_compound_hdr hdr = { .ident = 0, .minorversion = cb->cb_clp->cl_minorversion, @@ -740,7 +739,7 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req,
encode_cb_compound4args(xdr, &hdr); encode_cb_sequence4args(xdr, cb, &hdr); - encode_cb_offload4args(xdr, cp->nfserr, &cp->fh, cp, &hdr); + encode_cb_offload4args(xdr, cbo, &hdr); encode_cb_nops(&hdr); }
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 332fd1d0b188d..fdde7eca8d438 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1635,9 +1635,10 @@ nfsd4_cleanup_intra_ssc(struct nfsd_file *src, struct nfsd_file *dst)
static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) { - struct nfsd4_copy *copy = container_of(cb, struct nfsd4_copy, cp_cb); + struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb);
- nfs4_put_copy(copy); + kfree(cbo); }
static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, @@ -1753,25 +1754,23 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); }
-static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy, __be32 nfserr) { - struct nfsd4_copy *cb_copy; + struct nfsd4_cb_offload *cbo;
- cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); - if (!cb_copy) + cbo = kzalloc(sizeof(*cbo), GFP_KERNEL); + if (!cbo) return;
- refcount_set(&cb_copy->refcount, 1); - memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); - cb_copy->cp_clp = copy->cp_clp; - cb_copy->nfserr = copy->nfserr; - memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); + memcpy(&cbo->co_res, ©->cp_res, sizeof(copy->cp_res)); + memcpy(&cbo->co_fh, ©->fh, sizeof(copy->fh)); + cbo->co_nfserr = nfserr;
- nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, - &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); - trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, - ©->fh, copy->cp_count, copy->nfserr); - nfsd4_run_cb(&cb_copy->cp_cb); + nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops, + NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, &cbo->co_res.cb_stateid, + &cbo->co_fh, copy->cp_count, nfserr); + nfsd4_run_cb(&cbo->co_cb); }
/** @@ -1784,6 +1783,7 @@ static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; + __be32 nfserr;
if (nfsd4_ssc_is_inter(copy)) { struct file *filp; @@ -1791,21 +1791,21 @@ static int nfsd4_do_async_copy(void *data) filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, ©->stateid); if (IS_ERR(filp)) { - copy->nfserr = nfserr_offload_denied; + nfserr = nfserr_offload_denied; /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, filp, - copy->nf_dst->nf_file, false); + nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, + false); nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); } else { - copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, - copy->nf_dst->nf_file, false); + nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); }
do_callback: - nfsd4_send_cb_offload(copy); + nfsd4_send_cb_offload(copy, nfserr); cleanup_async_copy(copy); return 0; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 1f6ac92bcf856..3b9e60249aea9 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -533,6 +533,13 @@ struct nfsd42_write_res { stateid_t cb_stateid; };
+struct nfsd4_cb_offload { + struct nfsd4_callback co_cb; + struct nfsd42_write_res co_res; + __be32 co_nfserr; + struct knfsd_fh co_fh; +}; + struct nfsd4_copy { /* request */ stateid_t cp_src_stateid; @@ -550,10 +557,6 @@ struct nfsd4_copy {
/* response */ struct nfsd42_write_res cp_res; - - /* for cb_offload */ - struct nfsd4_callback cp_cb; - __be32 nfserr; struct knfsd_fh fh;
struct nfs4_client *cp_clp;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit bbf936edd543e7220f60f9cbd6933b916550396d ]
Currently, we pass the fh of the opened file down through several functions so that alloc_init_deleg can pass it to delegation_blocked. The filehandle of the open file is available in the nfs4_file however, so there's no need to pass it in a separate argument.
Drop the argument from alloc_init_deleg, nfs4_open_delegation and nfs4_set_delegation.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1babc08fcb88f..194d8aeb1fd46 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1143,7 +1143,6 @@ static void block_delegations(struct knfsd_fh *fh)
static struct nfs4_delegation * alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, - struct svc_fh *current_fh, struct nfs4_clnt_odstate *odstate) { struct nfs4_delegation *dp; @@ -1153,7 +1152,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, n = atomic_long_inc_return(&num_delegations); if (n < 0 || n > max_delegations) goto out_dec; - if (delegation_blocked(¤t_fh->fh_handle)) + if (delegation_blocked(&fp->fi_fhandle)) goto out_dec; dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab, nfs4_free_deleg)); if (dp == NULL) @@ -5307,7 +5306,7 @@ static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, }
static struct nfs4_delegation * -nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, +nfs4_set_delegation(struct nfs4_client *clp, struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) { int status = 0; @@ -5352,7 +5351,7 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, return ERR_PTR(status);
status = -ENOMEM; - dp = alloc_init_deleg(clp, fp, fh, odstate); + dp = alloc_init_deleg(clp, fp, odstate); if (!dp) goto out_delegees;
@@ -5420,8 +5419,7 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status) * proper support for them. */ static void -nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, - struct nfs4_ol_stateid *stp) +nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) { struct nfs4_delegation *dp; struct nfs4_openowner *oo = openowner(stp->st_stateowner); @@ -5453,7 +5451,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, default: goto out_no_deleg; } - dp = nfs4_set_delegation(clp, fh, stp->st_stid.sc_file, stp->st_clnt_odstate); + dp = nfs4_set_delegation(clp, stp->st_stid.sc_file, stp->st_clnt_odstate); if (IS_ERR(dp)) goto out_no_deleg;
@@ -5585,7 +5583,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * Attempt to hand out a delegation. No error return, because the * OPEN succeeds even if we fail. */ - nfs4_open_delegation(current_fh, open, stp); + nfs4_open_delegation(open, stp); nodeleg: status = nfs_ok; trace_nfsd_open(&stp->st_stid.sc_stateid);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 876c553cb41026cb6ad3cef970a35e5f69c42a25 ]
Between opening a file and setting a delegation on it, someone could rename or unlink the dentry. If this happens, we do not want to grant a delegation on the open.
On a CLAIM_NULL open, we're opening by filename, and we may (in the non-create case) or may not (in the create case) be holding i_rwsem when attempting to set a delegation. The latter case allows a race.
After getting a lease, redo the lookup of the file being opened and validate that the resulting dentry matches the one in the open file description.
To properly redo the lookup we need an rqst pointer to pass to nfsd_lookup_dentry(), so make sure that is available.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 1 + fs/nfsd/nfs4state.c | 54 ++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/xdr4.h | 1 + 3 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index fdde7eca8d438..7685bbe9d78dd 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -547,6 +547,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, open->op_openowner);
open->op_filp = NULL; + open->op_rqstp = rqstp;
/* This check required by spec. */ if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 194d8aeb1fd46..b3ab112695837 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5305,11 +5305,44 @@ static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, return 0; }
+/* + * It's possible that between opening the dentry and setting the delegation, + * that it has been renamed or unlinked. Redo the lookup to verify that this + * hasn't happened. + */ +static int +nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, + struct svc_fh *parent) +{ + struct svc_export *exp; + struct dentry *child; + __be32 err; + + /* parent may already be locked, and it may get unlocked by + * this call, but that is safe. + */ + err = nfsd_lookup_dentry(open->op_rqstp, parent, + open->op_fname, open->op_fnamelen, + &exp, &child); + + if (err) + return -EAGAIN; + + dput(child); + if (child != file_dentry(fp->fi_deleg_file->nf_file)) + return -EAGAIN; + + return 0; +} + static struct nfs4_delegation * -nfs4_set_delegation(struct nfs4_client *clp, - struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) +nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, + struct svc_fh *parent) { int status = 0; + struct nfs4_client *clp = stp->st_stid.sc_client; + struct nfs4_file *fp = stp->st_stid.sc_file; + struct nfs4_clnt_odstate *odstate = stp->st_clnt_odstate; struct nfs4_delegation *dp; struct nfsd_file *nf; struct file_lock *fl; @@ -5364,6 +5397,13 @@ nfs4_set_delegation(struct nfs4_client *clp, locks_free_lock(fl); if (status) goto out_clnt_odstate; + + if (parent) { + status = nfsd4_verify_deleg_dentry(open, fp, parent); + if (status) + goto out_unlock; + } + status = nfsd4_check_conflicting_opens(clp, fp); if (status) goto out_unlock; @@ -5419,11 +5459,13 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status) * proper support for them. */ static void -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) +nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, + struct svc_fh *currentfh) { struct nfs4_delegation *dp; struct nfs4_openowner *oo = openowner(stp->st_stateowner); struct nfs4_client *clp = stp->st_stid.sc_client; + struct svc_fh *parent = NULL; int cb_up; int status = 0;
@@ -5437,6 +5479,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) goto out_no_deleg; break; case NFS4_OPEN_CLAIM_NULL: + parent = currentfh; + fallthrough; case NFS4_OPEN_CLAIM_FH: /* * Let's not give out any delegations till everyone's @@ -5451,7 +5495,7 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) default: goto out_no_deleg; } - dp = nfs4_set_delegation(clp, stp->st_stid.sc_file, stp->st_clnt_odstate); + dp = nfs4_set_delegation(open, stp, parent); if (IS_ERR(dp)) goto out_no_deleg;
@@ -5583,7 +5627,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * Attempt to hand out a delegation. No error return, because the * OPEN succeeds even if we fail. */ - nfs4_open_delegation(open, stp); + nfs4_open_delegation(open, stp, &resp->cstate.current_fh); nodeleg: status = nfs_ok; trace_nfsd_open(&stp->st_stid.sc_stateid); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 3b9e60249aea9..14b87141de343 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -279,6 +279,7 @@ struct nfsd4_open { struct nfs4_clnt_odstate *op_odstate; /* used during processing */ struct nfs4_acl *op_acl; struct xdr_netobj op_label; + struct svc_rqst *op_rqstp; };
struct nfsd4_open_confirm {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 7fe2a71dda349a1afa75781f0cc7975be9784d15 ]
The attributes that nfsd might want to set on a file include 'struct iattr' as well as an ACL and security label. The latter two are passed around quite separately from the first, in part because they are only needed for NFSv4. This leads to some clumsiness in the code, such as the attributes NOT being set in nfsd_create_setattr().
We need to keep the directory locked until all attributes are set to ensure the file is never visibile without all its attributes. This need combined with the inconsistent handling of attributes leads to more clumsiness.
As a first step towards tidying this up, introduce 'struct nfsd_attrs'. This is passed (by reference) to vfs.c functions that work with attributes, and is assembled by the various nfs*proc functions which call them. As yet only iattr is included, but future patches will expand this.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 20 ++++++++++++++++---- fs/nfsd/nfs4proc.c | 23 ++++++++++++++++------- fs/nfsd/nfs4state.c | 5 ++++- fs/nfsd/nfsproc.c | 17 +++++++++++++---- fs/nfsd/vfs.c | 24 ++++++++++++++---------- fs/nfsd/vfs.h | 12 ++++++++---- 6 files changed, 71 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index e314545bbdb2e..7b81d871f0d3c 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -67,12 +67,15 @@ nfsd3_proc_setattr(struct svc_rqst *rqstp) { struct nfsd3_sattrargs *argp = rqstp->rq_argp; struct nfsd3_attrstat *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + };
dprintk("nfsd: SETATTR(3) %s\n", SVCFH_fmt(&argp->fh));
fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_setattr(rqstp, &resp->fh, &argp->attrs, + resp->status = nfsd_setattr(rqstp, &resp->fh, &attrs, argp->check_guard, argp->guardtime); return rpc_success; } @@ -233,6 +236,9 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, { struct iattr *iap = &argp->attrs; struct dentry *parent, *child; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; __u32 v_mtime, v_atime; struct inode *inode; __be32 status; @@ -331,7 +337,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, }
set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out: fh_unlock(fhp); @@ -368,6 +374,9 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + };
dprintk("nfsd: MKDIR(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -378,7 +387,7 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->fh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &argp->attrs, S_IFDIR, 0, &resp->fh); + &attrs, S_IFDIR, 0, &resp->fh); fh_unlock(&resp->dirfh); return rpc_success; } @@ -428,6 +437,9 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) { struct nfsd3_mknodargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; int type; dev_t rdev = 0;
@@ -453,7 +465,7 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp)
type = nfs3_ftypes[argp->ftype]; resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &argp->attrs, type, rdev, &resp->fh); + &attrs, type, rdev, &resp->fh); fh_unlock(&resp->dirfh); out: return rpc_success; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 7685bbe9d78dd..f8a157e4bc708 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -286,6 +286,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct nfsd4_open *open) { struct iattr *iap = &open->op_iattr; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; struct dentry *parent, *child; __u32 v_mtime, v_atime; struct inode *inode; @@ -404,7 +407,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, }
set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out: fh_unlock(fhp); @@ -787,6 +790,9 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_create *create = &u->create; + struct nfsd_attrs attrs = { + .na_iattr = &create->cr_iattr, + }; struct svc_fh resfh; __be32 status; dev_t rdev; @@ -818,7 +824,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFBLK, rdev, &resfh); + &attrs, S_IFBLK, rdev, &resfh); break;
case NF4CHR: @@ -829,26 +835,26 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFCHR, rdev, &resfh); + &attrs, S_IFCHR, rdev, &resfh); break;
case NF4SOCK: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFSOCK, 0, &resfh); + &attrs, S_IFSOCK, 0, &resfh); break;
case NF4FIFO: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFIFO, 0, &resfh); + &attrs, S_IFIFO, 0, &resfh); break;
case NF4DIR: create->cr_iattr.ia_valid &= ~ATTR_SIZE; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFDIR, 0, &resfh); + &attrs, S_IFDIR, 0, &resfh); break;
default: @@ -1142,6 +1148,9 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_setattr *setattr = &u->setattr; + struct nfsd_attrs attrs = { + .na_iattr = &setattr->sa_iattr, + }; __be32 status = nfs_ok; int err;
@@ -1174,7 +1183,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &setattr->sa_label); if (status) goto out; - status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr, + status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); out: fh_drop_write(&cstate->current_fh); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b3ab112695837..2b6c9f1b9de88 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5077,11 +5077,14 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh, .ia_valid = ATTR_SIZE, .ia_size = 0, }; + struct nfsd_attrs attrs = { + .na_iattr = &iattr, + }; if (!open->op_truncate) return 0; if (!(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) return nfserr_inval; - return nfsd_setattr(rqstp, fh, &iattr, 0, (time64_t)0); + return nfsd_setattr(rqstp, fh, &attrs, 0, (time64_t)0); }
static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index b8ddf21e70ea0..f061f229d5ff0 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -51,6 +51,9 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) struct nfsd_sattrargs *argp = rqstp->rq_argp; struct nfsd_attrstat *resp = rqstp->rq_resp; struct iattr *iap = &argp->attrs; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; struct svc_fh *fhp;
dprintk("nfsd: SETATTR %s, valid=%x, size=%ld\n", @@ -100,7 +103,7 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) } }
- resp->status = nfsd_setattr(rqstp, fhp, iap, 0, (time64_t)0); + resp->status = nfsd_setattr(rqstp, fhp, &attrs, 0, (time64_t)0); if (resp->status != nfs_ok) goto out;
@@ -260,6 +263,9 @@ nfsd_proc_create(struct svc_rqst *rqstp) svc_fh *dirfhp = &argp->fh; svc_fh *newfhp = &resp->fh; struct iattr *attr = &argp->attrs; + struct nfsd_attrs attrs = { + .na_iattr = attr, + }; struct inode *inode; struct dentry *dchild; int type, mode; @@ -385,7 +391,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) if (!inode) { /* File doesn't exist. Create it and set attrs */ resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name, - argp->len, attr, type, rdev, + argp->len, &attrs, type, rdev, newfhp); } else if (type == S_IFREG) { dprintk("nfsd: existing %s, valid=%x, size=%ld\n", @@ -396,7 +402,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) */ attr->ia_valid &= ATTR_SIZE; if (attr->ia_valid) - resp->status = nfsd_setattr(rqstp, newfhp, attr, 0, + resp->status = nfsd_setattr(rqstp, newfhp, &attrs, 0, (time64_t)0); }
@@ -511,6 +517,9 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd_createargs *argp = rqstp->rq_argp; struct nfsd_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + };
dprintk("nfsd: MKDIR %s %.*s\n", SVCFH_fmt(&argp->fh), argp->len, argp->name);
@@ -522,7 +531,7 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) argp->attrs.ia_valid &= ~ATTR_SIZE; fh_init(&resp->fh, NFS_FHSIZE); resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len, - &argp->attrs, S_IFDIR, 0, &resp->fh); + &attrs, S_IFDIR, 0, &resp->fh); fh_put(&argp->fh); if (resp->status != nfs_ok) goto out; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5c8dc1a05e57e..c86c3a8e42329 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -362,11 +362,13 @@ nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp, * Set various file attributes. After this call fhp needs an fh_put. */ __be32 -nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, +nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct nfsd_attrs *attr, int check_guard, time64_t guardtime) { struct dentry *dentry; struct inode *inode; + struct iattr *iap = attr->na_iattr; int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; @@ -1221,14 +1223,15 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, * @rqstp: RPC transaction being executed * @fhp: NFS filehandle of parent directory * @resfhp: NFS filehandle of new object - * @iap: requested attributes of new object + * @attrs: requested attributes of new object * * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct iattr *iap) + struct svc_fh *resfhp, struct nfsd_attrs *attrs) { + struct iattr *iap = attrs->na_iattr; __be32 status;
/* @@ -1249,7 +1252,7 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, * if the attributes have not changed. */ if (iap->ia_valid) - status = nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); + status = nfsd_setattr(rqstp, resfhp, attrs, 0, (time64_t)0); else status = nfserrno(commit_metadata(resfhp));
@@ -1288,11 +1291,12 @@ nfsd_check_ignore_resizing(struct iattr *iap) /* The parent directory should already be locked: */ __be32 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct nfsd_attrs *attrs, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild; struct inode *dirp; + struct iattr *iap = attrs->na_iattr; __be32 err; int host_err;
@@ -1365,7 +1369,7 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err < 0) goto out_nfserr;
- err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
out: dput(dchild); @@ -1384,8 +1388,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, */ __be32 nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct nfsd_attrs *attrs, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild = NULL; __be32 err; @@ -1417,7 +1421,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, dput(dchild); if (err) return err; - return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type, + return nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, rdev, resfhp); }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 26347d76f44a0..d8b1a36fca956 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -42,6 +42,10 @@ struct nfsd_file; typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
/* nfsd/vfs.c */ +struct nfsd_attrs { + struct iattr *na_iattr; /* input */ +}; + int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *, @@ -50,7 +54,7 @@ __be32 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *, const char *, unsigned int, struct svc_export **, struct dentry **); __be32 nfsd_setattr(struct svc_rqst *, struct svc_fh *, - struct iattr *, int, time64_t); + struct nfsd_attrs *, int, time64_t); int nfsd_mountpoint(struct dentry *, struct svc_export *); #ifdef CONFIG_NFSD_V4 __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, @@ -63,14 +67,14 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, + char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, + char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct iattr *iap); + struct svc_fh *resfhp, struct nfsd_attrs *iap); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); #ifdef CONFIG_NFSD_V4
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 93adc1e391a761441d783828b93979b38093d011 ]
The NFS protocol includes attributes when creating symlinks. Linux does store attributes for symlinks and allows them to be set, though they are not used for permission checking.
NFSD currently doesn't set standard (struct iattr) attributes when creating symlinks, but for NFSv4 it does set ACLs and security labels. This is inconsistent.
To improve consistency, pass the provided attributes into nfsd_symlink() and call nfsd_create_setattr() to set them.
NOTE: this results in a behaviour change for all NFS versions when the client sends non-default attributes with a SYMLINK request. With the Linux client, the only attributes are: attr.ia_mode = S_IFLNK | S_IRWXUGO; attr.ia_valid = ATTR_MODE; so the final outcome will be unchanged. Other clients might sent different attributes, and if they did they probably expect them to be honoured.
We ignore any error from nfsd_create_setattr(). It isn't really clear what should be done if a file is successfully created, but the attributes cannot be set. NFS doesn't allow partial success to be reported. Reporting failure is probably more misleading than reporting success, so the status is ignored.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 5 ++++- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfsproc.c | 5 ++++- fs/nfsd/vfs.c | 25 ++++++++++++++++++------- fs/nfsd/vfs.h | 5 +++-- 5 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 7b81d871f0d3c..394f6fb201974 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -397,6 +397,9 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) { struct nfsd3_symlinkargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + };
if (argp->tlen == 0) { resp->status = nfserr_inval; @@ -423,7 +426,7 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->ffh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname, - argp->flen, argp->tname, &resp->fh); + argp->flen, argp->tname, &attrs, &resp->fh); kfree(argp->tname); out: return rpc_success; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f8a157e4bc708..43387a8f10d06 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -813,7 +813,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, case NF4LNK: status = nfsd_symlink(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - create->cr_data, &resfh); + create->cr_data, &attrs, &resfh); break;
case NF4BLK: diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index f061f229d5ff0..180b84b6597b0 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -478,6 +478,9 @@ nfsd_proc_symlink(struct svc_rqst *rqstp) { struct nfsd_symlinkargs *argp = rqstp->rq_argp; struct nfsd_stat *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; struct svc_fh newfh;
if (argp->tlen > NFS_MAXPATHLEN) { @@ -499,7 +502,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp)
fh_init(&newfh, NFS_FHSIZE); resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen, - argp->tname, &newfh); + argp->tname, &attrs, &newfh);
kfree(argp->tname); fh_put(&argp->ffh); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index c86c3a8e42329..f5a1f41cddfff 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1463,15 +1463,25 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp) return 0; }
-/* - * Create a symlink and look up its inode +/** + * nfsd_symlink - Create a symlink and look up its inode + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of parent directory + * @fname: filename of the new symlink + * @flen: length of @fname + * @path: content of the new symlink (NUL-terminated) + * @attrs: requested attributes of new object + * @resfhp: NFS filehandle of new object + * * N.B. After this call _both_ fhp and resfhp need an fh_put + * + * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, - char *path, - struct svc_fh *resfhp) + char *fname, int flen, + char *path, struct nfsd_attrs *attrs, + struct svc_fh *resfhp) { struct dentry *dentry, *dnew; __be32 err, cerr; @@ -1501,13 +1511,14 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); + cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); + if (!err) + nfsd_create_setattr(rqstp, fhp, resfhp, attrs); fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_drop_write(fhp);
- cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); dput(dnew); if (err==0) err = cerr; out: diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index d8b1a36fca956..5047cec4c423c 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -114,8 +114,9 @@ __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 nfsd_readlink(struct svc_rqst *, struct svc_fh *, char *, int *); __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *, - char *name, int len, char *path, - struct svc_fh *res); + char *name, int len, char *path, + struct nfsd_attrs *attrs, + struct svc_fh *res); __be32 nfsd_link(struct svc_rqst *, struct svc_fh *, char *, int, struct svc_fh *); ssize_t nfsd_copy_file_range(struct file *, u64,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit d6a97d3f589a3a46a16183e03f3774daee251317 ]
nfsd_setattr() now sets a security label if provided, and nfsv4 provides it in the 'open' and 'create' paths and the 'setattr' path. If setting the label failed (including because the kernel doesn't support labels), an error field in 'struct nfsd_attrs' is set, and the caller can respond. The open/create callers clear FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case. The setattr caller returns the error.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 49 +++++++++------------------------------------- fs/nfsd/vfs.c | 29 +++------------------------ fs/nfsd/vfs.h | 5 +++-- 3 files changed, 15 insertions(+), 68 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 43387a8f10d06..cfcc463968b70 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -64,36 +64,6 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, "idle msecs before unmount export from source server"); #endif
-#ifdef CONFIG_NFSD_V4_SECURITY_LABEL -#include <linux/security.h> - -static inline void -nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) -{ - struct inode *inode = d_inode(resfh->fh_dentry); - int status; - - inode_lock(inode); - status = security_inode_setsecctx(resfh->fh_dentry, - label->data, label->len); - inode_unlock(inode); - - if (status) - /* - * XXX: We should really fail the whole open, but we may - * already have created a new file, so it may be too - * late. For now this seems the least of evils: - */ - bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - - return; -} -#else -static inline void -nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) -{ } -#endif - #define NFSDDBG_FACILITY NFSDDBG_PROC
static u32 nfsd_attrmask[] = { @@ -288,6 +258,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap = &open->op_iattr; struct nfsd_attrs attrs = { .na_iattr = iap, + .na_seclabel = &open->op_label, }; struct dentry *parent, *child; __u32 v_mtime, v_atime; @@ -409,6 +380,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, set_attr: status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
+ if (attrs.na_labelerr) + open->op_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; out: fh_unlock(fhp); if (child && !IS_ERR(child)) @@ -450,9 +423,6 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru status = nfsd4_create_file(rqstp, current_fh, *resfh, open); current->fs->umask = 0;
- if (!status && open->op_label.len) - nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval); - /* * Following rfc 3530 14.2.16, and rfc 5661 18.16.4 * use the returned bitmask to indicate which attributes @@ -792,6 +762,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_create *create = &u->create; struct nfsd_attrs attrs = { .na_iattr = &create->cr_iattr, + .na_seclabel = &create->cr_label, }; struct svc_fh resfh; __be32 status; @@ -864,8 +835,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out;
- if (create->cr_label.len) - nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval); + if (attrs.na_labelerr) + create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL;
if (create->cr_acl != NULL) do_set_nfs4_acl(rqstp, &resfh, create->cr_acl, @@ -1150,6 +1121,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_setattr *setattr = &u->setattr; struct nfsd_attrs attrs = { .na_iattr = &setattr->sa_iattr, + .na_seclabel = &setattr->sa_label, }; __be32 status = nfs_ok; int err; @@ -1178,13 +1150,10 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, setattr->sa_acl); if (status) goto out; - if (setattr->sa_label.len) - status = nfsd4_set_nfs4_label(rqstp, &cstate->current_fh, - &setattr->sa_label); - if (status) - goto out; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); + if (!status) + status = nfserrno(attrs.na_labelerr); out: fh_drop_write(&cstate->current_fh); return status; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index f5a1f41cddfff..01c431ed90ecf 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -471,6 +471,9 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, host_err = notify_change(dentry, iap, NULL);
out_unlock: + if (attr->na_seclabel && attr->na_seclabel->len) + attr->na_labelerr = security_inode_setsecctx(dentry, + attr->na_seclabel->data, attr->na_seclabel->len); fh_unlock(fhp); if (size_change) put_write_access(inode); @@ -508,32 +511,6 @@ int nfsd4_is_junction(struct dentry *dentry) return 0; return 1; } -#ifdef CONFIG_NFSD_V4_SECURITY_LABEL -__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct xdr_netobj *label) -{ - __be32 error; - int host_error; - struct dentry *dentry; - - error = fh_verify(rqstp, fhp, 0 /* S_IFREG */, NFSD_MAY_SATTR); - if (error) - return error; - - dentry = fhp->fh_dentry; - - inode_lock(d_inode(dentry)); - host_error = security_inode_setsecctx(dentry, label->data, label->len); - inode_unlock(d_inode(dentry)); - return nfserrno(host_error); -} -#else -__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct xdr_netobj *label) -{ - return nfserr_notsupp; -} -#endif
static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp) { diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 5047cec4c423c..d5d4cfe37c933 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -44,6 +44,9 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); /* nfsd/vfs.c */ struct nfsd_attrs { struct iattr *na_iattr; /* input */ + struct xdr_netobj *na_seclabel; /* input */ + + int na_labelerr; /* output */ };
int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, @@ -57,8 +60,6 @@ __be32 nfsd_setattr(struct svc_rqst *, struct svc_fh *, struct nfsd_attrs *, int, time64_t); int nfsd_mountpoint(struct dentry *, struct svc_export *); #ifdef CONFIG_NFSD_V4 -__be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, - struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit c0cbe70742f4a70893cd6e5f6b10b6e89b6db95b ]
pacl and dpacl pointers are added to struct nfsd_attrs, which requires that we have an nfsd_attrs_free() function to free them. Those nfsv4 functions that can set ACLs now set up these pointers based on the passed in NFSv4 ACL.
nfsd_setattr() sets the acls as appropriate.
Errors are handled as with security labels.
Signed-off-by: NeilBrown neilb@suse.de [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/acl.h | 6 ++++-- fs/nfsd/nfs4acl.c | 45 +++++++-------------------------------------- fs/nfsd/nfs4proc.c | 46 ++++++++++++++++------------------------------ fs/nfsd/vfs.c | 7 +++++++ fs/nfsd/vfs.h | 11 +++++++++++ 5 files changed, 45 insertions(+), 70 deletions(-)
diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h index ba14d2f4b64f4..4b7324458a94e 100644 --- a/fs/nfsd/acl.h +++ b/fs/nfsd/acl.h @@ -38,6 +38,8 @@ struct nfs4_acl; struct svc_fh; struct svc_rqst; +struct nfsd_attrs; +enum nfs_ftype4;
int nfs4_acl_bytes(int entries); int nfs4_acl_get_whotype(char *, u32); @@ -45,7 +47,7 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct nfs4_acl **acl); -__be32 nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl); +__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, + struct nfsd_attrs *attr);
#endif /* LINUX_NFS4_ACL_H */ diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c index 71292a0d6f092..bb8e2f6d7d03c 100644 --- a/fs/nfsd/nfs4acl.c +++ b/fs/nfsd/nfs4acl.c @@ -751,57 +751,26 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl, return ret; }
-__be32 -nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl) +__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, + struct nfsd_attrs *attr) { - __be32 error; int host_error; - struct dentry *dentry; - struct inode *inode; - struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0;
- /* Get inode */ - error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR); - if (error) - return error; - - dentry = fhp->fh_dentry; - inode = d_inode(dentry); + if (!acl) + return nfs_ok;
- if (S_ISDIR(inode->i_mode)) + if (type == NF4DIR) flags = NFS4_ACL_DIR;
- host_error = nfs4_acl_nfsv4_to_posix(acl, &pacl, &dpacl, flags); + host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl, + &attr->na_dpacl, flags); if (host_error == -EINVAL) return nfserr_attrnotsupp; - if (host_error < 0) - goto out_nfserr; - - fh_lock(fhp); - - host_error = set_posix_acl(inode, ACL_TYPE_ACCESS, pacl); - if (host_error < 0) - goto out_drop_lock; - - if (S_ISDIR(inode->i_mode)) { - host_error = set_posix_acl(inode, ACL_TYPE_DEFAULT, dpacl); - } - -out_drop_lock: - fh_unlock(fhp); - - posix_acl_release(pacl); - posix_acl_release(dpacl); -out_nfserr: - if (host_error == -EOPNOTSUPP) - return nfserr_attrnotsupp; else return nfserrno(host_error); }
- static short ace2type(struct nfs4_ace *ace) { diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cfcc463968b70..507a2aa967133 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -128,26 +128,6 @@ is_create_with_attrs(struct nfsd4_open *open) || open->op_createmode == NFS4_CREATE_EXCLUSIVE4_1); }
-/* - * if error occurs when setting the acl, just clear the acl bit - * in the returned attr bitmap. - */ -static void -do_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl, u32 *bmval) -{ - __be32 status; - - status = nfsd4_set_nfs4_acl(rqstp, fhp, acl); - if (status) - /* - * We should probably fail the whole open at this point, - * but we've already created the file, so it's too late; - * So this seems the least of evils: - */ - bmval[0] &= ~FATTR4_WORD0_ACL; -} - static inline void fh_dup2(struct svc_fh *dst, struct svc_fh *src) { @@ -281,6 +261,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err);
+ if (is_create_with_attrs(open)) + nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs); + fh_lock_nested(fhp, I_MUTEX_PARENT);
child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); @@ -382,8 +365,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (attrs.na_labelerr) open->op_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; + if (attrs.na_aclerr) + open->op_bmval[0] &= ~FATTR4_WORD0_ACL; out: fh_unlock(fhp); + nfsd_attrs_free(&attrs); if (child && !IS_ERR(child)) dput(child); fh_drop_write(fhp); @@ -446,9 +432,6 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru if (status) goto out;
- if (is_create_with_attrs(open) && open->op_acl != NULL) - do_set_nfs4_acl(rqstp, *resfh, open->op_acl, open->op_bmval); - nfsd4_set_open_owner_reply_cache(cstate, open, *resfh); accmode = NFSD_MAY_NOP; if (open->op_created || @@ -779,6 +762,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) return status;
+ status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs); current->fs->umask = create->cr_umask; switch (create->cr_type) { case NF4LNK: @@ -837,10 +821,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (attrs.na_labelerr) create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - - if (create->cr_acl != NULL) - do_set_nfs4_acl(rqstp, &resfh, create->cr_acl, - create->cr_bmval); + if (attrs.na_aclerr) + create->cr_bmval[0] &= ~FATTR4_WORD0_ACL;
fh_unlock(&cstate->current_fh); set_change_info(&create->cr_cinfo, &cstate->current_fh); @@ -849,6 +831,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fh_put(&resfh); out_umask: current->fs->umask = 0; + nfsd_attrs_free(&attrs); return status; }
@@ -1123,6 +1106,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, .na_iattr = &setattr->sa_iattr, .na_seclabel = &setattr->sa_label, }; + struct inode *inode; __be32 status = nfs_ok; int err;
@@ -1145,9 +1129,10 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out;
- if (setattr->sa_acl != NULL) - status = nfsd4_set_nfs4_acl(rqstp, &cstate->current_fh, - setattr->sa_acl); + inode = cstate->current_fh.fh_dentry->d_inode; + status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG, + setattr->sa_acl, &attrs); + if (status) goto out; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, @@ -1155,6 +1140,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!status) status = nfserrno(attrs.na_labelerr); out: + nfsd_attrs_free(&attrs); fh_drop_write(&cstate->current_fh); return status; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 01c431ed90ecf..bed542ba8ad8e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -474,6 +474,13 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len); + if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && attr->na_pacl) + attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_ACCESS, + attr->na_pacl); + if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && + !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) + attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, + attr->na_dpacl); fh_unlock(fhp); if (size_change) put_write_access(inode); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index d5d4cfe37c933..c95cd414b4bb0 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -6,6 +6,8 @@ #ifndef LINUX_NFSD_VFS_H #define LINUX_NFSD_VFS_H
+#include <linux/fs.h> +#include <linux/posix_acl.h> #include "nfsfh.h" #include "nfsd.h"
@@ -45,10 +47,19 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); struct nfsd_attrs { struct iattr *na_iattr; /* input */ struct xdr_netobj *na_seclabel; /* input */ + struct posix_acl *na_pacl; /* input */ + struct posix_acl *na_dpacl; /* input */
int na_labelerr; /* output */ + int na_aclerr; /* output */ };
+static inline void nfsd_attrs_free(struct nfsd_attrs *attrs) +{ + posix_acl_release(attrs->na_pacl); + posix_acl_release(attrs->na_dpacl); +} + int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 927bfc5600cd6333c9ef9f090f19e66b7d4c8ee1 ]
nfsd_create() usually returns with the directory still locked. nfsd_symlink() usually returns with it unlocked. This is clumsy.
Until recently nfsd_create() needed to keep the directory locked until ACLs and security label had been set. These are now set inside nfsd_create() (in nfsd_setattr()) so this need is gone.
So change nfsd_create() and nfsd_symlink() to always unlock, and remove any fh_unlock() calls that follow calls to these functions.
Signed-off-by: NeilBrown neilb@suse.de [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 2 -- fs/nfsd/nfs4proc.c | 2 -- fs/nfsd/vfs.c | 38 +++++++++++++++++++++----------------- 3 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 394f6fb201974..c71f0c359f7ae 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -388,7 +388,6 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, &attrs, S_IFDIR, 0, &resp->fh); - fh_unlock(&resp->dirfh); return rpc_success; }
@@ -469,7 +468,6 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) type = nfs3_ftypes[argp->ftype]; resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, &attrs, type, rdev, &resp->fh); - fh_unlock(&resp->dirfh); out: return rpc_success; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 507a2aa967133..0396fa2c1cfe7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -823,8 +823,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; if (attrs.na_aclerr) create->cr_bmval[0] &= ~FATTR4_WORD0_ACL; - - fh_unlock(&cstate->current_fh); set_change_info(&create->cr_cinfo, &cstate->current_fh); fh_dup2(&cstate->current_fh, &resfh); out: diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index bed542ba8ad8e..3d794533bbd4b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1395,8 +1395,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, fh_lock_nested(fhp, I_MUTEX_PARENT); dchild = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(dchild); - if (IS_ERR(dchild)) - return nfserrno(host_err); + if (IS_ERR(dchild)) { + err = nfserrno(host_err); + goto out_unlock; + } err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); /* * We unconditionally drop our ref to dchild as fh_compose will have @@ -1404,9 +1406,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ dput(dchild); if (err) - return err; - return nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, - rdev, resfhp); + goto out_unlock; + err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, + rdev, resfhp); +out_unlock: + fh_unlock(fhp); + return err; }
/* @@ -1483,16 +1488,19 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out;
host_err = fh_want_write(fhp); - if (host_err) - goto out_nfserr; + if (host_err) { + err = nfserrno(host_err); + goto out; + }
fh_lock(fhp); dentry = fhp->fh_dentry; dnew = lookup_one_len(fname, dentry, flen); - host_err = PTR_ERR(dnew); - if (IS_ERR(dnew)) - goto out_nfserr; - + if (IS_ERR(dnew)) { + err = nfserrno(PTR_ERR(dnew)); + fh_unlock(fhp); + goto out_drop_write; + } host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); @@ -1501,16 +1509,12 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_drop_write(fhp); - dput(dnew); if (err==0) err = cerr; +out_drop_write: + fh_drop_write(fhp); out: return err; - -out_nfserr: - err = nfserrno(host_err); - goto out; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit b677c0c63a135a916493c064906582e9f3ed4802 ]
Some error paths in nfsd_unlink() allow it to exit without unlocking the directory. This is not a problem in practice as the directory will be locked with an fh_put(), but it is untidy and potentially confusing.
This allows us to remove all the fh_unlock() calls that are immediately after nfsd_unlink() calls.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 2 -- fs/nfsd/nfs4proc.c | 4 +--- fs/nfsd/vfs.c | 7 +++++-- 3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index c71f0c359f7ae..d78dbb214c5c3 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -490,7 +490,6 @@ nfsd3_proc_remove(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, -S_IFDIR, argp->name, argp->len); - fh_unlock(&resp->fh); return rpc_success; }
@@ -511,7 +510,6 @@ nfsd3_proc_rmdir(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, S_IFDIR, argp->name, argp->len); - fh_unlock(&resp->fh); return rpc_success; }
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0396fa2c1cfe7..f588e592a0703 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1002,10 +1002,8 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_grace; status = nfsd_unlink(rqstp, &cstate->current_fh, 0, remove->rm_name, remove->rm_namelen); - if (!status) { - fh_unlock(&cstate->current_fh); + if (!status) set_change_info(&remove->rm_cinfo, &cstate->current_fh); - } return status; }
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3d794533bbd4b..6b8e1826956bf 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1767,12 +1767,12 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); if (IS_ERR(rdentry)) - goto out_drop_write; + goto out_unlock;
if (d_really_is_negative(rdentry)) { dput(rdentry); host_err = -ENOENT; - goto out_drop_write; + goto out_unlock; } rinode = d_inode(rdentry); ihold(rinode); @@ -1810,6 +1810,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, } out: return err; +out_unlock: + fh_unlock(fhp); + goto out_drop_write; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit e18bcb33bc5b69bccc2b532075aa00bb49cc01c5 ]
On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe because fh_unlock() records that the unlock has been done and doesn't repeat it. However it makes the code a little confusing and interferes with changes that are planned for directory locking.
So rearrange the code to ensure fh_unlock() is called exactly once if fh_lock() was called.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 6b8e1826956bf..dd945099ae6b5 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1557,9 +1557,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, dirp = d_inode(ddir);
dnew = lookup_one_len(name, ddir, len); - host_err = PTR_ERR(dnew); - if (IS_ERR(dnew)) - goto out_nfserr; + if (IS_ERR(dnew)) { + err = nfserrno(PTR_ERR(dnew)); + goto out_unlock; + }
dold = tfhp->fh_dentry;
@@ -1578,17 +1579,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, else err = nfserrno(host_err); } -out_dput: dput(dnew); -out_unlock: - fh_unlock(ffhp); +out_drop_write: fh_drop_write(tfhp); out: return err;
-out_nfserr: - err = nfserrno(host_err); - goto out_unlock; +out_dput: + dput(dnew); +out_unlock: + fh_unlock(ffhp); + goto out_drop_write; }
static void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 19d008b46941b8c668402170522e0f7a9258409c ]
nfsd_lookup() takes an exclusive lock on the parent inode, but no callers want the lock and it may not be needed at all if the result is in the dcache.
Change nfsd_lookup_dentry() to not take the lock, and call lookup_one_len_locked() which takes lock only if needed.
nfsd4_open() currently expects the lock to still be held, but that isn't necessary as nfsd_validate_delegated_dentry() provides required guarantees without the lock.
NOTE: NFSv4 requires directory changeinfo for OPEN even when a create wasn't requested and no change happened. Now that nfsd_lookup() doesn't use fh_lock(), we need to explicitly fill the attributes when no create happens. A new fh_fill_both_attrs() is provided for that task.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 20 ++++++++++++-------- fs/nfsd/nfs4state.c | 3 --- fs/nfsd/nfsfh.c | 19 +++++++++++++++++++ fs/nfsd/nfsfh.h | 2 +- fs/nfsd/vfs.c | 34 ++++++++++++++-------------------- 5 files changed, 46 insertions(+), 32 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f588e592a0703..9cf4298817c4b 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -302,6 +302,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (d_really_is_positive(child)) { status = nfs_ok;
+ /* NFSv4 protocol requires change attributes even though + * no change happened. + */ + fh_fill_both_attrs(fhp); + switch (open->op_createmode) { case NFS4_CREATE_UNCHECKED: if (!d_is_reg(child)) @@ -417,15 +422,15 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0) open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_MODIFY); - } else - /* - * Note this may exit with the parent still locked. - * We will hold the lock until nfsd4_open's final - * lookup, to prevent renames or unlinks until we've had - * a chance to an acquire a delegation if appropriate. - */ + } else { status = nfsd_lookup(rqstp, current_fh, open->op_fname, open->op_fnamelen, *resfh); + if (!status) + /* NFSv4 protocol requires change attributes even though + * no change happened. + */ + fh_fill_both_attrs(current_fh); + } if (status) goto out; status = nfsd_check_obj_isreg(*resfh); @@ -1043,7 +1048,6 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &exp, &dentry); if (err) return err; - fh_unlock(&cstate->current_fh); if (d_really_is_negative(dentry)) { exp_put(exp); err = nfserr_noent; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2b6c9f1b9de88..e44d9c8d5065a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5321,9 +5321,6 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, struct dentry *child; __be32 err;
- /* parent may already be locked, and it may get unlocked by - * this call, but that is safe. - */ err = nfsd_lookup_dentry(open->op_rqstp, parent, open->op_fname, open->op_fnamelen, &exp, &child); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index d4ae838948ba5..cc680deecafa7 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -670,6 +670,25 @@ void fh_fill_post_attrs(struct svc_fh *fhp) nfsd4_change_attribute(&fhp->fh_post_attr, inode); }
+/** + * fh_fill_both_attrs - Fill pre-op and post-op attributes + * @fhp: file handle to be updated + * + * This is used when the directory wasn't changed, but wcc attributes + * are needed anyway. + */ +void fh_fill_both_attrs(struct svc_fh *fhp) +{ + fh_fill_post_attrs(fhp); + if (!fhp->fh_post_saved) + return; + fhp->fh_pre_change = fhp->fh_post_change; + fhp->fh_pre_mtime = fhp->fh_post_attr.mtime; + fhp->fh_pre_ctime = fhp->fh_post_attr.ctime; + fhp->fh_pre_size = fhp->fh_post_attr.size; + fhp->fh_pre_saved = true; +} + /* * Release a file handle. */ diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index fb9d358a267e5..28a4f9a94e2c8 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -322,7 +322,7 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat,
extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); - +extern void fh_fill_both_attrs(struct svc_fh *fhp);
/* * Lock a file handle/inode diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index dd945099ae6b5..03f6dd2ec653b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -198,27 +198,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out_nfserr; } } else { - /* - * In the nfsd4_open() case, this may be held across - * subsequent open and delegation acquisition which may - * need to take the child's i_mutex: - */ - fh_lock_nested(fhp, I_MUTEX_PARENT); - dentry = lookup_one_len(name, dparent, len); + dentry = lookup_one_len_unlocked(name, dparent, len); host_err = PTR_ERR(dentry); if (IS_ERR(dentry)) goto out_nfserr; if (nfsd_mountpoint(dentry, exp)) { - /* - * We don't need the i_mutex after all. It's - * still possible we could open this (regular - * files can be mountpoints too), but the - * i_mutex is just there to prevent renames of - * something that we might be about to delegate, - * and a mountpoint won't be renamed: - */ - fh_unlock(fhp); - if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) { + host_err = nfsd_cross_mnt(rqstp, &dentry, &exp); + if (host_err) { dput(dentry); goto out_nfserr; } @@ -233,7 +219,15 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); }
-/* +/** + * nfsd_lookup - look up a single path component for nfsd + * + * @rqstp: the request context + * @fhp: the file handle of the directory + * @name: the component name, or %NULL to look up parent + * @len: length of name to examine + * @resfh: pointer to pre-initialised filehandle to hold result. + * * Look up one component of a pathname. * N.B. After this call _both_ fhp and resfh need an fh_put * @@ -243,11 +237,11 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 - * NeilBrown neilb@cse.unsw.edu.au + * */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, - unsigned int len, struct svc_fh *resfh) + unsigned int len, struct svc_fh *resfh) { struct svc_export *exp; struct dentry *dentry;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit debf16f0c671cb8db154a9ebcd6014cfff683b80 ]
When creating or unlinking a name in a directory use explicit inode_lock_nested() instead of fh_lock(), and explicit calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done for renames, with lock_rename() as the explicit locking.
Also move the 'fill' calls closer to the operation that might change the attributes. This way they are avoided on some error paths.
For the v2-only code in nfsproc.c, the fill calls are not replaced as they aren't needed.
Making the locking explicit will simplify proposed future changes to locking for directories. It also makes it easily visible exactly where pre/post attributes are used - not all callers of fh_lock() actually need the pre/post attributes.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 6 ++++-- fs/nfsd/nfs4proc.c | 6 ++++-- fs/nfsd/nfsproc.c | 5 ++--- fs/nfsd/vfs.c | 36 ++++++++++++++++++++++++++---------- 4 files changed, 36 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d78dbb214c5c3..1a52f0b06ec32 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -260,7 +260,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err);
- fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(inode, I_MUTEX_PARENT);
child = lookup_one_len(argp->name, parent, argp->len); if (IS_ERR(child)) { @@ -318,11 +318,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask();
+ fh_fill_pre_attrs(fhp); host_err = vfs_create(inode, child, iap->ia_mode, true); if (host_err < 0) { status = nfserrno(host_err); goto out; } + fh_fill_post_attrs(fhp);
/* A newly created file already has a file size of zero. */ if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) @@ -340,7 +342,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out: - fh_unlock(fhp); + inode_unlock(inode); if (child && !IS_ERR(child)) dput(child); fh_drop_write(fhp); diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9cf4298817c4b..193b84a0f3a59 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -264,7 +264,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (is_create_with_attrs(open)) nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
- fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(inode, I_MUTEX_PARENT);
child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); if (IS_ERR(child)) { @@ -348,10 +348,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask();
+ fh_fill_pre_attrs(fhp); status = nfsd4_vfs_create(fhp, child, open); if (status != nfs_ok) goto out; open->op_created = true; + fh_fill_post_attrs(fhp);
/* A newly created file already has a file size of zero. */ if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) @@ -373,7 +375,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (attrs.na_aclerr) open->op_bmval[0] &= ~FATTR4_WORD0_ACL; out: - fh_unlock(fhp); + inode_unlock(inode); nfsd_attrs_free(&attrs); if (child && !IS_ERR(child)) dput(child); diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 180b84b6597b0..e533550a26db5 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -291,7 +291,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) goto done; }
- fh_lock_nested(dirfhp, I_MUTEX_PARENT); + inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT); dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len); if (IS_ERR(dchild)) { resp->status = nfserrno(PTR_ERR(dchild)); @@ -407,8 +407,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) }
out_unlock: - /* We don't really need to unlock, as fh_put does it. */ - fh_unlock(dirfhp); + inode_unlock(dirfhp->fh_dentry->d_inode); fh_drop_write(dirfhp); done: fh_put(dirfhp); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 03f6dd2ec653b..3364e562b00e5 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1386,7 +1386,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err);
- fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); dchild = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) { @@ -1401,10 +1401,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, dput(dchild); if (err) goto out_unlock; + fh_fill_pre_attrs(fhp); err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, rdev, resfhp); + fh_fill_post_attrs(fhp); out_unlock: - fh_unlock(fhp); + inode_unlock(dentry->d_inode); return err; }
@@ -1487,20 +1489,22 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; }
- fh_lock(fhp); dentry = fhp->fh_dentry; + inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); dnew = lookup_one_len(fname, dentry, flen); if (IS_ERR(dnew)) { err = nfserrno(PTR_ERR(dnew)); - fh_unlock(fhp); + inode_unlock(dentry->d_inode); goto out_drop_write; } + fh_fill_pre_attrs(fhp); host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); if (!err) nfsd_create_setattr(rqstp, fhp, resfhp, attrs); - fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(dentry->d_inode); if (!err) err = nfserrno(commit_metadata(fhp)); dput(dnew); @@ -1546,9 +1550,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, goto out; }
- fh_lock_nested(ffhp, I_MUTEX_PARENT); ddir = ffhp->fh_dentry; dirp = d_inode(ddir); + inode_lock_nested(dirp, I_MUTEX_PARENT);
dnew = lookup_one_len(name, ddir, len); if (IS_ERR(dnew)) { @@ -1561,8 +1565,18 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; +<<<<<<< current host_err = vfs_link(dold, dirp, dnew, NULL); fh_unlock(ffhp); +||||||| constructed merge base + host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + fh_unlock(ffhp); +======= + fh_fill_pre_attrs(ffhp); + host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + fh_fill_post_attrs(ffhp); + inode_unlock(dirp); +>>>>>>> patched if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -1582,7 +1596,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, out_dput: dput(dnew); out_unlock: - fh_unlock(ffhp); + inode_unlock(dirp); goto out_drop_write; }
@@ -1755,9 +1769,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (host_err) goto out_nfserr;
- fh_lock_nested(fhp, I_MUTEX_PARENT); dentry = fhp->fh_dentry; dirp = d_inode(dentry); + inode_lock_nested(dirp, I_MUTEX_PARENT);
rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); @@ -1775,6 +1789,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (!type) type = d_inode(rdentry)->i_mode & S_IFMT;
+ fh_fill_pre_attrs(fhp); if (type != S_IFDIR) { if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) nfsd_close_cached_files(rdentry); @@ -1782,8 +1797,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, } else { host_err = vfs_rmdir(dirp, rdentry); } + fh_fill_post_attrs(fhp);
- fh_unlock(fhp); + inode_unlock(dirp); if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); @@ -1806,7 +1822,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, out: return err; out_unlock: - fh_unlock(fhp); + inode_unlock(dirp); goto out_drop_write; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit bb4d53d66e4b8c8b8e5634802262e53851a2d2db ]
When locking a file to access ACLs and xattrs etc, use explicit locking with inode_lock() instead of fh_lock(). This means that the calls to fh_fill_pre/post_attr() are also explicit which improves readability and allows us to place them only where they are needed. Only the xattr calls need pre/post information.
When locking a file we don't need I_MUTEX_PARENT as the file is not a parent of anything, so we can use inode_lock() directly rather than the inode_lock_nested() call that fh_lock() uses.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 6 +++--- fs/nfsd/nfs3acl.c | 4 ++-- fs/nfsd/nfs4state.c | 9 +++++---- fs/nfsd/vfs.c | 44 +++++++++++++++++++++----------------------- 4 files changed, 31 insertions(+), 32 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 03703b22c81ef..f7d166f056afa 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno;
- fh_lock(fh); + inode_lock(inode);
error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -120,7 +120,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_drop_lock;
- fh_unlock(fh); + inode_unlock(inode);
fh_drop_write(fh);
@@ -134,7 +134,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) return rpc_success;
out_drop_lock: - fh_unlock(fh); + inode_unlock(inode); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 350fae92ae045..15bee0339c764 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno;
- fh_lock(fh); + inode_lock(inode);
error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -109,7 +109,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) error = set_posix_acl(inode, ACL_TYPE_DEFAULT, argp->acl_default);
out_drop_lock: - fh_unlock(fh); + inode_unlock(inode); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e44d9c8d5065a..7a4baa41b5362 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7429,21 +7429,22 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock) { struct nfsd_file *nf; + struct inode *inode; __be32 err;
err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; - fh_lock(fhp); /* to block new leases till after test_lock: */ - err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode, - NFSD_MAY_READ)); + inode = fhp->fh_dentry->d_inode; + inode_lock(inode); /* to block new leases till after test_lock: */ + err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ)); if (err) goto out; lock->fl_file = nf->nf_file; err = nfserrno(vfs_test_lock(nf->nf_file, lock)); lock->fl_file = NULL; out: - fh_unlock(fhp); + inode_unlock(inode); nfsd_file_put(nf); return err; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3364e562b00e5..504a3ddfaf75b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -429,7 +429,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, return err; }
- fh_lock(fhp); + inode_lock(inode); if (size_change) { /* * RFC5661, Section 18.30.4: @@ -475,7 +475,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, attr->na_dpacl); - fh_unlock(fhp); + inode_unlock(inode); if (size_change) put_write_access(inode); out: @@ -1565,18 +1565,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; -<<<<<<< current - host_err = vfs_link(dold, dirp, dnew, NULL); - fh_unlock(ffhp); -||||||| constructed merge base - host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); - fh_unlock(ffhp); -======= fh_fill_pre_attrs(ffhp); - host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + host_err = vfs_link(dold, dirp, dnew, NULL); fh_fill_post_attrs(ffhp); inode_unlock(dirp); ->>>>>>> patched if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -2177,13 +2169,16 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp, return err; }
-/* - * Removexattr and setxattr need to call fh_lock to both lock the inode - * and set the change attribute. Since the top-level vfs_removexattr - * and vfs_setxattr calls already do their own inode_lock calls, call - * the _locked variant. Pass in a NULL pointer for delegated_inode, - * and let the client deal with NFS4ERR_DELAY (same as with e.g. - * setattr and remove). +/** + * nfsd_removexattr - Remove an extended attribute + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of object with xattr to remove + * @name: name of xattr to remove (NUL-terminate) + * + * Pass in a NULL pointer for delegated_inode, and let the client deal + * with NFS4ERR_DELAY (same as with e.g. setattr and remove). + * + * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) @@ -2199,11 +2194,13 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) if (ret) return nfserrno(ret);
- fh_lock(fhp); + inode_lock(fhp->fh_dentry->d_inode); + fh_fill_pre_attrs(fhp);
ret = __vfs_removexattr_locked(fhp->fh_dentry, name, NULL);
- fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(fhp->fh_dentry->d_inode); fh_drop_write(fhp);
return nfsd_xattr_errno(ret); @@ -2223,12 +2220,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, ret = fh_want_write(fhp); if (ret) return nfserrno(ret); - fh_lock(fhp); + inode_lock(fhp->fh_dentry->d_inode); + fh_fill_pre_attrs(fhp);
ret = __vfs_setxattr_locked(fhp->fh_dentry, name, buf, len, flags, NULL); - - fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(fhp->fh_dentry->d_inode); fh_drop_write(fhp);
return nfsd_xattr_errno(ret);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit dd8dd403d7b223cc77ee89d8d09caf045e90e648 ]
As all inode locking is now fully balanced, fh_put() does not need to call fh_unlock(). fh_lock() and fh_unlock() are no longer used, so discard them. These are the only real users of ->fh_locked, so discard that too.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.c | 3 +-- fs/nfsd/nfsfh.h | 56 ++++--------------------------------------------- fs/nfsd/vfs.c | 17 +-------------- 3 files changed, 6 insertions(+), 70 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index cc680deecafa7..db8d62632a5be 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -547,7 +547,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, if (ref_fh == fhp) fh_put(ref_fh);
- if (fhp->fh_locked || fhp->fh_dentry) { + if (fhp->fh_dentry) { printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n", dentry); } @@ -698,7 +698,6 @@ fh_put(struct svc_fh *fhp) struct dentry * dentry = fhp->fh_dentry; struct svc_export * exp = fhp->fh_export; if (dentry) { - fh_unlock(fhp); fhp->fh_dentry = NULL; dput(dentry); fh_clear_pre_post_attrs(fhp); diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 28a4f9a94e2c8..c3ae6414fc5cf 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -81,7 +81,6 @@ typedef struct svc_fh { struct dentry * fh_dentry; /* validated dentry */ struct svc_export * fh_export; /* export pointer */
- bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ bool fh_no_wcc; /* no wcc data needed */ bool fh_no_atomic_attr; @@ -93,7 +92,7 @@ typedef struct svc_fh { bool fh_post_saved; /* post-op attrs saved */ bool fh_pre_saved; /* pre-op attrs saved */
- /* Pre-op attributes saved during fh_lock */ + /* Pre-op attributes saved when inode is locked */ __u64 fh_pre_size; /* size before operation */ struct timespec64 fh_pre_mtime; /* mtime before oper */ struct timespec64 fh_pre_ctime; /* ctime before oper */ @@ -103,7 +102,7 @@ typedef struct svc_fh { */ u64 fh_pre_change;
- /* Post-op attributes saved in fh_unlock */ + /* Post-op attributes saved in fh_fill_post_attrs() */ struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ } svc_fh; @@ -223,8 +222,8 @@ void fh_put(struct svc_fh *); static __inline__ struct svc_fh * fh_copy(struct svc_fh *dst, struct svc_fh *src) { - WARN_ON(src->fh_dentry || src->fh_locked); - + WARN_ON(src->fh_dentry); + *dst = *src; return dst; } @@ -323,51 +322,4 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); extern void fh_fill_both_attrs(struct svc_fh *fhp); - -/* - * Lock a file handle/inode - * NOTE: both fh_lock and fh_unlock are done "by hand" in - * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once - * so, any changes here should be reflected there. - */ - -static inline void -fh_lock_nested(struct svc_fh *fhp, unsigned int subclass) -{ - struct dentry *dentry = fhp->fh_dentry; - struct inode *inode; - - BUG_ON(!dentry); - - if (fhp->fh_locked) { - printk(KERN_WARNING "fh_lock: %pd2 already locked!\n", - dentry); - return; - } - - inode = d_inode(dentry); - inode_lock_nested(inode, subclass); - fh_fill_pre_attrs(fhp); - fhp->fh_locked = true; -} - -static inline void -fh_lock(struct svc_fh *fhp) -{ - fh_lock_nested(fhp, I_MUTEX_NORMAL); -} - -/* - * Unlock a file handle/inode - */ -static inline void -fh_unlock(struct svc_fh *fhp) -{ - if (fhp->fh_locked) { - fh_fill_post_attrs(fhp); - inode_unlock(d_inode(fhp->fh_dentry)); - fhp->fh_locked = false; - } -} - #endif /* _LINUX_NFSD_NFSFH_H */ diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 504a3ddfaf75b..5a7fee4ee2079 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1282,13 +1282,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, dirp = d_inode(dentry);
dchild = dget(resfhp->fh_dentry); - if (!fhp->fh_locked) { - WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n", - dentry); - err = nfserr_io; - goto out; - } - err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE); if (err) goto out; @@ -1656,10 +1649,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, goto out; }
- /* cannot use fh_lock as we need deadlock protective ordering - * so do it by hand */ trap = lock_rename(tdentry, fdentry); - ffhp->fh_locked = tfhp->fh_locked = true; fh_fill_pre_attrs(ffhp); fh_fill_pre_attrs(tfhp);
@@ -1707,17 +1697,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, dput(odentry); out_nfserr: err = nfserrno(host_err); - /* - * We cannot rely on fh_unlock on the two filehandles, - * as that would do the wrong thing if the two directories - * were the same, so again we do it by hand. - */ + if (!close_cached) { fh_fill_post_attrs(ffhp); fh_fill_post_attrs(tfhp); } unlock_rename(tdentry, fdentry); - ffhp->fh_locked = tfhp->fh_locked = false; fh_drop_write(ffhp);
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 6930bcbfb6ceda63e298c6af6d733ecdf6bd4cde ]
lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem.
The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow.
Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak j.kasiak@gmail.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # 5.14+ Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 8 ++++++++ fs/lockd/xdr4.c | 19 ++----------------- include/linux/lockd/xdr.h | 2 ++ 3 files changed, 12 insertions(+), 17 deletions(-)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 4f247ab8be611..bf274f23969b3 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -32,6 +32,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, if (!nlmsvc_ops) return nlm_lck_denied_nolocks;
+ if (lock->lock_start > OFFSET_MAX || + (lock->lock_len && ((lock->lock_len - 1) > (OFFSET_MAX - lock->lock_start)))) + return nlm4_fbig; + /* Obtain host handle */ if (!(host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len)) || (argp->monitor && nsm_monitor(host) < 0)) @@ -50,6 +54,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Set up the missing parts of the file_lock structure */ lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; + lock->fl.fl_start = (loff_t)lock->lock_start; + lock->fl.fl_end = lock->lock_len ? + (loff_t)(lock->lock_start + lock->lock_len - 1) : + OFFSET_MAX; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); if (!lock->fl.fl_owner) { diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 856267c0864bd..712fdfeb8ef06 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -20,13 +20,6 @@
#include "svcxdr.h"
-static inline loff_t -s64_to_loff_t(__s64 offset) -{ - return (loff_t)offset; -} - - static inline s64 loff_t_to_s64(loff_t offset) { @@ -70,8 +63,6 @@ static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { struct file_lock *fl = &lock->fl; - u64 len, start; - s64 end;
if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) return false; @@ -81,20 +72,14 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return false; if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) return false; - if (xdr_stream_decode_u64(xdr, &start) < 0) + if (xdr_stream_decode_u64(xdr, &lock->lock_start) < 0) return false; - if (xdr_stream_decode_u64(xdr, &len) < 0) + if (xdr_stream_decode_u64(xdr, &lock->lock_len) < 0) return false;
locks_init_lock(fl); fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; - end = start + len - 1; - fl->fl_start = s64_to_loff_t(start); - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s64_to_loff_t(end);
return true; } diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 398f70093cd35..67e4a2c5500bd 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -41,6 +41,8 @@ struct nlm_lock { struct nfs_fh fh; struct xdr_netobj oh; u32 svid; + u64 lock_start; + u64 lock_len; struct file_lock fl; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 00801cd92d91e94aa04d687f9bb9a9104e7c3d46 ]
A recent patch moved ACL setting into nfsd_setattr(). Unfortunately it didn't work as nfsd_setattr() aborts early if iap->ia_valid is 0.
Remove this test, and instead avoid calling notify_change() when ia_valid is 0.
This means that nfsd_setattr() will now *always* lock the inode. Previously it didn't if only a ATTR_MODE change was requested on a symlink (see Commit 15b7a1b86d66 ("[PATCH] knfsd: fix setattr-on-symlink error return")). I don't think this change really matters.
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5a7fee4ee2079..95f2e4549c034 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -299,6 +299,10 @@ commit_metadata(struct svc_fh *fhp) static void nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap) { + /* Ignore mode updates on symlinks */ + if (S_ISLNK(inode->i_mode)) + iap->ia_valid &= ~ATTR_MODE; + /* sanitize the mode change */ if (iap->ia_valid & ATTR_MODE) { iap->ia_mode &= S_IALLUGO; @@ -366,7 +370,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err; + int host_err = 0; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE);
@@ -404,13 +408,6 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, dentry = fhp->fh_dentry; inode = d_inode(dentry);
- /* Ignore any mode updates on symlinks */ - if (S_ISLNK(inode->i_mode)) - iap->ia_valid &= ~ATTR_MODE; - - if (!iap->ia_valid) - return 0; - nfsd_sanitize_attrs(inode, iap);
if (check_guard && guardtime != inode->i_ctime.tv_sec) @@ -461,8 +458,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out_unlock; }
- iap->ia_valid |= ATTR_CTIME; - host_err = notify_change(dentry, iap, NULL); + if (iap->ia_valid) { + iap->ia_valid |= ATTR_CTIME; + host_err = notify_change(dentry, iap, NULL); + }
out_unlock: if (attr->na_seclabel && attr->na_seclabel->len)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit bfbfb6182ad1d7d184b16f25165faad879147f79 ]
pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE worth of data). Theoretically it had been possible since way back, but nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change. Fortunately, the only thing that changes for compound pages is that we need to stuff each relevant subpage in and convert the offset into offset in the first subpage.
Acked-by: Chuck Lever chuck.lever@oracle.com Tested-by: Benjamin Coddington bcodding@redhat.com Fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE" Signed-off-by: Al Viro viro@zeniv.linux.org.uk [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 95f2e4549c034..bc377ee177171 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -862,10 +862,15 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { struct svc_rqst *rqstp = sd->u.data; - - svc_rqst_replace_page(rqstp, buf->page); - if (rqstp->rq_res.page_len == 0) - rqstp->rq_res.page_base = buf->offset; + struct page *page = buf->page; // may be a compound one + unsigned offset = buf->offset; + int i; + + page += offset / PAGE_SIZE; + for (i = sd->len; i > 0; i -= PAGE_SIZE) + svc_rqst_replace_page(rqstp, page++); + if (rqstp->rq_res.page_len == 0) // first call + rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len; return sd->len; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 72f78ae00a8e5d7abe13abac8305a300f6afd74b ]
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmkn... Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4idmap.c | 8 ++++---- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfssvc.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index f92161ce1f97d..e70a1a2999b7b 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -82,8 +82,8 @@ ent_init(struct cache_head *cnew, struct cache_head *citm) new->id = itm->id; new->type = itm->type;
- strlcpy(new->name, itm->name, sizeof(new->name)); - strlcpy(new->authname, itm->authname, sizeof(new->authname)); + strscpy(new->name, itm->name, sizeof(new->name)); + strscpy(new->authname, itm->authname, sizeof(new->authname)); }
static void @@ -548,7 +548,7 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen return nfserr_badowner; memcpy(key.name, name, namelen); key.name[namelen] = '\0'; - strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, nametoid_lookup, &key, nn->nametoid_cache, &item); if (ret == -ENOENT) return nfserr_badowner; @@ -584,7 +584,7 @@ static __be32 idmap_id_to_name(struct xdr_stream *xdr, int ret; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
- strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item); if (ret == -ENOENT) return encode_ascii_id(xdr, id); diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 193b84a0f3a59..0431b979748b8 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1345,7 +1345,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, return 0; } if (work) { - strlcpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); + strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 011c556caa1e7..8b1afde192118 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -799,7 +799,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) if (nrservs == 0 && nn->nfsd_serv == NULL) goto out;
- strlcpy(nn->nfsd_name, utsname()->nodename, + strscpy(nn->nfsd_name, utsname()->nodename, sizeof(nn->nfsd_name));
error = nfsd_create_serv(net);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 97f8e62572555f8ad578d7b1739ba64d5d2cac0f ]
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmkn... Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/lockd/host.c b/fs/lockd/host.c index f802223e71abe..cdc8e12cdac44 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -164,7 +164,7 @@ static struct nlm_host *nlm_alloc_host(struct nlm_lookup_host_info *ni, host->h_addrbuf = nsm->sm_addrbuf; host->net = ni->net; host->h_cred = get_cred(ni->cred); - strlcpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); + strscpy(host->nodename, utsname()->nodename, sizeof(host->nodename));
out: return host;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Kornievskaia kolga@netapp.com
[ Upstream commit 754035ff79a14886e68c0c9f6fa80adb21f12b53 ]
If the passed in filehandle for the source file in the COPY operation is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE.
Signed-off-by: Olga Kornievskaia kolga@netapp.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0431b979748b8..ebfe39d313119 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1758,7 +1758,13 @@ static int nfsd4_do_async_copy(void *data) filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, ©->stateid); if (IS_ERR(filp)) { - nfserr = nfserr_offload_denied; + switch (PTR_ERR(filp)) { + case -EBADF: + nfserr = nfserr_wrong_type; + break; + default: + nfserr = nfserr_offload_denied; + } /* ss_mnt will be unmounted by the laundromat */ goto do_callback; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jinpeng Cui cui.jinpeng2@zte.com.cn
[ Upstream commit 4ab3442ca384a02abf8b1f2b3449a6c547851873 ]
Return value directly from fh_verify() do_open_permission() exp_pseudoroot() instead of getting value from redundant variable status.
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Jinpeng Cui cui.jinpeng2@zte.com.cn Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index ebfe39d313119..62ffcecf78f7e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -141,7 +141,6 @@ fh_dup2(struct svc_fh *dst, struct svc_fh *src) static __be32 do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open, int accmode) { - __be32 status;
if (open->op_truncate && !(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) @@ -156,9 +155,7 @@ do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfs if (open->op_share_deny & NFS4_SHARE_DENY_READ) accmode |= NFSD_MAY_WRITE;
- status = fh_verify(rqstp, current_fh, S_IFREG, accmode); - - return status; + return fh_verify(rqstp, current_fh, S_IFREG, accmode); }
static __be32 nfsd_check_obj_isreg(struct svc_fh *fh) @@ -454,7 +451,6 @@ static __be32 do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open) { struct svc_fh *current_fh = &cstate->current_fh; - __be32 status; int accmode = 0;
/* We don't know the target directory, and therefore can not @@ -479,9 +475,7 @@ do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, str if (open->op_claim_type == NFS4_OPEN_CLAIM_DELEG_CUR_FH) accmode = NFSD_MAY_OWNER_OVERRIDE;
- status = do_open_permission(rqstp, current_fh, open, accmode); - - return status; + return do_open_permission(rqstp, current_fh, open, accmode); }
static void @@ -668,11 +662,9 @@ static __be32 nfsd4_putrootfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { - __be32 status; - fh_put(&cstate->current_fh); - status = exp_pseudoroot(rqstp, &cstate->current_fh); - return status; + + return exp_pseudoroot(rqstp, &cstate->current_fh); }
static __be32
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit d44899b8bb0b919f923186c616a84f0e70e04772 ]
memdup_user() can't return NULL, so there is no point for checking for it.
Simplify some tests accordingly.
Suggested-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4recover.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index d08c1a8c9254b..b9394a639a41a 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -807,7 +807,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &ci->cc_name.cn_len)) return -EFAULT; name.data = memdup_user(&ci->cc_name.cn_id, namelen); - if (IS_ERR_OR_NULL(name.data)) + if (IS_ERR(name.data)) return -EFAULT; name.len = namelen; get_user(princhashlen, &ci->cc_princhash.cp_len); @@ -815,7 +815,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, princhash.data = memdup_user( &ci->cc_princhash.cp_data, princhashlen); - if (IS_ERR_OR_NULL(princhash.data)) { + if (IS_ERR(princhash.data)) { kfree(name.data); return -EFAULT; } @@ -829,7 +829,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &cnm->cn_len)) return -EFAULT; name.data = memdup_user(&cnm->cn_id, namelen); - if (IS_ERR_OR_NULL(name.data)) + if (IS_ERR(name.data)) return -EFAULT; name.len = namelen; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 30a30fcc3fc1ad4c5d017c9fcb75dc8f59e7bdad ]
Propagate the error code returned by memdup_user() instead of a hard coded -EFAULT.
Suggested-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4recover.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index b9394a639a41a..189c622dde61c 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -808,7 +808,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, return -EFAULT; name.data = memdup_user(&ci->cc_name.cn_id, namelen); if (IS_ERR(name.data)) - return -EFAULT; + return PTR_ERR(name.data); name.len = namelen; get_user(princhashlen, &ci->cc_princhash.cp_len); if (princhashlen > 0) { @@ -817,7 +817,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, princhashlen); if (IS_ERR(princhash.data)) { kfree(name.data); - return -EFAULT; + return PTR_ERR(princhash.data); } princhash.len = princhashlen; } else @@ -830,7 +830,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, return -EFAULT; name.data = memdup_user(&cnm->cn_id, namelen); if (IS_ERR(name.data)) - return -EFAULT; + return PTR_ERR(name.data); name.len = namelen; } if (name.len > 5 && memcmp(name.data, "hash:", 5) == 0) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 80e591ce636f3ae6855a0ca26963da1fdd6d4508 ]
When attempting an NFSv4 mount, a Solaris NFSv4 client builds a single large COMPOUND that chains a series of LOOKUPs to get to the pseudo filesystem root directory that is to be mounted. The Linux NFS server's current maximum of 16 operations per NFSv4 COMPOUND is not large enough to ensure that this works for paths that are more than a few components deep.
Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not trigger any re-allocation of the operation array on the server), increasing this maximum should result in little to no impact.
The ops array can get large now, so allocate it via vmalloc() to help ensure memory fragmentation won't cause an allocation failure.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383 Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 7 ++++--- fs/nfsd/state.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5476541530ead..92e0535ddb922 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -42,6 +42,8 @@ #include <linux/sunrpc/svcauth_gss.h> #include <linux/sunrpc/addr.h> #include <linux/xattr.h> +#include <linux/vmalloc.h> + #include <uapi/linux/xattr.h>
#include "idmap.h" @@ -2369,10 +2371,9 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) return true;
if (argp->opcnt > ARRAY_SIZE(argp->iops)) { - argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); + argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops)); if (!argp->ops) { argp->ops = argp->iops; - dprintk("nfsd: couldn't allocate room for COMPOUND\n"); return false; } } @@ -5402,7 +5403,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) struct nfsd4_compoundargs *args = rqstp->rq_argp;
if (args->ops != args->iops) { - kfree(args->ops); + vfree(args->ops); args->ops = args->iops; } while (args->to_free) { diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index ae596dbf86675..5d28beb290fef 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -175,7 +175,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s) /* Maximum number of slots per session. 160 is useful for long haul TCP */ #define NFSD_MAX_SLOTS_PER_SESSION 160 /* Maximum number of operations per session compound */ -#define NFSD_MAX_OPS_PER_COMPOUND 16 +#define NFSD_MAX_OPS_PER_COMPOUND 50 /* Maximum session per slot cache size */ #define NFSD_SLOT_CACHE_SIZE 2048 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 00b4492686e0497fdb924a9d4c8f6f99377e176c ]
Restore the previous limit on the @count argument to prevent a buffer overflow attack.
Fixes: 53b1119a6e50 ("NFSD: Fix READDIR buffer overflow") Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index e533550a26db5..559603a0a5358 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -567,12 +567,11 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr;
- count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); - memset(buf, 0, sizeof(*buf));
/* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - XDR_UNIT * 2; + buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), (u32)PAGE_SIZE); + buf->buflen -= XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 640f87c190e0d1b2a0fcb2ecf6d2cd53b1c41991 ]
Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply message at the same time.
Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large.
A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case.
Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this issue.
Reported-by: Ben Ronallo Benjamin.Ronallo@synopsys.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1a52f0b06ec32..d808779ab4538 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -563,13 +563,14 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr; - - count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); + unsigned int sendbuf = min_t(unsigned int, rqstp->rq_res.buflen, + svc_max_payload(rqstp));
memset(buf, 0, sizeof(*buf));
/* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - XDR_UNIT * 2; + buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), sendbuf); + buf->buflen -= XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 401bc1f90874280a80b93f23be33a0e7e2d1f912 ]
Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time.
Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large.
A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case.
Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 559603a0a5358..749c3354304c2 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -185,6 +185,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) argp->count, argp->offset);
argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2); + argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen);
v = 0; len = argp->count;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fa6be9cc6e80ec79892ddf08a8c10cabab9baf38 ]
Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time.
Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large.
A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case.
Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d808779ab4538..8679c41746027 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -150,7 +150,6 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - u32 max_blocksize = svc_max_payload(rqstp); unsigned int len; int v;
@@ -159,7 +158,8 @@ nfsd3_proc_read(struct svc_rqst *rqstp) (unsigned long) argp->count, (unsigned long long) argp->offset);
- argp->count = min_t(u32, argp->count, max_blocksize); + argp->count = min_t(u32, argp->count, svc_max_payload(rqstp)); + argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); if (argp->offset > (u64)OFFSET_MAX) argp->offset = (u64)OFFSET_MAX; if (argp->offset + argp->count > (u64)OFFSET_MAX)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 9558f9304ca1903090fa5d995a3269a8e82804b4 ]
nfsd_create_locked() does not use the "fname" and "flen" arguments, so drop them from declaration and all callers.
Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 5 ++--- fs/nfsd/vfs.c | 5 ++--- fs/nfsd/vfs.h | 4 ++-- 3 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 749c3354304c2..7ed03ac6bdab3 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -391,9 +391,8 @@ nfsd_proc_create(struct svc_rqst *rqstp) resp->status = nfs_ok; if (!inode) { /* File doesn't exist. Create it and set attrs */ - resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name, - argp->len, &attrs, type, rdev, - newfhp); + resp->status = nfsd_create_locked(rqstp, dirfhp, &attrs, type, + rdev, newfhp); } else if (type == S_IFREG) { dprintk("nfsd: existing %s, valid=%x, size=%ld\n", argp->name, attr->ia_valid, (long) attr->ia_size); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index bc377ee177171..5ec1119a87859 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1273,7 +1273,7 @@ nfsd_check_ignore_resizing(struct iattr *iap) /* The parent directory should already be locked: */ __be32 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct nfsd_attrs *attrs, + struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild; @@ -1399,8 +1399,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (err) goto out_unlock; fh_fill_pre_attrs(fhp); - err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, - rdev, resfhp); + err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp); fh_fill_post_attrs(fhp); out_unlock: inode_unlock(dentry->d_inode); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index c95cd414b4bb0..120521bc7b247 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -79,8 +79,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct nfsd_attrs *attrs, - int type, dev_t rdev, struct svc_fh *res); + struct nfsd_attrs *attrs, int type, dev_t rdev, + struct svc_fh *res); __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7518a3dc5ea249d4112156ce71b8b184eb786151 ]
If an NFS server returns NFS4ERR_RESOURCE on the first operation in an NFSv4 COMPOUND, there's no way for a client to know where the problem is and then simplify the compound to make forward progress.
So instead, make NFSD process as many operations in an oversized COMPOUND as it can and then return NFS4ERR_RESOURCE on the first operation it did not process.
pynfs NFSv4.0 COMP6 exercises this case, but checks only for the COMPOUND status code, not whether the server has processed any of the operations.
pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects too many operations per COMPOUND by checking against the limits negotiated when the session was created.
Suggested-by: Bruce Fields bfields@fieldses.org Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 19 +++++++++++++------ fs/nfsd/nfs4xdr.c | 12 +++--------- fs/nfsd/xdr4.h | 3 ++- 3 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 62ffcecf78f7e..c40795d1d98df 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2623,9 +2623,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) status = nfserr_minor_vers_mismatch; if (nfsd_minorversion(nn, args->minorversion, NFSD_TEST) <= 0) goto out; - status = nfserr_resource; - if (args->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - goto out;
status = nfs41_check_op_ordering(args); if (status) { @@ -2638,10 +2635,20 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
rqstp->rq_lease_breaker = (void **)&cstate->clp;
- trace_nfsd_compound(rqstp, args->opcnt); + trace_nfsd_compound(rqstp, args->client_opcnt); while (!status && resp->opcnt < args->opcnt) { op = &args->ops[resp->opcnt++];
+ if (unlikely(resp->opcnt == NFSD_MAX_OPS_PER_COMPOUND)) { + /* If there are still more operations to process, + * stop here and report NFS4ERR_RESOURCE. */ + if (cstate->minorversion == 0 && + args->client_opcnt > resp->opcnt) { + op->status = nfserr_resource; + goto encode_op; + } + } + /* * The XDR decode routines may have pre-set op->status; * for example, if there is a miscellaneous XDR error @@ -2717,8 +2724,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) status = op->status; }
- trace_nfsd_compound_status(args->opcnt, resp->opcnt, status, - nfsd4_op_name(op->opnum)); + trace_nfsd_compound_status(args->client_opcnt, resp->opcnt, + status, nfsd4_op_name(op->opnum));
nfsd4_cstate_clear_replay(cstate); nfsd4_increment_op_stats(op->opnum); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 92e0535ddb922..b9398b7b3539a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2359,16 +2359,10 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) return false; - if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) + if (xdr_stream_decode_u32(argp->xdr, &argp->client_opcnt) < 0) return false; - - /* - * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS - * here, so we return success at the xdr level so that - * nfsd4_proc can handle this is an NFS-level error. - */ - if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return true; + argp->opcnt = min_t(u32, argp->client_opcnt, + NFSD_MAX_OPS_PER_COMPOUND);
if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops)); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 14b87141de343..2a699e8ca1baf 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -717,9 +717,10 @@ struct nfsd4_compoundargs { struct svcxdr_tmpbuf *to_free; struct svc_rqst *rqstp;
- u32 taglen; char * tag; + u32 taglen; u32 minorversion; + u32 client_opcnt; u32 opcnt; struct nfsd4_op *ops; struct nfsd4_op iops[8];
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 6106d9119b6599fa23dc556b429d887b4c2d9f62 ]
We only need the inode number for this, not a full rack of attributes. Rename this function make it take a pointer to a u64 instead of struct kstat, and change it to just request STATX_INO.
Signed-off-by: Jeff Layton jlayton@kernel.org [ cel: renamed get_mounted_on_ino() ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b9398b7b3539a..b5ca83045d6e9 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2769,9 +2769,10 @@ static __be32 fattr_handle_absent_fs(u32 *bmval0, u32 *bmval1, u32 *bmval2, u32 }
-static int get_parent_attributes(struct svc_export *exp, struct kstat *stat) +static int nfsd4_get_mounted_on_ino(struct svc_export *exp, u64 *pino) { struct path path = exp->ex_path; + struct kstat stat; int err;
path_get(&path); @@ -2779,8 +2780,10 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat) if (path.dentry != path.mnt->mnt_root) break; } - err = vfs_getattr(&path, stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); + err = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT); path_put(&path); + if (!err) + *pino = stat.ino; return err; }
@@ -3277,22 +3280,21 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, *p++ = cpu_to_be32(stat.btime.tv_nsec); } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { - struct kstat parent_stat; u64 ino = stat.ino;
p = xdr_reserve_space(xdr, 8); if (!p) goto out_resource; /* - * Get parent's attributes if not ignoring crossmount - * and this is the root of a cross-mounted filesystem. + * Get ino of mountpoint in parent filesystem, if not ignoring + * crossmount and this is the root of a cross-mounted + * filesystem. */ if (ignore_crossmnt == 0 && dentry == exp->ex_path.mnt->mnt_root) { - err = get_parent_attributes(exp, &parent_stat); + err = nfsd4_get_mounted_on_ino(exp, &ino); if (err) goto out_nfserr; - ino = parent_stat.ino; } p = xdr_encode_hyper(p, ino); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gaosheng Cui cuigaosheng1@huawei.com
[ Upstream commit 18224dc58d960c65446971930d0487fc72d00598 ]
nfsd4_prepare_cb_recall() has been removed since commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"), so remove it.
Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/state.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 5d28beb290fef..4155be65d8069 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -697,7 +697,6 @@ extern int nfsd4_create_callback_queue(void); extern void nfsd4_destroy_callback_queue(void); extern void nfsd4_shutdown_callback(struct nfs4_client *); extern void nfsd4_shutdown_copy(struct nfs4_client *clp); -extern void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp); extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash, struct nfsd_net *nn); extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 1035d65446a018ca2dd179e29a2fcd6d29057781 ]
Wireshark has always been lousy about dissecting NFSv4 callbacks, especially NFSv4.0 backchannel requests. Add tracepoints so we can surgically capture these events in the trace log.
Tracepoints are time-stamped and ordered so that we can now observe the timing relationship between a CB_RECALL Reply and the client's DELEGRETURN Call. Example:
nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001
nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2 nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0 nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00 nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0 kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4layouts.c | 2 +- fs/nfsd/nfs4proc.c | 4 ++++ fs/nfsd/nfs4state.c | 4 ++++ fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 7018d209b784a..e4e23b2a3e655 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -657,7 +657,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task) ktime_t now, cutoff; const struct nfsd4_layout_ops *ops;
- + trace_nfsd_cb_layout_done(&ls->ls_stid.sc_stateid, task); switch (task->tk_status) { case 0: case -NFS4ERR_DELAY: diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c40795d1d98df..38d60964d8b2d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1603,6 +1603,10 @@ static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, struct rpc_task *task) { + struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb); + + trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task); return 1; }
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 7a4baa41b5362..8bbff388e4f03 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -357,6 +357,8 @@ nfsd4_cb_notify_lock_prepare(struct nfsd4_callback *cb) static int nfsd4_cb_notify_lock_done(struct nfsd4_callback *cb, struct rpc_task *task) { + trace_nfsd_cb_notify_lock_done(&zero_stateid, task); + /* * Since this is just an optimization, we don't try very hard if it * turns out not to succeed. We'll requeue it on NFS4ERR_DELAY, and @@ -4760,6 +4762,8 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, { struct nfs4_delegation *dp = cb_to_delegation(cb);
+ trace_nfsd_cb_recall_done(&dp->dl_stid.sc_stateid, task); + if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID || dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) return 1; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 22c1fb735f1a7..0ee4220a289a0 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1366,6 +1366,45 @@ TRACE_EVENT(nfsd_cb_offload, __entry->fh_hash, __entry->count, __entry->status) );
+DECLARE_EVENT_CLASS(nfsd_cb_done_class, + TP_PROTO( + const stateid_t *stp, + const struct rpc_task *task + ), + TP_ARGS(stp, task), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __field(int, status) + ), + TP_fast_assign( + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + __entry->status = task->tk_status; + ), + TP_printk("client %08x:%08x stateid %08x:%08x status=%d", + __entry->cl_boot, __entry->cl_id, __entry->si_id, + __entry->si_generation, __entry->status + ) +); + +#define DEFINE_NFSD_CB_DONE_EVENT(name) \ +DEFINE_EVENT(nfsd_cb_done_class, name, \ + TP_PROTO( \ + const stateid_t *stp, \ + const struct rpc_task *task \ + ), \ + TP_ARGS(stp, task)) + +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_recall_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_notify_lock_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_layout_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_offload_done); + #endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c035362eb935fe9381d9d1cc453bc2a37460e24c ]
Subsequent patches will use this mechanism to wake up an operation that is waiting for a client to return a delegation.
The new tracepoint records whether the wait timed out or was properly awoken by the expected DELEGRETURN:
nfsd-1155 [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)
Suggested-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++++++++++++ fs/nfsd/nfsd.h | 7 +++++++ fs/nfsd/trace.h | 23 +++++++++++++++++++++++ 3 files changed, 60 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8bbff388e4f03..2bb78ab4f6c31 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4734,6 +4734,35 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) return ret; }
+static bool nfsd4_deleg_present(const struct inode *inode) +{ + struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx); + + return ctx && !list_empty_careful(&ctx->flc_lease); +} + +/** + * nfsd_wait_for_delegreturn - wait for delegations to be returned + * @rqstp: the RPC transaction being executed + * @inode: in-core inode of the file being waited for + * + * The timeout prevents deadlock if all nfsd threads happen to be + * tied up waiting for returning delegations. + * + * Return values: + * %true: delegation was returned + * %false: timed out waiting for delegreturn + */ +bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode) +{ + long __maybe_unused timeo; + + timeo = wait_var_event_timeout(inode, !nfsd4_deleg_present(inode), + NFSD_DELEGRETURN_TIMEOUT); + trace_nfsd_delegret_wakeup(rqstp, inode, timeo); + return timeo > 0; +} + static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb) { struct nfs4_delegation *dp = cb_to_delegation(cb); @@ -6811,6 +6840,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto put_stateid;
+ wake_up_var(d_inode(cstate->current_fh.fh_dentry)); destroy_delegation(dp); put_stateid: nfs4_put_stid(&dp->dl_stid); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 57a468ed85c35..6ab4ad41ae84e 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -164,6 +164,7 @@ char * nfs4_recoverydir(void); bool nfsd4_spo_must_allow(struct svc_rqst *rqstp); int nfsd4_create_laundry_wq(void); void nfsd4_destroy_laundry_wq(void); +bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode); #else static inline int nfsd4_init_slabs(void) { return 0; } static inline void nfsd4_free_slabs(void) { } @@ -179,6 +180,11 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) } static inline int nfsd4_create_laundry_wq(void) { return 0; }; static inline void nfsd4_destroy_laundry_wq(void) {}; +static inline bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, + struct inode *inode) +{ + return false; +} #endif
/* @@ -343,6 +349,7 @@ void nfsd_lockd_shutdown(void); #define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ #define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 #define NFS4_CLIENTS_PER_GB 1024 +#define NFSD_DELEGRETURN_TIMEOUT (HZ / 34) /* 30ms */
/* * The following attributes are currently not supported by the NFSv4 server: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0ee4220a289a0..6803ac877ff70 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -454,6 +454,29 @@ DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err); #include "filecache.h" #include "vfs.h"
+TRACE_EVENT(nfsd_delegret_wakeup, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + long timeo + ), + TP_ARGS(rqstp, inode, timeo), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(long, timeo) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->timeo = timeo; + ), + TP_printk("xid=0x%08x inode=%p%s", + __entry->xid, __entry->inode, + __entry->timeo == 0 ? " (timed out)" : "" + ) +); + DECLARE_EVENT_CLASS(nfsd_stateid_class, TP_PROTO(stateid_t *stp), TP_ARGS(stp),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c0aa1913db57219e91a0a8832363cbafb3a9cf8f ]
Move code that will be retried (in a subsequent patch) into a helper function.
Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 97 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 40 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5ec1119a87859..e32b0c807ea9d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -356,8 +356,61 @@ nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); }
-/* - * Set various file attributes. After this call fhp needs an fh_put. +static int __nfsd_setattr(struct dentry *dentry, struct iattr *iap) +{ + int host_err; + + if (iap->ia_valid & ATTR_SIZE) { + /* + * RFC5661, Section 18.30.4: + * Changing the size of a file with SETATTR indirectly + * changes the time_modify and change attributes. + * + * (and similar for the older RFCs) + */ + struct iattr size_attr = { + .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, + .ia_size = iap->ia_size, + }; + + if (iap->ia_size < 0) + return -EFBIG; + + host_err = notify_change(dentry, &size_attr, NULL); + if (host_err) + return host_err; + iap->ia_valid &= ~ATTR_SIZE; + + /* + * Avoid the additional setattr call below if the only other + * attribute that the client sends is the mtime, as we update + * it as part of the size change above. + */ + if ((iap->ia_valid & ~ATTR_MTIME) == 0) + return 0; + } + + if (!iap->ia_valid) + return 0; + + iap->ia_valid |= ATTR_CTIME; + return notify_change(dentry, iap, NULL); +} + +/** + * nfsd_setattr - Set various file attributes. + * @rqstp: controlling RPC transaction + * @fhp: filehandle of target + * @attr: attributes to set + * @check_guard: set to 1 if guardtime is a valid timestamp + * @guardtime: do not act if ctime.tv_sec does not match this timestamp + * + * This call may adjust the contents of @attr (in particular, this + * call may change the bits in the na_iattr.ia_valid field). + * + * Returns nfs_ok on success, otherwise an NFS status code is + * returned. Caller must release @fhp by calling fh_put in either + * case. */ __be32 nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -370,7 +423,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err = 0; + int host_err; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE);
@@ -427,43 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, }
inode_lock(inode); - if (size_change) { - /* - * RFC5661, Section 18.30.4: - * Changing the size of a file with SETATTR indirectly - * changes the time_modify and change attributes. - * - * (and similar for the older RFCs) - */ - struct iattr size_attr = { - .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, - .ia_size = iap->ia_size, - }; - - host_err = -EFBIG; - if (iap->ia_size < 0) - goto out_unlock; - - host_err = notify_change(dentry, &size_attr, NULL); - if (host_err) - goto out_unlock; - iap->ia_valid &= ~ATTR_SIZE; - - /* - * Avoid the additional setattr call below if the only other - * attribute that the client sends is the mtime, as we update - * it as part of the size change above. - */ - if ((iap->ia_valid & ~ATTR_MTIME) == 0) - goto out_unlock; - } - - if (iap->ia_valid) { - iap->ia_valid |= ATTR_CTIME; - host_err = notify_change(dentry, iap, NULL); - } - -out_unlock: + host_err = __nfsd_setattr(dentry, iap); if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 34b91dda7124fc3259e4b2ae53e0c933dedfec01 ]
nfsd_setattr() can kick off a CB_RECALL (via notify_change() -> break_lease()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_setattr() again, once.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov imammedo@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index e32b0c807ea9d..dc79db261d6a2 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -426,6 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int host_err; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); + int retries;
if (iap->ia_valid & ATTR_SIZE) { accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE; @@ -480,7 +481,13 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, }
inode_lock(inode); - host_err = __nfsd_setattr(dentry, iap); + for (retries = 1;;) { + host_err = __nfsd_setattr(dentry, iap); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, inode)) + break; + } if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 68c522afd0b1936b48a03a4c8b81261e7597c62d ]
nfsd_rename() can kick off a CB_RECALL (via vfs_rename() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_rename() again, once.
This version of the patch handles renaming an existing file, but does not deal with renaming onto an existing file. That case will still always trigger an NFS4ERR_DELAY.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov imammedo@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index dc79db261d6a2..5684845d95119 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1711,7 +1711,15 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, .new_dir = tdir, .new_dentry = ndentry, }; - host_err = vfs_rename(&rd); + int retries; + + for (retries = 1;;) { + host_err = vfs_rename(&rd); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry))) + break; + } if (!host_err) { host_err = commit_metadata(tfhp); if (!host_err)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5f5f8b6d655fd947e899b1771c2f7cb581a06764 ]
nfsd_unlink() can kick off a CB_RECALL (via vfs_unlink() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_unlink() again, once.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov imammedo@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5684845d95119..e29034b1e6128 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1803,9 +1803,18 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
fh_fill_pre_attrs(fhp); if (type != S_IFDIR) { + int retries; + if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) nfsd_close_cached_files(rdentry); - host_err = vfs_unlink(dirp, rdentry, NULL); + + for (retries = 1;;) { + host_err = vfs_unlink(dirp, rdentry, NULL); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, rinode)) + break; + } } else { host_err = vfs_rmdir(dirp, rdentry); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 3a4ea23d86a317c4b68b9a69d51f7e84e1e04357 ]
Add counter nfs4_courtesy_client_count to nfsd_net to keep track of the number of courtesy clients in the system.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 17 ++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ffe17743cc74b..55c7006d6109a 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -192,6 +192,8 @@ struct nfsd_net {
atomic_t nfs4_client_count; int nfs4_max_clients; + + atomic_t nfsd_courtesy_clients; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2bb78ab4f6c31..9930c5f9440c7 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -160,6 +160,13 @@ static bool is_client_expired(struct nfs4_client *clp) return clp->cl_time == 0; }
+static void nfsd4_dec_courtesy_client_count(struct nfsd_net *nn, + struct nfs4_client *clp) +{ + if (clp->cl_state != NFSD4_ACTIVE) + atomic_add_unless(&nn->nfsd_courtesy_clients, -1, 0); +} + static __be32 get_client_locked(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); @@ -169,6 +176,7 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); + nfsd4_dec_courtesy_client_count(nn, clp); clp->cl_state = NFSD4_ACTIVE; return nfs_ok; } @@ -190,6 +198,7 @@ renew_client_locked(struct nfs4_client *clp)
list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); + nfsd4_dec_courtesy_client_count(nn, clp); clp->cl_state = NFSD4_ACTIVE; }
@@ -2248,6 +2257,7 @@ __destroy_client(struct nfs4_client *clp) if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); atomic_add_unless(&nn->nfs4_client_count, -1, 0); + nfsd4_dec_courtesy_client_count(nn, clp); free_client(clp); wake_up_all(&expiry_wq); } @@ -4375,6 +4385,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); max_clients *= NFS4_CLIENTS_PER_GB; nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); + + atomic_set(&nn->nfsd_courtesy_clients, 0); }
static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5928,8 +5940,11 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, goto exp_client; if (!state_expired(lt, clp->cl_time)) break; - if (!atomic_read(&clp->cl_rpc_users)) + if (!atomic_read(&clp->cl_rpc_users)) { + if (clp->cl_state == NFSD4_ACTIVE) + atomic_inc(&nn->nfsd_courtesy_clients); clp->cl_state = NFSD4_COURTESY; + } if (!client_has_state(clp)) goto exp_client; if (!nfs4_anylock_blockers(clp))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 7746b32f467b3813fb61faaab3258de35806a7ac ]
Add courtesy_client_reaper to react to low memory condition triggered by the system memory shrinker.
The delayed_work for the courtesy_client_reaper is scheduled on the shrinker's count callback using the laundry_wq.
The shrinker's scan callback is not used for expiring the courtesy clients due to potential deadlocks.
Signed-off-by: Dai Ngo dai.ngo@oracle.com [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 2 + fs/nfsd/nfs4state.c | 94 +++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/nfsctl.c | 6 ++- fs/nfsd/nfsd.h | 6 ++- 4 files changed, 96 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 55c7006d6109a..8c854ba3285bb 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -194,6 +194,8 @@ struct nfsd_net { int nfs4_max_clients;
atomic_t nfsd_courtesy_clients; + struct shrinker nfsd_client_shrinker; + struct delayed_work nfsd_shrinker_work; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9930c5f9440c7..d2468a408328d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4366,7 +4366,27 @@ nfsd4_init_slabs(void) return -ENOMEM; }
-void nfsd4_init_leases_net(struct nfsd_net *nn) +static unsigned long +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) +{ + int cnt; + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_client_shrinker); + + cnt = atomic_read(&nn->nfsd_courtesy_clients); + if (cnt > 0) + mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + return (unsigned long)cnt; +} + +static unsigned long +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + return SHRINK_STOP; +} + +int +nfsd4_init_leases_net(struct nfsd_net *nn) { struct sysinfo si; u64 max_clients; @@ -4387,6 +4407,16 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB);
atomic_set(&nn->nfsd_courtesy_clients, 0); + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; + return register_shrinker(&nn->nfsd_client_shrinker); +} + +void +nfsd4_leases_net_shutdown(struct nfsd_net *nn) +{ + unregister_shrinker(&nn->nfsd_client_shrinker); }
static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5959,10 +5989,49 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, spin_unlock(&nn->client_lock); }
+static void +nfs4_get_courtesy_client_reaplist(struct nfsd_net *nn, + struct list_head *reaplist) +{ + unsigned int maxreap = 0, reapcnt = 0; + struct list_head *pos, *next; + struct nfs4_client *clp; + + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; + INIT_LIST_HEAD(reaplist); + + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state == NFSD4_ACTIVE) + break; + if (reapcnt >= maxreap) + break; + if (!mark_client_expired_locked(clp)) { + list_add(&clp->cl_lru, reaplist); + reapcnt++; + } + } + spin_unlock(&nn->client_lock); +} + +static void +nfs4_process_client_reaplist(struct list_head *reaplist) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + + list_for_each_safe(pos, next, reaplist) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + trace_nfsd_clid_purged(&clp->cl_clientid); + list_del_init(&clp->cl_lru); + expire_client(clp); + } +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { - struct nfs4_client *clp; struct nfs4_openowner *oo; struct nfs4_delegation *dp; struct nfs4_ol_stateid *stp; @@ -5991,12 +6060,8 @@ nfs4_laundromat(struct nfsd_net *nn) } spin_unlock(&nn->s2s_cp_lock); nfs4_get_client_reaplist(nn, &reaplist, <); - list_for_each_safe(pos, next, &reaplist) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - trace_nfsd_clid_purged(&clp->cl_clientid); - list_del_init(&clp->cl_lru); - expire_client(clp); - } + nfs4_process_client_reaplist(&reaplist); + spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); @@ -6079,6 +6144,18 @@ laundromat_main(struct work_struct *laundry) queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ); }
+static void +courtesy_client_reaper(struct work_struct *reaper) +{ + struct list_head reaplist; + struct delayed_work *dwork = to_delayed_work(reaper); + struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + nfsd_shrinker_work); + + nfs4_get_courtesy_client_reaplist(nn, &reaplist); + nfs4_process_client_reaplist(&reaplist); +} + static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) { if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle)) @@ -7911,6 +7988,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru);
INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); + INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, courtesy_client_reaper); get_net(net);
return 0; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 917fa1892fd2d..597a26ad4183f 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; + retval = nfsd4_init_leases_net(nn); + if (retval) + goto out_drc_error; retval = nfsd_reply_cache_init(nn); if (retval) goto out_drc_error; - nfsd4_init_leases_net(nn); - get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock);
@@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); + nfsd4_leases_net_shutdown(nn); }
static struct pernet_operations nfsd_net_ops = { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 6ab4ad41ae84e..09726c5b9a317 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -505,7 +505,8 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif
-extern void nfsd4_init_leases_net(struct nfsd_net *nn); +extern int nfsd4_init_leases_net(struct nfsd_net *nn); +extern void nfsd4_leases_net_shutdown(struct nfsd_net *nn);
#else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) @@ -513,7 +514,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) return 0; }
-static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) {};
#define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 103cc1fafee48adb91fca0e19deb869fd23e46ab ]
Currently, SUNRPC clears the whole of .pc_argsize before processing each incoming RPC transaction. Add an extra parameter to struct svc_procedure to enable upper layers to reduce the amount of each operation's argument structure that is zeroed by SUNRPC.
The size of struct nfsd4_compoundargs, in particular, is a lot to clear on each incoming RPC Call. A subsequent patch will cut this down to something closer to what NFSv2 and NFSv3 uses.
This patch should cause no behavior changes.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 24 ++++++++++++++++++++++++ fs/lockd/svcproc.c | 24 ++++++++++++++++++++++++ fs/nfs/callback_xdr.c | 1 + fs/nfsd/nfs2acl.c | 5 +++++ fs/nfsd/nfs3acl.c | 3 +++ fs/nfsd/nfs3proc.c | 22 ++++++++++++++++++++++ fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/nfsproc.c | 18 ++++++++++++++++++ include/linux/sunrpc/svc.h | 1 + net/sunrpc/svc.c | 2 +- 10 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index bf274f23969b3..284b019cb6529 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -521,6 +521,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "NULL", @@ -530,6 +531,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_testres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, .pc_name = "TEST", @@ -539,6 +541,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "LOCK", @@ -548,6 +551,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "CANCEL", @@ -557,6 +561,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "UNLOCK", @@ -566,6 +571,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "GRANTED", @@ -575,6 +581,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_MSG", @@ -584,6 +591,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_MSG", @@ -593,6 +601,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_MSG", @@ -602,6 +611,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_MSG", @@ -611,6 +621,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_MSG", @@ -620,6 +631,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_RES", @@ -629,6 +641,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_RES", @@ -638,6 +651,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_RES", @@ -647,6 +661,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_RES", @@ -656,6 +671,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_res, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_RES", @@ -665,6 +681,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_reboot, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), + .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "SM_NOTIFY", @@ -674,6 +691,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -683,6 +701,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -692,6 +711,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -701,6 +721,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "SHARE", @@ -710,6 +731,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "UNSHARE", @@ -719,6 +741,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "NM_LOCK", @@ -728,6 +751,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_notify, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "FREE_ALL", diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index b09ca35b527cc..e35c05e278061 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -555,6 +555,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "NULL", @@ -564,6 +565,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_testres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, .pc_name = "TEST", @@ -573,6 +575,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "LOCK", @@ -582,6 +585,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "CANCEL", @@ -591,6 +595,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "UNLOCK", @@ -600,6 +605,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "GRANTED", @@ -609,6 +615,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_MSG", @@ -618,6 +625,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_MSG", @@ -627,6 +635,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_MSG", @@ -636,6 +645,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_MSG", @@ -645,6 +655,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_MSG", @@ -654,6 +665,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_RES", @@ -663,6 +675,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_RES", @@ -672,6 +685,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_RES", @@ -681,6 +695,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_RES", @@ -690,6 +705,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_res, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_RES", @@ -699,6 +715,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_reboot, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), + .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "SM_NOTIFY", @@ -708,6 +725,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -717,6 +735,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -726,6 +745,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -735,6 +755,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "SHARE", @@ -744,6 +765,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "UNSHARE", @@ -753,6 +775,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "NM_LOCK", @@ -762,6 +785,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_notify, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "FREE_ALL", diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 1b21f88a9808c..db69fc267c9a0 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1070,6 +1070,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_func = nfs4_callback_compound, .pc_encode = nfs4_encode_void, .pc_argsize = 256, + .pc_argzero = 256, .pc_ressize = 256, .pc_xdrressize = NFS4_CALLBACK_BUFSIZE, .pc_name = "COMPOUND", diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index f7d166f056afa..f8179fc8c9bdf 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -329,6 +329,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -340,6 +341,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_getaclres, .pc_release = nfsaclsvc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), + .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), @@ -351,6 +353,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd3_setaclargs), + .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -362,6 +365,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -373,6 +377,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_accessres, .pc_release = nfsaclsvc_release_access, .pc_argsize = sizeof(struct nfsd3_accessargs), + .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1, diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 15bee0339c764..2c451fcd5a3cc 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -250,6 +250,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -261,6 +262,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_getaclres, .pc_release = nfs3svc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), + .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), @@ -272,6 +274,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_setaclres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_setaclargs), + .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd3_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 8679c41746027..5c2e2b5e5945f 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -809,6 +809,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -820,6 +821,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_getattrres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -831,6 +833,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_sattrargs), + .pc_argzero = sizeof(struct nfsd3_sattrargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -842,6 +845,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_lookupres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+pAT+pAT, @@ -853,6 +857,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_accessres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_accessargs), + .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1, @@ -864,6 +869,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readlinkres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, @@ -875,6 +881,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readargs), + .pc_argzero = sizeof(struct nfsd3_readargs), .pc_ressize = sizeof(struct nfsd3_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4, @@ -886,6 +893,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_writeres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_writeargs), + .pc_argzero = sizeof(struct nfsd3_writeargs), .pc_ressize = sizeof(struct nfsd3_writeres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+4, @@ -897,6 +905,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_createargs), + .pc_argzero = sizeof(struct nfsd3_createargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -908,6 +917,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mkdirargs), + .pc_argzero = sizeof(struct nfsd3_mkdirargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -919,6 +929,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_symlinkargs), + .pc_argzero = sizeof(struct nfsd3_symlinkargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -930,6 +941,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mknodargs), + .pc_argzero = sizeof(struct nfsd3_mknodargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -941,6 +953,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -952,6 +965,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -963,6 +977,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_renameres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_renameargs), + .pc_argzero = sizeof(struct nfsd3_renameargs), .pc_ressize = sizeof(struct nfsd3_renameres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+WC, @@ -974,6 +989,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_linkres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_linkargs), + .pc_argzero = sizeof(struct nfsd3_linkargs), .pc_ressize = sizeof(struct nfsd3_linkres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+pAT+WC, @@ -985,6 +1001,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirargs), + .pc_argzero = sizeof(struct nfsd3_readdirargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIR", @@ -995,6 +1012,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirplusargs), + .pc_argzero = sizeof(struct nfsd3_readdirplusargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIRPLUS", @@ -1004,6 +1022,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsstatres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+2*6+1, @@ -1014,6 +1033,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsinfores, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsinfores), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+12, @@ -1024,6 +1044,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_pathconfres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_pathconfres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+6, @@ -1035,6 +1056,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_commitres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_commitargs), + .pc_argzero = sizeof(struct nfsd3_commitargs), .pc_ressize = sizeof(struct nfsd3_commitres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+WC+2, diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 38d60964d8b2d..f6ea7445073fe 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3577,6 +3577,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, @@ -3587,6 +3588,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfs4svc_decode_compoundargs, .pc_encode = nfs4svc_encode_compoundres, .pc_argsize = sizeof(struct nfsd4_compoundargs), + .pc_argzero = sizeof(struct nfsd4_compoundargs), .pc_ressize = sizeof(struct nfsd4_compoundres), .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 7ed03ac6bdab3..3509331a66504 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -645,6 +645,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -656,6 +657,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -667,6 +669,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_sattrargs), + .pc_argzero = sizeof(struct nfsd_sattrargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, @@ -677,6 +680,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -688,6 +692,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+AT, @@ -698,6 +703,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_readlinkres, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, @@ -709,6 +715,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_readres, .pc_release = nfssvc_release_readres, .pc_argsize = sizeof(struct nfsd_readargs), + .pc_argzero = sizeof(struct nfsd_readargs), .pc_ressize = sizeof(struct nfsd_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4, @@ -719,6 +726,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -730,6 +738,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_writeargs), + .pc_argzero = sizeof(struct nfsd_writeargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, @@ -741,6 +750,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), + .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, @@ -751,6 +761,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_diropargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -761,6 +772,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_renameargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_renameargs), + .pc_argzero = sizeof(struct nfsd_renameargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -771,6 +783,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_linkargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_linkargs), + .pc_argzero = sizeof(struct nfsd_linkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -781,6 +794,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_symlinkargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_symlinkargs), + .pc_argzero = sizeof(struct nfsd_symlinkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -792,6 +806,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), + .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, @@ -802,6 +817,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_diropargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -812,6 +828,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_readdirargs, .pc_encode = nfssvc_encode_readdirres, .pc_argsize = sizeof(struct nfsd_readdirargs), + .pc_argzero = sizeof(struct nfsd_readdirargs), .pc_ressize = sizeof(struct nfsd_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIR", @@ -821,6 +838,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_statfsres, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_statfsres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+5, diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 3c908ffbbf45f..5cf6543d3c8e5 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -476,6 +476,7 @@ struct svc_procedure { /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */ + unsigned int pc_argzero; /* how much of argument to clear */ unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9caaed1c143e6..8c29181796492 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1237,7 +1237,7 @@ svc_generic_init_request(struct svc_rqst *rqstp, rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc];
/* Initialize storage for argp and resp */ - memset(rqstp->rq_argp, 0, procp->pc_argsize); + memset(rqstp->rq_argp, 0, procp->pc_argzero); memset(rqstp->rq_resp, 0, procp->pc_ressize);
/* Bump per-procedure stats counter */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3fdc546462348b8a497c72bc894e0cde9f10fc40 ]
Have SunRPC clear everything except for the iops array. Then have each NFSv4 XDR decoder clear it's own argument before decoding.
Now individual operations may have a large argument struct while not penalizing the vast majority of operations with a small struct.
And, clearing the argument structure occurs as the argument fields are initialized, enabling the CPU to do write combining on that memory. In some cases, clearing is not even necessary because all of the fields in the argument structure are initialized by the decoder.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 61 +++++++++++++++++++++++++++++++++++++--------- 2 files changed, 51 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f6ea7445073fe..79f1990d40c44 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3588,7 +3588,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfs4svc_decode_compoundargs, .pc_encode = nfs4svc_encode_compoundres, .pc_argsize = sizeof(struct nfsd4_compoundargs), - .pc_argzero = sizeof(struct nfsd4_compoundargs), + .pc_argzero = offsetof(struct nfsd4_compoundargs, iops), .pc_ressize = sizeof(struct nfsd4_compoundres), .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b5ca83045d6e9..04699198eace7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -793,6 +793,7 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) return nfserr_bad_xdr; + memset(&commit->co_verf, 0, sizeof(commit->co_verf)); return nfs_ok; }
@@ -801,6 +802,7 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create { __be32 *p, status;
+ memset(create, 0, sizeof(*create)); if (xdr_stream_decode_u32(argp->xdr, &create->cr_type) < 0) return nfserr_bad_xdr; switch (create->cr_type) { @@ -850,6 +852,7 @@ nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegretu static inline __be32 nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) { + memset(getattr, 0, sizeof(*getattr)); return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, ARRAY_SIZE(getattr->ga_bmval)); } @@ -857,6 +860,7 @@ nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *geta static __be32 nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) { + memset(link, 0, sizeof(*link)); return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); }
@@ -905,6 +909,7 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { + memset(lock, 0, sizeof(*lock)); if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) return nfserr_bad_xdr; if ((lock->lk_type < NFS4_READ_LT) || (lock->lk_type > NFS4_WRITEW_LT)) @@ -921,6 +926,7 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) { + memset(lockt, 0, sizeof(*lockt)); if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) return nfserr_bad_xdr; if ((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) @@ -1142,11 +1148,8 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) __be32 status; u32 dummy;
- memset(open->op_bmval, 0, sizeof(open->op_bmval)); - open->op_iattr.ia_valid = 0; - open->op_openowner = NULL; + memset(open, 0, sizeof(*open));
- open->op_xdr_error = 0; if (xdr_stream_decode_u32(argp->xdr, &open->op_seqid) < 0) return nfserr_bad_xdr; /* deleg_want is ignored */ @@ -1181,6 +1184,8 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con if (xdr_stream_decode_u32(argp->xdr, &open_conf->oc_seqid) < 0) return nfserr_bad_xdr;
+ memset(&open_conf->oc_resp_stateid, 0, + sizeof(open_conf->oc_resp_stateid)); return nfs_ok; }
@@ -1189,6 +1194,7 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d { __be32 status;
+ memset(open_down, 0, sizeof(*open_down)); status = nfsd4_decode_stateid4(argp, &open_down->od_stateid); if (status) return status; @@ -1218,6 +1224,7 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) if (!putfh->pf_fhval) return nfserr_jukebox;
+ putfh->no_verify = false; return nfs_ok; }
@@ -1234,6 +1241,7 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { __be32 status;
+ memset(read, 0, sizeof(*read)); status = nfsd4_decode_stateid4(argp, &read->rd_stateid); if (status) return status; @@ -1250,6 +1258,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read { __be32 status;
+ memset(readdir, 0, sizeof(*readdir)); if (xdr_stream_decode_u64(argp->xdr, &readdir->rd_cookie) < 0) return nfserr_bad_xdr; status = nfsd4_decode_verifier4(argp, &readdir->rd_verf); @@ -1269,6 +1278,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read static __be32 nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) { + memset(&remove->rm_cinfo, 0, sizeof(remove->rm_cinfo)); return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); }
@@ -1277,6 +1287,7 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename { __be32 status;
+ memset(rename, 0, sizeof(*rename)); status = nfsd4_decode_component4(argp, &rename->rn_sname, &rename->rn_snamelen); if (status) return status; @@ -1293,6 +1304,7 @@ static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, struct nfsd4_secinfo *secinfo) { + secinfo->si_exp = NULL; return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); }
@@ -1301,6 +1313,7 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta { __be32 status;
+ memset(setattr, 0, sizeof(*setattr)); status = nfsd4_decode_stateid4(argp, &setattr->sa_stateid); if (status) return status; @@ -1315,6 +1328,8 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient { __be32 *p, status;
+ memset(setclientid, 0, sizeof(*setclientid)); + if (argp->minorversion >= 1) return nfserr_notsupp;
@@ -1371,6 +1386,8 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify { __be32 *p, status;
+ memset(verify, 0, sizeof(*verify)); + status = nfsd4_decode_bitmap4(argp, verify->ve_bmval, ARRAY_SIZE(verify->ve_bmval)); if (status) @@ -1410,6 +1427,9 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) return nfserr_bad_xdr;
+ write->wr_bytes_written = 0; + write->wr_how_written = 0; + memset(&write->wr_verifier, 0, sizeof(write->wr_verifier)); return nfs_ok; }
@@ -1434,6 +1454,7 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel
static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) { + memset(bc, 0, sizeof(*bc)); if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) return nfserr_bad_xdr; return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); @@ -1444,6 +1465,7 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, u32 use_conn_in_rdma_mode; __be32 status;
+ memset(bcts, 0, sizeof(*bcts)); status = nfsd4_decode_sessionid4(argp, &bcts->sessionid); if (status) return status; @@ -1585,6 +1607,7 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, { __be32 status;
+ memset(exid, 0, sizeof(*exid)); status = nfsd4_decode_verifier4(argp, &exid->verifier); if (status) return status; @@ -1637,6 +1660,7 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, { __be32 status;
+ memset(sess, 0, sizeof(*sess)); status = nfsd4_decode_clientid4(argp, &sess->clientid); if (status) return status; @@ -1652,11 +1676,7 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, return status; if (xdr_stream_decode_u32(argp->xdr, &sess->callback_prog) < 0) return nfserr_bad_xdr; - status = nfsd4_decode_cb_sec(argp, &sess->cb_sec); - if (status) - return status; - - return nfs_ok; + return nfsd4_decode_cb_sec(argp, &sess->cb_sec); }
static __be32 @@ -1680,6 +1700,7 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, { __be32 status;
+ memset(gdev, 0, sizeof(*gdev)); status = nfsd4_decode_deviceid4(argp, &gdev->gd_devid); if (status) return status; @@ -1700,6 +1721,7 @@ nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, { __be32 *p, status;
+ memset(lcp, 0, sizeof(*lcp)); if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.offset) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.length) < 0) @@ -1735,6 +1757,7 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, { __be32 status;
+ memset(lgp, 0, sizeof(*lgp)); if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_signal) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_layout_type) < 0) @@ -1760,6 +1783,7 @@ static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp) { + memset(lrp, 0, sizeof(*lrp)); if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_layout_type) < 0) @@ -1775,6 +1799,8 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, { if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) return nfserr_bad_xdr; + + sin->sin_exp = NULL; return nfs_ok; }
@@ -1795,6 +1821,7 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, seq->maxslots = be32_to_cpup(p++); seq->cachethis = be32_to_cpup(p);
+ seq->status_flags = 0; return nfs_ok; }
@@ -1805,6 +1832,7 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta __be32 status; u32 i;
+ memset(test_stateid, 0, sizeof(*test_stateid)); if (xdr_stream_decode_u32(argp->xdr, &test_stateid->ts_num_ids) < 0) return nfserr_bad_xdr;
@@ -1902,6 +1930,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) struct nl4_server *ns_dummy; __be32 status;
+ memset(copy, 0, sizeof(*copy)); status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); if (status) return status; @@ -1957,6 +1986,7 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, { __be32 status;
+ memset(cn, 0, sizeof(*cn)); cn->cpn_src = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_src)); if (cn->cpn_src == NULL) return nfserr_jukebox; @@ -1974,6 +2004,8 @@ static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, struct nfsd4_offload_status *os) { + os->count = 0; + os->status = 0; return nfsd4_decode_stateid4(argp, &os->stateid); }
@@ -1990,6 +2022,8 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) if (xdr_stream_decode_u32(argp->xdr, &seek->seek_whence) < 0) return nfserr_bad_xdr;
+ seek->seek_eof = 0; + seek->seek_pos = 0; return nfs_ok; }
@@ -2125,6 +2159,7 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, __be32 status; u32 maxcount;
+ memset(getxattr, 0, sizeof(*getxattr)); status = nfsd4_decode_xattr_name(argp, &getxattr->getxa_name); if (status) return status; @@ -2133,8 +2168,7 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount);
getxattr->getxa_len = maxcount; - - return status; + return nfs_ok; }
static __be32 @@ -2144,6 +2178,8 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, u32 flags, maxcount, size; __be32 status;
+ memset(setxattr, 0, sizeof(*setxattr)); + if (xdr_stream_decode_u32(argp->xdr, &flags) < 0) return nfserr_bad_xdr;
@@ -2182,6 +2218,8 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, { u32 maxcount;
+ memset(listxattrs, 0, sizeof(*listxattrs)); + if (xdr_stream_decode_u64(argp->xdr, &listxattrs->lsxa_cookie) < 0) return nfserr_bad_xdr;
@@ -2209,6 +2247,7 @@ static __be32 nfsd4_decode_removexattr(struct nfsd4_compoundargs *argp, struct nfsd4_removexattr *removexattr) { + memset(removexattr, 0, sizeof(*removexattr)); return nfsd4_decode_xattr_name(argp, &removexattr->rmxa_name); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 98124f5bd6c76699d514fbe491dd95265369cc99 ]
The dust has settled a bit and it's become obvious what code is totally common between nfsd_init_dirlist_pages() and nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.
The new helper brackets the existing xdr_init_decode_pages() API.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 10 +--------- fs/nfsd/nfsproc.c | 10 +--------- include/linux/sunrpc/xdr.h | 2 ++ net/sunrpc/xdr.c | 22 ++++++++++++++++++++++ 4 files changed, 26 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 5c2e2b5e5945f..a9bf455aee821 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -574,15 +574,7 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, buf->pages = rqstp->rq_next_page; rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT;
- /* This is xdr_init_encode(), but it assumes that - * the head kvec has already been consumed. */ - xdr_set_scratch_buffer(xdr, NULL, 0); - xdr->buf = buf; - xdr->page_ptr = buf->pages; - xdr->iov = NULL; - xdr->p = page_address(*buf->pages); - xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); - xdr->rqst = NULL; + xdr_init_encode_pages(xdr, buf, buf->pages, NULL); }
/* diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3509331a66504..78fa5a9edf277 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -575,15 +575,7 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++;
- /* This is xdr_init_encode(), but it assumes that - * the head kvec has already been consumed. */ - xdr_set_scratch_buffer(xdr, NULL, 0); - xdr->buf = buf; - xdr->page_ptr = buf->pages; - xdr->iov = NULL; - xdr->p = page_address(*buf->pages); - xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); - xdr->rqst = NULL; + xdr_init_encode_pages(xdr, buf, buf->pages, NULL); }
/* diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 59d99ff31c1a9..c1c50eaae4726 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -239,6 +239,8 @@ typedef int (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); +extern void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, + struct page **pages, struct rpc_rqst *rqst); extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index f66e2de7cd279..e2bd0cd391142 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -690,6 +690,28 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, } EXPORT_SYMBOL_GPL(xdr_init_encode);
+/** + * xdr_init_encode_pages - Initialize an xdr_stream for encoding into pages + * @xdr: pointer to xdr_stream struct + * @buf: pointer to XDR buffer into which to encode data + * @pages: list of pages to decode into + * @rqst: pointer to controlling rpc_rqst, for debugging + * + */ +void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, + struct page **pages, struct rpc_rqst *rqst) +{ + xdr_reset_scratch_buffer(xdr); + + xdr->buf = buf; + xdr->page_ptr = pages; + xdr->iov = NULL; + xdr->p = page_address(*pages); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); + xdr->rqst = rqst; +} +EXPORT_SYMBOL_GPL(xdr_init_encode_pages); + /** * __xdr_commit_encode - Ensure all data is written to buffer * @xdr: pointer to xdr_stream
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c3d2a04f05c590303c125a176e6e43df4a436fdb ]
Replace the check for buffer over/underflow with a helper that is commonly used for this purpose. The helper also sets xdr->nwords correctly after successfully linearizing the symlink argument into the stream's scratch buffer.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0293b8d65f10f..71e32cf288854 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -616,8 +616,6 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; - size_t remaining;
if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) return false; @@ -626,16 +624,10 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) return false;
- /* request sanity */ - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->tlen)) - return false; - - args->first.iov_base = xdr->p; + /* symlink_data */ args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - - return true; + args->first.iov_base = xdr_inline_decode(xdr, args->tlen); + return args->first.iov_base != NULL; }
bool
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d4da5baa533215b14625458e645056baf646bb2e ]
xdr_stream_subsegment() already returns a boolean value.
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3xdr.c | 4 +--- fs/nfsd/nfsxdr.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 71e32cf288854..3308dd671ef0b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -571,10 +571,8 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) args->count = max_blocksize; args->len = max_blocksize; } - if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) - return false;
- return true; + return xdr_stream_subsegment(xdr, &args->payload, args->count); }
bool diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index aba8520b4b8b6..caf6355b18fa9 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -338,10 +338,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return false; if (args->len > NFSSVC_MAXBLKSIZE_V2) return false; - if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) - return false;
- return true; + return xdr_stream_subsegment(xdr, &args->payload, args->len); }
bool
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9993a66317fc9951322483a9edbfae95a640b210 ]
In today's Linux NFS server implementation, the NFS dispatcher initializes each XDR result stream, and the NFSv4 .pc_func and .pc_encode methods all use xdr_stream-based encoding. This keeps rq_res.len automatically updated. There is no longer a need for the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().
Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 04699198eace7..fc587381cd087 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5467,12 +5467,8 @@ bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = xdr->buf; __be32 *p;
- WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + - buf->tail[0].iov_len); - /* * Send buffer space for the following items is reserved * at the top of nfsd4_proc_compound().
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 6604148cf961b57fc735e4204f8996536da9253c ]
These helpers are always invoked indirectly, so the compiler can't inline these anyway. While we're updating the synopses of these helpers, defensively convert their parameters to const pointers.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 121 ++++++++++++++++++++++++++++----------------- fs/nfsd/xdr4.h | 3 +- 2 files changed, 77 insertions(+), 47 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 79f1990d40c44..1bb0fb917cf0d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2763,28 +2763,33 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
#define op_encode_channel_attrs_maxsz (6 + 1 + 1)
-static inline u32 nfsd4_only_status_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_only_status_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size) * sizeof(__be32); }
-static inline u32 nfsd4_status_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_status_stateid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz)* sizeof(__be32); }
-static inline u32 nfsd4_access_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_access_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { /* ac_supported, ac_resp_access */ return (op_encode_hdr_size + 2)* sizeof(__be32); }
-static inline u32 nfsd4_commit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_commit_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_verifier_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_create_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); @@ -2795,10 +2800,10 @@ static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op * the op prematurely if the estimate is too large. We may turn off splice * reads unnecessarily. */ -static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { - u32 *bmap = op->u.getattr.ga_bmval; + const u32 *bmap = op->u.getattr.ga_bmval; u32 bmap0 = bmap[0], bmap1 = bmap[1], bmap2 = bmap[2]; u32 ret = 0;
@@ -2833,24 +2838,28 @@ static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, return ret; }
-static inline u32 nfsd4_getfh_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_getfh_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + NFS4_FHSIZE; }
-static inline u32 nfsd4_link_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_link_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_lock_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_lock_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_lock_denied_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_open_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz + op_encode_change_info_maxsz + 1 @@ -2858,7 +2867,8 @@ static inline u32 nfsd4_open_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) + op_encode_delegation_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0;
@@ -2868,7 +2878,8 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32); }
-static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = svc_max_payload(rqstp); u32 rlen = min(op->u.read.rd_length, maxcount); @@ -2882,7 +2893,8 @@ static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32); }
-static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_readdir_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0;
@@ -2893,59 +2905,68 @@ static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *o XDR_QUADLEN(rlen)) * sizeof(__be32); }
-static inline u32 nfsd4_readlink_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_readlink_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + PAGE_SIZE; }
-static inline u32 nfsd4_remove_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_remove_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_rename_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_rename_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + op_encode_change_info_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_sequence_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_sequence_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5) * sizeof(__be32); }
-static inline u32 nfsd4_test_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_test_stateid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 + op->u.test_stateid.ts_num_ids) * sizeof(__be32); }
-static inline u32 nfsd4_setattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_setattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_secinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_secinfo_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + RPC_AUTH_MAXFLAVOR * (4 + XDR_QUADLEN(GSS_OID_MAX_LEN))) * sizeof(__be32); }
-static inline u32 nfsd4_setclientid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_setclientid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + XDR_QUADLEN(NFS4_VERIFIER_SIZE)) * sizeof(__be32); }
-static inline u32 nfsd4_write_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_write_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + op_encode_verifier_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_exchange_id_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + 1 + /* eir_clientid, eir_sequenceid */\ 1 + 1 + /* eir_flags, spr_how */\ @@ -2959,14 +2980,16 @@ static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_o 0 /* ignored eir_server_impl_id contents */) * sizeof(__be32); }
-static inline u32 nfsd4_bind_conn_to_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_bind_conn_to_session_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* bctsr_sessid */\ 2 /* bctsr_dir, use_conn_in_rdma_mode */) * sizeof(__be32); }
-static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_create_session_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* sessionid */\ @@ -2975,7 +2998,8 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd op_encode_channel_attrs_maxsz) * sizeof(__be32); }
-static inline u32 nfsd4_copy_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_copy_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* wr_callback */ + @@ -2987,16 +3011,16 @@ static inline u32 nfsd4_copy_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) 1 /* cr_synchronous */) * sizeof(__be32); }
-static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_offload_status_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 /* osr_count */ + 1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32); }
-static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 3 /* cnr_lease_time */ + @@ -3011,7 +3035,8 @@ static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp, }
#ifdef CONFIG_NFSD_PNFS -static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0;
@@ -3029,7 +3054,8 @@ static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4 * so we need to define an arbitrary upper bound here. */ #define MAX_LAYOUT_SIZE 128 -static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutget_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* logr_return_on_close */ + @@ -3038,14 +3064,16 @@ static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op MAX_LAYOUT_SIZE) * sizeof(__be32); }
-static inline u32 nfsd4_layoutcommit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutcommit_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* locr_newsize */ + 2 /* ns_size */) * sizeof(__be32); }
-static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutreturn_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* lrs_stateid */ + @@ -3054,13 +3082,14 @@ static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_ #endif /* CONFIG_NFSD_PNFS */
-static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_seek_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 3) * sizeof(__be32); }
-static inline u32 nfsd4_getxattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_getxattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount, rlen;
@@ -3070,14 +3099,14 @@ static inline u32 nfsd4_getxattr_rsize(struct svc_rqst *rqstp, return (op_encode_hdr_size + 1 + XDR_QUADLEN(rlen)) * sizeof(__be32); }
-static inline u32 nfsd4_setxattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_setxattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_listxattrs_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_listxattrs_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount, rlen;
@@ -3087,8 +3116,8 @@ static inline u32 nfsd4_listxattrs_rsize(struct svc_rqst *rqstp, return (op_encode_hdr_size + 4 + XDR_QUADLEN(rlen)) * sizeof(__be32); }
-static inline u32 nfsd4_removexattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_removexattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 2a699e8ca1baf..1e83fda7601ab 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -889,7 +889,8 @@ struct nfsd4_operation { u32 op_flags; char *op_name; /* Try to get response size before operation */ - u32 (*op_rsize_bop)(struct svc_rqst *, struct nfsd4_op *); + u32 (*op_rsize_bop)(const struct svc_rqst *rqstp, + const struct nfsd4_op *op); void (*op_get_currentstateid)(struct nfsd4_compound_state *, union nfsd4_op_u *); void (*op_set_currentstateid)(struct nfsd4_compound_state *,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 77e378cf2a595d8e39cddf28a31efe6afd9394a0 ]
This field was added by commit 1091006c5eb1 ("nfsd: turn on reply cache for NFSv4") but was never put to use.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/xdr4.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 1e83fda7601ab..624a19ec3ad11 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -724,7 +724,6 @@ struct nfsd4_compoundargs { u32 opcnt; struct nfsd4_op *ops; struct nfsd4_op iops[8]; - int cachetype; };
struct nfsd4_compoundres {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9f553e61bd36c1048543ac2f6945103dd2f742be ]
Remove a couple of 4-byte holes on platforms with 64-bit pointers.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/xdr4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 624a19ec3ad11..8f323d9071f06 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -732,8 +732,8 @@ struct nfsd4_compoundres { struct svc_rqst * rqstp;
__be32 *statusp; - u32 taglen; char * tag; + u32 taglen; u32 opcnt;
struct nfsd4_compound_state cstate;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit 0cfb0c4228a5c8e2ed2b58f8309b660b187cef02 ]
Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/stats.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index a8c5a02a84f04..777e24e5da33b 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -32,7 +32,7 @@ struct svc_stat nfsd_svcstats = { .program = &nfsd_program, };
-static int nfsd_proc_show(struct seq_file *seq, void *v) +static int nfsd_show(struct seq_file *seq, void *v) { int i;
@@ -72,17 +72,7 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) return 0; }
-static int nfsd_proc_open(struct inode *inode, struct file *file) -{ - return single_open(file, nfsd_proc_show, NULL); -} - -static const struct proc_ops nfsd_proc_ops = { - .proc_open = nfsd_proc_open, - .proc_read = seq_read, - .proc_lseek = seq_lseek, - .proc_release = single_release, -}; +DEFINE_PROC_SHOW_ATTRIBUTE(nfsd);
int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit 9beeaab8e05d353d709103cafa1941714b4d5d94 ]
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com [ cel: reduce line length ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 29 +++++------------------------ 1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 597a26ad4183f..3ed0cfdb0c0b5 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -185,17 +185,7 @@ static int export_features_show(struct seq_file *m, void *v) return 0; }
-static int export_features_open(struct inode *inode, struct file *file) -{ - return single_open(file, export_features_show, NULL); -} - -static const struct file_operations export_features_operations = { - .open = export_features_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(export_features);
#if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) static int supported_enctypes_show(struct seq_file *m, void *v) @@ -204,17 +194,7 @@ static int supported_enctypes_show(struct seq_file *m, void *v) return 0; }
-static int supported_enctypes_open(struct inode *inode, struct file *file) -{ - return single_open(file, supported_enctypes_show, NULL); -} - -static const struct file_operations supported_enctypes_ops = { - .open = supported_enctypes_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(supported_enctypes); #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */
static const struct file_operations pool_stats_operations = { @@ -1365,7 +1345,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) /* Per-export io stats use same ops as exports file */ [NFSD_Export_Stats] = {"export_stats", &exports_nfsd_operations, S_IRUGO}, [NFSD_Export_features] = {"export_features", - &export_features_operations, S_IRUGO}, + &export_features_fops, S_IRUGO}, [NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_FO_UnlockFS] = {"unlock_filesystem", @@ -1381,7 +1361,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) - [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO}, + [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", + &supported_enctypes_fops, S_IRUGO}, #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ #ifdef CONFIG_NFSD_V4 [NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit 1d7f6b302b75ff7acb9eb3cab0c631b10cfa7542 ]
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
inode is converted from seq_file->file instead of seq_file->private in client_info_show().
Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d2468a408328d..fce62a4388a26 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2503,7 +2503,7 @@ static const char *cb_state2str(int state)
static int client_info_show(struct seq_file *m, void *v) { - struct inode *inode = m->private; + struct inode *inode = file_inode(m->file); struct nfs4_client *clp; u64 clid;
@@ -2543,17 +2543,7 @@ static int client_info_show(struct seq_file *m, void *v) return 0; }
-static int client_info_open(struct inode *inode, struct file *file) -{ - return single_open(file, client_info_show, inode); -} - -static const struct file_operations client_info_fops = { - .open = client_info_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(client_info);
static void *states_start(struct seq_file *s, loff_t *pos) __acquires(&clp->cl_lock)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit 64776611a06322b99386f8dfe3b3ba1aa0347a38 ]
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show().
Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com [ cel: reduce line length ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/cache.h | 2 +- fs/nfsd/nfscache.c | 13 +++---------- fs/nfsd/nfsctl.c | 10 +++------- 3 files changed, 7 insertions(+), 18 deletions(-)
diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h index 65c331f75e9c7..f21259ead64bb 100644 --- a/fs/nfsd/cache.h +++ b/fs/nfsd/cache.h @@ -84,6 +84,6 @@ int nfsd_reply_cache_init(struct nfsd_net *); void nfsd_reply_cache_shutdown(struct nfsd_net *); int nfsd_cache_lookup(struct svc_rqst *); void nfsd_cache_update(struct svc_rqst *, int, __be32 *); -int nfsd_reply_cache_stats_open(struct inode *, struct file *); +int nfsd_reply_cache_stats_show(struct seq_file *m, void *v);
#endif /* NFSCACHE_H */ diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 7da88bdc0d6c3..2b5417e06d80d 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -603,9 +603,10 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data) * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) +int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) { - struct nfsd_net *nn = m->private; + struct nfsd_net *nn = net_generic(file_inode(m->file)->i_sb->s_fs_info, + nfsd_net_id);
seq_printf(m, "max entries: %u\n", nn->max_drc_entries); seq_printf(m, "num entries: %u\n", @@ -625,11 +626,3 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; } - -int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file) -{ - struct nfsd_net *nn = net_generic(file_inode(file)->i_sb->s_fs_info, - nfsd_net_id); - - return single_open(file, nfsd_reply_cache_stats_show, nn); -} diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 3ed0cfdb0c0b5..1983f4f2908d9 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -204,12 +204,7 @@ static const struct file_operations pool_stats_operations = { .release = nfsd_pool_stats_release, };
-static const struct file_operations reply_cache_stats_operations = { - .open = nfsd_reply_cache_stats_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(nfsd_reply_cache_stats);
static const struct file_operations filecache_ops = { .open = nfsd_file_cache_stats_open, @@ -1354,7 +1349,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO}, - [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO}, + [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", + &nfsd_reply_cache_stats_fops, S_IRUGO}, [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit 1342f9dd3fc219089deeb2620f6790f19b4129b1 ]
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 7 +------ fs/nfsd/filecache.h | 2 +- fs/nfsd/nfsctl.c | 9 ++------- 3 files changed, 4 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 55478d411e5a0..fa8e1546e0206 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1211,7 +1211,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) +int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long releases = 0, pages_flushed = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; @@ -1258,8 +1258,3 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; } - -int nfsd_file_cache_stats_open(struct inode *inode, struct file *file) -{ - return single_open(file, nfsd_file_cache_stats_show, NULL); -} diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 8e8c0c47d67df..357832bac736b 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -60,5 +60,5 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); -int nfsd_file_cache_stats_open(struct inode *, struct file *); +int nfsd_file_cache_stats_show(struct seq_file *m, void *v); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 1983f4f2908d9..6a29bcfc93909 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -206,12 +206,7 @@ static const struct file_operations pool_stats_operations = {
DEFINE_SHOW_ATTRIBUTE(nfsd_reply_cache_stats);
-static const struct file_operations filecache_ops = { - .open = nfsd_file_cache_stats_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(nfsd_file_cache_stats);
/*----------------------------------------------------------------------------*/ /* @@ -1355,7 +1350,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, - [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, + [NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_fops, S_IRUGO},
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 781fde1a2ba2391f31142f46f964cf1148ca1791 ]
Code maintenance: The name of the copy_stateid_t::sc_count field collides with the sc_count field in struct nfs4_stid, making the latter difficult to grep for when auditing stateid reference counting.
No behavior change expected.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 6 +++--- fs/nfsd/nfs4state.c | 30 +++++++++++++++--------------- fs/nfsd/state.h | 6 +++--- 3 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1bb0fb917cf0d..e1aa48d496b98 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1818,7 +1818,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!nfs4_init_copy_state(nn, copy)) goto out_err; refcount_set(&async_copy->refcount, 1); - memcpy(©->cp_res.cb_stateid, ©->cp_stateid.stid, + memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, sizeof(copy->cp_res.cb_stateid)); dup_copy_fields(copy, async_copy); async_copy->copy_task = kthread_create(nfsd4_do_async_copy, @@ -1856,7 +1856,7 @@ find_async_copy(struct nfs4_client *clp, stateid_t *stateid)
spin_lock(&clp->async_lock); list_for_each_entry(copy, &clp->async_copies, copies) { - if (memcmp(©->cp_stateid.stid, stateid, NFS4_STATEID_SIZE)) + if (memcmp(©->cp_stateid.cs_stid, stateid, NFS4_STATEID_SIZE)) continue; refcount_inc(©->refcount); spin_unlock(&clp->async_lock); @@ -1910,7 +1910,7 @@ nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, cps = nfs4_alloc_init_cpntf_state(nn, stid); if (!cps) goto out; - memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.stid, sizeof(stateid_t)); + memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.cs_stid, sizeof(stateid_t)); memcpy(&cps->cp_p_stateid, &stid->sc_stateid, sizeof(stateid_t)); memcpy(&cps->cp_p_clid, &clp->cl_clientid, sizeof(clientid_t));
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fce62a4388a26..f207c73ae1b58 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -985,19 +985,19 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla * Create a unique stateid_t to represent each COPY. */ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, - unsigned char sc_type) + unsigned char cs_type) { int new_id;
- stid->stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; - stid->stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; - stid->sc_type = sc_type; + stid->cs_stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; + stid->cs_stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; + stid->cs_type = cs_type;
idr_preload(GFP_KERNEL); spin_lock(&nn->s2s_cp_lock); new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, stid, 0, 0, GFP_NOWAIT); - stid->stid.si_opaque.so_id = new_id; - stid->stid.si_generation = 1; + stid->cs_stid.si_opaque.so_id = new_id; + stid->cs_stid.si_generation = 1; spin_unlock(&nn->s2s_cp_lock); idr_preload_end(); if (new_id < 0) @@ -1019,7 +1019,7 @@ struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn, if (!cps) return NULL; cps->cpntf_time = ktime_get_boottime_seconds(); - refcount_set(&cps->cp_stateid.sc_count, 1); + refcount_set(&cps->cp_stateid.cs_count, 1); if (!nfs4_init_cp_state(nn, &cps->cp_stateid, NFS4_COPYNOTIFY_STID)) goto out_free; spin_lock(&nn->s2s_cp_lock); @@ -1035,11 +1035,11 @@ void nfs4_free_copy_state(struct nfsd4_copy *copy) { struct nfsd_net *nn;
- WARN_ON_ONCE(copy->cp_stateid.sc_type != NFS4_COPY_STID); + WARN_ON_ONCE(copy->cp_stateid.cs_type != NFS4_COPY_STID); nn = net_generic(copy->cp_clp->net, nfsd_net_id); spin_lock(&nn->s2s_cp_lock); idr_remove(&nn->s2s_cp_stateids, - copy->cp_stateid.stid.si_opaque.so_id); + copy->cp_stateid.cs_stid.si_opaque.so_id); spin_unlock(&nn->s2s_cp_lock); }
@@ -6044,7 +6044,7 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&nn->s2s_cp_lock); idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { cps = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (cps->cp_stateid.sc_type == NFS4_COPYNOTIFY_STID && + if (cps->cp_stateid.cs_type == NFS4_COPYNOTIFY_STID && state_expired(<, cps->cpntf_time)) _free_cpntf_state_locked(nn, cps); } @@ -6384,12 +6384,12 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s, static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps) { - WARN_ON_ONCE(cps->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID); - if (!refcount_dec_and_test(&cps->cp_stateid.sc_count)) + WARN_ON_ONCE(cps->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID); + if (!refcount_dec_and_test(&cps->cp_stateid.cs_count)) return; list_del(&cps->cp_list); idr_remove(&nn->s2s_cp_stateids, - cps->cp_stateid.stid.si_opaque.so_id); + cps->cp_stateid.cs_stid.si_opaque.so_id); kfree(cps); } /* @@ -6411,12 +6411,12 @@ __be32 manage_cpntf_state(struct nfsd_net *nn, stateid_t *st, if (cps_t) { state = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (state->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID) { + if (state->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID) { state = NULL; goto unlock; } if (!clp) - refcount_inc(&state->cp_stateid.sc_count); + refcount_inc(&state->cp_stateid.cs_count); else _free_cpntf_state_locked(nn, state); } diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 4155be65d8069..b3477087a9fc3 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -57,11 +57,11 @@ typedef struct { } stateid_t;
typedef struct { - stateid_t stid; + stateid_t cs_stid; #define NFS4_COPY_STID 1 #define NFS4_COPYNOTIFY_STID 2 - unsigned char sc_type; - refcount_t sc_count; + unsigned char cs_type; + refcount_t cs_count; } copy_stateid_t;
struct nfsd4_callback {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 76ce4dcec0dc08a032db916841ddc4e3998be317 ]
Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time.
Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large.
Add an NFSv4 helper that computes the size of the send buffer. It replaces svc_max_payload() in spots where svc_max_payload() returns a value that might be larger than the remaining send buffer space. Callers who need to know the transport's actual maximum payload size will continue to use svc_max_payload().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 48 +++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e1aa48d496b98..50fd4ba04a3e0 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2763,6 +2763,22 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
#define op_encode_channel_attrs_maxsz (6 + 1 + 1)
+/* + * The _rsize() helpers are invoked by the NFSv4 COMPOUND decoder, which + * is called before sunrpc sets rq_res.buflen. Thus we have to compute + * the maximum payload size here, based on transport limits and the size + * of the remaining space in the rq_pages array. + */ +static u32 nfsd4_max_payload(const struct svc_rqst *rqstp) +{ + u32 buflen; + + buflen = (rqstp->rq_page_end - rqstp->rq_next_page) * PAGE_SIZE; + buflen -= rqstp->rq_auth_slack; + buflen -= rqstp->rq_res.head[0].iov_len; + return min_t(u32, buflen, svc_max_payload(rqstp)); +} + static u32 nfsd4_only_status_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { @@ -2808,9 +2824,9 @@ static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, u32 ret = 0;
if (bmap0 & FATTR4_WORD0_ACL) - return svc_max_payload(rqstp); + return nfsd4_max_payload(rqstp); if (bmap0 & FATTR4_WORD0_FS_LOCATIONS) - return svc_max_payload(rqstp); + return nfsd4_max_payload(rqstp);
if (bmap1 & FATTR4_WORD1_OWNER) { ret += IDMAP_NAMESZ + 4; @@ -2870,10 +2886,7 @@ static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.read.rd_length, maxcount); + u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp));
return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32); } @@ -2881,8 +2894,7 @@ static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = svc_max_payload(rqstp); - u32 rlen = min(op->u.read.rd_length, maxcount); + u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp)); /* * If we detect that the file changed during hole encoding, then we * recover by encoding the remaining reply as data. This means we need @@ -2896,10 +2908,7 @@ static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_readdir_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.readdir.rd_maxcount, maxcount); + u32 rlen = min(op->u.readdir.rd_maxcount, nfsd4_max_payload(rqstp));
return (op_encode_hdr_size + op_encode_verifier_maxsz + XDR_QUADLEN(rlen)) * sizeof(__be32); @@ -3038,10 +3047,7 @@ static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.getdeviceinfo.gd_maxcount, maxcount); + u32 rlen = min(op->u.getdeviceinfo.gd_maxcount, nfsd4_max_payload(rqstp));
return (op_encode_hdr_size + 1 /* gd_layout_type*/ + @@ -3091,10 +3097,7 @@ static u32 nfsd4_seek_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_getxattr_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount, rlen; - - maxcount = svc_max_payload(rqstp); - rlen = min_t(u32, XATTR_SIZE_MAX, maxcount); + u32 rlen = min_t(u32, XATTR_SIZE_MAX, nfsd4_max_payload(rqstp));
return (op_encode_hdr_size + 1 + XDR_QUADLEN(rlen)) * sizeof(__be32); } @@ -3108,10 +3111,7 @@ static u32 nfsd4_setxattr_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_listxattrs_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount, rlen; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.listxattrs.lsxa_maxcount, maxcount); + u32 rlen = min(op->u.listxattrs.lsxa_maxcount, nfsd4_max_payload(rqstp));
return (op_encode_hdr_size + 4 + XDR_QUADLEN(rlen)) * sizeof(__be32); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 4d01416ab41540bb13ec4a39ac4e6c4aa5934bc9 ]
In the case of a revoked delegation, we still fill out the pointer even when returning an error, which is bad form. Only overwrite the pointer on success.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f207c73ae1b58..1dc3823f3d124 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6289,6 +6289,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, struct nfs4_stid **s, struct nfsd_net *nn) { __be32 status; + struct nfs4_stid *stid; bool return_revoked = false;
/* @@ -6311,15 +6312,16 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, } if (status) return status; - *s = find_stateid_by_type(cstate->clp, stateid, typemask); - if (!*s) + stid = find_stateid_by_type(cstate->clp, stateid, typemask); + if (!stid) return nfserr_bad_stateid; - if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { - nfs4_put_stid(*s); + if ((stid->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { + nfs4_put_stid(stid); if (cstate->minorversion) return nfserr_deleg_revoked; return nfserr_bad_stateid; } + *s = stid; return nfs_ok; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 25fbe1fca14142beae6c882f7906510363d42bff ]
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1dc3823f3d124..e9fc5a357fc4d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4870,14 +4870,14 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) * We're assuming the state code never drops its reference * without first removing the lease. Since we're in this lease * callback (and since the lease code is serialized by the - * i_lock) we know the server hasn't removed the lease yet, and + * flc_lock) we know the server hasn't removed the lease yet, and * we know it's safe to take a reference. */ refcount_inc(&dp->dl_stid.sc_count); nfsd4_run_cb(&dp->dl_recall); }
-/* Called from break_lease() with i_lock held. */ +/* Called from break_lease() with flc_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit b95239ca4954a0d48b19c09ce7e8f31b453b4216 ]
queue_work can return false and not queue anything, if the work is already queued. If that happens in the case of a CB_RECALL, we'll have taken an extra reference to the stid that will never be put. Ensure we throw a warning in that case.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 14 ++++++++++++-- fs/nfsd/nfs4state.c | 5 ++--- fs/nfsd/state.h | 2 +- 3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index face8908a40b1..39989c14c8a1e 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1373,11 +1373,21 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, cb->cb_holds_slot = false; }
-void nfsd4_run_cb(struct nfsd4_callback *cb) +/** + * nfsd4_run_cb - queue up a callback job to run + * @cb: callback to queue + * + * Kick off a callback to do its thing. Returns false if it was already + * on a queue, true otherwise. + */ +bool nfsd4_run_cb(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; + bool queued;
nfsd41_cb_inflight_begin(clp); - if (!nfsd4_queue_cb(cb)) + queued = nfsd4_queue_cb(cb); + if (!queued) nfsd41_cb_inflight_end(clp); + return queued; } diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e9fc5a357fc4d..fc6188d70796d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4874,14 +4874,13 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) * we know it's safe to take a reference. */ refcount_inc(&dp->dl_stid.sc_count); - nfsd4_run_cb(&dp->dl_recall); + WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall)); }
/* Called from break_lease() with flc_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) { - bool ret = false; struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; struct nfs4_client *clp = dp->dl_stid.sc_client; @@ -4907,7 +4906,7 @@ nfsd_break_deleg_cb(struct file_lock *fl) fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); spin_unlock(&fp->fi_lock); - return ret; + return false; }
/** diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index b3477087a9fc3..e2daef3cc0034 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -692,7 +692,7 @@ extern void nfsd4_probe_callback_sync(struct nfs4_client *clp); extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *); extern void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, const struct nfsd4_callback_ops *ops, enum nfsd4_cb_op op); -extern void nfsd4_run_cb(struct nfsd4_callback *cb); +extern bool nfsd4_run_cb(struct nfsd4_callback *cb); extern int nfsd4_create_callback_queue(void); extern void nfsd4_destroy_callback_queue(void); extern void nfsd4_shutdown_callback(struct nfs4_client *);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 895ddf5ed4c54ea9e3533606d7a8b4e4f27f95ef ]
We've had some reports of problems in the refcounting for delegation stateids that we've yet to track down. Add some extra checks to ensure that we've removed the object from various lists before freeing it.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067 Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fc6188d70796d..948ef17178158 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1071,7 +1071,12 @@ static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
static void nfs4_free_deleg(struct nfs4_stid *stid) { - WARN_ON(!list_empty(&stid->sc_cp_list)); + struct nfs4_delegation *dp = delegstateid(stid); + + WARN_ON_ONCE(!list_empty(&stid->sc_cp_list)); + WARN_ON_ONCE(!list_empty(&dp->dl_perfile)); + WARN_ON_ONCE(!list_empty(&dp->dl_perclnt)); + WARN_ON_ONCE(!list_empty(&dp->dl_recall_lru)); kmem_cache_free(deleg_slab, stid); atomic_long_dec(&num_delegations); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit d5bf88895f24686641c39420ee6df716dc1d95d8 ]
Reviewed-by: Matthew Bobrowski repnop@google.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify.h | 2 +- fs/notify/fanotify/fanotify_user.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index cd7d09a569fff..a2a15bc4df280 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -18,7 +18,7 @@
#include "fanotify.h"
-static bool fanotify_path_equal(struct path *p1, struct path *p2) +static bool fanotify_path_equal(const struct path *p1, const struct path *p2) { return p1->mnt == p2->mnt && p1->dentry == p2->dentry; } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 1d9f11255c64f..bf6d4d38afa04 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -458,7 +458,7 @@ static inline bool fanotify_event_has_path(struct fanotify_event *event) event->type == FANOTIFY_EVENT_TYPE_PATH_PERM; }
-static inline struct path *fanotify_event_path(struct fanotify_event *event) +static inline const struct path *fanotify_event_path(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_PATH) return &FANOTIFY_PE(event)->path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 72dd446606a78..5302313f28bed 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -237,7 +237,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, return event; }
-static int create_fd(struct fsnotify_group *group, struct path *path, +static int create_fd(struct fsnotify_group *group, const struct path *path, struct file **file) { int client_fd; @@ -607,7 +607,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, char __user *buf, size_t count) { struct fanotify_event_metadata metadata; - struct path *path = fanotify_event_path(event); + const struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; @@ -1541,7 +1541,7 @@ static int fanotify_test_fid(struct dentry *dentry) }
static int fanotify_events_supported(struct fsnotify_group *group, - struct path *path, __u64 mask, + const struct path *path, __u64 mask, unsigned int flags) { unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gaosheng Cui cuigaosheng1@huawei.com
[ Upstream commit f847c74d6e89f10926db58649a05b99237258691 ]
fsnotify_alloc_event_holder() and fsnotify_destroy_event_holder() has been removed since commit 7053aee26a35 ("fsnotify: do not share events between notification groups"), so remove it.
Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fsnotify.h | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index 87d8a50ee8038..fde74eb333cc9 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -76,10 +76,6 @@ static inline void fsnotify_clear_marks_by_sb(struct super_block *sb) */ extern void __fsnotify_update_child_dentry_flags(struct inode *inode);
-/* allocate and destroy and event holder to attach events to notification/access queues */ -extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void); -extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder); - extern struct kmem_cache *fsnotify_mark_connector_cachep;
#endif /* __FS_NOTIFY_FSNOTIFY_H_ */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gaosheng Cui cuigaosheng1@huawei.com
[ Upstream commit 7a80bf902d2bc722b4477442ee772e8574603185 ]
All uses of fanotify_event_has_path() have been removed since commit 9c61f3b560f5 ("fanotify: break up fanotify_alloc_event()"), now it is useless, so remove it.
Link: https://lore.kernel.org/r/20220926023018.1505270-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Signed-off-by: Jan Kara jack@suse.cz [ cel: resolved merge conflict ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/notify/fanotify/fanotify.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index bf6d4d38afa04..57f51a9a3015d 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -452,12 +452,6 @@ static inline bool fanotify_is_error_event(u32 mask) return mask & FAN_FS_ERROR; }
-static inline bool fanotify_event_has_path(struct fanotify_event *event) -{ - return event->type == FANOTIFY_EVENT_TYPE_PATH || - event->type == FANOTIFY_EVENT_TYPE_PATH_PERM; -} - static inline const struct path *fanotify_event_path(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_PATH)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ]
nfsd_file_unhash_and_dispose() is called for two reasons:
We're either shutting down and purging the filecache, or we've gotten a notification about a file delete, so we want to go ahead and unhash it so that it'll get cleaned up when we close.
We're either walking the hashtable or doing a lookup in it and we don't take a reference in either case. What we want to do in both cases is to try and unhash the object and put it on the dispose list if that was successful. If it's no longer hashed, then we don't want to touch it, with the assumption being that something else is already cleaning up the sentinel reference.
Instead of trying to selectively decrement the refcount in this function, just unhash it, and if that was successful, move it to the dispose list. Then, the disposal routine will just clean that up as usual.
Also, just make this a void function, drop the WARN_ON_ONCE, and the comments about deadlocking since the nature of the purported deadlock is no longer clear.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 36 +++++++----------------------------- 1 file changed, 7 insertions(+), 29 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fa8e1546e0206..a0d93e797cdce 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -404,22 +404,15 @@ nfsd_file_unhash(struct nfsd_file *nf) return false; }
-/* - * Return true if the file was unhashed. - */ -static bool +static void nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) { trace_nfsd_file_unhash_and_dispose(nf); - if (!nfsd_file_unhash(nf)) - return false; - /* keep final reference for nfsd_file_lru_dispose */ - if (refcount_dec_not_one(&nf->nf_ref)) - return true; - - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, dispose); - return true; + if (nfsd_file_unhash(nf)) { + /* caller must call nfsd_file_dispose_list() later */ + nfsd_file_lru_remove(nf); + list_add(&nf->nf_lru, dispose); + } }
static void @@ -561,8 +554,6 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) * @lock: LRU list lock (unused) * @arg: dispose list * - * Note this can deadlock with nfsd_file_cache_purge. - * * Return values: * %LRU_REMOVED: @item was removed from the LRU * %LRU_ROTATE: @item is to be moved to the LRU tail @@ -747,8 +738,6 @@ nfsd_file_close_inode(struct inode *inode) * * Walk the LRU list and close any entries that have not been used since * the last scan. - * - * Note this can deadlock with nfsd_file_cache_purge. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -890,16 +879,12 @@ nfsd_file_cache_init(void) goto out; }
-/* - * Note this can deadlock with nfsd_file_lru_cb. - */ static void __nfsd_file_cache_purge(struct net *net) { struct rhashtable_iter iter; struct nfsd_file *nf; LIST_HEAD(dispose); - bool del;
rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); do { @@ -909,14 +894,7 @@ __nfsd_file_cache_purge(struct net *net) while (!IS_ERR_OR_NULL(nf)) { if (net && nf->nf_net != net) continue; - del = nfsd_file_unhash_and_dispose(nf, &dispose); - - /* - * Deadlock detected! Something marked this entry as - * unhased, but hasn't removed it from the hash list. - */ - WARN_ON_ONCE(!del); - + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 243a5263014a30436c93ed3f1f864c1da845455e ]
nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough to get a reference after finding it in the hash. Take the rcu_read_lock() and call rhashtable_lookup directly.
Switch to using rhashtable_lookup_insert_key as well, and use the usual retry mechanism if we hit an -EEXIST. Rename the "retry" bool to open_retry, and eliminiate the insert_err goto target.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 52 +++++++++++++++++++-------------------------- 1 file changed, 22 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index a0d93e797cdce..0b19eb015c6c8 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1041,9 +1041,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, .need = may_flags & NFSD_FILE_MAY_MASK, .net = SVC_NET(rqstp), }; - struct nfsd_file *nf, *new; - bool retry = true; + bool open_retry = true; + struct nfsd_file *nf; __be32 status; + int ret;
status = fh_verify(rqstp, fhp, S_IFREG, may_flags|NFSD_MAY_OWNER_OVERRIDE); @@ -1053,35 +1054,33 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, key.cred = get_current_cred();
retry: - /* Avoid allocation if the item is already in cache */ - nf = rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); + rcu_read_lock(); + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); if (nf) nf = nfsd_file_get(nf); + rcu_read_unlock(); if (nf) goto wait_for_construction;
- new = nfsd_file_alloc(&key, may_flags); - if (!new) { + nf = nfsd_file_alloc(&key, may_flags); + if (!nf) { status = nfserr_jukebox; goto out_status; }
- nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl, - &key, &new->nf_rhash, - nfsd_file_rhash_params); - if (!nf) { - nf = new; - goto open_file; - } - if (IS_ERR(nf)) - goto insert_err; - nf = nfsd_file_get(nf); - if (nf == NULL) { - nf = new; + ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, + &key, &nf->nf_rhash, + nfsd_file_rhash_params); + if (likely(ret == 0)) goto open_file; - } - nfsd_file_slab_free(&new->nf_rcu); + + nfsd_file_slab_free(&nf->nf_rcu); + if (ret == -EEXIST) + goto retry; + trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); + status = nfserr_jukebox; + goto out_status;
wait_for_construction: wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE); @@ -1089,11 +1088,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, /* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); - if (!retry) { + if (!open_retry) { status = nfserr_jukebox; goto out; } - retry = false; + open_retry = false; nfsd_file_put_noref(nf); goto retry; } @@ -1141,13 +1140,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; - -insert_err: - nfsd_file_slab_free(&new->nf_rcu); - trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, PTR_ERR(nf)); - nf = NULL; - status = nfserr_jukebox; - goto out_status; }
/**
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit bd86c69dae65de30f6d47249418ba7889809e31a ]
syzbot is reporting UAF read at register_shrinker_prepared() [1], for commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") missed that nfsd4_leases_net_shutdown() from nfsd_exit_net() is called only when nfsd_init_net() succeeded. If nfsd_init_net() fails due to nfsd_reply_cache_init() failure, register_shrinker() from nfsd4_init_leases_net() has to be undone before nfsd_init_net() returns.
Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1] Reported-by: syzbot syzbot+ff796f04613b4c84ad89@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 6a29bcfc93909..dc74a947a440c 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1458,12 +1458,14 @@ static __net_init int nfsd_init_net(struct net *net) goto out_drc_error; retval = nfsd_reply_cache_init(nn); if (retval) - goto out_drc_error; + goto out_cache_error; get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock);
return 0;
+out_cache_error: + nfsd4_leases_net_shutdown(nn); out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ]
If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely.
Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 0b19eb015c6c8..024adcbe67e95 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -892,9 +892,8 @@ __nfsd_file_cache_purge(struct net *net)
nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (net && nf->nf_net != net) - continue; - nfsd_file_unhash_and_dispose(nf, &dispose); + if (!net || nf->nf_net == net) + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit bdd6b5624c62d0acd350d07564f1c82fe649235f ]
When we fail to insert into the hashtable with a non-retryable error, we'll free the object and then goto out_status. If the tracepoint is enabled, it'll end up accessing the freed object when it tries to grab the fields out of it.
Set nf to NULL after freeing it to avoid the issue.
Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire") Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 024adcbe67e95..dceb522f5cee9 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1075,6 +1075,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto open_file;
nfsd_file_slab_free(&nf->nf_rcu); + nf = NULL; if (ret == -EEXIST) goto retry; trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 50256e4793a5e5ab77703c82a47344ad2e774a59 ]
nfsd_lookup_dentry returns an export reference in addition to the dentry ref. Ensure that we put it too.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866 Fixes: 876c553cb410 ("NFSD: verify the opened dentry after setting a delegation") Reported-by: Yongcheng Yang yoyang@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 948ef17178158..10915c72c7815 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5397,6 +5397,7 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, if (err) return -EAGAIN;
+ exp_put(exp); dput(child); if (child != file_dentry(fp->fi_deleg_file->nf_file)) return -EAGAIN;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ac8db824ead0de2e9111337c401409d010fba2f0 ]
This was found when virtual machines with nfs-mounted qcow2 disks failed to boot properly.
Reported-by: Anders Blomdell anders.blomdell@control.lth.se Suggested-by: Al Viro viro@zeniv.linux.org.uk Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132 Fixes: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages") [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index e29034b1e6128..4ff626c912cc3 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -888,11 +888,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct svc_rqst *rqstp = sd->u.data; struct page *page = buf->page; // may be a compound one unsigned offset = buf->offset; - int i; + struct page *last_page;
- page += offset / PAGE_SIZE; - for (i = sd->len; i > 0; i -= PAGE_SIZE) - svc_rqst_replace_page(rqstp, page++); + last_page = page + (offset + sd->len - 1) / PAGE_SIZE; + for (page += offset / PAGE_SIZE; page <= last_page; page++) + svc_rqst_replace_page(rqstp, page); if (rqstp->rq_res.page_len == 0) // first call rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ]
There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure.
Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/locks.c | 24 ++++++++++++------------ include/linux/fs.h | 14 ++++++++++++++ 2 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c index 13a3ba97b73d1..b0753c8871fb2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -251,7 +251,7 @@ locks_get_lock_context(struct inode *inode, int type) struct file_lock_context *ctx;
/* paired with cmpxchg() below */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (likely(ctx) || type == F_UNLCK) goto out;
@@ -270,7 +270,7 @@ locks_get_lock_context(struct inode *inode, int type) */ if (cmpxchg(&inode->i_flctx, NULL, ctx)) { kmem_cache_free(flctx_cache, ctx); - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); } out: trace_locks_get_lock_context(inode, type, ctx); @@ -323,7 +323,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list, void locks_free_lock_context(struct inode *inode) { - struct file_lock_context *ctx = inode->i_flctx; + struct file_lock_context *ctx = locks_inode_context(inode);
if (unlikely(ctx)) { locks_check_ctx_lists(inode); @@ -985,7 +985,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl) void *owner; void (*func)(void);
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx || list_empty_careful(&ctx->flc_posix)) { fl->fl_type = F_UNLCK; return; @@ -1674,7 +1674,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) new_fl->fl_flags = type;
/* typically we will check that ctx is non-NULL before calling */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) { WARN_ON_ONCE(1); goto free_lock; @@ -1779,7 +1779,7 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time) struct file_lock_context *ctx; struct file_lock *fl;
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (ctx && !list_empty_careful(&ctx->flc_lease)) { spin_lock(&ctx->flc_lock); fl = list_first_entry_or_null(&ctx->flc_lease, @@ -1825,7 +1825,7 @@ int fcntl_getlease(struct file *filp) int type = F_UNLCK; LIST_HEAD(dispose);
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (ctx && !list_empty_careful(&ctx->flc_lease)) { percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -2014,7 +2014,7 @@ static int generic_delete_lease(struct file *filp, void *owner) struct file_lock_context *ctx; LIST_HEAD(dispose);
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) { trace_generic_delete_lease(inode, NULL); return error; @@ -2765,7 +2765,7 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner) * posix_lock_file(). Another process could be setting a lock on this * file at the same time, but we wouldn't remove that lock anyway. */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx || list_empty(&ctx->flc_posix)) return;
@@ -2838,7 +2838,7 @@ void locks_remove_file(struct file *filp) { struct file_lock_context *ctx;
- ctx = smp_load_acquire(&locks_inode(filp)->i_flctx); + ctx = locks_inode_context(locks_inode(filp)); if (!ctx) return;
@@ -2885,7 +2885,7 @@ bool vfs_inode_has_locks(struct inode *inode) struct file_lock_context *ctx; bool ret;
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) return false;
@@ -3030,7 +3030,7 @@ void show_fd_locks(struct seq_file *f, struct file_lock_context *ctx; int id = 0;
- ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) return;
diff --git a/include/linux/fs.h b/include/linux/fs.h index 7e7098bf2c57e..6a26ef54ac25d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1168,6 +1168,13 @@ extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); extern bool locks_owner_has_blockers(struct file_lock_context *flctx, fl_owner_t owner); + +static inline struct file_lock_context * +locks_inode_context(const struct inode *inode) +{ + return smp_load_acquire(&inode->i_flctx); +} + #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1313,6 +1320,13 @@ static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, { return false; } + +static inline struct file_lock_context * +locks_inode_context(const struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_FILE_LOCKING */
static inline struct inode *file_inode(const struct file *f)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 98b41ffe0afdfeaa1439a5d6bd2db4a94277e31b ]
lockd currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation).
Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Anna Schumaker anna@kernel.org Cc: Chuck Lever chuck.lever@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index e1c4617de7714..720684345817c 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -207,7 +207,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = inode->i_flctx; + struct file_lock_context *flctx = locks_inode_context(inode); struct nlm_host *lockhost;
if (!flctx || list_empty_careful(&flctx->flc_posix)) @@ -262,7 +262,7 @@ nlm_file_inuse(struct nlm_file *file) { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = inode->i_flctx; + struct file_lock_context *flctx = locks_inode_context(inode);
if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares) return 1;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 77c67530e1f95ac25c7075635f32f04367380894 ]
nfsd currently doesn't access i_flctx safely everywhere. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation).
Acked-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 10915c72c7815..2d09977fee83d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4773,7 +4773,7 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
static bool nfsd4_deleg_present(const struct inode *inode) { - struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx); + struct file_lock_context *ctx = locks_inode_context(inode);
return ctx && !list_empty_careful(&ctx->flc_lease); } @@ -5912,7 +5912,7 @@ nfs4_lockowner_has_blockers(struct nfs4_lockowner *lo)
list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { nf = stp->st_stid.sc_file; - ctx = nf->fi_inode->i_flctx; + ctx = locks_inode_context(nf->fi_inode); if (!ctx) continue; if (locks_owner_has_blockers(ctx, lo)) @@ -7740,7 +7740,7 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) }
inode = locks_inode(nf->nf_file); - flctx = inode->i_flctx; + flctx = locks_inode_context(inode);
if (flctx && !list_empty_careful(&flctx->flc_posix)) { spin_lock(&flctx->flc_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anna Schumaker Anna.Schumaker@Netapp.com
[ Upstream commit eeadcb75794516839078c28b3730132aeb700ce6 ]
Chuck had suggested reverting READ_PLUS so it returns a single DATA segment covering the requested read range. This prepares the server for a future "sparse read" function so support can easily be added without needing to rip out the old READ_PLUS code at the same time.
Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 139 +++++++++++----------------------------------- 1 file changed, 32 insertions(+), 107 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fc587381cd087..27b38ce6f8c6e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4775,79 +4775,37 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, - struct nfsd4_read *read, - unsigned long *maxcount, u32 *eof, - loff_t *pos) + struct nfsd4_read *read) { - struct xdr_stream *xdr = resp->xdr; + bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); struct file *file = read->rd_nf->nf_file; - int starting_len = xdr->buf->len; - loff_t hole_pos; - __be32 nfserr; - __be32 *p, tmp; - __be64 tmp64; - - hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE); - if (hole_pos > read->rd_offset) - *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset); - *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len)); + struct xdr_stream *xdr = resp->xdr; + unsigned long maxcount; + __be32 nfserr, *p;
/* Content type, offset, byte count */ p = xdr_reserve_space(xdr, 4 + 8 + 4); if (!p) - return nfserr_resource; + return nfserr_io; + if (resp->xdr->buf->page_len && splice_ok) { + WARN_ON_ONCE(splice_ok); + return nfserr_serverfault; + }
- read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount); - if (read->rd_vlen < 0) - return nfserr_resource; + maxcount = min_t(unsigned long, read->rd_length, + (xdr->buf->buflen - xdr->buf->len));
- nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, - resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof); + if (file->f_op->splice_read && splice_ok) + nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); + else + nfserr = nfsd4_encode_readv(resp, read, file, maxcount); if (nfserr) return nfserr; - xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount)); - - tmp = htonl(NFS4_CONTENT_DATA); - write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); - tmp64 = cpu_to_be64(read->rd_offset); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp64, 8); - tmp = htonl(*maxcount); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp, 4); - - tmp = xdr_zero; - write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp, - xdr_pad_size(*maxcount)); - return nfs_ok; -} - -static __be32 -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, - struct nfsd4_read *read, - unsigned long *maxcount, u32 *eof) -{ - struct file *file = read->rd_nf->nf_file; - loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA); - loff_t f_size = i_size_read(file_inode(file)); - unsigned long count; - __be32 *p; - - if (data_pos == -ENXIO) - data_pos = f_size; - else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); - count = data_pos - read->rd_offset;
- /* Content type, offset, byte count */ - p = xdr_reserve_space(resp->xdr, 4 + 8 + 8); - if (!p) - return nfserr_resource; - - *p++ = htonl(NFS4_CONTENT_HOLE); + *p++ = cpu_to_be32(NFS4_CONTENT_DATA); p = xdr_encode_hyper(p, read->rd_offset); - p = xdr_encode_hyper(p, count); + *p = cpu_to_be32(read->rd_length);
- *eof = (read->rd_offset + count) >= f_size; - *maxcount = min_t(unsigned long, count, *maxcount); return nfs_ok; }
@@ -4855,69 +4813,36 @@ static __be32 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { - unsigned long maxcount, count; + struct file *file = read->rd_nf->nf_file; struct xdr_stream *xdr = resp->xdr; - struct file *file; int starting_len = xdr->buf->len; - int last_segment = xdr->buf->len; - int segments = 0; - __be32 *p, tmp; - bool is_data; - loff_t pos; - u32 eof; + u32 segments = 0; + __be32 *p;
if (nfserr) return nfserr; - file = read->rd_nf->nf_file;
/* eof flag, segment count */ p = xdr_reserve_space(xdr, 4 + 4); if (!p) - return nfserr_resource; + return nfserr_io; xdr_commit_encode(xdr);
- maxcount = min_t(unsigned long, read->rd_length, - (xdr->buf->buflen - xdr->buf->len)); - count = maxcount; - - eof = read->rd_offset >= i_size_read(file_inode(file)); - if (eof) + read->rd_eof = read->rd_offset >= i_size_read(file_inode(file)); + if (read->rd_eof) goto out;
- pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE); - is_data = pos > read->rd_offset; - - while (count > 0 && !eof) { - maxcount = count; - if (is_data) - nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof, - segments == 0 ? &pos : NULL); - else - nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof); - if (nfserr) - goto out; - count -= maxcount; - read->rd_offset += maxcount; - is_data = !is_data; - last_segment = xdr->buf->len; - segments++; - } - -out: - if (nfserr && segments == 0) + nfserr = nfsd4_encode_read_plus_data(resp, read); + if (nfserr) { xdr_truncate_encode(xdr, starting_len); - else { - if (nfserr) { - xdr_truncate_encode(xdr, last_segment); - nfserr = nfs_ok; - eof = 0; - } - tmp = htonl(eof); - write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); - tmp = htonl(segments); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); + return nfserr; }
+ segments++; + +out: + p = xdr_encode_bool(p, read->rd_eof); + *p = cpu_to_be32(segments); return nfserr; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.i.king@gmail.com
[ Upstream commit 69eed23baf877bbb1f14d7f4df54f89807c9ee2a ]
Variable host_err is assigned a value that is never read, it is being re-assigned a value in every different execution path in the following switch statement. The assignment is redundant and can be removed.
Cleans up clang-scan warning: warning: Value stored to 'host_err' is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King colin.i.king@gmail.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 4ff626c912cc3..aae81c5cecb94 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1322,7 +1322,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, iap->ia_mode &= ~current_umask();
err = 0; - host_err = 0; switch (type) { case S_IFREG: host_err = vfs_create(dirp, dchild, iap->ia_mode, true);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit ea5021e911d3479346a75ac9b7d9dcd751b0fb99 ]
The xdr_stream conversion inadvertently left some code that set the page_len of the send buffer. The XDR stream encoders should handle this automatically now.
This oversight adds garbage past the end of the Reply message. Clients typically ignore the garbage, but NFSD does not need to send it, as it leaks stale memory contents onto the wire.
Fixes: f8cba47344f7 ("NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs2acl.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index f8179fc8c9bdf..9adf672dedbdd 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -244,7 +244,6 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; - int w;
if (!svcxdr_encode_stat(xdr, resp->status)) return false; @@ -258,15 +257,6 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return false;
- rqstp->rq_res.page_len = w = nfsacl_size( - (resp->mask & NFS_ACL) ? resp->acl_access : NULL, - (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); - while (w > 0) { - if (!*(rqstp->rq_next_page++)) - return true; - w -= PAGE_SIZE; - } - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, resp->mask & NFS_ACL, 0)) return false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 841fd0a3cb490eae5dfd262eccb8c8b11d57f8b8 ]
For some reason, the NFSv2 GETACL result encoder was fully converted to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was not similarly converted.
Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3acl.c | 30 ++++++------------------------ 1 file changed, 6 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 2c451fcd5a3cc..161f831b3a1b7 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -169,11 +169,7 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; - struct kvec *head = rqstp->rq_res.head; struct inode *inode; - unsigned int base; - int n; - int w;
if (!svcxdr_encode_nfsstat3(xdr, resp->status)) return false; @@ -185,26 +181,12 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return false;
- base = (char *)xdr->p - (char *)head->iov_base; - - rqstp->rq_res.page_len = w = nfsacl_size( - (resp->mask & NFS_ACL) ? resp->acl_access : NULL, - (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); - while (w > 0) { - if (!*(rqstp->rq_next_page++)) - return false; - w -= PAGE_SIZE; - } - - n = nfsacl_encode(&rqstp->rq_res, base, inode, - resp->acl_access, - resp->mask & NFS_ACL, 0); - if (n > 0) - n = nfsacl_encode(&rqstp->rq_res, base + n, inode, - resp->acl_default, - resp->mask & NFS_DFACL, - NFS_ACL_DEFAULT); - if (n <= 0) + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, + resp->mask & NFS_ACL, 0)) + return false; + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, + resp->mask & NFS_DFACL, + NFS_ACL_DEFAULT)) return false; break; default:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 8e823bafff2308753d430566256c83d8085952da ]
The kernel currently errors out if you attempt to enable or disable a version that it doesn't recognize. Change it to ignore attempts to disable an unrecognized version. If we don't support it, then there is no harm in doing so.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Tom Talpey tom@talpey.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index dc74a947a440c..68ed42fd29fc8 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -601,7 +601,9 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) } break; default: - return -EINVAL; + /* Ignore requests to disable non-existent versions */ + if (cmd == NFSD_SET) + return -EINVAL; } vers += len + 1; } while ((len = qword_get(&mesg, vers, size)) > 0);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit cb12fae1c34b1fa7eaae92c5aadc72d86d7fae19 ]
nfserrno() is common to all nfs versions, but nfsproc.c is specifically for NFSv2. Move it to vfs.c, and the prototype to vfs.h.
While we're in here, remove the #ifdef EDQUOT check in this function. It's apparently a holdover from the initial merge of the nfsd code in 1997. No other place in the kernel checks that that symbol is defined before using it, so I think we can dispense with it here.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/blocklayout.c | 1 + fs/nfsd/blocklayoutxdr.c | 1 + fs/nfsd/export.h | 1 - fs/nfsd/flexfilelayout.c | 1 + fs/nfsd/nfs4idmap.c | 1 + fs/nfsd/nfsproc.c | 62 --------------------------------------- fs/nfsd/vfs.c | 63 ++++++++++++++++++++++++++++++++++++++++ fs/nfsd/vfs.h | 1 + 8 files changed, 68 insertions(+), 63 deletions(-)
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c index a07c39c94bbd0..d91a686d2f313 100644 --- a/fs/nfsd/blocklayout.c +++ b/fs/nfsd/blocklayout.c @@ -16,6 +16,7 @@ #include "blocklayoutxdr.h" #include "pnfs.h" #include "filecache.h" +#include "vfs.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
diff --git a/fs/nfsd/blocklayoutxdr.c b/fs/nfsd/blocklayoutxdr.c index 2455dc8be18a8..1ed2f691ebb90 100644 --- a/fs/nfsd/blocklayoutxdr.c +++ b/fs/nfsd/blocklayoutxdr.c @@ -9,6 +9,7 @@
#include "nfsd.h" #include "blocklayoutxdr.h" +#include "vfs.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index ee0e3aba4a6e5..d03f7f6a8642d 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -115,7 +115,6 @@ struct svc_export * rqst_find_fsidzero_export(struct svc_rqst *); int exp_rootfh(struct net *, struct auth_domain *, char *path, struct knfsd_fh *, int maxsize); __be32 exp_pseudoroot(struct svc_rqst *, struct svc_fh *); -__be32 nfserrno(int errno);
static inline void exp_put(struct svc_export *exp) { diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c index 2e2f1d5e9f623..fabc21ed68cea 100644 --- a/fs/nfsd/flexfilelayout.c +++ b/fs/nfsd/flexfilelayout.c @@ -15,6 +15,7 @@
#include "flexfilelayoutxdr.h" #include "pnfs.h" +#include "vfs.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index e70a1a2999b7b..5e9809aff37eb 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -41,6 +41,7 @@ #include "idmap.h" #include "nfsd.h" #include "netns.h" +#include "vfs.h"
/* * Turn off idmapping when using AUTH_SYS. diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 78fa5a9edf277..3777b5be4253a 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -848,65 +848,3 @@ const struct svc_version nfsd_version2 = { .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS2_SVC_XDRSIZE, }; - -/* - * Map errnos to NFS errnos. - */ -__be32 -nfserrno (int errno) -{ - static struct { - __be32 nfserr; - int syserr; - } nfs_errtbl[] = { - { nfs_ok, 0 }, - { nfserr_perm, -EPERM }, - { nfserr_noent, -ENOENT }, - { nfserr_io, -EIO }, - { nfserr_nxio, -ENXIO }, - { nfserr_fbig, -E2BIG }, - { nfserr_stale, -EBADF }, - { nfserr_acces, -EACCES }, - { nfserr_exist, -EEXIST }, - { nfserr_xdev, -EXDEV }, - { nfserr_mlink, -EMLINK }, - { nfserr_nodev, -ENODEV }, - { nfserr_notdir, -ENOTDIR }, - { nfserr_isdir, -EISDIR }, - { nfserr_inval, -EINVAL }, - { nfserr_fbig, -EFBIG }, - { nfserr_nospc, -ENOSPC }, - { nfserr_rofs, -EROFS }, - { nfserr_mlink, -EMLINK }, - { nfserr_nametoolong, -ENAMETOOLONG }, - { nfserr_notempty, -ENOTEMPTY }, -#ifdef EDQUOT - { nfserr_dquot, -EDQUOT }, -#endif - { nfserr_stale, -ESTALE }, - { nfserr_jukebox, -ETIMEDOUT }, - { nfserr_jukebox, -ERESTARTSYS }, - { nfserr_jukebox, -EAGAIN }, - { nfserr_jukebox, -EWOULDBLOCK }, - { nfserr_jukebox, -ENOMEM }, - { nfserr_io, -ETXTBSY }, - { nfserr_notsupp, -EOPNOTSUPP }, - { nfserr_toosmall, -ETOOSMALL }, - { nfserr_serverfault, -ESERVERFAULT }, - { nfserr_serverfault, -ENFILE }, - { nfserr_io, -EREMOTEIO }, - { nfserr_stale, -EOPENSTALE }, - { nfserr_io, -EUCLEAN }, - { nfserr_perm, -ENOKEY }, - { nfserr_no_grace, -ENOGRACE}, - }; - int i; - - for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { - if (nfs_errtbl[i].syserr == errno) - return nfs_errtbl[i].nfserr; - } - WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); - return nfserr_io; -} - diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index aae81c5cecb94..9bda56fe303cf 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -48,6 +48,69 @@
#define NFSDDBG_FACILITY NFSDDBG_FILEOP
+/** + * nfserrno - Map Linux errnos to NFS errnos + * @errno: POSIX(-ish) error code to be mapped + * + * Returns the appropriate (net-endian) nfserr_* (or nfs_ok if errno is 0). If + * it's an error we don't expect, log it once and return nfserr_io. + */ +__be32 +nfserrno (int errno) +{ + static struct { + __be32 nfserr; + int syserr; + } nfs_errtbl[] = { + { nfs_ok, 0 }, + { nfserr_perm, -EPERM }, + { nfserr_noent, -ENOENT }, + { nfserr_io, -EIO }, + { nfserr_nxio, -ENXIO }, + { nfserr_fbig, -E2BIG }, + { nfserr_stale, -EBADF }, + { nfserr_acces, -EACCES }, + { nfserr_exist, -EEXIST }, + { nfserr_xdev, -EXDEV }, + { nfserr_mlink, -EMLINK }, + { nfserr_nodev, -ENODEV }, + { nfserr_notdir, -ENOTDIR }, + { nfserr_isdir, -EISDIR }, + { nfserr_inval, -EINVAL }, + { nfserr_fbig, -EFBIG }, + { nfserr_nospc, -ENOSPC }, + { nfserr_rofs, -EROFS }, + { nfserr_mlink, -EMLINK }, + { nfserr_nametoolong, -ENAMETOOLONG }, + { nfserr_notempty, -ENOTEMPTY }, + { nfserr_dquot, -EDQUOT }, + { nfserr_stale, -ESTALE }, + { nfserr_jukebox, -ETIMEDOUT }, + { nfserr_jukebox, -ERESTARTSYS }, + { nfserr_jukebox, -EAGAIN }, + { nfserr_jukebox, -EWOULDBLOCK }, + { nfserr_jukebox, -ENOMEM }, + { nfserr_io, -ETXTBSY }, + { nfserr_notsupp, -EOPNOTSUPP }, + { nfserr_toosmall, -ETOOSMALL }, + { nfserr_serverfault, -ESERVERFAULT }, + { nfserr_serverfault, -ENFILE }, + { nfserr_io, -EREMOTEIO }, + { nfserr_stale, -EOPENSTALE }, + { nfserr_io, -EUCLEAN }, + { nfserr_perm, -ENOKEY }, + { nfserr_no_grace, -ENOGRACE}, + }; + int i; + + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { + if (nfs_errtbl[i].syserr == errno) + return nfs_errtbl[i].nfserr; + } + WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); + return nfserr_io; +} + /* * Called from nfsd_lookup and encode_dirent. Check if we have crossed * a mount point. diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 120521bc7b247..8ddd687f83599 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -60,6 +60,7 @@ static inline void nfsd_attrs_free(struct nfsd_attrs *attrs) posix_acl_release(attrs->na_dpacl); }
+__be32 nfserrno (int errno); int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 2f3a4b2ac2f28b9be78ad21f401f31e263845214 ]
rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical step toward deprecating it and allow NFSv2 support to be compiled out.
Add a new CONFIG_NFSD_V2 option that can be turned off and rework the CONFIG_NFSD_V?_ACL option dependencies. Add a description that discourages enabling it.
Also, change the description of CONFIG_NFSD to state that the always-on version is now 3 instead of 2.
Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2 is disabled at compile time, this should make the kernel ignore attempts to disable it at runtime, but still error out when trying to enable it.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Tom Talpey tom@talpey.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/Kconfig | 19 +++++++++++++++---- fs/nfsd/Makefile | 5 +++-- fs/nfsd/nfsctl.c | 2 ++ fs/nfsd/nfsd.h | 3 +-- fs/nfsd/nfssvc.c | 6 ++++++ 5 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 887af7966b032..6d2d498a59573 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -8,6 +8,7 @@ config NFSD select SUNRPC select EXPORTFS select NFS_ACL_SUPPORT if NFSD_V2_ACL + select NFS_ACL_SUPPORT if NFSD_V3_ACL depends on MULTIUSER help Choose Y here if you want to allow other computers to access @@ -26,19 +27,29 @@ config NFSD
Below you can choose which versions of the NFS protocol are available to clients mounting the NFS server on this system. - Support for NFS version 2 (RFC 1094) is always available when + Support for NFS version 3 (RFC 1813) is always available when CONFIG_NFSD is selected.
If unsure, say N.
-config NFSD_V2_ACL - bool +config NFSD_V2 + bool "NFS server support for NFS version 2 (DEPRECATED)" depends on NFSD + default n + help + NFSv2 (RFC 1094) was the first publicly-released version of NFS. + Unless you are hosting ancient (1990's era) NFS clients, you don't + need this. + + If unsure, say N. + +config NFSD_V2_ACL + bool "NFS server support for the NFSv2 ACL protocol extension" + depends on NFSD_V2
config NFSD_V3_ACL bool "NFS server support for the NFSv3 ACL protocol extension" depends on NFSD - select NFSD_V2_ACL help Solaris NFS servers support an auxiliary NFSv3 ACL protocol that never became an official part of the NFS version 3 protocol. diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index 805c06d5f1b4b..6fffc8f03f740 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -10,9 +10,10 @@ obj-$(CONFIG_NFSD) += nfsd.o # this one should be compiled first, as the tracing macros can easily blow up nfsd-y += trace.o
-nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \ - export.o auth.o lockd.o nfscache.o nfsxdr.o \ +nfsd-y += nfssvc.o nfsctl.o nfsfh.o vfs.o \ + export.o auth.o lockd.o nfscache.o \ stats.o filecache.o nfs3proc.o nfs3xdr.o +nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 68ed42fd29fc8..d1e581a60480c 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -581,7 +581,9 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
cmd = sign == '-' ? NFSD_CLEAR : NFSD_SET; switch(num) { +#ifdef CONFIG_NFSD_V2 case 2: +#endif case 3: nfsd_vers(nn, num, cmd); break; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 09726c5b9a317..93b42ef9ed91b 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -64,8 +64,7 @@ struct readdir_cd {
extern struct svc_program nfsd_program; -extern const struct svc_version nfsd_version2, nfsd_version3, - nfsd_version4; +extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; extern struct mutex nfsd_mutex; extern spinlock_t nfsd_drc_lock; extern unsigned long nfsd_drc_max_mem; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8b1afde192118..429f38c986280 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -91,8 +91,12 @@ unsigned long nfsd_drc_mem_used; #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static struct svc_stat nfsd_acl_svcstats; static const struct svc_version *nfsd_acl_version[] = { +# if defined(CONFIG_NFSD_V2_ACL) [2] = &nfsd_acl_version2, +# endif +# if defined(CONFIG_NFSD_V3_ACL) [3] = &nfsd_acl_version3, +# endif };
#define NFSD_ACL_MINVERS 2 @@ -116,7 +120,9 @@ static struct svc_stat nfsd_acl_svcstats = { #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
static const struct svc_version *nfsd_version[] = { +#if defined(CONFIG_NFSD_V2) [2] = &nfsd_version2, +#endif [3] = &nfsd_version3, #if defined(CONFIG_NFSD_V4) [4] = &nfsd_version4,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Disseldorp ddiss@suse.de
[ Upstream commit 427505ffeaa464f683faba945a88d3e3248f6979 ]
expfs.c has a bunch of dprintk statements which are unusable due to: #define dprintk(fmt, args...) do{}while(0) Use pr_debug so that they can be enabled dynamically. Also make some minor changes to the debug statements to fix some incorrect types, and remove __func__ which can be handled by dynamic debug separately.
Signed-off-by: David Disseldorp ddiss@suse.de Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exportfs/expfs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 0106eba46d5af..8c28bd1c9ed94 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -18,7 +18,7 @@ #include <linux/sched.h> #include <linux/cred.h>
-#define dprintk(fmt, args...) do{}while(0) +#define dprintk(fmt, args...) pr_debug(fmt, ##args)
static int get_name(const struct path *path, char *name, struct dentry *child); @@ -132,8 +132,8 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, inode_unlock(dentry->d_inode);
if (IS_ERR(parent)) { - dprintk("%s: get_parent of %ld failed, err %d\n", - __func__, dentry->d_inode->i_ino, PTR_ERR(parent)); + dprintk("get_parent of %lu failed, err %ld\n", + dentry->d_inode->i_ino, PTR_ERR(parent)); return parent; }
@@ -147,7 +147,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, dprintk("%s: found name: %s\n", __func__, nbuf); tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf)); if (IS_ERR(tmp)) { - dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp)); + dprintk("lookup failed: %ld\n", PTR_ERR(tmp)); err = PTR_ERR(tmp); goto out_err; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c252849082ff525af18b4f253b3c9ece94e951ed ]
In a moment I'm going to introduce separate nfsd_file types, one of which is garbage-collected; the other, not. The garbage-collected variety is to be used by NFSv2 and v3, and the non-garbage-collected variety is to be used by NFSv4.
nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want nfsd_commit() to find and use the correct variety of cached nfsd_file object for the NFS version that is in use.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs3proc.c | 10 +++++++++- fs/nfsd/nfs4proc.c | 11 ++++++++++- fs/nfsd/vfs.c | 15 ++++----------- fs/nfsd/vfs.h | 3 ++- 4 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a9bf455aee821..a3a55b2be4f67 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -13,6 +13,7 @@ #include "cache.h" #include "xdr3.h" #include "vfs.h" +#include "filecache.h"
#define NFSDDBG_FACILITY NFSDDBG_PROC
@@ -763,6 +764,7 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) { struct nfsd3_commitargs *argp = rqstp->rq_argp; struct nfsd3_commitres *resp = rqstp->rq_resp; + struct nfsd_file *nf;
dprintk("nfsd: COMMIT(3) %s %u@%Lu\n", SVCFH_fmt(&argp->fh), @@ -770,8 +772,14 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) (unsigned long long) argp->offset);
fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset, + resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); + if (resp->status) + goto out; + resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset, argp->count, resp->verf); + nfsd_file_put(nf); +out: return rpc_success; }
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 50fd4ba04a3e0..3fe1966ed7358 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -731,10 +731,19 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_commit *commit = &u->commit; + struct nfsd_file *nf; + __be32 status;
- return nfsd_commit(rqstp, &cstate->current_fh, commit->co_offset, + status = nfsd_file_acquire(rqstp, &cstate->current_fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); + if (status != nfs_ok) + return status; + + status = nfsd_commit(rqstp, &cstate->current_fh, nf, commit->co_offset, commit->co_count, (__be32 *)commit->co_verf.data); + nfsd_file_put(nf); + return status; }
static __be32 diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 9bda56fe303cf..3278dddd11ba9 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1213,6 +1213,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, * nfsd_commit - Commit pending writes to stable storage * @rqstp: RPC request being processed * @fhp: NFS filehandle + * @nf: target file * @offset: raw offset from beginning of file * @count: raw count of bytes to sync * @verf: filled in with the server's current write verifier @@ -1229,19 +1230,13 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, * An nfsstat value in network byte order. */ __be32 -nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, - u32 count, __be32 *verf) +nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, + u64 offset, u32 count, __be32 *verf) { + __be32 err = nfs_ok; u64 maxbytes; loff_t start, end; struct nfsd_net *nn; - struct nfsd_file *nf; - __be32 err; - - err = nfsd_file_acquire(rqstp, fhp, - NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); - if (err) - goto out;
/* * Convert the client-provided (offset, count) range to a @@ -1282,8 +1277,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, } else nfsd_copy_write_verifier(verf, nn);
- nfsd_file_put(nf); -out: return err; }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 8ddd687f83599..dbdfef7ae85bb 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -89,7 +89,8 @@ __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct nfsd_attrs *iap); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, - u64 offset, u32 count, __be32 *verf); + struct nfsd_file *nf, u64 offset, u32 count, + __be32 *verf); #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, void **bufp, int *lenp);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit dcf3f80965ca787c70def402cdf1553c93c75529 ]
This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2.
That commit attempted to make files available to other users as soon as all NFSv4 clients were done with them, rather than waiting until the filecache LRU had garbage collected them.
It gets the reference counting wrong, for one thing.
But it also misses that DELEGRETURN should release a file in the same fashion. In fact, any nfsd_file_put() on an file held open by an NFSv4 client needs potentially to release the file immediately...
Clear the way for implementing that idea.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 18 ------------------ fs/nfsd/filecache.h | 1 - fs/nfsd/nfs4state.c | 4 ++-- 3 files changed, 2 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index dceb522f5cee9..e429fce894316 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -443,24 +443,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); }
-/** - * nfsd_file_close - Close an nfsd_file - * @nf: nfsd_file to close - * - * If this is the final reference for @nf, free it immediately. - * This reflects an on-the-wire CLOSE or DELEGRETURN into the - * VFS and exported filesystem. - */ -void nfsd_file_close(struct nfsd_file *nf) -{ - nfsd_file_put(nf); - if (refcount_dec_if_one(&nf->nf_ref)) { - nfsd_file_unhash(nf); - nfsd_file_lru_remove(nf); - nfsd_file_free(nf); - } -} - struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 357832bac736b..6b012ea4bd9da 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -52,7 +52,6 @@ void nfsd_file_cache_shutdown(void); int nfsd_file_cache_start_net(struct net *net); void nfsd_file_cache_shutdown_net(struct net *net); void nfsd_file_put(struct nfsd_file *nf); -void nfsd_file_close(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2d09977fee83d..80d8f40d1f126 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -842,9 +842,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag) swap(f2, fp->fi_fds[O_RDWR]); spin_unlock(&fp->fi_lock); if (f1) - nfsd_file_close(f1); + nfsd_file_put(f1); if (f2) - nfsd_file_close(f2); + nfsd_file_put(f2); } }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4d1ea8455716ca070e3cd85767e6f6a562a58b1b ]
NFSv4 operations manage the lifetime of nfsd_file items they use by means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be garbage collected.
Introduce a mechanism to enable garbage collection for nfsd_file items used only by NFSv2/3 callers.
Note that the change in nfsd_file_put() ensures that both CLOSE and DELEGRETURN will actually close out and free an nfsd_file on last reference of a non-garbage-collected file.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394 Suggested-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 63 +++++++++++++++++++++++++++++++++++++++------ fs/nfsd/filecache.h | 3 +++ fs/nfsd/nfs3proc.c | 4 +-- fs/nfsd/trace.h | 3 ++- fs/nfsd/vfs.c | 4 +-- 5 files changed, 64 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e429fce894316..13a25503b80e1 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -62,6 +62,7 @@ struct nfsd_file_lookup_key { struct net *net; const struct cred *cred; unsigned char need; + bool gc; enum nfsd_file_lookup_type type; };
@@ -161,6 +162,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, return 1; if (!nfsd_match_cred(nf->nf_cred, key->cred)) return 1; + if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + return 1; if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) return 1; break; @@ -296,6 +299,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) nf->nf_flags = 0; __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); + if (key->gc) + __set_bit(NFSD_FILE_GC, &nf->nf_flags); nf->nf_inode = key->inode; /* nf_ref is pre-incremented for hash table */ refcount_set(&nf->nf_ref, 2); @@ -427,16 +432,27 @@ nfsd_file_put_noref(struct nfsd_file *nf) } }
+static void +nfsd_file_unhash_and_put(struct nfsd_file *nf) +{ + if (nfsd_file_unhash(nf)) + nfsd_file_put_noref(nf); +} + void nfsd_file_put(struct nfsd_file *nf) { might_sleep();
- nfsd_file_lru_add(nf); - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { + if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) + nfsd_file_lru_add(nf); + else if (refcount_read(&nf->nf_ref) == 2) + nfsd_file_unhash_and_put(nf); + + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); - } else if (nf->nf_file) { + } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { nfsd_file_put_noref(nf); nfsd_file_schedule_laundrette(); } else @@ -1015,12 +1031,14 @@ nfsd_file_is_cached(struct inode *inode)
static __be32 nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf, bool open) + unsigned int may_flags, struct nfsd_file **pnf, + bool open, bool want_gc) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_FULL, .need = may_flags & NFSD_FILE_MAY_MASK, .net = SVC_NET(rqstp), + .gc = want_gc, }; bool open_retry = true; struct nfsd_file *nf; @@ -1116,14 +1134,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * then unhash. */ if (status != nfs_ok || key.inode->i_nlink == 0) - if (nfsd_file_unhash(nf)) - nfsd_file_put_noref(nf); + nfsd_file_unhash_and_put(nf); clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; }
+/** + * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file to be opened + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * The nfsd_file object returned by this API is reference-counted + * and garbage-collected. The object is retained for a few + * seconds after the final nfsd_file_put() in case the caller + * wants to re-use it. + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true); +} + /** * nfsd_file_acquire - Get a struct nfsd_file with an open file * @rqstp: the RPC transaction being executed @@ -1131,6 +1170,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * @may_flags: NFSD_MAY_ settings for the file * @pnf: OUT: new or found "struct nfsd_file" object * + * The nfsd_file_object returned by this API is reference-counted + * but not garbage-collected. The object is unhashed after the + * final nfsd_file_put(). + * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ @@ -1138,7 +1181,7 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false); }
/** @@ -1148,6 +1191,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * @may_flags: NFSD_MAY_ settings for the file * @pnf: OUT: new or found "struct nfsd_file" object * + * The nfsd_file_object returned by this API is reference-counted + * but not garbage-collected. The object is released immediately + * one RCU grace period after the final nfsd_file_put(). + * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ @@ -1155,7 +1202,7 @@ __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false); }
/* diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 6b012ea4bd9da..b7efb2c3ddb18 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -38,6 +38,7 @@ struct nfsd_file { #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) #define NFSD_FILE_REFERENCED (2) +#define NFSD_FILE_GC (3) unsigned long nf_flags; struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; @@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); +__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a3a55b2be4f67..19cf583096d9c 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -772,8 +772,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) (unsigned long long) argp->offset);
fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE | - NFSD_MAY_NOT_BREAK_LEASE, &nf); + resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); if (resp->status) goto out; resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset, diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 6803ac877ff70..03722414d6db8 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -730,7 +730,8 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) + { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \ + { 1 << NFSD_FILE_GC, "GC"})
DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3278dddd11ba9..3d67dd7eab4b5 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1165,7 +1165,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err;
trace_nfsd_read_start(rqstp, fhp, offset, *count); - err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err;
@@ -1197,7 +1197,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
- err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf); + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf); if (err) goto out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b3276c1f5b268ff56622e9e125b792b4c3dc03ac ]
Record what we've learned recently about the NFSD filecache in a documenting comment so our future selves don't forget what all this is for.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 13a25503b80e1..d681faf48cf85 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -2,6 +2,30 @@ * Open file cache. * * (c) 2015 - Jeff Layton jeff.layton@primarydata.com + * + * An nfsd_file object is a per-file collection of open state that binds + * together: + * - a struct file * + * - a user credential + * - a network namespace + * - a read-ahead context + * - monitoring for writeback errors + * + * nfsd_file objects are reference-counted. Consumers acquire a new + * object via the nfsd_file_acquire API. They manage their interest in + * the acquired object, and hence the object's reference count, via + * nfsd_file_get and nfsd_file_put. There are two varieties of nfsd_file + * object: + * + * * non-garbage-collected: When a consumer wants to precisely control + * the lifetime of a file's open state, it acquires a non-garbage- + * collected nfsd_file. The final nfsd_file_put releases the open + * state immediately. + * + * * garbage-collected: When a consumer does not control the lifetime + * of open state, it acquires a garbage-collected nfsd_file. The + * final nfsd_file_put allows the open state to linger for a period + * during which it may be re-used. */
#include <linux/hash.h>
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit eeff73f7c1c583f79a401284f46c619294859310 ]
Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op() call sites.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 31 +++++++------------------------ 1 file changed, 7 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 3fe1966ed7358..32fccca7de185 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -943,12 +943,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &read->rd_stateid, RD_STATE, &read->rd_nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_read: couldn't process stateid!\n"); - goto out; - } - status = nfs_ok; -out: + read->rd_rqstp = rqstp; read->rd_fhp = &cstate->current_fh; return status; @@ -1117,10 +1112,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL, NULL); - if (status) { - dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n"); + if (status) return status; - } } err = fh_want_write(&cstate->current_fh); if (err) @@ -1168,10 +1161,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_offset, cnt); status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, stateid, WR_STATE, &nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_write: couldn't process stateid!\n"); + if (status) return status; - }
write->wr_how_written = write->wr_stable_how;
@@ -1202,17 +1193,13 @@ nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh, src_stateid, RD_STATE, src, NULL); - if (status) { - dprintk("NFSD: %s: couldn't process src stateid!\n", __func__); + if (status) goto out; - }
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, dst_stateid, WR_STATE, dst, NULL); - if (status) { - dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__); + if (status) goto out_put_src; - }
/* fix up for NFS-specific error code */ if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) || @@ -1949,10 +1936,8 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &fallocate->falloc_stateid, WR_STATE, &nf, NULL); - if (status != nfs_ok) { - dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n"); + if (status != nfs_ok) return status; - }
status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file, fallocate->falloc_offset, @@ -2008,10 +1993,8 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &seek->seek_stateid, RD_STATE, &nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n"); + if (status) return status; - }
switch (seek->seek_whence) { case NFS4_CONTENT_DATA:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 20eee313ff4b8a7e71ae9560f5c4ba27cd763005 ]
Handing out a delegation stateid is recorded with the nfsd_deleg_read tracepoint, but there isn't a matching tracepoint for recording when the stateid is returned.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 80d8f40d1f126..a40a9a836fb1e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6929,6 +6929,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto put_stateid;
+ trace_nfsd_deleg_return(stateid); wake_up_var(d_inode(cstate->current_fh.fh_dentry)); destroy_delegation(dp); put_stateid: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 03722414d6db8..fe76d3b2c9286 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -517,6 +517,7 @@ DEFINE_STATEID_EVENT(layout_recall_release);
DEFINE_STATEID_EVENT(open); DEFINE_STATEID_EVENT(deleg_read); +DEFINE_STATEID_EVENT(deleg_return); DEFINE_STATEID_EVENT(deleg_recall);
DECLARE_EVENT_CLASS(nfsd_stateseqid_class,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit a1c74569bbde91299f24535abf711be5c84df9de ]
Delegation revocation is an exceptional event that is not otherwise visible externally (eg, no network traffic is emitted). Generate a trace record when it occurs so that revocation can be observed or other activity can be triggered. Example:
nfsd-1104 [005] 1912.002544: nfsd_stid_revoke: client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG
Trace infrastructure is provided for subsequent additional tracing related to nfs4_stid activity.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 ++ fs/nfsd/trace.h | 55 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a40a9a836fb1e..dcbf777dc58d3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1366,6 +1366,8 @@ static void revoke_delegation(struct nfs4_delegation *dp)
WARN_ON(!list_empty(&dp->dl_recall_lru));
+ trace_nfsd_stid_revoke(&dp->dl_stid); + if (clp->cl_minorversion) { spin_lock(&clp->cl_lock); dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index fe76d3b2c9286..191b206379b76 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -550,6 +550,61 @@ DEFINE_EVENT(nfsd_stateseqid_class, nfsd_##name, \ DEFINE_STATESEQID_EVENT(preprocess); DEFINE_STATESEQID_EVENT(open_confirm);
+TRACE_DEFINE_ENUM(NFS4_OPEN_STID); +TRACE_DEFINE_ENUM(NFS4_LOCK_STID); +TRACE_DEFINE_ENUM(NFS4_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_CLOSED_STID); +TRACE_DEFINE_ENUM(NFS4_REVOKED_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_CLOSED_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_LAYOUT_STID); + +#define show_stid_type(x) \ + __print_flags(x, "|", \ + { NFS4_OPEN_STID, "OPEN" }, \ + { NFS4_LOCK_STID, "LOCK" }, \ + { NFS4_DELEG_STID, "DELEG" }, \ + { NFS4_CLOSED_STID, "CLOSED" }, \ + { NFS4_REVOKED_DELEG_STID, "REVOKED" }, \ + { NFS4_CLOSED_DELEG_STID, "CLOSED_DELEG" }, \ + { NFS4_LAYOUT_STID, "LAYOUT" }) + +DECLARE_EVENT_CLASS(nfsd_stid_class, + TP_PROTO( + const struct nfs4_stid *stid + ), + TP_ARGS(stid), + TP_STRUCT__entry( + __field(unsigned long, sc_type) + __field(int, sc_count) + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + ), + TP_fast_assign( + const stateid_t *stp = &stid->sc_stateid; + + __entry->sc_type = stid->sc_type; + __entry->sc_count = refcount_read(&stid->sc_count); + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + ), + TP_printk("client %08x:%08x stateid %08x:%08x ref=%d type=%s", + __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation, + __entry->sc_count, show_stid_type(__entry->sc_type) + ) +); + +#define DEFINE_STID_EVENT(name) \ +DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name, \ + TP_PROTO(const struct nfs4_stid *stid), \ + TP_ARGS(stid)) + +DEFINE_STID_EVENT(revoke); + DECLARE_EVENT_CLASS(nfsd_clientid_class, TP_PROTO(const clientid_t *clid), TP_ARGS(clid),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b48f8056c034f28dd54668399f1d22be421b0bef ]
Enable callers to use const pointers where they are able to.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsfh.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index c3ae6414fc5cf..513e028b0bbee 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -220,7 +220,7 @@ __be32 fh_update(struct svc_fh *); void fh_put(struct svc_fh *);
static __inline__ struct svc_fh * -fh_copy(struct svc_fh *dst, struct svc_fh *src) +fh_copy(struct svc_fh *dst, const struct svc_fh *src) { WARN_ON(src->fh_dentry);
@@ -229,7 +229,7 @@ fh_copy(struct svc_fh *dst, struct svc_fh *src) }
static inline void -fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src) +fh_copy_shallow(struct knfsd_fh *dst, const struct knfsd_fh *src) { dst->fh_size = src->fh_size; memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size); @@ -243,7 +243,8 @@ fh_init(struct svc_fh *fhp, int maxsize) return fhp; }
-static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) +static inline bool fh_match(const struct knfsd_fh *fh1, + const struct knfsd_fh *fh2) { if (fh1->fh_size != fh2->fh_size) return false; @@ -252,7 +253,8 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) return true; }
-static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) +static inline bool fh_fsid_match(const struct knfsd_fh *fh1, + const struct knfsd_fh *fh2) { if (fh1->fh_fsid_type != fh2->fh_fsid_type) return false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3fe828caddd81e68e9d29353c6e9285a658ca056 ]
Enable callers to use const pointers for type safety.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index dcbf777dc58d3..90505095d7e0c 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -721,7 +721,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) #define FILE_HASH_BITS 8 #define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
-static unsigned int file_hashval(struct svc_fh *fh) +static unsigned int file_hashval(const struct svc_fh *fh) { struct inode *inode = d_inode(fh->fh_dentry);
@@ -4686,7 +4686,7 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
/* search file_hashtbl[] for file */ static struct nfs4_file * -find_file_locked(struct svc_fh *fh, unsigned int hashval) +find_file_locked(const struct svc_fh *fh, unsigned int hashval) { struct nfs4_file *fp;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 81a21fa3e7fdecb3c5b97014f0fc5a17d5806cae ]
Name this function more consistently. I'm going to use nfsd4_file_ and nfsd4_file_hash_ for these helpers.
Change the @fh parameter to be const pointer for better type safety.
Finally, move the hash insertion operation to the caller. This is typical for most other "init_object" type helpers, and it is where most of the other nfs4_file hash table operations are located.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 90505095d7e0c..d6e0052b3f682 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4277,11 +4277,9 @@ static struct nfs4_file *nfsd4_alloc_file(void) }
/* OPEN Share state helper functions */ -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, - struct nfs4_file *fp) -{ - lockdep_assert_held(&state_lock);
+static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp) +{ refcount_set(&fp->fi_ref, 1); spin_lock_init(&fp->fi_lock); INIT_LIST_HEAD(&fp->fi_stateids); @@ -4299,7 +4297,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, INIT_LIST_HEAD(&fp->fi_lo_states); atomic_set(&fp->fi_lo_recalls, 0); #endif - hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]); }
void @@ -4717,7 +4714,8 @@ static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, fp->fi_aliased = alias_found = true; } if (likely(ret == NULL)) { - nfsd4_init_file(fh, hashval, new); + nfsd4_file_init(fh, new); + hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); new->fi_aliased = alias_found; ret = new; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3341678f2fd6106055cead09e513fad6950a0d19 ]
Refactor to relocate hash deletion operation to a helper function that is close to most other nfs4_file data structure operations.
The "noinline" annotation will become useful in a moment when the hlist_del_rcu() is replaced with a more complex rhash remove operation. It also guarantees that hash remove operations can be traced with "-p function -l remove_nfs4_file_locked".
This also simplifies the organization of forward declarations: the to-be-added rhashtable and its param structure will be defined /after/ put_nfs4_file().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d6e0052b3f682..c464403c23a25 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -84,6 +84,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) static void nfs4_free_ol_stateid(struct nfs4_stid *stid); void nfsd4_end_grace(struct nfsd_net *nn); static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps); +static void nfsd4_file_hash_remove(struct nfs4_file *fi);
/* Locking: */
@@ -591,7 +592,7 @@ put_nfs4_file(struct nfs4_file *fi) might_lock(&state_lock);
if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { - hlist_del_rcu(&fi->fi_hash); + nfsd4_file_hash_remove(fi); spin_unlock(&state_lock); WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); @@ -4749,6 +4750,11 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) return insert_file(new, fh, hashval); }
+static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) +{ + hlist_del_rcu(&fi->fi_hash); +} + /* * Called to check deny when READ with all zero stateid or * WRITE with all zero or all one stateid
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9270fc514ba7d415636b23bcb937573a1ce54f6a ]
Remove the call to find_file_locked() in insert_nfs4_file(). Tracing shows that over 99% of these calls return NULL. Thus it is not worth the expense of the extra bucket list traversal. insert_file() already deals correctly with the case where the item is already in the hash bucket.
Since nfsd4_file_hash_insert() is now just a wrapper around insert_file(), move the meat of insert_file() into nfsd4_file_hash_insert() and get rid of it.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 64 ++++++++++++++++++++------------------------- 1 file changed, 28 insertions(+), 36 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c464403c23a25..d2664aa4bde0d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4698,24 +4698,42 @@ find_file_locked(const struct svc_fh *fh, unsigned int hashval) return NULL; }
-static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, - unsigned int hashval) +static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; + unsigned int hashval = file_hashval(fh); + + rcu_read_lock(); + fp = find_file_locked(fh, hashval); + rcu_read_unlock(); + return fp; +} + +/* + * On hash insertion, identify entries with the same inode but + * distinct filehandles. They will all be in the same hash bucket + * because nfs4_file's are hashed by the address in the fi_inode + * field. + */ +static noinline_for_stack struct nfs4_file * +nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp) +{ + unsigned int hashval = file_hashval(fhp); struct nfs4_file *ret = NULL; bool alias_found = false; + struct nfs4_file *fi;
spin_lock(&state_lock); - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, + hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { - if (refcount_inc_not_zero(&fp->fi_ref)) - ret = fp; - } else if (d_inode(fh->fh_dentry) == fp->fi_inode) - fp->fi_aliased = alias_found = true; + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { + if (refcount_inc_not_zero(&fi->fi_ref)) + ret = fi; + } else if (d_inode(fhp->fh_dentry) == fi->fi_inode) + fi->fi_aliased = alias_found = true; } if (likely(ret == NULL)) { - nfsd4_file_init(fh, new); + nfsd4_file_init(fhp, new); hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); new->fi_aliased = alias_found; ret = new; @@ -4724,32 +4742,6 @@ static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, return ret; }
-static struct nfs4_file * find_file(struct svc_fh *fh) -{ - struct nfs4_file *fp; - unsigned int hashval = file_hashval(fh); - - rcu_read_lock(); - fp = find_file_locked(fh, hashval); - rcu_read_unlock(); - return fp; -} - -static struct nfs4_file * -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) -{ - struct nfs4_file *fp; - unsigned int hashval = file_hashval(fh); - - rcu_read_lock(); - fp = find_file_locked(fh, hashval); - rcu_read_unlock(); - if (fp) - return fp; - - return insert_file(new, fh, hashval); -} - static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) { hlist_del_rcu(&fi->fi_hash); @@ -5641,7 +5633,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * and check for delegations in the process of being recalled. * If not found, create the nfs4_file struct */ - fp = find_or_add_file(open->op_file, current_fh); + fp = nfsd4_file_hash_insert(open->op_file, current_fh); if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 15424748001a9b5ea62b3e6ad45f0a8b27f01df9 ]
find_file() is now the only caller of find_file_locked(), so just fold these two together.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d2664aa4bde0d..69329dc159f64 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4682,31 +4682,24 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) nfs4_put_stid(&last->st_stid); }
-/* search file_hashtbl[] for file */ -static struct nfs4_file * -find_file_locked(const struct svc_fh *fh, unsigned int hashval) +static noinline_for_stack struct nfs4_file * +nfsd4_file_hash_lookup(const struct svc_fh *fhp) { - struct nfs4_file *fp; + unsigned int hashval = file_hashval(fhp); + struct nfs4_file *fi;
- hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { - if (refcount_inc_not_zero(&fp->fi_ref)) - return fp; + rcu_read_lock(); + hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { + if (refcount_inc_not_zero(&fi->fi_ref)) { + rcu_read_unlock(); + return fi; + } } } - return NULL; -} - -static struct nfs4_file * find_file(struct svc_fh *fh) -{ - struct nfs4_file *fp; - unsigned int hashval = file_hashval(fh); - - rcu_read_lock(); - fp = find_file_locked(fh, hashval); rcu_read_unlock(); - return fp; + return NULL; }
/* @@ -4757,9 +4750,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) struct nfs4_file *fp; __be32 ret = nfs_ok;
- fp = find_file(current_fh); + fp = nfsd4_file_hash_lookup(current_fh); if (!fp) return ret; + /* Check for conflicting share reservations */ spin_lock(&fp->fi_lock); if (fp->fi_share_deny & deny_type)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d47b295e8d76a4d69f0e2ea0cd8a79c9d3488280 ]
fh_match() is costly, especially when filehandles are large (as is the case for NFSv4). It needs to be used sparingly when searching data structures. Unfortunately, with common workloads, I see multiple thousands of objects stored in file_hashtbl[], which has just 256 buckets, making its bucket hash chains quite lengthy.
Walking long hash chains with the state_lock held blocks other activity that needs that lock. Sizable hash chains are a common occurrance once the server has handed out some delegations, for example -- IIUC, each delegated file is held open on the server by an nfs4_file object.
To help mitigate the cost of searching with fh_match(), replace the nfs4_file hash table with an rhashtable, which can dynamically resize its bucket array to minimize hash chain length.
The result of this modification is an improvement in the latency of NFSv4 operations, and the reduction of nfsd CPU utilization due to eliminating the cost of multiple calls to fh_match() and reducing the CPU cache misses incurred while walking long hash chains in the nfs4_file hash table.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 97 +++++++++++++++++++++++++++++---------------- fs/nfsd/state.h | 5 +-- 2 files changed, 63 insertions(+), 39 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 69329dc159f64..d490b19d95ddf 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -44,7 +44,9 @@ #include <linux/jhash.h> #include <linux/string_helpers.h> #include <linux/fsnotify.h> +#include <linux/rhashtable.h> #include <linux/nfs_ssc.h> + #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -589,11 +591,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) void put_nfs4_file(struct nfs4_file *fi) { - might_lock(&state_lock); - - if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { + if (refcount_dec_and_test(&fi->fi_ref)) { nfsd4_file_hash_remove(fi); - spin_unlock(&state_lock); WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu); @@ -718,19 +717,20 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) return ret & OWNER_HASH_MASK; }
-/* hash table for nfs4_file */ -#define FILE_HASH_BITS 8 -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS) - -static unsigned int file_hashval(const struct svc_fh *fh) -{ - struct inode *inode = d_inode(fh->fh_dentry); +static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp;
- /* XXX: why not (here & in file cache) use inode? */ - return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS); -} +static const struct rhashtable_params nfs4_file_rhash_params = { + .key_len = sizeof_field(struct nfs4_file, fi_inode), + .key_offset = offsetof(struct nfs4_file, fi_inode), + .head_offset = offsetof(struct nfs4_file, fi_rlist),
-static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; + /* + * Start with a single page hash table to reduce resizing churn + * on light workloads. + */ + .min_size = 256, + .automatic_shrinking = true, +};
/* * Check if courtesy clients have conflicting access and resolve it if possible @@ -4685,12 +4685,14 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) static noinline_for_stack struct nfs4_file * nfsd4_file_hash_lookup(const struct svc_fh *fhp) { - unsigned int hashval = file_hashval(fhp); + struct inode *inode = d_inode(fhp->fh_dentry); + struct rhlist_head *tmp, *list; struct nfs4_file *fi;
rcu_read_lock(); - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { + list = rhltable_lookup(&nfs4_file_rhltable, &inode, + nfs4_file_rhash_params); + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { if (refcount_inc_not_zero(&fi->fi_ref)) { rcu_read_unlock(); @@ -4704,40 +4706,56 @@ nfsd4_file_hash_lookup(const struct svc_fh *fhp)
/* * On hash insertion, identify entries with the same inode but - * distinct filehandles. They will all be in the same hash bucket - * because nfs4_file's are hashed by the address in the fi_inode - * field. + * distinct filehandles. They will all be on the list returned + * by rhltable_lookup(). + * + * inode->i_lock prevents racing insertions from adding an entry + * for the same inode/fhp pair twice. */ static noinline_for_stack struct nfs4_file * nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp) { - unsigned int hashval = file_hashval(fhp); + struct inode *inode = d_inode(fhp->fh_dentry); + struct rhlist_head *tmp, *list; struct nfs4_file *ret = NULL; bool alias_found = false; struct nfs4_file *fi; + int err;
- spin_lock(&state_lock); - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { + rcu_read_lock(); + spin_lock(&inode->i_lock); + + list = rhltable_lookup(&nfs4_file_rhltable, &inode, + nfs4_file_rhash_params); + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { if (refcount_inc_not_zero(&fi->fi_ref)) ret = fi; - } else if (d_inode(fhp->fh_dentry) == fi->fi_inode) + } else fi->fi_aliased = alias_found = true; } - if (likely(ret == NULL)) { - nfsd4_file_init(fhp, new); - hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); - new->fi_aliased = alias_found; - ret = new; - } - spin_unlock(&state_lock); + if (ret) + goto out_unlock; + + nfsd4_file_init(fhp, new); + err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist, + nfs4_file_rhash_params); + if (err) + goto out_unlock; + + new->fi_aliased = alias_found; + ret = new; + +out_unlock: + spin_unlock(&inode->i_lock); + rcu_read_unlock(); return ret; }
static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) { - hlist_del_rcu(&fi->fi_hash); + rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist, + nfs4_file_rhash_params); }
/* @@ -5628,6 +5646,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * If not found, create the nfs4_file struct */ fp = nfsd4_file_hash_insert(open->op_file, current_fh); + if (unlikely(!fp)) + return nfserr_jukebox; if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) @@ -8054,10 +8074,16 @@ nfs4_state_start(void) { int ret;
- ret = nfsd4_create_callback_queue(); + ret = rhltable_init(&nfs4_file_rhltable, &nfs4_file_rhash_params); if (ret) return ret;
+ ret = nfsd4_create_callback_queue(); + if (ret) { + rhltable_destroy(&nfs4_file_rhltable); + return ret; + } + set_max_delegations(); return 0; } @@ -8088,6 +8114,7 @@ nfs4_state_shutdown_net(struct net *net)
nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); + rhltable_destroy(&nfs4_file_rhltable); #ifdef CONFIG_NFSD_V4_2_INTER_SSC nfsd4_ssc_shutdown_umount(nn); #endif diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e2daef3cc0034..eadd7f465bf52 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate { * inode can have multiple filehandles associated with it, so there is * (potentially) a many to one relationship between this struct and struct * inode. - * - * These are hashed by filehandle in the file_hashtbl, which is protected by - * the global state_lock spinlock. */ struct nfs4_file { refcount_t fi_ref; struct inode * fi_inode; bool fi_aliased; spinlock_t fi_lock; - struct hlist_node fi_hash; /* hash on fi_fhandle */ + struct rhlist_head fi_rlist; struct list_head fi_stateids; union { struct list_head fi_delegations;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3f054211b29c0fa06dfdcab402c795fd7e906be1 ]
Add a missing SPDX header.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d681faf48cf85..b43d2d7ac5957 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 /* - * Open file cache. + * The NFSD open file cache. * * (c) 2015 - Jeff Layton jeff.layton@primarydata.com *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 1f696e230ea5198e393368b319eb55651828d687 ]
We're counting mapping->nrpages, but not all of those are necessarily dirty. We don't really have a simple way to count just the dirty pages, so just remove this stat since it's not accurate.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b43d2d7ac5957..b95b1be5b2e43 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -57,7 +57,6 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); -static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed); static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
struct nfsd_fcache_disposal { @@ -395,7 +394,6 @@ nfsd_file_flush(struct nfsd_file *nf)
if (!file || !(file->f_mode & FMODE_WRITE)) return; - this_cpu_add(nfsd_file_pages_flushed, file->f_mapping->nrpages); if (vfs_fsync(file, 1) != 0) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } @@ -1022,7 +1020,6 @@ nfsd_file_cache_shutdown(void) per_cpu(nfsd_file_acquisitions, i) = 0; per_cpu(nfsd_file_releases, i) = 0; per_cpu(nfsd_file_total_age, i) = 0; - per_cpu(nfsd_file_pages_flushed, i) = 0; per_cpu(nfsd_file_evictions, i) = 0; } } @@ -1237,7 +1234,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long releases = 0, pages_flushed = 0, evictions = 0; + unsigned long releases = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, buckets = 0; unsigned long lru = 0, total_age = 0; @@ -1265,7 +1262,6 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); evictions += per_cpu(nfsd_file_evictions, i); - pages_flushed += per_cpu(nfsd_file_pages_flushed, i); }
seq_printf(m, "total entries: %u\n", count); @@ -1279,6 +1275,5 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else seq_printf(m, "mean age (ms): -\n"); - seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 8214118589881b2d390284410c5ff275e7a5e03c ]
In a coming patch, we're going to rework how the filecache refcounting works. Move some code around in the function to reduce the churn in the later patches, and rename some of the functions with (hopefully) clearer names: nfsd_file_flush becomes nfsd_file_fsync, and nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue.
Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free, to better match the name of the function from which it's called.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 111 ++++++++++++++++++++++---------------------- fs/nfsd/trace.h | 4 +- 2 files changed, 58 insertions(+), 57 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b95b1be5b2e43..fb7ada3f7410e 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -334,16 +334,59 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) return nf; }
+static void +nfsd_file_fsync(struct nfsd_file *nf) +{ + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return; + if (vfs_fsync(file, 1) != 0) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); +} + +static int +nfsd_file_check_write_error(struct nfsd_file *nf) +{ + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return 0; + return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); +} + +static void +nfsd_file_hash_remove(struct nfsd_file *nf) +{ + trace_nfsd_file_unhash(nf); + + if (nfsd_file_check_write_error(nf)) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); + rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, + nfsd_file_rhash_params); +} + +static bool +nfsd_file_unhash(struct nfsd_file *nf) +{ + if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + nfsd_file_hash_remove(nf); + return true; + } + return false; +} + static bool nfsd_file_free(struct nfsd_file *nf) { s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); bool flush = false;
+ trace_nfsd_file_free(nf); + this_cpu_inc(nfsd_file_releases); this_cpu_add(nfsd_file_total_age, age);
- trace_nfsd_file_put_final(nf); if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { @@ -377,27 +420,6 @@ nfsd_file_check_writeback(struct nfsd_file *nf) mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); }
-static int -nfsd_file_check_write_error(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return 0; - return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); -} - -static void -nfsd_file_flush(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return; - if (vfs_fsync(file, 1) != 0) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - static void nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); @@ -411,31 +433,18 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) trace_nfsd_file_lru_del(nf); }
-static void -nfsd_file_hash_remove(struct nfsd_file *nf) -{ - trace_nfsd_file_unhash(nf); - - if (nfsd_file_check_write_error(nf)) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, - nfsd_file_rhash_params); -} - -static bool -nfsd_file_unhash(struct nfsd_file *nf) +struct nfsd_file * +nfsd_file_get(struct nfsd_file *nf) { - if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_hash_remove(nf); - return true; - } - return false; + if (likely(refcount_inc_not_zero(&nf->nf_ref))) + return nf; + return NULL; }
static void -nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) +nfsd_file_unhash_and_queue(struct nfsd_file *nf, struct list_head *dispose) { - trace_nfsd_file_unhash_and_dispose(nf); + trace_nfsd_file_unhash_and_queue(nf); if (nfsd_file_unhash(nf)) { /* caller must call nfsd_file_dispose_list() later */ nfsd_file_lru_remove(nf); @@ -473,7 +482,7 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_unhash_and_put(nf);
if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_flush(nf); + nfsd_file_fsync(nf); nfsd_file_put_noref(nf); } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { nfsd_file_put_noref(nf); @@ -482,14 +491,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); }
-struct nfsd_file * -nfsd_file_get(struct nfsd_file *nf) -{ - if (likely(refcount_inc_not_zero(&nf->nf_ref))) - return nf; - return NULL; -} - static void nfsd_file_dispose_list(struct list_head *dispose) { @@ -498,7 +499,7 @@ nfsd_file_dispose_list(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_flush(nf); + nfsd_file_fsync(nf); nfsd_file_put_noref(nf); } } @@ -512,7 +513,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_flush(nf); + nfsd_file_fsync(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue; if (nfsd_file_free(nf)) @@ -712,7 +713,7 @@ __nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) nfsd_file_rhash_params); if (!nf) break; - nfsd_file_unhash_and_dispose(nf, dispose); + nfsd_file_unhash_and_queue(nf, dispose); count++; } while (1); rcu_read_unlock(); @@ -914,7 +915,7 @@ __nfsd_file_cache_purge(struct net *net) nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { if (!net || nf->nf_net == net) - nfsd_file_unhash_and_dispose(nf, &dispose); + nfsd_file_unhash_and_queue(nf, &dispose); nf = rhashtable_walk_next(&iter); }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 191b206379b76..5faec08ac7cf7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -819,10 +819,10 @@ DEFINE_EVENT(nfsd_file_class, name, \ TP_PROTO(struct nfsd_file *nf), \ TP_ARGS(nf))
-DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); +DEFINE_NFSD_FILE_EVENT(nfsd_file_free); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); -DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose); +DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_queue);
TRACE_EVENT(nfsd_file_alloc, TP_PROTO(
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 22ae4c114f77b55a4c5036e8f70409a0799a08f8 ]
We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU.
Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fb7ada3f7410e..522e900a88605 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -210,12 +210,9 @@ static const struct rhashtable_params nfsd_file_rhash_params = { static void nfsd_file_schedule_laundrette(void) { - if ((atomic_read(&nfsd_file_rhash_tbl.nelems) == 0) || - test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) - return; - - queue_delayed_work(system_wq, &nfsd_filecache_laundrette, - NFSD_LAUNDRETTE_DELAY); + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags)) + queue_delayed_work(system_wq, &nfsd_filecache_laundrette, + NFSD_LAUNDRETTE_DELAY); }
static void @@ -665,7 +662,8 @@ static void nfsd_file_gc_worker(struct work_struct *work) { nfsd_file_gc(); - nfsd_file_schedule_laundrette(); + if (list_lru_count(&nfsd_file_lru)) + nfsd_file_schedule_laundrette(); }
static unsigned long
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit d7064eaf688cfe454c50db9f59298463d80d403c ]
Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset.
Examples:
<...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 ++++- fs/nfsd/trace.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 522e900a88605..6b8873b0c2c38 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -335,10 +335,13 @@ static void nfsd_file_fsync(struct nfsd_file *nf) { struct file *file = nf->nf_file; + int ret;
if (!file || !(file->f_mode & FMODE_WRITE)) return; - if (vfs_fsync(file, 1) != 0) + ret = vfs_fsync(file, 1); + trace_nfsd_file_fsync(nf, ret); + if (ret) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); }
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5faec08ac7cf7..235ea38da8644 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1151,6 +1151,37 @@ DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed);
+TRACE_EVENT(nfsd_file_fsync, + TP_PROTO( + const struct nfsd_file *nf, + int ret + ), + TP_ARGS(nf, ret), + TP_STRUCT__entry( + __field(void *, nf_inode) + __field(int, nf_ref) + __field(int, ret) + __field(unsigned long, nf_flags) + __field(unsigned char, nf_may) + __field(struct file *, nf_file) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->ret = ret; + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p ret=%d", + __entry->nf_inode, + __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file, __entry->ret + ) +); + #include "cache.h"
TRACE_DEFINE_ENUM(RC_DROPIT);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 18ebd35b61b4693a0ddc270b6d4f18def232e770 ]
vfs_lock_file() expects the struct file_lock to be fully initialised by the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field is NULL.
Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Reviewed-by: Jeff Layton jlayton@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582 Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svcsubs.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 720684345817c..e3b6229e7ae5c 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) } }
-static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) +static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl) { struct file_lock lock;
@@ -184,12 +184,15 @@ static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; - lock.fl_owner = owner; - if (file->f_file[O_RDONLY] && - vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) + lock.fl_owner = fl->fl_owner; + lock.fl_pid = fl->fl_pid; + lock.fl_flags = FL_POSIX; + + lock.fl_file = file->f_file[O_RDONLY]; + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) goto out_err; - if (file->f_file[O_WRONLY] && - vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) + lock.fl_file = file->f_file[O_WRONLY]; + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) goto out_err; return 0; out_err: @@ -226,7 +229,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, if (match(lockhost, host)) {
spin_unlock(&flctx->flc_lock); - if (nlm_unlock_files(file, fl->fl_owner)) + if (nlm_unlock_files(file, fl)) return 1; goto again; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 01d53a88c08951f88f2a42f1f1e6568928e0590e ]
With the addition of POSIX ACLs to struct nfsd_attrs, we no longer return an error if setting the ACL fails. Ensure we return the na_aclerr error on SETATTR if there is one.
Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Neil Brown neilb@suse.de Reported-by: Yongcheng Yang yoyang@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 32fccca7de185..86ab9b8ac2daf 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1135,6 +1135,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, 0, (time64_t)0); if (!status) status = nfserrno(attrs.na_labelerr); + if (!status) + status = nfserrno(attrs.na_aclerr); out: nfsd_attrs_free(&attrs); fh_drop_write(&cstate->current_fh);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiu Jianfeng xiujianfeng@huawei.com
[ Upstream commit 85a0d0c9a58002ef7d1bf5e3ea630f4fbd42a4f0 ]
Use struct_size() helper to simplify the code, no functional changes.
Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d490b19d95ddf..9a8038bfaa0d5 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1833,13 +1833,12 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs, int numslots = fattrs->maxreqs; int slotsize = slot_bytes(fattrs); struct nfsd4_session *new; - int mem, i; + int i;
- BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot *) - + sizeof(struct nfsd4_session) > PAGE_SIZE); - mem = numslots * sizeof(struct nfsd4_slot *); + BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION) + > PAGE_SIZE);
- new = kzalloc(sizeof(*new) + mem, GFP_KERNEL); + new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL); if (!new) return NULL; /* allocate each struct nfsd4_slot and data cache in one piece */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 75c7940d2a86d3f1b60a0a265478cb8fc887b970 ]
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc4proc.c | 1 + fs/lockd/svcproc.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 284b019cb6529..b72023a6b4c16 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -52,6 +52,7 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, *filp = file;
/* Set up the missing parts of the file_lock structure */ + lock->fl.fl_flags = FL_POSIX; lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_start = (loff_t)lock->lock_start; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index e35c05e278061..32784f508c810 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -77,6 +77,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Set up the missing parts of the file_lock structure */ mode = lock_to_openmode(&lock->fl); + lock->fl.fl_flags = FL_POSIX; lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 69efce009f7df888e1fede3cb2913690eb829f52 ]
Shared locks are set on O_RDONLY descriptors and exclusive locks are set on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once for each descriptor, but it doesn't reset fl_file. Ensure that it does.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svclock.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 9c1aa75441e1c..9eae99e08e699 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -659,11 +659,13 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) nlmsvc_cancel_blocked(net, file, lock);
lock->fl.fl_type = F_UNLCK; - if (file->f_file[O_RDONLY]) - error = vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, + lock->fl.fl_file = file->f_file[O_RDONLY]; + if (lock->fl.fl_file) + error = vfs_lock_file(lock->fl.fl_file, F_SETLK, &lock->fl, NULL); - if (file->f_file[O_WRONLY]) - error = vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, + lock->fl.fl_file = file->f_file[O_WRONLY]; + if (lock->fl.fl_file) + error |= vfs_lock_file(lock->fl.fl_file, F_SETLK, &lock->fl, NULL);
return (error < 0)? nlm_lck_denied_nolocks : nlm_granted;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 9f27783b4dd235ef3c8dbf69fc6322777450323c ]
We currently do a lock_to_openmode call based on the arguments from the NLM_UNLOCK call, but that will always set the fl_type of the lock to F_UNLCK, and the O_RDONLY descriptor is always chosen.
Fix it to use the file_lock from the block instead.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svclock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 9eae99e08e699..4e30f3c509701 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -699,9 +699,10 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l block = nlmsvc_lookup_block(file, lock); mutex_unlock(&file->f_mutex); if (block != NULL) { - mode = lock_to_openmode(&lock->fl); - vfs_cancel_lock(block->b_file->f_file[mode], - &block->b_call->a_args.lock.fl); + struct file_lock *fl = &block->b_call->a_args.lock.fl; + + mode = lock_to_openmode(fl); + vfs_cancel_lock(block->b_file->f_file[mode], fl); status = nlmsvc_unlink_block(block); nlmsvc_release_block(block); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brian Foster bfoster@redhat.com
[ Upstream commit 79a1d88a36f77374c77fd41a4386d8c2736b8704 ]
_nfsd_copy_file_range() calls vfs_fsync_range() with an offset and count (bytes written), but the former wants the start and end bytes of the range to sync. Fix it up.
Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done") Signed-off-by: Brian Foster bfoster@redhat.com Tested-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 86ab9b8ac2daf..cf558d951ec68 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1632,6 +1632,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; int status; + loff_t end;
/* See RFC 7862 p.67: */ if (bytes_total == 0) @@ -1651,8 +1652,8 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, /* for a non-zero asynchronous copy do a commit of data */ if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) { since = READ_ONCE(dst->f_wb_err); - status = vfs_fsync_range(dst, copy->cp_dst_pos, - copy->cp_res.wr_bytes_written, 0); + end = copy->cp_dst_pos + copy->cp_res.wr_bytes_written - 1; + status = vfs_fsync_range(dst, copy->cp_dst_pos, end, 0); if (!status) status = filemap_check_wb_err(dst->f_mapping, since); if (!status)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit a1049eb47f20b9eabf9afb218578fff16b4baca6 ]
Refactoring courtesy_client_reaper to generic low memory shrinker so it can be used for other purposes.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9a8038bfaa0d5..8fdf5ab5b9e47 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4361,7 +4361,7 @@ nfsd4_init_slabs(void) }
static unsigned long -nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) +nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { int cnt; struct nfsd_net *nn = container_of(shrink, @@ -4374,7 +4374,7 @@ nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) }
static unsigned long -nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) +nfsd4_state_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc) { return SHRINK_STOP; } @@ -4401,8 +4401,8 @@ nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB);
atomic_set(&nn->nfsd_courtesy_clients, 0); - nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; - nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; + nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; + nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; return register_shrinker(&nn->nfsd_client_shrinker); } @@ -6151,17 +6151,24 @@ laundromat_main(struct work_struct *laundry) }
static void -courtesy_client_reaper(struct work_struct *reaper) +courtesy_client_reaper(struct nfsd_net *nn) { struct list_head reaplist; - struct delayed_work *dwork = to_delayed_work(reaper); - struct nfsd_net *nn = container_of(dwork, struct nfsd_net, - nfsd_shrinker_work);
nfs4_get_courtesy_client_reaplist(nn, &reaplist); nfs4_process_client_reaplist(&reaplist); }
+static void +nfsd4_state_shrinker_worker(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + nfsd_shrinker_work); + + courtesy_client_reaper(nn); +} + static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) { if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle)) @@ -7997,7 +8004,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru);
INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); - INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, courtesy_client_reaper); + INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net);
return 0;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 3959066b697b5dfbb7141124ae9665337d4bc638 ]
Add XDR encode and decode function for CB_RECALL_ANY.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 72 ++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/state.h | 1 + fs/nfsd/xdr4.h | 5 +++ fs/nfsd/xdr4cb.h | 6 ++++ 4 files changed, 84 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 39989c14c8a1e..4eae2c5af2edf 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -76,6 +76,17 @@ static __be32 *xdr_encode_empty_array(__be32 *p) * 1 Protocol" */
+static void encode_uint32(struct xdr_stream *xdr, u32 n) +{ + WARN_ON_ONCE(xdr_stream_encode_u32(xdr, n) < 0); +} + +static void encode_bitmap4(struct xdr_stream *xdr, const __u32 *bitmap, + size_t len) +{ + WARN_ON_ONCE(xdr_stream_encode_uint32_array(xdr, bitmap, len) < 0); +} + /* * nfs_cb_opnum4 * @@ -328,6 +339,24 @@ static void encode_cb_recall4args(struct xdr_stream *xdr, hdr->nops++; }
+/* + * CB_RECALLANY4args + * + * struct CB_RECALLANY4args { + * uint32_t craa_objects_to_keep; + * bitmap4 craa_type_mask; + * }; + */ +static void +encode_cb_recallany4args(struct xdr_stream *xdr, + struct nfs4_cb_compound_hdr *hdr, struct nfsd4_cb_recall_any *ra) +{ + encode_nfs_cb_opnum4(xdr, OP_CB_RECALL_ANY); + encode_uint32(xdr, ra->ra_keep); + encode_bitmap4(xdr, ra->ra_bmval, ARRAY_SIZE(ra->ra_bmval)); + hdr->nops++; +} + /* * CB_SEQUENCE4args * @@ -482,6 +511,26 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr, encode_cb_nops(&hdr); }
+/* + * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects + */ +static void +nfs4_xdr_enc_cb_recall_any(struct rpc_rqst *req, + struct xdr_stream *xdr, const void *data) +{ + const struct nfsd4_callback *cb = data; + struct nfsd4_cb_recall_any *ra; + struct nfs4_cb_compound_hdr hdr = { + .ident = cb->cb_clp->cl_cb_ident, + .minorversion = cb->cb_clp->cl_minorversion, + }; + + ra = container_of(cb, struct nfsd4_cb_recall_any, ra_cb); + encode_cb_compound4args(xdr, &hdr); + encode_cb_sequence4args(xdr, cb, &hdr); + encode_cb_recallany4args(xdr, &hdr, ra); + encode_cb_nops(&hdr); +}
/* * NFSv4.0 and NFSv4.1 XDR decode functions @@ -520,6 +569,28 @@ static int nfs4_xdr_dec_cb_recall(struct rpc_rqst *rqstp, return decode_cb_op_status(xdr, OP_CB_RECALL, &cb->cb_status); }
+/* + * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects + */ +static int +nfs4_xdr_dec_cb_recall_any(struct rpc_rqst *rqstp, + struct xdr_stream *xdr, + void *data) +{ + struct nfsd4_callback *cb = data; + struct nfs4_cb_compound_hdr hdr; + int status; + + status = decode_cb_compound4res(xdr, &hdr); + if (unlikely(status)) + return status; + status = decode_cb_sequence4res(xdr, cb); + if (unlikely(status || cb->cb_seq_status)) + return status; + status = decode_cb_op_status(xdr, OP_CB_RECALL_ANY, &cb->cb_status); + return status; +} + #ifdef CONFIG_NFSD_PNFS /* * CB_LAYOUTRECALL4args @@ -783,6 +854,7 @@ static const struct rpc_procinfo nfs4_cb_procedures[] = { #endif PROC(CB_NOTIFY_LOCK, COMPOUND, cb_notify_lock, cb_notify_lock), PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload), + PROC(CB_RECALL_ANY, COMPOUND, cb_recall_any, cb_recall_any), };
static unsigned int nfs4_cb_counts[ARRAY_SIZE(nfs4_cb_procedures)]; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index eadd7f465bf52..e30882f8b8516 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -636,6 +636,7 @@ enum nfsd4_cb_op { NFSPROC4_CLNT_CB_OFFLOAD, NFSPROC4_CLNT_CB_SEQUENCE, NFSPROC4_CLNT_CB_NOTIFY_LOCK, + NFSPROC4_CLNT_CB_RECALL_ANY, };
/* Returns true iff a is later than b: */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8f323d9071f06..24934cf90a84f 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -896,6 +896,11 @@ struct nfsd4_operation { union nfsd4_op_u *); };
+struct nfsd4_cb_recall_any { + struct nfsd4_callback ra_cb; + u32 ra_keep; + u32 ra_bmval[1]; +};
#endif
diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h index 547cf07cf4e08..0d39af1b00a0f 100644 --- a/fs/nfsd/xdr4cb.h +++ b/fs/nfsd/xdr4cb.h @@ -48,3 +48,9 @@ #define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \ cb_sequence_dec_sz + \ op_dec_sz) +#define NFS4_enc_cb_recall_any_sz (cb_compound_enc_hdr_sz + \ + cb_sequence_enc_sz + \ + 1 + 1 + 1) +#define NFS4_dec_cb_recall_any_sz (cb_compound_dec_hdr_sz + \ + cb_sequence_dec_sz + \ + op_dec_sz)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ]
The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations.
To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds.
Signed-off-by: Dai Ngo dai.ngo@oracle.com [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 88 ++++++++++++++++++++++++++++++++++++++++++-- fs/nfsd/state.h | 5 +++ include/linux/nfs4.h | 13 +++++++ 3 files changed, 102 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8fdf5ab5b9e47..4e05ad774c861 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2144,6 +2144,7 @@ static void __free_client(struct kref *k) kfree(clp->cl_nii_domain.data); kfree(clp->cl_nii_name.data); idr_destroy(&clp->cl_stateids); + kfree(clp->cl_ra); kmem_cache_free(client_slab, clp); }
@@ -2871,6 +2872,36 @@ static const struct tree_descr client_files[] = { [3] = {""}, };
+static int +nfsd4_cb_recall_any_done(struct nfsd4_callback *cb, + struct rpc_task *task) +{ + switch (task->tk_status) { + case -NFS4ERR_DELAY: + rpc_delay(task, 2 * HZ); + return 0; + default: + return 1; + } +} + +static void +nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) +{ + struct nfs4_client *clp = cb->cb_clp; + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); + + spin_lock(&nn->client_lock); + clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); + put_client_renew_locked(clp); + spin_unlock(&nn->client_lock); +} + +static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { + .done = nfsd4_cb_recall_any_done, + .release = nfsd4_cb_recall_any_release, +}; + static struct nfs4_client *create_client(struct xdr_netobj name, struct svc_rqst *rqstp, nfs4_verifier *verf) { @@ -2908,6 +2939,14 @@ static struct nfs4_client *create_client(struct xdr_netobj name, free_client(clp); return NULL; } + clp->cl_ra = kzalloc(sizeof(*clp->cl_ra), GFP_KERNEL); + if (!clp->cl_ra) { + free_client(clp); + return NULL; + } + clp->cl_ra_time = 0; + nfsd4_init_cb(&clp->cl_ra->ra_cb, clp, &nfsd4_cb_recall_any_ops, + NFSPROC4_CLNT_CB_RECALL_ANY); return clp; }
@@ -4363,14 +4402,16 @@ nfsd4_init_slabs(void) static unsigned long nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { - int cnt; + int count; struct nfsd_net *nn = container_of(shrink, struct nfsd_net, nfsd_client_shrinker);
- cnt = atomic_read(&nn->nfsd_courtesy_clients); - if (cnt > 0) + count = atomic_read(&nn->nfsd_courtesy_clients); + if (!count) + count = atomic_long_read(&num_delegations); + if (count) mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); - return (unsigned long)cnt; + return (unsigned long)count; }
static unsigned long @@ -6159,6 +6200,44 @@ courtesy_client_reaper(struct nfsd_net *nn) nfs4_process_client_reaplist(&reaplist); }
+static void +deleg_reaper(struct nfsd_net *nn) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + struct list_head cblist; + + INIT_LIST_HEAD(&cblist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state != NFSD4_ACTIVE || + list_empty(&clp->cl_delegations) || + atomic_read(&clp->cl_delegs_in_recall) || + test_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags) || + (ktime_get_boottime_seconds() - + clp->cl_ra_time < 5)) { + continue; + } + list_add(&clp->cl_ra_cblist, &cblist); + + /* release in nfsd4_cb_recall_any_release */ + atomic_inc(&clp->cl_rpc_users); + set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); + clp->cl_ra_time = ktime_get_boottime_seconds(); + } + spin_unlock(&nn->client_lock); + + while (!list_empty(&cblist)) { + clp = list_first_entry(&cblist, struct nfs4_client, + cl_ra_cblist); + list_del_init(&clp->cl_ra_cblist); + clp->cl_ra->ra_keep = 0; + clp->cl_ra->ra_bmval[0] = BIT(RCA4_TYPE_MASK_RDATA_DLG); + nfsd4_run_cb(&clp->cl_ra->ra_cb); + } +} + static void nfsd4_state_shrinker_worker(struct work_struct *work) { @@ -6167,6 +6246,7 @@ nfsd4_state_shrinker_worker(struct work_struct *work) nfsd_shrinker_work);
courtesy_client_reaper(nn); + deleg_reaper(nn); }
static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e30882f8b8516..e94634d305912 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -368,6 +368,7 @@ struct nfs4_client { #define NFSD4_CLIENT_UPCALL_LOCK (5) /* upcall serialization */ #define NFSD4_CLIENT_CB_FLAG_MASK (1 << NFSD4_CLIENT_CB_UPDATE | \ 1 << NFSD4_CLIENT_CB_KILL) +#define NFSD4_CLIENT_CB_RECALL_ANY (6) unsigned long cl_flags; const struct cred *cl_cb_cred; struct rpc_clnt *cl_cb_client; @@ -411,6 +412,10 @@ struct nfs4_client {
unsigned int cl_state; atomic_t cl_delegs_in_recall; + + struct nfsd4_cb_recall_any *cl_ra; + time64_t cl_ra_time; + struct list_head cl_ra_cblist; };
/* struct nfs4_client_reset diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index 5b4c67c91f56a..ea88d0f462c9d 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -717,4 +717,17 @@ enum nfs4_setxattr_options { SETXATTR4_CREATE = 1, SETXATTR4_REPLACE = 2, }; + +enum { + RCA4_TYPE_MASK_RDATA_DLG = 0, + RCA4_TYPE_MASK_WDATA_DLG = 1, + RCA4_TYPE_MASK_DIR_DLG = 2, + RCA4_TYPE_MASK_FILE_LAYOUT = 3, + RCA4_TYPE_MASK_BLK_LAYOUT = 4, + RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8, + RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9, + RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12, + RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15, +}; + #endif
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9315564747cb6a570e99196b3a4880fb817635fd ]
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch().
Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 4 ++-- fs/nfsd/nfssvc.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3777b5be4253a..c32a871f0e275 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -211,7 +211,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - return rpc_drop_reply; + __set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; }
@@ -246,7 +246,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - return rpc_drop_reply; + __set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; }
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 429f38c986280..325d3d3f12110 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1060,7 +1060,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) svcxdr_init_encode(rqstp);
*statp = proc->pc_func(rqstp); - if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) + if (test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop;
if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit e78e274eb22d966258a3845acc71d3c5b8ee2ea8 ]
When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1].
There were 97 warnings produced by NFS. For example:
fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The enc/dec callbacks were defined as passing "void *" as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences.
Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name:
@find@ identifier func; type T, opsT; identifier ops, N; @@
opsT ops[] = { [N] = (T) func, };
@already_void@ identifier find.func; identifier name; @@
func(..., -void +union nfsd4_op_u *name) { ... }
@proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@
func@p(..., T name ) { ... }
@script:python get_member@ type_name << proto.T; member; @@
coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])
@convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@
func@p(..., - T name + union nfsd4_op_u *u ) { + T name = &u->member; ... }
@cast@ identifier find.func; type T, opsT; identifier ops, N; @@
opsT ops[] = { [N] = - (T) func, };
Cc: Chuck Lever chuck.lever@oracle.com Cc: Jeff Layton jlayton@kernel.org Cc: Gustavo A. R. Silva gustavoars@kernel.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 632 +++++++++++++++++++++++++++------------------- 1 file changed, 377 insertions(+), 255 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 27b38ce6f8c6e..8bbf85da1de05 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -770,16 +770,18 @@ nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs)
static __be32 nfsd4_decode_access(struct nfsd4_compoundargs *argp, - struct nfsd4_access *access) + union nfsd4_op_u *u) { + struct nfsd4_access *access = &u->access; if (xdr_stream_decode_u32(argp->xdr, &access->ac_req_access) < 0) return nfserr_bad_xdr; return nfs_ok; }
static __be32 -nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) +nfsd4_decode_close(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_close *close = &u->close; if (xdr_stream_decode_u32(argp->xdr, &close->cl_seqid) < 0) return nfserr_bad_xdr; return nfsd4_decode_stateid4(argp, &close->cl_stateid); @@ -787,8 +789,9 @@ nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close)
static __be32 -nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit) +nfsd4_decode_commit(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_commit *commit = &u->commit; if (xdr_stream_decode_u64(argp->xdr, &commit->co_offset) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) @@ -798,8 +801,9 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit }
static __be32 -nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) +nfsd4_decode_create(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_create *create = &u->create; __be32 *p, status;
memset(create, 0, sizeof(*create)); @@ -844,22 +848,25 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create }
static inline __be32 -nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegreturn *dr) +nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_delegreturn *dr = &u->delegreturn; return nfsd4_decode_stateid4(argp, &dr->dr_stateid); }
static inline __be32 -nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) +nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_getattr *getattr = &u->getattr; memset(getattr, 0, sizeof(*getattr)); return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, ARRAY_SIZE(getattr->ga_bmval)); }
static __be32 -nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) +nfsd4_decode_link(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_link *link = &u->link; memset(link, 0, sizeof(*link)); return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); } @@ -907,8 +914,9 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) }
static __be32 -nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) +nfsd4_decode_lock(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lock *lock = &u->lock; memset(lock, 0, sizeof(*lock)); if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) return nfserr_bad_xdr; @@ -924,8 +932,9 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) }
static __be32 -nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) +nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lockt *lockt = &u->lockt; memset(lockt, 0, sizeof(*lockt)); if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) return nfserr_bad_xdr; @@ -940,8 +949,9 @@ nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) }
static __be32 -nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) +nfsd4_decode_locku(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_locku *locku = &u->locku; __be32 status;
if (xdr_stream_decode_u32(argp->xdr, &locku->lu_type) < 0) @@ -962,8 +972,9 @@ nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) }
static __be32 -nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup) +nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lookup *lookup = &u->lookup; return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); }
@@ -1143,8 +1154,9 @@ nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, }
static __be32 -nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +nfsd4_decode_open(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_open *open = &u->open; __be32 status; u32 dummy;
@@ -1171,8 +1183,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) }
static __be32 -nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_confirm *open_conf) +nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_open_confirm *open_conf = &u->open_confirm; __be32 status;
if (argp->minorversion >= 1) @@ -1190,8 +1204,10 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con }
static __be32 -nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_downgrade *open_down) +nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_open_downgrade *open_down = &u->open_downgrade; __be32 status;
memset(open_down, 0, sizeof(*open_down)); @@ -1209,8 +1225,9 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d }
static __be32 -nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) +nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_putfh *putfh = &u->putfh; __be32 *p;
if (xdr_stream_decode_u32(argp->xdr, &putfh->pf_fhlen) < 0) @@ -1229,7 +1246,7 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) }
static __be32 -nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { if (argp->minorversion == 0) return nfs_ok; @@ -1237,8 +1254,9 @@ nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) }
static __be32 -nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) +nfsd4_decode_read(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; __be32 status;
memset(read, 0, sizeof(*read)); @@ -1254,8 +1272,9 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) }
static __be32 -nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *readdir) +nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_readdir *readdir = &u->readdir; __be32 status;
memset(readdir, 0, sizeof(*readdir)); @@ -1276,15 +1295,17 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read }
static __be32 -nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) +nfsd4_decode_remove(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_remove *remove = &u->remove; memset(&remove->rm_cinfo, 0, sizeof(remove->rm_cinfo)); return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); }
static __be32 -nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename) +nfsd4_decode_rename(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_rename *rename = &u->rename; __be32 status;
memset(rename, 0, sizeof(*rename)); @@ -1295,22 +1316,25 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename }
static __be32 -nfsd4_decode_renew(struct nfsd4_compoundargs *argp, clientid_t *clientid) +nfsd4_decode_renew(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + clientid_t *clientid = &u->renew; return nfsd4_decode_clientid4(argp, clientid); }
static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo *secinfo = &u->secinfo; secinfo->si_exp = NULL; return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); }
static __be32 -nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *setattr) +nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_setattr *setattr = &u->setattr; __be32 status;
memset(setattr, 0, sizeof(*setattr)); @@ -1324,8 +1348,9 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta }
static __be32 -nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid *setclientid) +nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_setclientid *setclientid = &u->setclientid; __be32 *p, status;
memset(setclientid, 0, sizeof(*setclientid)); @@ -1367,8 +1392,10 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient }
static __be32 -nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid_confirm *scd_c) +nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_setclientid_confirm *scd_c = &u->setclientid_confirm; __be32 status;
if (argp->minorversion >= 1) @@ -1382,8 +1409,9 @@ nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_s
/* Also used for NVERIFY */ static __be32 -nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify) +nfsd4_decode_verify(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_verify *verify = &u->verify; __be32 *p, status;
memset(verify, 0, sizeof(*verify)); @@ -1409,8 +1437,9 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify }
static __be32 -nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) +nfsd4_decode_write(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_write *write = &u->write; __be32 status;
status = nfsd4_decode_stateid4(argp, &write->wr_stateid); @@ -1434,8 +1463,10 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) }
static __be32 -nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_release_lockowner *rlockowner) +nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; __be32 status;
if (argp->minorversion >= 1) @@ -1452,16 +1483,20 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel return nfs_ok; }
-static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) +static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_backchannel_ctl *bc = &u->backchannel_ctl; memset(bc, 0, sizeof(*bc)); if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) return nfserr_bad_xdr; return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); }
-static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) +static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; u32 use_conn_in_rdma_mode; __be32 status;
@@ -1603,8 +1638,9 @@ nfsd4_decode_nfs_impl_id4(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) + union nfsd4_op_u *u) { + struct nfsd4_exchange_id *exid = &u->exchange_id; __be32 status;
memset(exid, 0, sizeof(*exid)); @@ -1656,8 +1692,9 @@ nfsd4_decode_channel_attrs4(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, - struct nfsd4_create_session *sess) + union nfsd4_op_u *u) { + struct nfsd4_create_session *sess = &u->create_session; __be32 status;
memset(sess, 0, sizeof(*sess)); @@ -1681,23 +1718,26 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_destroy_session(struct nfsd4_compoundargs *argp, - struct nfsd4_destroy_session *destroy_session) + union nfsd4_op_u *u) { + struct nfsd4_destroy_session *destroy_session = &u->destroy_session; return nfsd4_decode_sessionid4(argp, &destroy_session->sessionid); }
static __be32 nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, - struct nfsd4_free_stateid *free_stateid) + union nfsd4_op_u *u) { + struct nfsd4_free_stateid *free_stateid = &u->free_stateid; return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); }
#ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, - struct nfsd4_getdeviceinfo *gdev) + union nfsd4_op_u *u) { + struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; __be32 status;
memset(gdev, 0, sizeof(*gdev)); @@ -1717,8 +1757,9 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutcommit *lcp) + union nfsd4_op_u *u) { + struct nfsd4_layoutcommit *lcp = &u->layoutcommit; __be32 *p, status;
memset(lcp, 0, sizeof(*lcp)); @@ -1753,8 +1794,9 @@ nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutget *lgp) + union nfsd4_op_u *u) { + struct nfsd4_layoutget *lgp = &u->layoutget; __be32 status;
memset(lgp, 0, sizeof(*lgp)); @@ -1781,8 +1823,9 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutreturn *lrp) + union nfsd4_op_u *u) { + struct nfsd4_layoutreturn *lrp = &u->layoutreturn; memset(lrp, 0, sizeof(*lrp)); if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) return nfserr_bad_xdr; @@ -1795,8 +1838,9 @@ nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, #endif /* CONFIG_NFSD_PNFS */
static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo_no_name *sin) + union nfsd4_op_u *u) { + struct nfsd4_secinfo_no_name *sin = &u->secinfo_no_name; if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) return nfserr_bad_xdr;
@@ -1806,8 +1850,9 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, - struct nfsd4_sequence *seq) + union nfsd4_op_u *u) { + struct nfsd4_sequence *seq = &u->sequence; __be32 *p, status;
status = nfsd4_decode_sessionid4(argp, &seq->sessionid); @@ -1826,8 +1871,10 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, }
static __be32 -nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) +nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct nfsd4_test_stateid_id *stateid; __be32 status; u32 i; @@ -1852,14 +1899,16 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta }
static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, - struct nfsd4_destroy_clientid *dc) + union nfsd4_op_u *u) { + struct nfsd4_destroy_clientid *dc = &u->destroy_clientid; return nfsd4_decode_clientid4(argp, &dc->clientid); }
static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, - struct nfsd4_reclaim_complete *rc) + union nfsd4_op_u *u) { + struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; if (xdr_stream_decode_bool(argp->xdr, &rc->rca_one_fs) < 0) return nfserr_bad_xdr; return nfs_ok; @@ -1867,8 +1916,9 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, - struct nfsd4_fallocate *fallocate) + union nfsd4_op_u *u) { + struct nfsd4_fallocate *fallocate = &u->allocate; __be32 status;
status = nfsd4_decode_stateid4(argp, &fallocate->falloc_stateid); @@ -1924,8 +1974,9 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, }
static __be32 -nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) +nfsd4_decode_copy(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_copy *copy = &u->copy; u32 consecutive, i, count, sync; struct nl4_server *ns_dummy; __be32 status; @@ -1982,8 +2033,9 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
static __be32 nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, - struct nfsd4_copy_notify *cn) + union nfsd4_op_u *u) { + struct nfsd4_copy_notify *cn = &u->copy_notify; __be32 status;
memset(cn, 0, sizeof(*cn)); @@ -2002,16 +2054,18 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, - struct nfsd4_offload_status *os) + union nfsd4_op_u *u) { + struct nfsd4_offload_status *os = &u->offload_status; os->count = 0; os->status = 0; return nfsd4_decode_stateid4(argp, &os->stateid); }
static __be32 -nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) +nfsd4_decode_seek(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_seek *seek = &u->seek; __be32 status;
status = nfsd4_decode_stateid4(argp, &seek->seek_stateid); @@ -2028,8 +2082,9 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) }
static __be32 -nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_clone *clone = &u->clone; __be32 status;
status = nfsd4_decode_stateid4(argp, &clone->cl_src_stateid); @@ -2154,8 +2209,9 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) */ static __be32 nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, - struct nfsd4_getxattr *getxattr) + union nfsd4_op_u *u) { + struct nfsd4_getxattr *getxattr = &u->getxattr; __be32 status; u32 maxcount;
@@ -2173,8 +2229,9 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, - struct nfsd4_setxattr *setxattr) + union nfsd4_op_u *u) { + struct nfsd4_setxattr *setxattr = &u->setxattr; u32 flags, maxcount, size; __be32 status;
@@ -2214,8 +2271,9 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, - struct nfsd4_listxattrs *listxattrs) + union nfsd4_op_u *u) { + struct nfsd4_listxattrs *listxattrs = &u->listxattrs; u32 maxcount;
memset(listxattrs, 0, sizeof(*listxattrs)); @@ -2245,113 +2303,114 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp,
static __be32 nfsd4_decode_removexattr(struct nfsd4_compoundargs *argp, - struct nfsd4_removexattr *removexattr) + union nfsd4_op_u *u) { + struct nfsd4_removexattr *removexattr = &u->removexattr; memset(removexattr, 0, sizeof(*removexattr)); return nfsd4_decode_xattr_name(argp, &removexattr->rmxa_name); }
static __be32 -nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_noop(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { return nfs_ok; }
static __be32 -nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { return nfserr_notsupp; }
-typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, void *); +typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u);
static const nfsd4_dec nfsd4_dec_ops[] = { - [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, - [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close, - [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit, - [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create, - [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn, - [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr, - [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_LINK] = (nfsd4_dec)nfsd4_decode_link, - [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock, - [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt, - [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku, - [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup, - [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop, - [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify, - [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open, - [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_open_confirm, - [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade, - [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh, - [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_putpubfh, - [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_READ] = (nfsd4_dec)nfsd4_decode_read, - [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir, - [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop, - [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove, - [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename, - [OP_RENEW] = (nfsd4_dec)nfsd4_decode_renew, - [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo, - [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr, - [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_setclientid, - [OP_SETCLIENTID_CONFIRM] = (nfsd4_dec)nfsd4_decode_setclientid_confirm, - [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify, - [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write, - [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_release_lockowner, + [OP_ACCESS] = nfsd4_decode_access, + [OP_CLOSE] = nfsd4_decode_close, + [OP_COMMIT] = nfsd4_decode_commit, + [OP_CREATE] = nfsd4_decode_create, + [OP_DELEGPURGE] = nfsd4_decode_notsupp, + [OP_DELEGRETURN] = nfsd4_decode_delegreturn, + [OP_GETATTR] = nfsd4_decode_getattr, + [OP_GETFH] = nfsd4_decode_noop, + [OP_LINK] = nfsd4_decode_link, + [OP_LOCK] = nfsd4_decode_lock, + [OP_LOCKT] = nfsd4_decode_lockt, + [OP_LOCKU] = nfsd4_decode_locku, + [OP_LOOKUP] = nfsd4_decode_lookup, + [OP_LOOKUPP] = nfsd4_decode_noop, + [OP_NVERIFY] = nfsd4_decode_verify, + [OP_OPEN] = nfsd4_decode_open, + [OP_OPENATTR] = nfsd4_decode_notsupp, + [OP_OPEN_CONFIRM] = nfsd4_decode_open_confirm, + [OP_OPEN_DOWNGRADE] = nfsd4_decode_open_downgrade, + [OP_PUTFH] = nfsd4_decode_putfh, + [OP_PUTPUBFH] = nfsd4_decode_putpubfh, + [OP_PUTROOTFH] = nfsd4_decode_noop, + [OP_READ] = nfsd4_decode_read, + [OP_READDIR] = nfsd4_decode_readdir, + [OP_READLINK] = nfsd4_decode_noop, + [OP_REMOVE] = nfsd4_decode_remove, + [OP_RENAME] = nfsd4_decode_rename, + [OP_RENEW] = nfsd4_decode_renew, + [OP_RESTOREFH] = nfsd4_decode_noop, + [OP_SAVEFH] = nfsd4_decode_noop, + [OP_SECINFO] = nfsd4_decode_secinfo, + [OP_SETATTR] = nfsd4_decode_setattr, + [OP_SETCLIENTID] = nfsd4_decode_setclientid, + [OP_SETCLIENTID_CONFIRM] = nfsd4_decode_setclientid_confirm, + [OP_VERIFY] = nfsd4_decode_verify, + [OP_WRITE] = nfsd4_decode_write, + [OP_RELEASE_LOCKOWNER] = nfsd4_decode_release_lockowner,
/* new operations for NFSv4.1 */ - [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl, - [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session, - [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id, - [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session, - [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session, - [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid, - [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_BACKCHANNEL_CTL] = nfsd4_decode_backchannel_ctl, + [OP_BIND_CONN_TO_SESSION] = nfsd4_decode_bind_conn_to_session, + [OP_EXCHANGE_ID] = nfsd4_decode_exchange_id, + [OP_CREATE_SESSION] = nfsd4_decode_create_session, + [OP_DESTROY_SESSION] = nfsd4_decode_destroy_session, + [OP_FREE_STATEID] = nfsd4_decode_free_stateid, + [OP_GET_DIR_DELEGATION] = nfsd4_decode_notsupp, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_getdeviceinfo, - [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_layoutcommit, - [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_layoutget, - [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_layoutreturn, + [OP_GETDEVICEINFO] = nfsd4_decode_getdeviceinfo, + [OP_GETDEVICELIST] = nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = nfsd4_decode_layoutcommit, + [OP_LAYOUTGET] = nfsd4_decode_layoutget, + [OP_LAYOUTRETURN] = nfsd4_decode_layoutreturn, #else - [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_GETDEVICEINFO] = nfsd4_decode_notsupp, + [OP_GETDEVICELIST] = nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = nfsd4_decode_notsupp, + [OP_LAYOUTGET] = nfsd4_decode_notsupp, + [OP_LAYOUTRETURN] = nfsd4_decode_notsupp, #endif - [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name, - [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence, - [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid, - [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid, - [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete, + [OP_SECINFO_NO_NAME] = nfsd4_decode_secinfo_no_name, + [OP_SEQUENCE] = nfsd4_decode_sequence, + [OP_SET_SSV] = nfsd4_decode_notsupp, + [OP_TEST_STATEID] = nfsd4_decode_test_stateid, + [OP_WANT_DELEGATION] = nfsd4_decode_notsupp, + [OP_DESTROY_CLIENTID] = nfsd4_decode_destroy_clientid, + [OP_RECLAIM_COMPLETE] = nfsd4_decode_reclaim_complete,
/* new operations for NFSv4.2 */ - [OP_ALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, - [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy, - [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_copy_notify, - [OP_DEALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, - [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTERROR] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTSTATS] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_OFFLOAD_CANCEL] = (nfsd4_dec)nfsd4_decode_offload_status, - [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_offload_status, - [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read, - [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek, - [OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone, + [OP_ALLOCATE] = nfsd4_decode_fallocate, + [OP_COPY] = nfsd4_decode_copy, + [OP_COPY_NOTIFY] = nfsd4_decode_copy_notify, + [OP_DEALLOCATE] = nfsd4_decode_fallocate, + [OP_IO_ADVISE] = nfsd4_decode_notsupp, + [OP_LAYOUTERROR] = nfsd4_decode_notsupp, + [OP_LAYOUTSTATS] = nfsd4_decode_notsupp, + [OP_OFFLOAD_CANCEL] = nfsd4_decode_offload_status, + [OP_OFFLOAD_STATUS] = nfsd4_decode_offload_status, + [OP_READ_PLUS] = nfsd4_decode_read, + [OP_SEEK] = nfsd4_decode_seek, + [OP_WRITE_SAME] = nfsd4_decode_notsupp, + [OP_CLONE] = nfsd4_decode_clone, /* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = (nfsd4_dec)nfsd4_decode_getxattr, - [OP_SETXATTR] = (nfsd4_dec)nfsd4_decode_setxattr, - [OP_LISTXATTRS] = (nfsd4_dec)nfsd4_decode_listxattrs, - [OP_REMOVEXATTR] = (nfsd4_dec)nfsd4_decode_removexattr, + [OP_GETXATTR] = nfsd4_decode_getxattr, + [OP_SETXATTR] = nfsd4_decode_setxattr, + [OP_LISTXATTRS] = nfsd4_decode_listxattrs, + [OP_REMOVEXATTR] = nfsd4_decode_removexattr, };
static inline bool @@ -3641,8 +3700,10 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid) }
static __be32 -nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access) +nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_access *access = &u->access; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3654,8 +3715,10 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ return 0; }
-static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts) +static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3671,8 +3734,10 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, }
static __be32 -nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close) +nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_close *close = &u->close; struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &close->cl_stateid); @@ -3680,8 +3745,10 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c
static __be32 -nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit) +nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_commit *commit = &u->commit; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3694,8 +3761,10 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ }
static __be32 -nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create) +nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_create *create = &u->create; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3708,8 +3777,10 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ }
static __be32 -nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr) +nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_getattr *getattr = &u->getattr; struct svc_fh *fhp = getattr->ga_fhp; struct xdr_stream *xdr = resp->xdr;
@@ -3718,8 +3789,10 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 }
static __be32 -nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp) +nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct svc_fh **fhpp = &u->getfh; struct xdr_stream *xdr = resp->xdr; struct svc_fh *fhp = *fhpp; unsigned int len; @@ -3773,8 +3846,10 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld) }
static __be32 -nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock) +nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_lock *lock = &u->lock; struct xdr_stream *xdr = resp->xdr;
if (!nfserr) @@ -3786,8 +3861,10 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo }
static __be32 -nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt) +nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_lockt *lockt = &u->lockt; struct xdr_stream *xdr = resp->xdr;
if (nfserr == nfserr_denied) @@ -3796,8 +3873,10 @@ nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l }
static __be32 -nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku) +nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_locku *locku = &u->locku; struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &locku->lu_stateid); @@ -3805,8 +3884,10 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l
static __be32 -nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link) +nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_link *link = &u->link; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3819,8 +3900,10 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
static __be32 -nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open) +nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open *open = &u->open; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -3913,16 +3996,20 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op }
static __be32 -nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc) +nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open_confirm *oc = &u->open_confirm; struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid); }
static __be32 -nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od) +nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open_downgrade *od = &u->open_downgrade; struct xdr_stream *xdr = resp->xdr;
return nfsd4_encode_stateid(xdr, &od->od_stateid); @@ -4021,8 +4108,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
static __be32 nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_read *read) + union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); unsigned long maxcount; struct xdr_stream *xdr = resp->xdr; @@ -4063,8 +4151,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, }
static __be32 -nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink) +nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_readlink *readlink = &u->readlink; __be32 *p, *maxcount_p, zero = xdr_zero; struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; @@ -4108,8 +4198,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd }
static __be32 -nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readdir *readdir) +nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_readdir *readdir = &u->readdir; int maxcount; int bytes_left; loff_t offset; @@ -4199,8 +4291,10 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 }
static __be32 -nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove) +nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_remove *remove = &u->remove; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4212,8 +4306,10 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ }
static __be32 -nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename) +nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_rename *rename = &u->rename; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4295,8 +4391,9 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr, struct svc_export *exp)
static __be32 nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_secinfo *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo *secinfo = &u->secinfo; struct xdr_stream *xdr = resp->xdr;
return nfsd4_do_encode_secinfo(xdr, secinfo->si_exp); @@ -4304,8 +4401,9 @@ nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_secinfo_no_name *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo_no_name *secinfo = &u->secinfo_no_name; struct xdr_stream *xdr = resp->xdr;
return nfsd4_do_encode_secinfo(xdr, secinfo->sin_exp); @@ -4316,8 +4414,10 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, * regardless of the error status. */ static __be32 -nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr) +nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_setattr *setattr = &u->setattr; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4340,8 +4440,10 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 }
static __be32 -nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd) +nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_setclientid *scd = &u->setclientid; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4364,8 +4466,10 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n }
static __be32 -nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write) +nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_write *write = &u->write; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4381,8 +4485,9 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
static __be32 nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_exchange_id *exid) + union nfsd4_op_u *u) { + struct nfsd4_exchange_id *exid = &u->exchange_id; struct xdr_stream *xdr = resp->xdr; __be32 *p; char *major_id; @@ -4459,8 +4564,9 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_create_session *sess) + union nfsd4_op_u *u) { + struct nfsd4_create_session *sess = &u->create_session; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4512,8 +4618,9 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_sequence *seq) + union nfsd4_op_u *u) { + struct nfsd4_sequence *seq = &u->sequence; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4535,8 +4642,9 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_test_stateid *test_stateid) + union nfsd4_op_u *u) { + struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct xdr_stream *xdr = resp->xdr; struct nfsd4_test_stateid_id *stateid, *next; __be32 *p; @@ -4556,8 +4664,9 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_getdeviceinfo *gdev) + union nfsd4_op_u *u) { + struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; u32 starting_len = xdr->buf->len, needed_len; @@ -4609,8 +4718,9 @@ nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutget *lgp) + union nfsd4_op_u *u) { + struct nfsd4_layoutget *lgp = &u->layoutget; struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; __be32 *p; @@ -4636,8 +4746,9 @@ nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutcommit *lcp) + union nfsd4_op_u *u) { + struct nfsd4_layoutcommit *lcp = &u->layoutcommit; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4657,8 +4768,9 @@ nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutreturn *lrp) + union nfsd4_op_u *u) { + struct nfsd4_layoutreturn *lrp = &u->layoutreturn; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4743,8 +4855,9 @@ nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns)
static __be32 nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_copy *copy) + union nfsd4_op_u *u) { + struct nfsd4_copy *copy = &u->copy; __be32 *p;
nfserr = nfsd42_encode_write_res(resp, ©->cp_res, @@ -4760,8 +4873,9 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_offload_status *os) + union nfsd4_op_u *u) { + struct nfsd4_offload_status *os = &u->offload_status; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4811,8 +4925,9 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
static __be32 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_read *read) + union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; struct file *file = read->rd_nf->nf_file; struct xdr_stream *xdr = resp->xdr; int starting_len = xdr->buf->len; @@ -4848,8 +4963,9 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_copy_notify *cn) + union nfsd4_op_u *u) { + struct nfsd4_copy_notify *cn = &u->copy_notify; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -4883,8 +4999,9 @@ nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_seek *seek) + union nfsd4_op_u *u) { + struct nfsd4_seek *seek = &u->seek; __be32 *p;
p = xdr_reserve_space(resp->xdr, 4 + 8); @@ -4895,7 +5012,8 @@ nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, }
static __be32 -nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p) +nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *p) { return nfserr; } @@ -4946,8 +5064,9 @@ nfsd4_vbuf_to_stream(struct xdr_stream *xdr, char *buf, u32 buflen)
static __be32 nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_getxattr *getxattr) + union nfsd4_op_u *u) { + struct nfsd4_getxattr *getxattr = &u->getxattr; struct xdr_stream *xdr = resp->xdr; __be32 *p, err;
@@ -4970,8 +5089,9 @@ nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_setxattr *setxattr) + union nfsd4_op_u *u) { + struct nfsd4_setxattr *setxattr = &u->setxattr; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -5011,8 +5131,9 @@ nfsd4_listxattr_validate_cookie(struct nfsd4_listxattrs *listxattrs,
static __be32 nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_listxattrs *listxattrs) + union nfsd4_op_u *u) { + struct nfsd4_listxattrs *listxattrs = &u->listxattrs; struct xdr_stream *xdr = resp->xdr; u32 cookie_offset, count_offset, eof; u32 left, xdrleft, slen, count; @@ -5122,8 +5243,9 @@ nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32 nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_removexattr *removexattr) + union nfsd4_op_u *u) { + struct nfsd4_removexattr *removexattr = &u->removexattr; struct xdr_stream *xdr = resp->xdr; __be32 *p;
@@ -5135,7 +5257,7 @@ nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, return 0; }
-typedef __be32(* nfsd4_enc)(struct nfsd4_compoundres *, __be32, void *); +typedef __be32(*nfsd4_enc)(struct nfsd4_compoundres *, __be32, union nfsd4_op_u *u);
/* * Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 @@ -5143,93 +5265,93 @@ typedef __be32(* nfsd4_enc)(struct nfsd4_compoundres *, __be32, void *); * done in the decoding phase. */ static const nfsd4_enc nfsd4_enc_ops[] = { - [OP_ACCESS] = (nfsd4_enc)nfsd4_encode_access, - [OP_CLOSE] = (nfsd4_enc)nfsd4_encode_close, - [OP_COMMIT] = (nfsd4_enc)nfsd4_encode_commit, - [OP_CREATE] = (nfsd4_enc)nfsd4_encode_create, - [OP_DELEGPURGE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_DELEGRETURN] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GETATTR] = (nfsd4_enc)nfsd4_encode_getattr, - [OP_GETFH] = (nfsd4_enc)nfsd4_encode_getfh, - [OP_LINK] = (nfsd4_enc)nfsd4_encode_link, - [OP_LOCK] = (nfsd4_enc)nfsd4_encode_lock, - [OP_LOCKT] = (nfsd4_enc)nfsd4_encode_lockt, - [OP_LOCKU] = (nfsd4_enc)nfsd4_encode_locku, - [OP_LOOKUP] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LOOKUPP] = (nfsd4_enc)nfsd4_encode_noop, - [OP_NVERIFY] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OPEN] = (nfsd4_enc)nfsd4_encode_open, - [OP_OPENATTR] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OPEN_CONFIRM] = (nfsd4_enc)nfsd4_encode_open_confirm, - [OP_OPEN_DOWNGRADE] = (nfsd4_enc)nfsd4_encode_open_downgrade, - [OP_PUTFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_PUTPUBFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_PUTROOTFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_READ] = (nfsd4_enc)nfsd4_encode_read, - [OP_READDIR] = (nfsd4_enc)nfsd4_encode_readdir, - [OP_READLINK] = (nfsd4_enc)nfsd4_encode_readlink, - [OP_REMOVE] = (nfsd4_enc)nfsd4_encode_remove, - [OP_RENAME] = (nfsd4_enc)nfsd4_encode_rename, - [OP_RENEW] = (nfsd4_enc)nfsd4_encode_noop, - [OP_RESTOREFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_SAVEFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_SECINFO] = (nfsd4_enc)nfsd4_encode_secinfo, - [OP_SETATTR] = (nfsd4_enc)nfsd4_encode_setattr, - [OP_SETCLIENTID] = (nfsd4_enc)nfsd4_encode_setclientid, - [OP_SETCLIENTID_CONFIRM] = (nfsd4_enc)nfsd4_encode_noop, - [OP_VERIFY] = (nfsd4_enc)nfsd4_encode_noop, - [OP_WRITE] = (nfsd4_enc)nfsd4_encode_write, - [OP_RELEASE_LOCKOWNER] = (nfsd4_enc)nfsd4_encode_noop, + [OP_ACCESS] = nfsd4_encode_access, + [OP_CLOSE] = nfsd4_encode_close, + [OP_COMMIT] = nfsd4_encode_commit, + [OP_CREATE] = nfsd4_encode_create, + [OP_DELEGPURGE] = nfsd4_encode_noop, + [OP_DELEGRETURN] = nfsd4_encode_noop, + [OP_GETATTR] = nfsd4_encode_getattr, + [OP_GETFH] = nfsd4_encode_getfh, + [OP_LINK] = nfsd4_encode_link, + [OP_LOCK] = nfsd4_encode_lock, + [OP_LOCKT] = nfsd4_encode_lockt, + [OP_LOCKU] = nfsd4_encode_locku, + [OP_LOOKUP] = nfsd4_encode_noop, + [OP_LOOKUPP] = nfsd4_encode_noop, + [OP_NVERIFY] = nfsd4_encode_noop, + [OP_OPEN] = nfsd4_encode_open, + [OP_OPENATTR] = nfsd4_encode_noop, + [OP_OPEN_CONFIRM] = nfsd4_encode_open_confirm, + [OP_OPEN_DOWNGRADE] = nfsd4_encode_open_downgrade, + [OP_PUTFH] = nfsd4_encode_noop, + [OP_PUTPUBFH] = nfsd4_encode_noop, + [OP_PUTROOTFH] = nfsd4_encode_noop, + [OP_READ] = nfsd4_encode_read, + [OP_READDIR] = nfsd4_encode_readdir, + [OP_READLINK] = nfsd4_encode_readlink, + [OP_REMOVE] = nfsd4_encode_remove, + [OP_RENAME] = nfsd4_encode_rename, + [OP_RENEW] = nfsd4_encode_noop, + [OP_RESTOREFH] = nfsd4_encode_noop, + [OP_SAVEFH] = nfsd4_encode_noop, + [OP_SECINFO] = nfsd4_encode_secinfo, + [OP_SETATTR] = nfsd4_encode_setattr, + [OP_SETCLIENTID] = nfsd4_encode_setclientid, + [OP_SETCLIENTID_CONFIRM] = nfsd4_encode_noop, + [OP_VERIFY] = nfsd4_encode_noop, + [OP_WRITE] = nfsd4_encode_write, + [OP_RELEASE_LOCKOWNER] = nfsd4_encode_noop,
/* NFSv4.1 operations */ - [OP_BACKCHANNEL_CTL] = (nfsd4_enc)nfsd4_encode_noop, - [OP_BIND_CONN_TO_SESSION] = (nfsd4_enc)nfsd4_encode_bind_conn_to_session, - [OP_EXCHANGE_ID] = (nfsd4_enc)nfsd4_encode_exchange_id, - [OP_CREATE_SESSION] = (nfsd4_enc)nfsd4_encode_create_session, - [OP_DESTROY_SESSION] = (nfsd4_enc)nfsd4_encode_noop, - [OP_FREE_STATEID] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GET_DIR_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, + [OP_BACKCHANNEL_CTL] = nfsd4_encode_noop, + [OP_BIND_CONN_TO_SESSION] = nfsd4_encode_bind_conn_to_session, + [OP_EXCHANGE_ID] = nfsd4_encode_exchange_id, + [OP_CREATE_SESSION] = nfsd4_encode_create_session, + [OP_DESTROY_SESSION] = nfsd4_encode_noop, + [OP_FREE_STATEID] = nfsd4_encode_noop, + [OP_GET_DIR_DELEGATION] = nfsd4_encode_noop, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_getdeviceinfo, - [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_layoutcommit, - [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_layoutget, - [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_layoutreturn, + [OP_GETDEVICEINFO] = nfsd4_encode_getdeviceinfo, + [OP_GETDEVICELIST] = nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = nfsd4_encode_layoutcommit, + [OP_LAYOUTGET] = nfsd4_encode_layoutget, + [OP_LAYOUTRETURN] = nfsd4_encode_layoutreturn, #else - [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop, + [OP_GETDEVICEINFO] = nfsd4_encode_noop, + [OP_GETDEVICELIST] = nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = nfsd4_encode_noop, + [OP_LAYOUTGET] = nfsd4_encode_noop, + [OP_LAYOUTRETURN] = nfsd4_encode_noop, #endif - [OP_SECINFO_NO_NAME] = (nfsd4_enc)nfsd4_encode_secinfo_no_name, - [OP_SEQUENCE] = (nfsd4_enc)nfsd4_encode_sequence, - [OP_SET_SSV] = (nfsd4_enc)nfsd4_encode_noop, - [OP_TEST_STATEID] = (nfsd4_enc)nfsd4_encode_test_stateid, - [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, - [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop, - [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_SECINFO_NO_NAME] = nfsd4_encode_secinfo_no_name, + [OP_SEQUENCE] = nfsd4_encode_sequence, + [OP_SET_SSV] = nfsd4_encode_noop, + [OP_TEST_STATEID] = nfsd4_encode_test_stateid, + [OP_WANT_DELEGATION] = nfsd4_encode_noop, + [OP_DESTROY_CLIENTID] = nfsd4_encode_noop, + [OP_RECLAIM_COMPLETE] = nfsd4_encode_noop,
/* NFSv4.2 operations */ - [OP_ALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy, - [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_copy_notify, - [OP_DEALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTERROR] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTSTATS] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OFFLOAD_CANCEL] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_offload_status, - [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus, - [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek, - [OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop, - [OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_ALLOCATE] = nfsd4_encode_noop, + [OP_COPY] = nfsd4_encode_copy, + [OP_COPY_NOTIFY] = nfsd4_encode_copy_notify, + [OP_DEALLOCATE] = nfsd4_encode_noop, + [OP_IO_ADVISE] = nfsd4_encode_noop, + [OP_LAYOUTERROR] = nfsd4_encode_noop, + [OP_LAYOUTSTATS] = nfsd4_encode_noop, + [OP_OFFLOAD_CANCEL] = nfsd4_encode_noop, + [OP_OFFLOAD_STATUS] = nfsd4_encode_offload_status, + [OP_READ_PLUS] = nfsd4_encode_read_plus, + [OP_SEEK] = nfsd4_encode_seek, + [OP_WRITE_SAME] = nfsd4_encode_noop, + [OP_CLONE] = nfsd4_encode_noop,
/* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = (nfsd4_enc)nfsd4_encode_getxattr, - [OP_SETXATTR] = (nfsd4_enc)nfsd4_encode_setxattr, - [OP_LISTXATTRS] = (nfsd4_enc)nfsd4_encode_listxattrs, - [OP_REMOVEXATTR] = (nfsd4_enc)nfsd4_encode_removexattr, + [OP_GETXATTR] = nfsd4_encode_getxattr, + [OP_SETXATTR] = nfsd4_encode_setxattr, + [OP_LISTXATTRS] = nfsd4_encode_listxattrs, + [OP_REMOVEXATTR] = nfsd4_encode_removexattr, };
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit ac3a2585f018f10039b4a856dcb122da88c1c1c9 ]
The filecache refcounting is a bit non-standard for something searchable by RCU, in that we maintain a sentinel reference while it's hashed. This in turn requires that we have to do things differently in the "put" depending on whether its hashed, which we believe to have led to races.
There are other problems in here too. nfsd_file_close_inode_sync can end up freeing an nfsd_file while there are still outstanding references to it, and there are a number of subtle ToC/ToU races.
Rework the code so that the refcount is what drives the lifecycle. When the refcount goes to zero, then unhash and rcu free the object. A task searching for a nfsd_file is allowed to bump its refcount, but only if it's not already 0. Ensure that we don't make any other changes to it until a reference is held.
With this change, the LRU carries a reference. Take special care to deal with it when removing an entry from the list, and ensure that we only repurpose the nf_lru list_head when the refcount is 0 to ensure exclusive access to it.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 318 +++++++++++++++++++++++--------------------- fs/nfsd/trace.h | 51 +++---- 2 files changed, 189 insertions(+), 180 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6b8873b0c2c38..140094a44cc40 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -323,8 +323,7 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) if (key->gc) __set_bit(NFSD_FILE_GC, &nf->nf_flags); nf->nf_inode = key->inode; - /* nf_ref is pre-incremented for hash table */ - refcount_set(&nf->nf_ref, 2); + refcount_set(&nf->nf_ref, 1); nf->nf_may = key->need; nf->nf_mark = NULL; } @@ -376,24 +375,35 @@ nfsd_file_unhash(struct nfsd_file *nf) return false; }
-static bool +static void nfsd_file_free(struct nfsd_file *nf) { s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); - bool flush = false;
trace_nfsd_file_free(nf);
this_cpu_inc(nfsd_file_releases); this_cpu_add(nfsd_file_total_age, age);
+ nfsd_file_unhash(nf); + + /* + * We call fsync here in order to catch writeback errors. It's not + * strictly required by the protocol, but an nfsd_file could get + * evicted from the cache before a COMMIT comes in. If another + * task were to open that file in the interim and scrape the error, + * then the client may never see it. By calling fsync here, we ensure + * that writeback happens before the entry is freed, and that any + * errors reported result in the write verifier changing. + */ + nfsd_file_fsync(nf); + if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { get_file(nf->nf_file); filp_close(nf->nf_file, NULL); fput(nf->nf_file); - flush = true; }
/* @@ -401,10 +411,9 @@ nfsd_file_free(struct nfsd_file *nf) * WARN and leak it to preserve system stability. */ if (WARN_ON_ONCE(!list_empty(&nf->nf_lru))) - return flush; + return;
call_rcu(&nf->nf_rcu, nfsd_file_slab_free); - return flush; }
static bool @@ -420,17 +429,23 @@ nfsd_file_check_writeback(struct nfsd_file *nf) mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); }
-static void nfsd_file_lru_add(struct nfsd_file *nf) +static bool nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); - if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) + if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) { trace_nfsd_file_lru_add(nf); + return true; + } + return false; }
-static void nfsd_file_lru_remove(struct nfsd_file *nf) +static bool nfsd_file_lru_remove(struct nfsd_file *nf) { - if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) + if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) { trace_nfsd_file_lru_del(nf); + return true; + } + return false; }
struct nfsd_file * @@ -441,86 +456,60 @@ nfsd_file_get(struct nfsd_file *nf) return NULL; }
-static void -nfsd_file_unhash_and_queue(struct nfsd_file *nf, struct list_head *dispose) -{ - trace_nfsd_file_unhash_and_queue(nf); - if (nfsd_file_unhash(nf)) { - /* caller must call nfsd_file_dispose_list() later */ - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, dispose); - } -} - -static void -nfsd_file_put_noref(struct nfsd_file *nf) -{ - trace_nfsd_file_put(nf); - - if (refcount_dec_and_test(&nf->nf_ref)) { - WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); - nfsd_file_lru_remove(nf); - nfsd_file_free(nf); - } -} - -static void -nfsd_file_unhash_and_put(struct nfsd_file *nf) -{ - if (nfsd_file_unhash(nf)) - nfsd_file_put_noref(nf); -} - +/** + * nfsd_file_put - put the reference to a nfsd_file + * @nf: nfsd_file of which to put the reference + * + * Put a reference to a nfsd_file. In the non-GC case, we just put the + * reference immediately. In the GC case, if the reference would be + * the last one, the put it on the LRU instead to be cleaned up later. + */ void nfsd_file_put(struct nfsd_file *nf) { might_sleep(); + trace_nfsd_file_put(nf);
- if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) - nfsd_file_lru_add(nf); - else if (refcount_read(&nf->nf_ref) == 2) - nfsd_file_unhash_and_put(nf); - - if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_fsync(nf); - nfsd_file_put_noref(nf); - } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { - nfsd_file_put_noref(nf); - nfsd_file_schedule_laundrette(); - } else - nfsd_file_put_noref(nf); -} - -static void -nfsd_file_dispose_list(struct list_head *dispose) -{ - struct nfsd_file *nf; + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) && + test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + /* + * If this is the last reference (nf_ref == 1), then try to + * transfer it to the LRU. + */ + if (refcount_dec_not_one(&nf->nf_ref)) + return; + + /* Try to add it to the LRU. If that fails, decrement. */ + if (nfsd_file_lru_add(nf)) { + /* If it's still hashed, we're done */ + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + nfsd_file_schedule_laundrette(); + return; + }
- while(!list_empty(dispose)) { - nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del_init(&nf->nf_lru); - nfsd_file_fsync(nf); - nfsd_file_put_noref(nf); + /* + * We're racing with unhashing, so try to remove it from + * the LRU. If removal fails, then someone else already + * has our reference. + */ + if (!nfsd_file_lru_remove(nf)) + return; + } } + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); }
static void -nfsd_file_dispose_list_sync(struct list_head *dispose) +nfsd_file_dispose_list(struct list_head *dispose) { - bool flush = false; struct nfsd_file *nf;
- while(!list_empty(dispose)) { + while (!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_fsync(nf); - if (!refcount_dec_and_test(&nf->nf_ref)) - continue; - if (nfsd_file_free(nf)) - flush = true; + nfsd_file_free(nf); } - if (flush) - flush_delayed_fput(); }
static void @@ -590,21 +579,8 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, struct list_head *head = arg; struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
- /* - * Do a lockless refcount check. The hashtable holds one reference, so - * we look to see if anything else has a reference, or if any have - * been put since the shrinker last ran. Those don't get unhashed and - * released. - * - * Note that in the put path, we set the flag and then decrement the - * counter. Here we check the counter and then test and clear the flag. - * That order is deliberate to ensure that we can do this locklessly. - */ - if (refcount_read(&nf->nf_ref) > 1) { - list_lru_isolate(lru, &nf->nf_lru); - trace_nfsd_file_gc_in_use(nf); - return LRU_REMOVED; - } + /* We should only be dealing with GC entries here */ + WARN_ON_ONCE(!test_bit(NFSD_FILE_GC, &nf->nf_flags));
/* * Don't throw out files that are still undergoing I/O or @@ -615,40 +591,30 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, return LRU_SKIP; }
+ /* If it was recently added to the list, skip it */ if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { trace_nfsd_file_gc_referenced(nf); return LRU_ROTATE; }
- if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - trace_nfsd_file_gc_hashed(nf); - return LRU_SKIP; + /* + * Put the reference held on behalf of the LRU. If it wasn't the last + * one, then just remove it from the LRU and ignore it. + */ + if (!refcount_dec_and_test(&nf->nf_ref)) { + trace_nfsd_file_gc_in_use(nf); + list_lru_isolate(lru, &nf->nf_lru); + return LRU_REMOVED; }
+ /* Refcount went to zero. Unhash it and queue it to the dispose list */ + nfsd_file_unhash(nf); list_lru_isolate_move(lru, &nf->nf_lru, head); this_cpu_inc(nfsd_file_evictions); trace_nfsd_file_gc_disposed(nf); return LRU_REMOVED; }
-/* - * Unhash items on @dispose immediately, then queue them on the - * disposal workqueue to finish releasing them in the background. - * - * cel: Note that between the time list_lru_shrink_walk runs and - * now, these items are in the hash table but marked unhashed. - * Why release these outside of lru_cb ? There's no lock ordering - * problem since lru_cb currently takes no lock. - */ -static void nfsd_file_gc_dispose_list(struct list_head *dispose) -{ - struct nfsd_file *nf; - - list_for_each_entry(nf, dispose, nf_lru) - nfsd_file_hash_remove(nf); - nfsd_file_dispose_list_delayed(dispose); -} - static void nfsd_file_gc(void) { @@ -658,7 +624,7 @@ nfsd_file_gc(void) ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, &dispose, list_lru_count(&nfsd_file_lru)); trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_gc_dispose_list(&dispose); + nfsd_file_dispose_list_delayed(&dispose); }
static void @@ -684,7 +650,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) ret = list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, &dispose); trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_gc_dispose_list(&dispose); + nfsd_file_dispose_list_delayed(&dispose); return ret; }
@@ -694,72 +660,111 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, };
-/* - * Find all cache items across all net namespaces that match @inode and - * move them to @dispose. The lookup is atomic wrt nfsd_file_acquire(). +/** + * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode + * @inode: inode on which to close out nfsd_files + * @dispose: list on which to gather nfsd_files to close out + * + * An nfsd_file represents a struct file being held open on behalf of nfsd. An + * open file however can block other activity (such as leases), or cause + * undesirable behavior (e.g. spurious silly-renames when reexporting NFS). + * + * This function is intended to find open nfsd_files when this sort of + * conflicting access occurs and then attempt to close those files out. + * + * Populates the dispose list with entries that have already had their + * refcounts go to zero. The actual free of an nfsd_file can be expensive, + * so we leave it up to the caller whether it wants to wait or not. */ -static unsigned int -__nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) +static void +nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, }; - unsigned int count = 0; struct nfsd_file *nf;
rcu_read_lock(); do { + int decrement = 1; + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); if (!nf) break; - nfsd_file_unhash_and_queue(nf, dispose); - count++; + + /* If we raced with someone else unhashing, ignore it */ + if (!nfsd_file_unhash(nf)) + continue; + + /* If we can't get a reference, ignore it */ + if (!nfsd_file_get(nf)) + continue; + + /* Extra decrement if we remove from the LRU */ + if (nfsd_file_lru_remove(nf)) + ++decrement; + + /* If refcount goes to 0, then put on the dispose list */ + if (refcount_sub_and_test(decrement, &nf->nf_ref)) { + list_add(&nf->nf_lru, dispose); + trace_nfsd_file_closing(nf); + } } while (1); rcu_read_unlock(); - return count; }
/** - * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file + * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * - * Unhash and put, then flush and fput all cache items associated with @inode. + * Close out any open nfsd_files that can be reaped for @inode. The + * actual freeing is deferred to the dispose_list_delayed infrastructure. + * + * This is used by the fsnotify callbacks and setlease notifier. */ -void -nfsd_file_close_inode_sync(struct inode *inode) +static void +nfsd_file_close_inode(struct inode *inode) { LIST_HEAD(dispose); - unsigned int count;
- count = __nfsd_file_close_inode(inode, &dispose); - trace_nfsd_file_close_inode_sync(inode, count); - nfsd_file_dispose_list_sync(&dispose); + nfsd_file_queue_for_close(inode, &dispose); + nfsd_file_dispose_list_delayed(&dispose); }
/** - * nfsd_file_close_inode - attempt a delayed close of a nfsd_file + * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file * @inode: inode of the file to attempt to remove * - * Unhash and put all cache item associated with @inode. + * Close out any open nfsd_files that can be reaped for @inode. The + * nfsd_files are closed out synchronously. + * + * This is called from nfsd_rename and nfsd_unlink to avoid silly-renames + * when reexporting NFS. */ -static void -nfsd_file_close_inode(struct inode *inode) +void +nfsd_file_close_inode_sync(struct inode *inode) { + struct nfsd_file *nf; LIST_HEAD(dispose); - unsigned int count;
- count = __nfsd_file_close_inode(inode, &dispose); - trace_nfsd_file_close_inode(inode, count); - nfsd_file_dispose_list_delayed(&dispose); + trace_nfsd_file_close(inode); + + nfsd_file_queue_for_close(inode, &dispose); + while (!list_empty(&dispose)) { + nf = list_first_entry(&dispose, struct nfsd_file, nf_lru); + list_del_init(&nf->nf_lru); + nfsd_file_free(nf); + } + flush_delayed_fput(); }
/** * nfsd_file_delayed_close - close unused nfsd_files * @work: dummy * - * Walk the LRU list and close any entries that have not been used since + * Walk the LRU list and destroy any entries that have not been used since * the last scan. */ static void @@ -781,7 +786,7 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
/* Only close files for F_SETLEASE leases */ if (fl->fl_flags & FL_LEASE) - nfsd_file_close_inode_sync(file_inode(fl->fl_file)); + nfsd_file_close_inode(file_inode(fl->fl_file)); return 0; }
@@ -902,6 +907,13 @@ nfsd_file_cache_init(void) goto out; }
+/** + * __nfsd_file_cache_purge: clean out the cache for shutdown + * @net: net-namespace to shut down the cache (may be NULL) + * + * Walk the nfsd_file cache and close out any that match @net. If @net is NULL, + * then close out everything. Called when an nfsd instance is being shut down. + */ static void __nfsd_file_cache_purge(struct net *net) { @@ -915,8 +927,11 @@ __nfsd_file_cache_purge(struct net *net)
nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (!net || nf->nf_net == net) - nfsd_file_unhash_and_queue(nf, &dispose); + if (!net || nf->nf_net == net) { + nfsd_file_unhash(nf); + nfsd_file_lru_remove(nf); + list_add(&nf->nf_lru, &dispose); + } nf = rhashtable_walk_next(&iter); }
@@ -1083,8 +1098,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nf) nf = nfsd_file_get(nf); rcu_read_unlock(); - if (nf) + + if (nf) { + if (nfsd_file_lru_remove(nf)) + WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref)); goto wait_for_construction; + }
nf = nfsd_file_alloc(&key, may_flags); if (!nf) { @@ -1117,11 +1136,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; } open_retry = false; - nfsd_file_put_noref(nf); + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); goto retry; }
- nfsd_file_lru_remove(nf); this_cpu_inc(nfsd_file_cache_hits);
status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); @@ -1131,7 +1150,8 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { - nfsd_file_put(nf); + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); nf = NULL; }
@@ -1157,8 +1177,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status != nfs_ok || key.inode->i_nlink == 0) - nfsd_file_unhash_and_put(nf); + if (status == nfs_ok && key.inode->i_nlink == 0) + status = nfserr_jukebox; + if (status != nfs_ok) + nfsd_file_unhash(nf); clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 235ea38da8644..8db7eecde9e39 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -786,8 +786,8 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \ - { 1 << NFSD_FILE_GC, "GC"}) + { 1 << NFSD_FILE_REFERENCED, "REFERENCED" }, \ + { 1 << NFSD_FILE_GC, "GC" })
DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), @@ -822,6 +822,7 @@ DEFINE_EVENT(nfsd_file_class, name, \ DEFINE_NFSD_FILE_EVENT(nfsd_file_free); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); +DEFINE_NFSD_FILE_EVENT(nfsd_file_closing); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_queue);
TRACE_EVENT(nfsd_file_alloc, @@ -1013,35 +1014,6 @@ TRACE_EVENT(nfsd_file_open, __entry->nf_file) )
-DECLARE_EVENT_CLASS(nfsd_file_search_class, - TP_PROTO( - const struct inode *inode, - unsigned int count - ), - TP_ARGS(inode, count), - TP_STRUCT__entry( - __field(const struct inode *, inode) - __field(unsigned int, count) - ), - TP_fast_assign( - __entry->inode = inode; - __entry->count = count; - ), - TP_printk("inode=%p count=%u", - __entry->inode, __entry->count) -); - -#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ -DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO( \ - const struct inode *inode, \ - unsigned int count \ - ), \ - TP_ARGS(inode, count)) - -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); - TRACE_EVENT(nfsd_file_is_cached, TP_PROTO( const struct inode *inode, @@ -1119,7 +1091,6 @@ DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_hashed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed);
DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, @@ -1151,6 +1122,22 @@ DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed);
+TRACE_EVENT(nfsd_file_close, + TP_PROTO( + const struct inode *inode + ), + TP_ARGS(inode), + TP_STRUCT__entry( + __field(const void *, inode) + ), + TP_fast_assign( + __entry->inode = inode; + ), + TP_printk("inode=%p", + __entry->inode + ) +); + TRACE_EVENT(nfsd_file_fsync, TP_PROTO( const struct nfsd_file *nf,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 ]
Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") added the ability to cache an open fd over a compound. There are a couple of problems with the way this currently works:
It's racy, as a newly-created nfsd_file can end up with its PENDING bit cleared while the nf is hashed, and the nf_file pointer is still zeroed out. Other tasks can find it in this state and they expect to see a valid nf_file, and can oops if nf_file is NULL.
Also, there is no guarantee that we'll end up creating a new nfsd_file if one is already in the hash. If an extant entry is in the hash with a valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with the value of op_file and the old nf_file will leak.
Fix both issues by making a new nfsd_file_acquirei_opened variant that takes an optional file pointer. If one is present when this is called, we'll take a new reference to it instead of trying to open the file. If the nfsd_file already has a valid nf_file, we'll just ignore the optional file and pass the nfsd_file back as-is.
Also rework the tracepoints a bit to allow for an "opened" variant and don't try to avoid counting acquisitions in the case where we already have a cached open file.
Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") Cc: Trond Myklebust trondmy@hammerspace.com Reported-by: Stanislav Saner ssaner@redhat.com Reported-and-Tested-by: Ruben Vestergaard rubenv@drcmr.dk Reported-and-Tested-by: Torkil Svensgaard torkil@drcmr.dk Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 40 ++++++++++++++++++---------------- fs/nfsd/filecache.h | 5 +++-- fs/nfsd/nfs4state.c | 16 ++++---------- fs/nfsd/trace.h | 52 ++++++++++++--------------------------------- 4 files changed, 42 insertions(+), 71 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 140094a44cc40..6a62d95d5ce64 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1070,8 +1070,8 @@ nfsd_file_is_cached(struct inode *inode)
static __be32 nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf, - bool open, bool want_gc) + unsigned int may_flags, struct file *file, + struct nfsd_file **pnf, bool want_gc) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_FULL, @@ -1146,8 +1146,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { - if (open) - this_cpu_inc(nfsd_file_acquisitions); + this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { if (refcount_dec_and_test(&nf->nf_ref)) @@ -1157,20 +1156,23 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
out_status: put_cred(key.cred); - if (open) - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status;
open_file: trace_nfsd_file_alloc(nf); nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); if (nf->nf_mark) { - if (open) { + if (file) { + get_file(file); + nf->nf_file = file; + status = nfs_ok; + trace_nfsd_file_opened(nf, status); + } else { status = nfsd_open_verified(rqstp, fhp, may_flags, &nf->nf_file); trace_nfsd_file_open(nf, status); - } else - status = nfs_ok; + } } else status = nfserr_jukebox; /* @@ -1206,7 +1208,7 @@ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true); }
/** @@ -1227,28 +1229,30 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false); }
/** - * nfsd_file_create - Get a struct nfsd_file, do not open + * nfsd_file_acquire_opened - Get a struct nfsd_file using existing open file * @rqstp: the RPC transaction being executed * @fhp: the NFS filehandle of the file just created * @may_flags: NFSD_MAY_ settings for the file + * @file: cached, already-open file (may be NULL) * @pnf: OUT: new or found "struct nfsd_file" object * - * The nfsd_file_object returned by this API is reference-counted - * but not garbage-collected. The object is released immediately - * one RCU grace period after the final nfsd_file_put(). + * Acquire a nfsd_file object that is not GC'ed. If one doesn't already exist, + * and @file is non-NULL, use it to instantiate a new nfsd_file instead of + * opening a new one. * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ __be32 -nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) +nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct file *file, + struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false); }
/* diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index b7efb2c3ddb18..41516a4263ea5 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -60,7 +60,8 @@ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); -__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **nfp); +__be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct file *file, + struct nfsd_file **nfp); int nfsd_file_cache_stats_show(struct seq_file *m, void *v); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4e05ad774c861..5eb3e055fde43 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5261,18 +5261,10 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, if (!fp->fi_fds[oflag]) { spin_unlock(&fp->fi_lock);
- if (!open->op_filp) { - status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); - if (status != nfs_ok) - goto out_put_access; - } else { - status = nfsd_file_create(rqstp, cur_fh, access, &nf); - if (status != nfs_ok) - goto out_put_access; - nf->nf_file = open->op_filp; - open->op_filp = NULL; - trace_nfsd_file_create(rqstp, access, nf); - } + status = nfsd_file_acquire_opened(rqstp, cur_fh, access, + open->op_filp, &nf); + if (status != nfs_ok) + goto out_put_access;
spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8db7eecde9e39..0f674982785ce 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -891,43 +891,6 @@ TRACE_EVENT(nfsd_file_acquire, ) );
-TRACE_EVENT(nfsd_file_create, - TP_PROTO( - const struct svc_rqst *rqstp, - unsigned int may_flags, - const struct nfsd_file *nf - ), - - TP_ARGS(rqstp, may_flags, nf), - - TP_STRUCT__entry( - __field(const void *, nf_inode) - __field(const void *, nf_file) - __field(unsigned long, may_flags) - __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(unsigned int, nf_ref) - __field(u32, xid) - ), - - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_file = nf->nf_file; - __entry->may_flags = may_flags; - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->xid = be32_to_cpu(rqstp->rq_xid); - ), - - TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", - __entry->xid, __entry->nf_inode, - show_nfsd_may_flags(__entry->may_flags), - __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), __entry->nf_file - ) -); - TRACE_EVENT(nfsd_file_insert_err, TP_PROTO( const struct svc_rqst *rqstp, @@ -989,8 +952,8 @@ TRACE_EVENT(nfsd_file_cons_err, ) );
-TRACE_EVENT(nfsd_file_open, - TP_PROTO(struct nfsd_file *nf, __be32 status), +DECLARE_EVENT_CLASS(nfsd_file_open_class, + TP_PROTO(const struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status), TP_STRUCT__entry( __field(void *, nf_inode) /* cannot be dereferenced */ @@ -1014,6 +977,17 @@ TRACE_EVENT(nfsd_file_open, __entry->nf_file) )
+#define DEFINE_NFSD_FILE_OPEN_EVENT(name) \ +DEFINE_EVENT(nfsd_file_open_class, name, \ + TP_PROTO( \ + const struct nfsd_file *nf, \ + __be32 status \ + ), \ + TP_ARGS(nf, status)) + +DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_open); +DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_opened); + TRACE_EVENT(nfsd_file_is_cached, TP_PROTO( const struct inode *inode,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 7827c81f0248e3c2f40d438b020f3d222f002171 ]
The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next.
Found via KCSAN.
Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 7 +++---- fs/nfsd/nfs4xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 4 ++-- net/sunrpc/svc.c | 6 +++--- net/sunrpc/svc_xprt.c | 2 +- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 7 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cf558d951ec68..a89f98fa3a9d0 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -937,7 +937,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * the client wants us to do more in this compound: */ if (!nfsd4_last_compound_op(rqstp)) - __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
/* check stateid */ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, @@ -2609,12 +2609,11 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) cstate->minorversion = args->minorversion; fh_init(current_fh, NFS4_FHSIZE); fh_init(save_fh, NFS4_FHSIZE); - /* * Don't use the deferral mechanism for NFSv4; compounds make it * too hard to avoid non-idempotency problems. */ - __clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
/* * According to RFC3010, this takes precedence over all other errors. @@ -2736,7 +2735,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) out: cstate->status = status; /* Reset deferral mechanism for RPC deferrals */ - __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); return rpc_success; }
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8bbf85da1de05..d0b9fbf189ac6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2523,7 +2523,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;
if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) - __clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags);
return true; } diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index abaa952ae7518..329eac782cc5e 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -898,7 +898,7 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct g * rejecting the server-computed MIC in this somewhat rare case, * do not use splice with the GSS integrity service. */ - __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
/* Did we already verify the signature on the original pass through? */ if (rqstp->rq_deferred) @@ -970,7 +970,7 @@ unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gs int pad, remaining_len, offset; u32 rseqno;
- __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
priv_len = svc_getnl(&buf->head[0]); if (rqstp->rq_deferred) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 8c29181796492..26d972c54a593 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1276,10 +1276,10 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto err_short_len;
/* Will be turned off by GSS integrity and privacy services */ - __set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Will be turned off only when NFSv4 Sessions are used */ - __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); - __clear_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + clear_bit(RQ_DROPME, &rqstp->rq_flags);
svc_putu32(resv, rqstp->rq_xid);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index ea65036dc5068..000b737784bd9 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1228,7 +1228,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) trace_svc_defer(rqstp); svc_xprt_get(rqstp->rq_xprt); dr->xprt = rqstp->rq_xprt; - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags);
dr->handle.revisit = svc_revisit; return &dr->handle; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index ff8dcebbdfc4e..90f6231d6ed67 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -309,9 +309,9 @@ static void svc_sock_setbufsize(struct svc_sock *svsk, unsigned int nreqs) static void svc_sock_secure_port(struct svc_rqst *rqstp) { if (svc_port_is_privileged(svc_addr(rqstp))) - __set_bit(RQ_SECURE, &rqstp->rq_flags); + set_bit(RQ_SECURE, &rqstp->rq_flags); else - __clear_bit(RQ_SECURE, &rqstp->rq_flags); + clear_bit(RQ_SECURE, &rqstp->rq_flags); }
/* @@ -1014,9 +1014,9 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) rqstp->rq_xprt_ctxt = NULL; rqstp->rq_prot = IPPROTO_TCP; if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags)) - __set_bit(RQ_LOCAL, &rqstp->rq_flags); + set_bit(RQ_LOCAL, &rqstp->rq_flags); else - __clear_bit(RQ_LOCAL, &rqstp->rq_flags); + clear_bit(RQ_LOCAL, &rqstp->rq_flags);
p = (__be32 *)rqstp->rq_arg.head[0].iov_base; calldir = p[1]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index e72e36989985f..c895f80df659c 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -606,7 +606,7 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt)
static void svc_rdma_secure_port(struct svc_rqst *rqstp) { - __set_bit(RQ_SECURE, &rqstp->rq_flags); + set_bit(RQ_SECURE, &rqstp->rq_flags); }
static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 5304930dbae82d259bcf7e5611db7c81e7a42eff ]
The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next.
Fixes: 9315564747cb ("NFSD: Use only RQ_DROPME to signal the need to drop a reply") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsproc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index c32a871f0e275..96426dea7d412 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -211,7 +211,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; }
@@ -246,7 +246,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xingyuan Mo hdthky0@gmail.com
[ Upstream commit e6cf91b7b47ff82b624bdfe2fdcde32bb52e71dd ]
If signal_pending() returns true, schedule_timeout() will not be executed, causing the waiting task to remain in the wait queue. Fixed by adding a call to finish_wait(), which ensures that the waiting task will always be removed from the wait queue.
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Xingyuan Mo hdthky0@gmail.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a89f98fa3a9d0..4ab063a2ac84e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1320,6 +1320,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, /* allow 20secs for mount/unmount for now - revisit */ if (signal_pending(current) || (schedule_timeout(20*HZ) == 0)) { + finish_wait(&nn->nfsd_ssc_waitq, &wait); kfree(work); return nfserr_eagain; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit f385f7d244134246f984975ed34cd75f77de479f ]
Currently the nfsd-client shrinker is registered and unregistered at the time the nfsd module is loaded and unloaded. The problem with this is the shrinker is being registered before all of the relevant fields in nfsd_net are initialized when nfsd is started. This can lead to an oops when memory is low and the shrinker is called while nfsd is not running.
This patch moves the register/unregister of nfsd-client shrinker from module load/unload time to nfsd startup/shutdown time.
Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith efault@gmx.de Signed-off-by: Dai Ngo dai.ngo@oracle.com [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 22 +++++++++++----------- fs/nfsd/nfsctl.c | 7 +------ fs/nfsd/nfsd.h | 6 ++---- 3 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5eb3e055fde43..f2647cbc108e3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4420,7 +4420,7 @@ nfsd4_state_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc) return SHRINK_STOP; }
-int +void nfsd4_init_leases_net(struct nfsd_net *nn) { struct sysinfo si; @@ -4442,16 +4442,6 @@ nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB);
atomic_set(&nn->nfsd_courtesy_clients, 0); - nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; - nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; - nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; - return register_shrinker(&nn->nfsd_client_shrinker); -} - -void -nfsd4_leases_net_shutdown(struct nfsd_net *nn) -{ - unregister_shrinker(&nn->nfsd_client_shrinker); }
static void init_nfs4_replay(struct nfs4_replay *rp) @@ -8079,8 +8069,17 @@ static int nfs4_state_create_net(struct net *net) INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net);
+ nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; + nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; + + if (register_shrinker(&nn->nfsd_client_shrinker)) + goto err_shrinker; return 0;
+err_shrinker: + put_net(net); + kfree(nn->sessionid_hashtbl); err_sessionid: kfree(nn->unconf_id_hashtbl); err_unconf_id: @@ -8173,6 +8172,7 @@ nfs4_state_shutdown_net(struct net *net) struct list_head *pos, *next, reaplist; struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ unregister_shrinker(&nn->nfsd_client_shrinker); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index d1e581a60480c..c2577ee7ffb22 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1457,9 +1457,7 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; - retval = nfsd4_init_leases_net(nn); - if (retval) - goto out_drc_error; + nfsd4_init_leases_net(nn); retval = nfsd_reply_cache_init(nn); if (retval) goto out_cache_error; @@ -1469,8 +1467,6 @@ static __net_init int nfsd_init_net(struct net *net) return 0;
out_cache_error: - nfsd4_leases_net_shutdown(nn); -out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error: nfsd_export_shutdown(net); @@ -1486,7 +1482,6 @@ static __net_exit void nfsd_exit_net(struct net *net) nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); - nfsd4_leases_net_shutdown(nn); }
static struct pernet_operations nfsd_net_ops = { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 93b42ef9ed91b..fa0144a742678 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -504,8 +504,7 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif
-extern int nfsd4_init_leases_net(struct nfsd_net *nn); -extern void nfsd4_leases_net_shutdown(struct nfsd_net *nn); +extern void nfsd4_init_leases_net(struct nfsd_net *nn);
#else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) @@ -513,8 +512,7 @@ static inline int nfsd4_is_junction(struct dentry *dentry) return 0; }
-static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; -static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) {}; +static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { };
#define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 7c24fa225081f31bc6da6a355c1ba801889ab29a ]
Since nfsd4_state_shrinker_count always calls mod_delayed_work with 0 delay, we can replace delayed_work with work_struct to save some space and overhead.
Also add the call to cancel_work after unregister the shrinker in nfs4_state_shutdown_net.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/netns.h | 2 +- fs/nfsd/nfs4state.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 8c854ba3285bb..51a4b7885cae2 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -195,7 +195,7 @@ struct nfsd_net {
atomic_t nfsd_courtesy_clients; struct shrinker nfsd_client_shrinker; - struct delayed_work nfsd_shrinker_work; + struct work_struct nfsd_shrinker_work; };
/* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f2647cbc108e3..3dd64caf06158 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4410,7 +4410,7 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) if (!count) count = atomic_long_read(&num_delegations); if (count) - mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + queue_work(laundry_wq, &nn->nfsd_shrinker_work); return (unsigned long)count; }
@@ -6223,8 +6223,7 @@ deleg_reaper(struct nfsd_net *nn) static void nfsd4_state_shrinker_worker(struct work_struct *work) { - struct delayed_work *dwork = to_delayed_work(work); - struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + struct nfsd_net *nn = container_of(work, struct nfsd_net, nfsd_shrinker_work);
courtesy_client_reaper(nn); @@ -8066,7 +8065,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru);
INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); - INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); + INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net);
nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; @@ -8173,6 +8172,7 @@ nfs4_state_shutdown_net(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id);
unregister_shrinker(&nn->nfsd_client_shrinker); + cancel_work(&nn->nfsd_shrinker_work); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 4bdbba54e9b1c769da8ded9abd209d765715e1d6 ]
nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed.
Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access.
Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard rubenv@drcmr.dk Reported-by: Torkil Svensgaard torkil@drcmr.dk Reported-by: Shachar Kagan skagan@nvidia.com Signed-off-by: Jeff Layton jlayton@kernel.org Tested-by: Shachar Kagan skagan@nvidia.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 61 ++++++++++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6a62d95d5ce64..68c7c82f8b3bb 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -660,6 +660,39 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, };
+/** + * nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file + * @nf: nfsd_file to attempt to queue + * @dispose: private list to queue successfully-put objects + * + * Unhash an nfsd_file, try to get a reference to it, and then put that + * reference. If it's the last reference, queue it to the dispose list. + */ +static void +nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) + __must_hold(RCU) +{ + int decrement = 1; + + /* If we raced with someone else unhashing, ignore it */ + if (!nfsd_file_unhash(nf)) + return; + + /* If we can't get a reference, ignore it */ + if (!nfsd_file_get(nf)) + return; + + /* Extra decrement if we remove from the LRU */ + if (nfsd_file_lru_remove(nf)) + ++decrement; + + /* If refcount goes to 0, then put on the dispose list */ + if (refcount_sub_and_test(decrement, &nf->nf_ref)) { + list_add(&nf->nf_lru, dispose); + trace_nfsd_file_closing(nf); + } +} + /** * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode * @inode: inode on which to close out nfsd_files @@ -687,30 +720,11 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
rcu_read_lock(); do { - int decrement = 1; - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); if (!nf) break; - - /* If we raced with someone else unhashing, ignore it */ - if (!nfsd_file_unhash(nf)) - continue; - - /* If we can't get a reference, ignore it */ - if (!nfsd_file_get(nf)) - continue; - - /* Extra decrement if we remove from the LRU */ - if (nfsd_file_lru_remove(nf)) - ++decrement; - - /* If refcount goes to 0, then put on the dispose list */ - if (refcount_sub_and_test(decrement, &nf->nf_ref)) { - list_add(&nf->nf_lru, dispose); - trace_nfsd_file_closing(nf); - } + nfsd_file_cond_queue(nf, dispose); } while (1); rcu_read_unlock(); } @@ -927,11 +941,8 @@ __nfsd_file_cache_purge(struct net *net)
nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (!net || nf->nf_net == net) { - nfsd_file_unhash(nf); - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, &dispose); - } + if (!net || nf->nf_net == net) + nfsd_file_cond_queue(nf, &dispose); nf = rhashtable_walk_next(&iter); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 4102db175b5d884d133270fdbd0e59111ce688fc ]
The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net.
Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin jiyin@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3dd64caf06158..6c11f2701af88 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -8192,7 +8192,6 @@ nfs4_state_shutdown_net(struct net *net)
nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); - rhltable_destroy(&nfs4_file_rhltable); #ifdef CONFIG_NFSD_V4_2_INTER_SSC nfsd4_ssc_shutdown_umount(nn); #endif @@ -8202,6 +8201,7 @@ void nfs4_state_shutdown(void) { nfsd4_destroy_callback_queue(); + rhltable_destroy(&nfs4_file_rhltable); }
static void
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit df24ac7a2e3a9d0bc68f1756a880e50bfe4b4522 ]
Currently nfsd4_setup_inter_ssc returns the vfsmount of the source server's export when the mount completes. After the copy is done nfsd4_cleanup_inter_ssc is called with the vfsmount of the source server and it searches nfsd_ssc_mount_list for a matching entry to do the clean up.
The problems with this approach are (1) the need to search the nfsd_ssc_mount_list and (2) the code has to handle the case where the matching entry is not found which looks ugly.
The enhancement is instead of nfsd4_setup_inter_ssc returning the vfsmount, it returns the nfsd4_ssc_umount_item which has the vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called it's passed with the nfsd4_ssc_umount_item directly to do the clean up so no searching is needed and there is no need to handle the 'not found' case.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com [ cel: adjusted whitespace and variable/function names ] Reviewed-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 111 ++++++++++++++++------------------------ fs/nfsd/xdr4.h | 2 +- include/linux/nfs_ssc.h | 2 +- 3 files changed, 46 insertions(+), 69 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 4ab063a2ac84e..5e175133b7bc7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1295,15 +1295,15 @@ extern void nfs_sb_deactive(struct super_block *sb); * setup a work entry in the ssc delayed unmount list. */ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, - struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) + struct nfsd4_ssc_umount_item **nsui) { struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *work = NULL; struct nfsd4_ssc_umount_item *tmp; DEFINE_WAIT(wait); + __be32 status = 0;
- *ss_mnt = NULL; - *retwork = NULL; + *nsui = NULL; work = kzalloc(sizeof(*work), GFP_KERNEL); try_again: spin_lock(&nn->nfsd_ssc_lock); @@ -1327,12 +1327,12 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, finish_wait(&nn->nfsd_ssc_waitq, &wait); goto try_again; } - *ss_mnt = ni->nsui_vfsmount; + *nsui = ni; refcount_inc(&ni->nsui_refcnt); spin_unlock(&nn->nfsd_ssc_lock); kfree(work);
- /* return vfsmount in ss_mnt */ + /* return vfsmount in (*nsui)->nsui_vfsmount */ return 0; } if (work) { @@ -1340,31 +1340,32 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); - *retwork = work; - } + *nsui = work; + } else + status = nfserr_resource; spin_unlock(&nn->nfsd_ssc_lock); - return 0; + return status; }
-static void nfsd4_ssc_update_dul_work(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *work, struct vfsmount *ss_mnt) +static void nfsd4_ssc_update_dul(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *nsui, + struct vfsmount *ss_mnt) { - /* set nsui_vfsmount, clear busy flag and wakeup waiters */ spin_lock(&nn->nfsd_ssc_lock); - work->nsui_vfsmount = ss_mnt; - work->nsui_busy = false; + nsui->nsui_vfsmount = ss_mnt; + nsui->nsui_busy = false; wake_up_all(&nn->nfsd_ssc_waitq); spin_unlock(&nn->nfsd_ssc_lock); }
-static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *work) +static void nfsd4_ssc_cancel_dul(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *nsui) { spin_lock(&nn->nfsd_ssc_lock); - list_del(&work->nsui_list); + list_del(&nsui->nsui_list); wake_up_all(&nn->nfsd_ssc_waitq); spin_unlock(&nn->nfsd_ssc_lock); - kfree(work); + kfree(nsui); }
/* @@ -1372,7 +1373,7 @@ static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, */ static __be32 nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, - struct vfsmount **mount) + struct nfsd4_ssc_umount_item **nsui) { struct file_system_type *type; struct vfsmount *ss_mnt; @@ -1383,7 +1384,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, char *ipaddr, *dev_name, *raw_data; int len, raw_len; __be32 status = nfserr_inval; - struct nfsd4_ssc_umount_item *work = NULL; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
naddr = &nss->u.nl4_addr; @@ -1391,6 +1391,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, naddr->addr_len, (struct sockaddr *)&tmp_addr, sizeof(tmp_addr)); + *nsui = NULL; if (tmp_addrlen == 0) goto out_err;
@@ -1433,10 +1434,10 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, goto out_free_rawdata; snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep);
- status = nfsd4_ssc_setup_dul(nn, ipaddr, &work, &ss_mnt); + status = nfsd4_ssc_setup_dul(nn, ipaddr, nsui); if (status) goto out_free_devname; - if (ss_mnt) + if ((*nsui)->nsui_vfsmount) goto out_done;
/* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */ @@ -1444,15 +1445,12 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, module_put(type->owner); if (IS_ERR(ss_mnt)) { status = nfserr_nodev; - if (work) - nfsd4_ssc_cancel_dul_work(nn, work); + nfsd4_ssc_cancel_dul(nn, *nsui); goto out_free_devname; } - if (work) - nfsd4_ssc_update_dul_work(nn, work, ss_mnt); + nfsd4_ssc_update_dul(nn, *nsui, ss_mnt); out_done: status = 0; - *mount = ss_mnt;
out_free_devname: kfree(dev_name); @@ -1476,7 +1474,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy, struct vfsmount **mount) + struct nfsd4_copy *copy) { struct svc_fh *s_fh = NULL; stateid_t *s_stid = ©->cp_src_stateid; @@ -1489,7 +1487,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out;
- status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); + status = nfsd4_interssc_connect(copy->cp_src, rqstp, ©->ss_nsui); if (status) goto out;
@@ -1507,45 +1505,27 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, }
static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, +nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, struct nfsd_file *dst) { - bool found = false; - long timeout; - struct nfsd4_ssc_umount_item *tmp; - struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); + long timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout);
nfs42_ssc_close(filp); nfsd_file_put(dst); fput(filp);
- if (!nn) { - mntput(ss_mnt); - return; - } spin_lock(&nn->nfsd_ssc_lock); - timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); - list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { - if (ni->nsui_vfsmount->mnt_sb == ss_mnt->mnt_sb) { - list_del(&ni->nsui_list); - /* - * vfsmount can be shared by multiple exports, - * decrement refcnt. If the count drops to 1 it - * will be unmounted when nsui_expire expires. - */ - refcount_dec(&ni->nsui_refcnt); - ni->nsui_expire = jiffies + timeout; - list_add_tail(&ni->nsui_list, &nn->nfsd_ssc_mount_list); - found = true; - break; - } - } + list_del(&nsui->nsui_list); + /* + * vfsmount can be shared by multiple exports, + * decrement refcnt. If the count drops to 1 it + * will be unmounted when nsui_expire expires. + */ + refcount_dec(&nsui->nsui_refcnt); + nsui->nsui_expire = jiffies + timeout; + list_add_tail(&nsui->nsui_list, &nn->nfsd_ssc_mount_list); spin_unlock(&nn->nfsd_ssc_lock); - if (!found) { - mntput(ss_mnt); - return; - } }
#else /* CONFIG_NFSD_V4_2_INTER_SSC */ @@ -1553,15 +1533,13 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy, - struct vfsmount **mount) + struct nfsd4_copy *copy) { - *mount = NULL; return nfserr_inval; }
static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, +nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, struct nfsd_file *dst) { } @@ -1702,7 +1680,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) memcpy(dst->cp_src, src->cp_src, sizeof(struct nl4_server)); memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid)); memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh)); - dst->ss_mnt = src->ss_mnt; + dst->ss_nsui = src->ss_nsui; }
static void cleanup_async_copy(struct nfsd4_copy *copy) @@ -1751,8 +1729,8 @@ static int nfsd4_do_async_copy(void *data) if (nfsd4_ssc_is_inter(copy)) { struct file *filp;
- filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, - ©->stateid); + filp = nfs42_ssc_open(copy->ss_nsui->nsui_vfsmount, + ©->c_fh, ©->stateid); if (IS_ERR(filp)) { switch (PTR_ERR(filp)) { case -EBADF: @@ -1766,7 +1744,7 @@ static int nfsd4_do_async_copy(void *data) } nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, false); - nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); + nfsd4_cleanup_inter_ssc(copy->ss_nsui, filp, copy->nf_dst); } else { nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); @@ -1792,8 +1770,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfserr_notsupp; goto out; } - status = nfsd4_setup_inter_ssc(rqstp, cstate, copy, - ©->ss_mnt); + status = nfsd4_setup_inter_ssc(rqstp, cstate, copy); if (status) return nfserr_offload_denied; } else { diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 24934cf90a84f..a034b9b62137c 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -571,7 +571,7 @@ struct nfsd4_copy { struct task_struct *copy_task; refcount_t refcount;
- struct vfsmount *ss_mnt; + struct nfsd4_ssc_umount_item *ss_nsui; struct nfs_fh c_fh; nfs4_stateid stateid; }; diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 75843c00f326a..22265b1ff0800 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -53,6 +53,7 @@ static inline void nfs42_ssc_close(struct file *filep) if (nfs_ssc_client_tbl.ssc_nfs4_ops) (*nfs_ssc_client_tbl.ssc_nfs4_ops->sco_close)(filep); } +#endif
struct nfsd4_ssc_umount_item { struct list_head nsui_list; @@ -66,7 +67,6 @@ struct nfsd4_ssc_umount_item { struct vfsmount *nsui_vfsmount; char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; }; -#endif
/* * NFS_FS
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 70f62231cdfd52357836733dd31db787e0412ab2 ]
...and remove some now-useless NULL pointer checks in its callers.
Suggested-by: NeilBrown neilb@suse.de Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 ++--- fs/nfsd/nfs4state.c | 4 +--- 2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 68c7c82f8b3bb..206742bbbd682 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -451,7 +451,7 @@ static bool nfsd_file_lru_remove(struct nfsd_file *nf) struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { - if (likely(refcount_inc_not_zero(&nf->nf_ref))) + if (nf && refcount_inc_not_zero(&nf->nf_ref)) return nf; return NULL; } @@ -1106,8 +1106,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, rcu_read_lock(); nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); - if (nf) - nf = nfsd_file_get(nf); + nf = nfsd_file_get(nf); rcu_read_unlock();
if (nf) { diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6c11f2701af88..2733eb33d5df2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -602,9 +602,7 @@ put_nfs4_file(struct nfs4_file *fi) static struct nfsd_file * __nfs4_get_fd(struct nfs4_file *f, int oflag) { - if (f->fi_fds[oflag]) - return nfsd_file_get(f->fi_fds[oflag]); - return NULL; + return nfsd_file_get(f->fi_fds[oflag]); }
static struct nfsd_file *
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ]
There are two different flavors of the nfsd4_copy struct. One is embedded in the compound and is used directly in synchronous copies. The other is dynamically allocated, refcounted and tracked in the client struture. For the embedded one, the cleanup just involves releasing any nfsd_files held on its behalf. For the async one, the cleanup is a bit more involved, and we need to dequeue it from lists, unhash it, etc.
There is at least one potential refcount leak in this code now. If the kthread_create call fails, then both the src and dst nfsd_files in the original nfsd4_copy object are leaked.
The cleanup in this codepath is also sort of weird. In the async copy case, we'll have up to four nfsd_file references (src and dst for both flavors of copy structure). They are both put at the end of nfsd4_do_async_copy, even though the ones held on behalf of the embedded one outlive that structure.
Change it so that we always clean up the nfsd_file refs held by the embedded copy structure before nfsd4_copy returns. Rework cleanup_async_copy to handle both inter and intra copies. Eliminate nfsd4_cleanup_intra_ssc since it now becomes a no-op.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5e175133b7bc7..0fe00d6d385a1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1512,7 +1512,6 @@ nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, long timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout);
nfs42_ssc_close(filp); - nfsd_file_put(dst); fput(filp);
spin_lock(&nn->nfsd_ssc_lock); @@ -1562,13 +1561,6 @@ nfsd4_setup_intra_ssc(struct svc_rqst *rqstp, ©->nf_dst); }
-static void -nfsd4_cleanup_intra_ssc(struct nfsd_file *src, struct nfsd_file *dst) -{ - nfsd_file_put(src); - nfsd_file_put(dst); -} - static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) { struct nfsd4_cb_offload *cbo = @@ -1683,12 +1675,18 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->ss_nsui = src->ss_nsui; }
+static void release_copy_files(struct nfsd4_copy *copy) +{ + if (copy->nf_src) + nfsd_file_put(copy->nf_src); + if (copy->nf_dst) + nfsd_file_put(copy->nf_dst); +} + static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); - nfsd_file_put(copy->nf_dst); - if (!nfsd4_ssc_is_inter(copy)) - nfsd_file_put(copy->nf_src); + release_copy_files(copy); spin_lock(©->cp_clp->async_lock); list_del(©->copies); spin_unlock(©->cp_clp->async_lock); @@ -1748,7 +1746,6 @@ static int nfsd4_do_async_copy(void *data) } else { nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); }
do_callback: @@ -1811,9 +1808,9 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } else { status = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, true); - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out: + release_copy_files(copy); return status; out_err: if (async_copy)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 34e8f9ec4c9ac235f917747b23a200a5e0ec857b ]
The reference count of nfsd4_ssc_umount_item is not decremented on error conditions. This prevents the laundromat from unmounting the vfsmount of the source file.
This patch decrements the reference count of nfsd4_ssc_umount_item on error.
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Dai Ngo dai.ngo@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0fe00d6d385a1..fe20fd0933355 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1813,13 +1813,17 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, release_copy_files(copy); return status; out_err: + if (nfsd4_ssc_is_inter(copy)) { + /* + * Source's vfsmount of inter-copy will be unmounted + * by the laundromat. Use copy instead of async_copy + * since async_copy->ss_nsui might not be set yet. + */ + refcount_dec(©->ss_nsui->nsui_refcnt); + } if (async_copy) cleanup_async_copy(async_copy); status = nfserrno(-ENOMEM); - /* - * source's vfsmount of inter-copy will be unmounted - * by the laundromat - */ goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ]
We had a bug report that xfstest generic/355 was failing on NFSv4.0. This test sets various combinations of setuid/setgid modes and tests whether DIO writes will cause them to be stripped.
What I found was that the server did properly strip those bits, but the client didn't notice because it held a delegation that was not recalled. The recall didn't occur because the client itself was the one generating the activity and we avoid recalls in that case.
Clearing setuid bits is an "implicit" activity. The client didn't specifically request that we do that, so we need the server to issue a CB_RECALL, or avoid the situation entirely by not issuing a delegation.
The easiest fix here is to simply not give out a delegation if the file is being opened for write, and the mode has the setuid and/or setgid bit set. Note that there is a potential race between the mode and lease being set, so we test for this condition both before and after setting the lease.
This patch fixes generic/355, generic/683 and generic/684 for me. (Note that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and fail).
Reported-by: Boyang Xue bxue@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2733eb33d5df2..f57137ef306bd 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5435,6 +5435,23 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, return 0; }
+/* + * We avoid breaking delegations held by a client due to its own activity, but + * clearing setuid/setgid bits on a write is an implicit activity and the client + * may not notice and continue using the old mode. Avoid giving out a delegation + * on setuid/setgid files when the client is requesting an open for write. + */ +static int +nfsd4_verify_setuid_write(struct nfsd4_open *open, struct nfsd_file *nf) +{ + struct inode *inode = file_inode(nf->nf_file); + + if ((open->op_share_access & NFS4_SHARE_ACCESS_WRITE) && + (inode->i_mode & (S_ISUID|S_ISGID))) + return -EAGAIN; + return 0; +} + static struct nfs4_delegation * nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, struct svc_fh *parent) @@ -5468,6 +5485,8 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, spin_lock(&fp->fi_lock); if (nfs4_delegation_exists(clp, fp)) status = -EAGAIN; + else if (nfsd4_verify_setuid_write(open, nf)) + status = -EAGAIN; else if (!fp->fi_deleg_file) { fp->fi_deleg_file = nf; /* increment early to prevent fi_deleg_file from being @@ -5508,6 +5527,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (status) goto out_unlock;
+ /* + * Now that the deleg is set, check again to ensure that nothing + * raced in and changed the mode while we weren't lookng. + */ + status = nfsd4_verify_setuid_write(open, fp->fi_deleg_file); + if (status) + goto out_unlock; + spin_lock(&state_lock); spin_lock(&fp->fi_lock); if (fp->fi_had_conflict)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ]
When nfsd4_copy fails to allocate memory for async_copy->cp_src, or nfs4_init_copy_state fails, it calls cleanup_async_copy to do the cleanup for the async_copy which causes page fault since async_copy is not yet initialized.
This patche rearranges the order of initializing the fields in async_copy and adds checks in cleanup_async_copy to skip un-initialized fields.
Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy") Signed-off-by: Dai Ngo dai.ngo@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 12 ++++++++---- fs/nfsd/nfs4state.c | 5 +++-- 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index fe20fd0933355..9c51c10bcf080 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1687,9 +1687,12 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); release_copy_files(copy); - spin_lock(©->cp_clp->async_lock); - list_del(©->copies); - spin_unlock(©->cp_clp->async_lock); + if (copy->cp_clp) { + spin_lock(©->cp_clp->async_lock); + if (!list_empty(©->copies)) + list_del_init(©->copies); + spin_unlock(©->cp_clp->async_lock); + } nfs4_put_copy(copy); }
@@ -1786,12 +1789,13 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; + INIT_LIST_HEAD(&async_copy->copies); + refcount_set(&async_copy->refcount, 1); async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); if (!async_copy->cp_src) goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; - refcount_set(&async_copy->refcount, 1); memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, sizeof(copy->cp_res.cb_stateid)); dup_copy_fields(copy, async_copy); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f57137ef306bd..507dd2b11856b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -990,7 +990,6 @@ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid,
stid->cs_stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; stid->cs_stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; - stid->cs_type = cs_type;
idr_preload(GFP_KERNEL); spin_lock(&nn->s2s_cp_lock); @@ -1001,6 +1000,7 @@ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, idr_preload_end(); if (new_id < 0) return 0; + stid->cs_type = cs_type; return 1; }
@@ -1034,7 +1034,8 @@ void nfs4_free_copy_state(struct nfsd4_copy *copy) { struct nfsd_net *nn;
- WARN_ON_ONCE(copy->cp_stateid.cs_type != NFS4_COPY_STID); + if (copy->cp_stateid.cs_type != NFS4_COPY_STID) + return; nn = net_generic(copy->cp_clp->net, nfsd_net_id); spin_lock(&nn->s2s_cp_lock); idr_remove(&nn->s2s_cp_stateids,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ]
The nested if statements here make no sense, as you can never reach "else" branch in the nested statement. Fix the error handling for when there is a courtesy client that holds a conflicting deny mode.
Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server") Reported-by: 張智諺 cc85nod@gmail.com Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 507dd2b11856b..e8e642e5ec8f1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5296,16 +5296,17 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, /* test and set deny mode */ spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); - if (status == nfs_ok) { - if (status != nfserr_share_denied) { - set_deny(open->op_share_deny, stp); - fp->fi_share_deny |= - (open->op_share_deny & NFS4_SHARE_DENY_BOTH); - } else { - if (nfs4_resolve_deny_conflicts_locked(fp, false, - stp, open->op_share_deny, false)) - status = nfserr_jukebox; - } + switch (status) { + case nfs_ok: + set_deny(open->op_share_deny, stp); + fp->fi_share_deny |= + (open->op_share_deny & NFS4_SHARE_DENY_BOTH); + break; + case nfserr_share_denied: + if (nfs4_resolve_deny_conflicts_locked(fp, false, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; + break; } spin_unlock(&fp->fi_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 4c475eee02375ade6e864f1db16976ba0d96a0a2 ]
Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block.
We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT).
I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed.
If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error).
The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri plambri@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 44 ++++++++++++-------------------------------- fs/nfsd/trace.h | 31 ------------------------------- 2 files changed, 12 insertions(+), 63 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 206742bbbd682..4a3796c6bd957 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -330,37 +330,27 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) return nf; }
+/** + * nfsd_file_check_write_error - check for writeback errors on a file + * @nf: nfsd_file to check for writeback errors + * + * Check whether a nfsd_file has an unseen error. Reset the write + * verifier if so. + */ static void -nfsd_file_fsync(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - int ret; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return; - ret = vfs_fsync(file, 1); - trace_nfsd_file_fsync(nf, ret); - if (ret) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - -static int nfsd_file_check_write_error(struct nfsd_file *nf) { struct file *file = nf->nf_file;
- if (!file || !(file->f_mode & FMODE_WRITE)) - return 0; - return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); + if ((file->f_mode & FMODE_WRITE) && + filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err))) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); }
static void nfsd_file_hash_remove(struct nfsd_file *nf) { trace_nfsd_file_unhash(nf); - - if (nfsd_file_check_write_error(nf)) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, nfsd_file_rhash_params); } @@ -386,23 +376,12 @@ nfsd_file_free(struct nfsd_file *nf) this_cpu_add(nfsd_file_total_age, age);
nfsd_file_unhash(nf); - - /* - * We call fsync here in order to catch writeback errors. It's not - * strictly required by the protocol, but an nfsd_file could get - * evicted from the cache before a COMMIT comes in. If another - * task were to open that file in the interim and scrape the error, - * then the client may never see it. By calling fsync here, we ensure - * that writeback happens before the entry is freed, and that any - * errors reported result in the write verifier changing. - */ - nfsd_file_fsync(nf); - if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { get_file(nf->nf_file); filp_close(nf->nf_file, NULL); + nfsd_file_check_write_error(nf); fput(nf->nf_file); }
@@ -1157,6 +1136,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, out: if (status == nfs_ok) { this_cpu_inc(nfsd_file_acquisitions); + nfsd_file_check_write_error(nf); *pnf = nf; } else { if (refcount_dec_and_test(&nf->nf_ref)) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0f674982785ce..445d00f00eab7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1112,37 +1112,6 @@ TRACE_EVENT(nfsd_file_close, ) );
-TRACE_EVENT(nfsd_file_fsync, - TP_PROTO( - const struct nfsd_file *nf, - int ret - ), - TP_ARGS(nf, ret), - TP_STRUCT__entry( - __field(void *, nf_inode) - __field(int, nf_ref) - __field(int, ret) - __field(unsigned long, nf_flags) - __field(unsigned char, nf_may) - __field(struct file *, nf_file) - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->ret = ret; - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_file = nf->nf_file; - ), - TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p ret=%d", - __entry->nf_inode, - __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->ret - ) -); - #include "cache.h"
TRACE_DEFINE_ENUM(RC_DROPIT);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 90d2175572470ba7f55da8447c72ddd4942923c4 ]
Currently, we're only memcpy'ing the first __be32. Ensure we copy into both words.
Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field") Reported-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 325d3d3f12110..a0ecec54d3d7d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -363,7 +363,7 @@ void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
do { read_seqbegin_or_lock(&nn->writeverf_lock, &seq); - memcpy(verf, nn->writeverf, sizeof(*verf)); + memcpy(verf, nn->writeverf, sizeof(nn->writeverf)); } while (need_seqretry(&nn->writeverf_lock, seq)); done_seqretry(&nn->writeverf_lock, seq); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit fd9a2e1d513823e840960cb3bc26d8b7749d4ac2 ]
Flole observes this WARNING on occasion:
[1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0
Reported-by: flole@flole.de Suggested-by: Jan Kara jack@suse.cz Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123 Fixes: 73da852e3831 ("nfsd: use vfs_iter_read/write") Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3d67dd7eab4b5..ddf424d76d410 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1117,7 +1117,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, since = READ_ONCE(file->f_wb_err); if (verf) nfsd_copy_write_verifier(verf, nn); + file_start_write(file); host_err = vfs_iter_write(file, &iter, &pos, flags); + file_end_write(file); if (host_err < 0) { nfsd_reset_write_verifier(nn); trace_nfsd_writeverf_reset(nn, rqstp, host_err);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 7ff84910c66c9144cc0de9d9deed9fb84c03aff0 ]
Commit 6930bcbfb6ce dropped the setting of the file_lock range when decoding a nlm_lock off the wire. This causes the client side grant callback to miss matching blocks and reject the lock, only to rerequest it 30s later.
Add a helper function to set the file_lock range from the start and end values that the protocol uses, and have the nlm_lock decoder call that to set up the file_lock args properly.
Fixes: 6930bcbfb6ce ("lockd: detect and reject lock arguments that overflow") Reported-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jeff Layton jlayton@kernel.org Tested-by: Amir Goldstein amir73il@gmail.com Cc: stable@vger.kernel.org #6.0 Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/clnt4xdr.c | 9 +-------- fs/lockd/xdr4.c | 13 ++++++++++++- include/linux/lockd/xdr4.h | 1 + 3 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c index 7df6324ccb8ab..8161667c976f8 100644 --- a/fs/lockd/clnt4xdr.c +++ b/fs/lockd/clnt4xdr.c @@ -261,7 +261,6 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) u32 exclusive; int error; __be32 *p; - s32 end;
memset(lock, 0, sizeof(*lock)); locks_init_lock(fl); @@ -285,13 +284,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) fl->fl_type = exclusive != 0 ? F_WRLCK : F_RDLCK; p = xdr_decode_hyper(p, &l_offset); xdr_decode_hyper(p, &l_len); - end = l_offset + l_len - 1; - - fl->fl_start = (loff_t)l_offset; - if (l_len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = (loff_t)end; + nlm4svc_set_file_lock_range(fl, l_offset, l_len); error = 0; out: return error; diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 712fdfeb8ef06..5fcbf30cd2759 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -33,6 +33,17 @@ loff_t_to_s64(loff_t offset) return res; }
+void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len) +{ + s64 end = off + len - 1; + + fl->fl_start = off; + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = end; +} + /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -80,7 +91,7 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) locks_init_lock(fl); fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; - + nlm4svc_set_file_lock_range(fl, lock->lock_start, lock->lock_len); return true; }
diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 9a6b55da8fd64..72831e35dca32 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,6 +22,7 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED)
+void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len); bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 27c934dd8832dd40fd34776f916dc201e18b319b ]
The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page.
nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves.
If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check for it.
Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro viro@zeniv.linux.org.uk Reported-by: Dario Lesca d.lesca@solinos.it Tested-by: David Critch dcritch@redhat.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index ddf424d76d410..abc682854507b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -954,8 +954,15 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct page *last_page;
last_page = page + (offset + sd->len - 1) / PAGE_SIZE; - for (page += offset / PAGE_SIZE; page <= last_page; page++) + for (page += offset / PAGE_SIZE; page <= last_page; page++) { + /* + * Skip page replacement when extending the contents + * of the current page. + */ + if (page == *(rqstp->rq_next_page - 1)) + continue; svc_rqst_replace_page(rqstp, page); + } if (rqstp->rq_res.page_len == 0) // first call rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 804d8e0a6e54427268790472781e03bc243f4ee3 ]
OPDESC() simply indexes into nfsd4_ops[] by the op's operation number, without range checking that value. It assumes callers are careful to avoid calling it with an out-of-bounds opnum value.
nfsd4_decode_compound() is not so careful, and can invoke OPDESC() with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end of nfsd4_ops[].
Reported-by: Jeff Layton jlayton@kernel.org Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d0b9fbf189ac6..e1f2f26ba93f2 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2476,10 +2476,12 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) for (i = 0; i < argp->opcnt; i++) { op = &argp->ops[i]; op->replay = NULL; + op->opdesc = NULL;
if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) return false; if (nfsd4_opnum_in_range(argp, op)) { + op->opdesc = OPDESC(op); op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) trace_nfsd_compound_decode_err(argp->rqstp, @@ -2490,7 +2492,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->opnum = OP_ILLEGAL; op->status = nfserr_op_illegal; } - op->opdesc = OPDESC(op); + /* * We'll try to cache the result in the DRC if any one * op in the compound wants to be cached:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 15a8b55dbb1ba154d82627547c5761cac884d810 ]
For ops with "trivial" replies, nfsd4_encode_operation will shortcut most of the encoding work and skip to just marshalling up the status. One of the things it skips is calling op_release. This could cause a memory leak in the layoutget codepath if there is an error at an inopportune time.
Have the compound processing engine always call op_release, even when op_func sets an error in op->status. With this change, we also need nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL on error to avoid a double free.
Reported-by: Zhi Li yieli@redhat.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403 Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index e1f2f26ba93f2..d62382dfc135e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5397,10 +5397,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) __be32 *p;
p = xdr_reserve_space(xdr, 8); - if (!p) { - WARN_ON_ONCE(1); - return; - } + if (!p) + goto release; *p++ = cpu_to_be32(op->opnum); post_err_offset = xdr->buf->len;
@@ -5415,8 +5413,6 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) op->status = encoder(resp, op->status, &op->u); if (op->status) trace_nfsd_compound_encode_err(rqstp, op->opnum, op->status); - if (opdesc && opdesc->op_release) - opdesc->op_release(&op->u); xdr_commit_encode(xdr);
/* nfsd4_check_resp_size guarantees enough room for error status */ @@ -5457,6 +5453,9 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) } status: *p = op->status; +release: + if (opdesc && opdesc->op_release) + opdesc->op_release(&op->u); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit b8bea9f6cdd7236c7c2238d022145e9b2f8aac22 ]
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 4a3796c6bd957..677a8d935ccc2 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1173,9 +1173,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserr_jukebox; if (status != nfs_ok) nfsd_file_unhash(nf); - clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); - smp_mb__after_atomic(); - wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); + clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 6c31e4c98853a4ba47355ea151b36a77c42b7734 ]
Since v4 files are expected to be long-lived, there's little value in closing them out of the cache when there is conflicting access.
Change the comparator to also match the gc value in the key. Change both of the current users of that key to set the gc value in the key to "true".
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 677a8d935ccc2..4ddc82b84f7c4 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -174,6 +174,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
switch (key->type) { case NFSD_FILE_KEY_INODE: + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + return 1; if (nf->nf_inode != key->inode) return 1; break; @@ -694,6 +696,7 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, + .gc = true, }; struct nfsd_file *nf;
@@ -1048,6 +1051,7 @@ nfsd_file_is_cached(struct inode *inode) struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, + .gc = true, }; bool ret = false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit d69b8dbfd0866abc5ec84652cc1c10fc3d4d91ef ]
test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!".
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 4ddc82b84f7c4..d61c8223082a4 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -188,7 +188,7 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, return 1; if (!nfsd_match_cred(nf->nf_cred, key->cred)) return 1; - if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) return 1; if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) return 1;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit c6593366c0bf222be9c7561354dfb921c611745e ]
An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d61c8223082a4..43bb2fd47cf58 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1101,7 +1101,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf = nfsd_file_alloc(&key, may_flags); if (!nf) { status = nfserr_jukebox; - goto out_status; + goto out; }
ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, @@ -1110,13 +1110,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (likely(ret == 0)) goto open_file;
- nfsd_file_slab_free(&nf->nf_rcu); - nf = NULL; if (ret == -EEXIST) goto retry; trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); status = nfserr_jukebox; - goto out_status; + goto construction_err;
wait_for_construction: wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE); @@ -1126,29 +1124,25 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); if (!open_retry) { status = nfserr_jukebox; - goto out; + goto construction_err; } open_retry = false; - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); goto retry; } - this_cpu_inc(nfsd_file_cache_hits);
status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); + if (status != nfs_ok) { + nfsd_file_put(nf); + nf = NULL; + } + out: if (status == nfs_ok) { this_cpu_inc(nfsd_file_acquisitions); nfsd_file_check_write_error(nf); *pnf = nf; - } else { - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); - nf = NULL; } - -out_status: put_cred(key.cred); trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status; @@ -1178,6 +1172,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (status != nfs_ok) nfsd_file_unhash(nf); clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); + if (status == nfs_ok) + goto out; + +construction_err: + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); + nf = NULL; goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit b680cb9b737331aad271feebbedafb865504e234 ]
David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify.
Reported-by: David Howells dhowells@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 43bb2fd47cf58..faa0c7d0253eb 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1093,6 +1093,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, rcu_read_unlock();
if (nf) { + /* + * If the nf is on the LRU then it holds an extra reference + * that must be put if it's removed. It had better not be + * the last one however, since we should hold another. + */ if (nfsd_file_lru_remove(nf)) WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref)); goto wait_for_construction;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit b2ff1bd71db2a1b193a6dde0845adcd69cbcf75e ]
The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index faa0c7d0253eb..786e06cf107ff 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -381,10 +381,8 @@ nfsd_file_free(struct nfsd_file *nf) if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { - get_file(nf->nf_file); - filp_close(nf->nf_file, NULL); nfsd_file_check_write_error(nf); - fput(nf->nf_file); + filp_close(nf->nf_file, NULL); }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 972cc0e0924598cb293b919d39c848dc038b2c28 ]
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 786e06cf107ff..1d4c0387c4192 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -906,7 +906,8 @@ nfsd_file_cache_init(void) * @net: net-namespace to shut down the cache (may be NULL) * * Walk the nfsd_file cache and close out any that match @net. If @net is NULL, - * then close out everything. Called when an nfsd instance is being shut down. + * then close out everything. Called when an nfsd instance is being shut down, + * and when the exports table is flushed. */ static void __nfsd_file_cache_purge(struct net *net)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit dcb779fcd4ed5984ad15991d574943d12a8693d1 ]
On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback.
The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems.
Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions.
Signed-off-by: Jeff Layton jlayton@kernel.org Acked-by: Anna Schumaker Anna.Schumaker@Netapp.com [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/export.c | 9 ++++++--- fs/nfsd/filecache.c | 12 +++++++++++- include/linux/exportfs.h | 1 + 3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b347e3ce0cc8e..993be63ab3015 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -182,7 +182,10 @@ const struct export_operations nfs_export_ops = { .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, .fetch_iversion = nfs_fetch_iversion, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| - EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR, + .flags = EXPORT_OP_NOWCC | + EXPORT_OP_NOSUBTREECHK | + EXPORT_OP_CLOSE_BEFORE_UNLINK | + EXPORT_OP_REMOTE_FS | + EXPORT_OP_NOATOMIC_ATTR | + EXPORT_OP_FLUSH_ON_CLOSE, }; diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d4c0387c4192..080d796547854 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -401,13 +401,23 @@ nfsd_file_check_writeback(struct nfsd_file *nf) struct file *file = nf->nf_file; struct address_space *mapping;
- if (!file || !(file->f_mode & FMODE_WRITE)) + /* File not open for write? */ + if (!(file->f_mode & FMODE_WRITE)) return false; + + /* + * Some filesystems (e.g. NFS) flush all dirty data on close. + * On others, there is no need to wait for writeback. + */ + if (!(file_inode(file)->i_sb->s_export_op->flags & EXPORT_OP_FLUSH_ON_CLOSE)) + return false; + mapping = file->f_mapping; return mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) || mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); }
+ static bool nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index fe848901fcc3a..218fc5c54e901 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,6 +221,7 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ +#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */ unsigned long flags; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit c4c649ab413ba6a785b25f0edbb12f617c87db2a ]
While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion.
A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation).
Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered.
Light testing with fstests generic/531 show no regressions.
Suggested-by: Neil Brown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 311 ++++++++++++++++++-------------------------- fs/nfsd/filecache.h | 9 +- 2 files changed, 133 insertions(+), 187 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 080d796547854..52e67ec267965 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -73,70 +73,9 @@ static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static struct delayed_work nfsd_filecache_laundrette; -static struct rhashtable nfsd_file_rhash_tbl +static struct rhltable nfsd_file_rhltable ____cacheline_aligned_in_smp;
-enum nfsd_file_lookup_type { - NFSD_FILE_KEY_INODE, - NFSD_FILE_KEY_FULL, -}; - -struct nfsd_file_lookup_key { - struct inode *inode; - struct net *net; - const struct cred *cred; - unsigned char need; - bool gc; - enum nfsd_file_lookup_type type; -}; - -/* - * The returned hash value is based solely on the address of an in-code - * inode, a pointer to a slab-allocated object. The entropy in such a - * pointer is concentrated in its middle bits. - */ -static u32 nfsd_file_inode_hash(const struct inode *inode, u32 seed) -{ - unsigned long ptr = (unsigned long)inode; - u32 k; - - k = ptr >> L1_CACHE_SHIFT; - k &= 0x00ffffff; - return jhash2(&k, 1, seed); -} - -/** - * nfsd_file_key_hashfn - Compute the hash value of a lookup key - * @data: key on which to compute the hash value - * @len: rhash table's key_len parameter (unused) - * @seed: rhash table's random seed of the day - * - * Return value: - * Computed 32-bit hash value - */ -static u32 nfsd_file_key_hashfn(const void *data, u32 len, u32 seed) -{ - const struct nfsd_file_lookup_key *key = data; - - return nfsd_file_inode_hash(key->inode, seed); -} - -/** - * nfsd_file_obj_hashfn - Compute the hash value of an nfsd_file - * @data: object on which to compute the hash value - * @len: rhash table's key_len parameter (unused) - * @seed: rhash table's random seed of the day - * - * Return value: - * Computed 32-bit hash value - */ -static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed) -{ - const struct nfsd_file *nf = data; - - return nfsd_file_inode_hash(nf->nf_inode, seed); -} - static bool nfsd_match_cred(const struct cred *c1, const struct cred *c2) { @@ -157,55 +96,16 @@ nfsd_match_cred(const struct cred *c1, const struct cred *c2) return true; }
-/** - * nfsd_file_obj_cmpfn - Match a cache item against search criteria - * @arg: search criteria - * @ptr: cache item to check - * - * Return values: - * %0 - Item matches search criteria - * %1 - Item does not match search criteria - */ -static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, - const void *ptr) -{ - const struct nfsd_file_lookup_key *key = arg->key; - const struct nfsd_file *nf = ptr; - - switch (key->type) { - case NFSD_FILE_KEY_INODE: - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) - return 1; - if (nf->nf_inode != key->inode) - return 1; - break; - case NFSD_FILE_KEY_FULL: - if (nf->nf_inode != key->inode) - return 1; - if (nf->nf_may != key->need) - return 1; - if (nf->nf_net != key->net) - return 1; - if (!nfsd_match_cred(nf->nf_cred, key->cred)) - return 1; - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) - return 1; - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) - return 1; - break; - } - return 0; -} - static const struct rhashtable_params nfsd_file_rhash_params = { .key_len = sizeof_field(struct nfsd_file, nf_inode), .key_offset = offsetof(struct nfsd_file, nf_inode), - .head_offset = offsetof(struct nfsd_file, nf_rhash), - .hashfn = nfsd_file_key_hashfn, - .obj_hashfn = nfsd_file_obj_hashfn, - .obj_cmpfn = nfsd_file_obj_cmpfn, - /* Reduce resizing churn on light workloads */ - .min_size = 512, /* buckets */ + .head_offset = offsetof(struct nfsd_file, nf_rlist), + + /* + * Start with a single page hash table to reduce resizing churn + * on light workloads. + */ + .min_size = 256, .automatic_shrinking = true, };
@@ -308,27 +208,27 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) }
static struct nfsd_file * -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) +nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need, + bool want_gc) { struct nfsd_file *nf;
nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL); - if (nf) { - INIT_LIST_HEAD(&nf->nf_lru); - nf->nf_birthtime = ktime_get(); - nf->nf_file = NULL; - nf->nf_cred = get_current_cred(); - nf->nf_net = key->net; - nf->nf_flags = 0; - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); - __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - if (key->gc) - __set_bit(NFSD_FILE_GC, &nf->nf_flags); - nf->nf_inode = key->inode; - refcount_set(&nf->nf_ref, 1); - nf->nf_may = key->need; - nf->nf_mark = NULL; - } + if (unlikely(!nf)) + return NULL; + + INIT_LIST_HEAD(&nf->nf_lru); + nf->nf_birthtime = ktime_get(); + nf->nf_file = NULL; + nf->nf_cred = get_current_cred(); + nf->nf_net = net; + nf->nf_flags = want_gc ? + BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING) | BIT(NFSD_FILE_GC) : + BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING); + nf->nf_inode = inode; + refcount_set(&nf->nf_ref, 1); + nf->nf_may = need; + nf->nf_mark = NULL; return nf; }
@@ -353,8 +253,8 @@ static void nfsd_file_hash_remove(struct nfsd_file *nf) { trace_nfsd_file_unhash(nf); - rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, - nfsd_file_rhash_params); + rhltable_remove(&nfsd_file_rhltable, &nf->nf_rlist, + nfsd_file_rhash_params); }
static bool @@ -687,8 +587,8 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) * @inode: inode on which to close out nfsd_files * @dispose: list on which to gather nfsd_files to close out * - * An nfsd_file represents a struct file being held open on behalf of nfsd. An - * open file however can block other activity (such as leases), or cause + * An nfsd_file represents a struct file being held open on behalf of nfsd. + * An open file however can block other activity (such as leases), or cause * undesirable behavior (e.g. spurious silly-renames when reexporting NFS). * * This function is intended to find open nfsd_files when this sort of @@ -701,21 +601,17 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) static void nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_INODE, - .inode = inode, - .gc = true, - }; + struct rhlist_head *tmp, *list; struct nfsd_file *nf;
rcu_read_lock(); - do { - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); - if (!nf) - break; + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { + if (!test_bit(NFSD_FILE_GC, &nf->nf_flags)) + continue; nfsd_file_cond_queue(nf, dispose); - } while (1); + } rcu_read_unlock(); }
@@ -839,7 +735,7 @@ nfsd_file_cache_init(void) if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0;
- ret = rhashtable_init(&nfsd_file_rhash_tbl, &nfsd_file_rhash_params); + ret = rhltable_init(&nfsd_file_rhltable, &nfsd_file_rhash_params); if (ret) return ret;
@@ -907,7 +803,7 @@ nfsd_file_cache_init(void) nfsd_file_mark_slab = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhashtable_destroy(&nfsd_file_rhash_tbl); + rhltable_destroy(&nfsd_file_rhltable); goto out; }
@@ -926,7 +822,7 @@ __nfsd_file_cache_purge(struct net *net) struct nfsd_file *nf; LIST_HEAD(dispose);
- rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); + rhltable_walk_enter(&nfsd_file_rhltable, &iter); do { rhashtable_walk_start(&iter);
@@ -1032,7 +928,7 @@ nfsd_file_cache_shutdown(void) nfsd_file_mark_slab = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhashtable_destroy(&nfsd_file_rhash_tbl); + rhltable_destroy(&nfsd_file_rhltable);
for_each_possible_cpu(i) { per_cpu(nfsd_file_cache_hits, i) = 0; @@ -1043,6 +939,35 @@ nfsd_file_cache_shutdown(void) } }
+static struct nfsd_file * +nfsd_file_lookup_locked(const struct net *net, const struct cred *cred, + struct inode *inode, unsigned char need, + bool want_gc) +{ + struct rhlist_head *tmp, *list; + struct nfsd_file *nf; + + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { + if (nf->nf_may != need) + continue; + if (nf->nf_net != net) + continue; + if (!nfsd_match_cred(nf->nf_cred, cred)) + continue; + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != want_gc) + continue; + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) + continue; + + if (!nfsd_file_get(nf)) + continue; + return nf; + } + return NULL; +} + /** * nfsd_file_is_cached - are there any cached open files for this inode? * @inode: inode to check @@ -1057,16 +982,20 @@ nfsd_file_cache_shutdown(void) bool nfsd_file_is_cached(struct inode *inode) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_INODE, - .inode = inode, - .gc = true, - }; + struct rhlist_head *tmp, *list; + struct nfsd_file *nf; bool ret = false;
- if (rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params) != NULL) - ret = true; + rcu_read_lock(); + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) + if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) { + ret = true; + break; + } + rcu_read_unlock(); + trace_nfsd_file_is_cached(inode, (int)ret); return ret; } @@ -1076,14 +1005,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct file *file, struct nfsd_file **pnf, bool want_gc) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_FULL, - .need = may_flags & NFSD_FILE_MAY_MASK, - .net = SVC_NET(rqstp), - .gc = want_gc, - }; + unsigned char need = may_flags & NFSD_FILE_MAY_MASK; + struct net *net = SVC_NET(rqstp); + struct nfsd_file *new, *nf; + const struct cred *cred; bool open_retry = true; - struct nfsd_file *nf; + struct inode *inode; __be32 status; int ret;
@@ -1091,14 +1018,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, may_flags|NFSD_MAY_OWNER_OVERRIDE); if (status != nfs_ok) return status; - key.inode = d_inode(fhp->fh_dentry); - key.cred = get_current_cred(); + inode = d_inode(fhp->fh_dentry); + cred = get_current_cred();
retry: rcu_read_lock(); - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); - nf = nfsd_file_get(nf); + nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); rcu_read_unlock();
if (nf) { @@ -1112,21 +1037,32 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto wait_for_construction; }
- nf = nfsd_file_alloc(&key, may_flags); - if (!nf) { + new = nfsd_file_alloc(net, inode, need, want_gc); + if (!new) { status = nfserr_jukebox; goto out; }
- ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, - &key, &nf->nf_rhash, - nfsd_file_rhash_params); + rcu_read_lock(); + spin_lock(&inode->i_lock); + nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); + if (unlikely(nf)) { + spin_unlock(&inode->i_lock); + rcu_read_unlock(); + nfsd_file_slab_free(&new->nf_rcu); + goto wait_for_construction; + } + nf = new; + ret = rhltable_insert(&nfsd_file_rhltable, &nf->nf_rlist, + nfsd_file_rhash_params); + spin_unlock(&inode->i_lock); + rcu_read_unlock(); if (likely(ret == 0)) goto open_file;
if (ret == -EEXIST) goto retry; - trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); + trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret); status = nfserr_jukebox; goto construction_err;
@@ -1135,7 +1071,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
/* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); + trace_nfsd_file_cons_err(rqstp, inode, may_flags, nf); if (!open_retry) { status = nfserr_jukebox; goto construction_err; @@ -1157,13 +1093,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_check_write_error(nf); *pnf = nf; } - put_cred(key.cred); - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + put_cred(cred); + trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); return status;
open_file: trace_nfsd_file_alloc(nf); - nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); + nf->nf_mark = nfsd_file_mark_find_or_create(nf, inode); if (nf->nf_mark) { if (file) { get_file(file); @@ -1181,7 +1117,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status == nfs_ok && key.inode->i_nlink == 0) + if (status != nfs_ok || inode->i_nlink == 0) status = nfserr_jukebox; if (status != nfs_ok) nfsd_file_unhash(nf); @@ -1208,8 +1144,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * seconds after the final nfsd_file_put() in case the caller * wants to re-use it. * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1229,8 +1168,11 @@ nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, * but not garbage-collected. The object is unhashed after the * final nfsd_file_put(). * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1251,8 +1193,11 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * and @file is non-NULL, use it to instantiate a new nfsd_file instead of * opening a new one. * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1283,7 +1228,7 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) lru = list_lru_count(&nfsd_file_lru);
rcu_read_lock(); - ht = &nfsd_file_rhash_tbl; + ht = &nfsd_file_rhltable.ht; count = atomic_read(&ht->nelems); tbl = rht_dereference_rcu(ht->tbl, ht); buckets = tbl->size; @@ -1299,7 +1244,7 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) evictions += per_cpu(nfsd_file_evictions, i); }
- seq_printf(m, "total entries: %u\n", count); + seq_printf(m, "total inodes: %u\n", count); seq_printf(m, "hash buckets: %u\n", buckets); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 41516a4263ea5..e54165a3224f0 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -29,9 +29,8 @@ struct nfsd_file_mark { * never be dereferenced, only used for comparison. */ struct nfsd_file { - struct rhash_head nf_rhash; - struct list_head nf_lru; - struct rcu_head nf_rcu; + struct rhlist_head nf_rlist; + void *nf_inode; struct file *nf_file; const struct cred *nf_cred; struct net *nf_net; @@ -40,10 +39,12 @@ struct nfsd_file { #define NFSD_FILE_REFERENCED (2) #define NFSD_FILE_GC (3) unsigned long nf_flags; - struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; unsigned char nf_may; + struct nfsd_file_mark *nf_mark; + struct list_head nf_lru; + struct rcu_head nf_rcu; ktime_t nf_birthtime; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 92e4a6733f922f0fef1d0995f7b2d0eaff86c7ea ]
When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list.
Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments.
Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/filecache.c | 64 ++++++++++++++++----------------------------- 1 file changed, 22 insertions(+), 42 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 52e67ec267965..6b8706f23eaf0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -401,49 +401,26 @@ nfsd_file_dispose_list(struct list_head *dispose) } }
-static void -nfsd_file_list_remove_disposal(struct list_head *dst, - struct nfsd_fcache_disposal *l) -{ - spin_lock(&l->lock); - list_splice_init(&l->freeme, dst); - spin_unlock(&l->lock); -} - -static void -nfsd_file_list_add_disposal(struct list_head *files, struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct nfsd_fcache_disposal *l = nn->fcache_disposal; - - spin_lock(&l->lock); - list_splice_tail_init(files, &l->freeme); - spin_unlock(&l->lock); - queue_work(nfsd_filecache_wq, &l->work); -} - -static void -nfsd_file_list_add_pernet(struct list_head *dst, struct list_head *src, - struct net *net) -{ - struct nfsd_file *nf, *tmp; - - list_for_each_entry_safe(nf, tmp, src, nf_lru) { - if (nf->nf_net == net) - list_move_tail(&nf->nf_lru, dst); - } -} - +/** + * nfsd_file_dispose_list_delayed - move list of dead files to net's freeme list + * @dispose: list of nfsd_files to be disposed + * + * Transfers each file to the "freeme" list for its nfsd_net, to eventually + * be disposed of by the per-net garbage collector. + */ static void nfsd_file_dispose_list_delayed(struct list_head *dispose) { - LIST_HEAD(list); - struct nfsd_file *nf; - while(!list_empty(dispose)) { - nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - nfsd_file_list_add_pernet(&list, dispose, nf->nf_net); - nfsd_file_list_add_disposal(&list, nf->nf_net); + struct nfsd_file *nf = list_first_entry(dispose, + struct nfsd_file, nf_lru); + struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal; + + spin_lock(&l->lock); + list_move_tail(&nf->nf_lru, &l->freeme); + spin_unlock(&l->lock); + queue_work(nfsd_filecache_wq, &l->work); } }
@@ -664,8 +641,8 @@ nfsd_file_close_inode_sync(struct inode *inode) * nfsd_file_delayed_close - close unused nfsd_files * @work: dummy * - * Walk the LRU list and destroy any entries that have not been used since - * the last scan. + * Scrape the freeme list for this nfsd_net, and then dispose of them + * all. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -674,7 +651,10 @@ nfsd_file_delayed_close(struct work_struct *work) struct nfsd_fcache_disposal *l = container_of(work, struct nfsd_fcache_disposal, work);
- nfsd_file_list_remove_disposal(&head, l); + spin_lock(&l->lock); + list_splice_init(&l->freeme, &head); + spin_unlock(&l->lock); + nfsd_file_dispose_list(&head); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit d53d70084d27f56bcdf5074328f2c9ec861be596 ]
notify_change can modify the iattr structure. In particular it can end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing a BUG() if the same iattr is passed to notify_change more than once.
Make a copy of the struct iattr before calling notify_change.
Reported-by: Zhi Li yieli@redhat.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969 Tested-by: Zhi Li yieli@redhat.com Fixes: 34b91dda7124 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY") Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/vfs.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index abc682854507b..542a1adbbf2a1 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -545,7 +545,15 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
inode_lock(inode); for (retries = 1;;) { - host_err = __nfsd_setattr(dentry, iap); + struct iattr attrs; + + /* + * notify_change() can alter its iattr argument, making + * @iap unsuitable for submission multiple times. Make a + * copy for every loop iteration. + */ + attrs = *iap; + host_err = __nfsd_setattr(dentry, &attrs); if (host_err != -EAGAIN || !retries--) break; if (!nfsd_wait_for_delegreturn(rqstp, inode))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit c034203b6a9dae6751ef4371c18cb77983e30c28 ]
The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug.
The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function.
Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: NeilBrown neilb@suse.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 7 +------ include/linux/sunrpc/svcsock.h | 7 +++---- net/sunrpc/svcsock.c | 24 ++++++------------------ 3 files changed, 10 insertions(+), 28 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index c2577ee7ffb22..76a60e7a75097 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -714,16 +714,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred if (err != 0 || fd < 0) return -EINVAL;
- if (svc_alien_sock(net, fd)) { - printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__); - return -EINVAL; - } - err = nfsd_create_serv(net); if (err != 0) return err;
- err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
if (err >= 0 && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index b7ac7fe683067..a366d3eb05315 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -57,10 +57,9 @@ int svc_recv(struct svc_rqst *, long); int svc_send(struct svc_rqst *); void svc_drop(struct svc_rqst *); void svc_sock_update_bufs(struct svc_serv *serv); -bool svc_alien_sock(struct net *net, int fd); -int svc_addsock(struct svc_serv *serv, const int fd, - char *name_return, const size_t len, - const struct cred *cred); +int svc_addsock(struct svc_serv *serv, struct net *net, + const int fd, char *name_return, const size_t len, + const struct cred *cred); void svc_init_xprt_sock(void); void svc_cleanup_xprt_sock(void); struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot); diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 90f6231d6ed67..cb0cfcd8a8141 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1342,25 +1342,10 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, return svsk; }
-bool svc_alien_sock(struct net *net, int fd) -{ - int err; - struct socket *sock = sockfd_lookup(fd, &err); - bool ret = false; - - if (!sock) - goto out; - if (sock_net(sock->sk) != net) - ret = true; - sockfd_put(sock); -out: - return ret; -} -EXPORT_SYMBOL_GPL(svc_alien_sock); - /** * svc_addsock - add a listener socket to an RPC service * @serv: pointer to RPC service to which to add a new listener + * @net: caller's network namespace * @fd: file descriptor of the new listener * @name_return: pointer to buffer to fill in with name of listener * @len: size of the buffer @@ -1370,8 +1355,8 @@ EXPORT_SYMBOL_GPL(svc_alien_sock); * Name is terminated with '\n'. On error, returns a negative errno * value. */ -int svc_addsock(struct svc_serv *serv, const int fd, char *name_return, - const size_t len, const struct cred *cred) +int svc_addsock(struct svc_serv *serv, struct net *net, const int fd, + char *name_return, const size_t len, const struct cred *cred) { int err = 0; struct socket *so = sockfd_lookup(fd, &err); @@ -1382,6 +1367,9 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
if (!so) return err; + err = -EINVAL; + if (sock_net(so->sk) != net) + goto out; err = -EAFNOSUPPORT; if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6)) goto out;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ]
The below-mentioned patch was intended to simplify refcounting on the svc_serv used by locked. The goal was to only ever have a single reference from the single thread. To that end we dropped a call to lockd_start_svc() (except when creating thread) which would take a reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from lockd_create_svc() in the case where the svc_serv already existed. So after the patch: - on the first call the svc_serv was allocated and the one reference was given to the thread, so there are no extra references - on subsequent calls svc_get() was called so there is now an extra reference. This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get() takes *two* references, one on nlmsvc_serv and one by incrementing nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in lockd_create_svc().
Reported-by: Ido Schimmel idosch@idosch.org Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()") Signed-off-by: NeilBrown neilb@suse.de Tested-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/lockd/svc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 59ef8a1f843f3..5579e67da17db 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -355,7 +355,6 @@ static int lockd_get(void) int error;
if (nlmsvc_serv) { - svc_get(nlmsvc_serv); nlmsvc_users++; return 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 262176798b18b12fd8ab84c94cfece0a6a652476 ]
Clean up: de-duplicate some common code.
Reviewed-by: Jeff Layton jlayton@kernel.org Acked-by: Tom Talpey tom@talpey.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d62382dfc135e..a81938c1e3efb 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2541,6 +2541,20 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, return p; }
+static __be32 nfsd4_encode_nfstime4(struct xdr_stream *xdr, + struct timespec64 *tv) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 3); + if (!p) + return nfserr_resource; + + p = xdr_encode_hyper(p, (s64)tv->tv_sec); + *p = cpu_to_be32(tv->tv_nsec); + return nfs_ok; +} + /* * ctime (in NFSv4, time_metadata) is not writeable, and the client * doesn't really care what resolution could theoretically be stored by @@ -3346,11 +3360,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_encode_hyper(p, dummy64); } if (bmval1 & FATTR4_WORD1_TIME_ACCESS) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec); - *p++ = cpu_to_be32(stat.atime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.atime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_DELTA) { p = xdr_reserve_space(xdr, 12); @@ -3359,25 +3371,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = encode_time_delta(p, d_inode(dentry)); } if (bmval1 & FATTR4_WORD1_TIME_METADATA) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec); - *p++ = cpu_to_be32(stat.ctime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.ctime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_MODIFY) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec); - *p++ = cpu_to_be32(stat.mtime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.mtime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_CREATE) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.btime.tv_sec); - *p++ = cpu_to_be32(stat.btime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.btime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { u64 ino = stat.ino;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tavian Barnes tavianator@tavianator.com
[ Upstream commit d7dbed457c2ef83709a2a2723a2d58de43623449 ]
In nfsd4_encode_fattr(), TIME_CREATE was being written out after all other times. However, they should be written out in an order that matches the bit flags in bmval1, which in this case are
#define FATTR4_WORD1_TIME_ACCESS (1UL << 15) #define FATTR4_WORD1_TIME_CREATE (1UL << 18) #define FATTR4_WORD1_TIME_DELTA (1UL << 19) #define FATTR4_WORD1_TIME_METADATA (1UL << 20) #define FATTR4_WORD1_TIME_MODIFY (1UL << 21)
so TIME_CREATE should come second.
I noticed this on a FreeBSD NFSv4.2 client, which supports creation times. On this client, file times were weirdly permuted. With this patch applied on the server, times looked normal on the client.
Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Link: https://unix.stackexchange.com/q/749605/56202 Signed-off-by: Tavian Barnes tavianator@tavianator.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a81938c1e3efb..5a68c62864925 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3364,6 +3364,11 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, if (status) goto out; } + if (bmval1 & FATTR4_WORD1_TIME_CREATE) { + status = nfsd4_encode_nfstime4(xdr, &stat.btime); + if (status) + goto out; + } if (bmval1 & FATTR4_WORD1_TIME_DELTA) { p = xdr_reserve_space(xdr, 12); if (!p) @@ -3380,11 +3385,6 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, if (status) goto out; } - if (bmval1 & FATTR4_WORD1_TIME_CREATE) { - status = nfsd4_encode_nfstime4(xdr, &stat.btime); - if (status) - goto out; - } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { u64 ino = stat.ino;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 3903902401451b1cd9d797a8c79769eb26ac7fe5 ]
The original implementation of nfsd used signals to stop threads during shutdown. In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads internally it if was asked to run "0" threads. After this user-space transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to threads was no longer an important part of the API.
In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the use of signals for stopping threads, using kthread_stop() instead.
This patch makes the "obvious" next step and removes the ability to signal nfsd threads - or any svc threads. nfsd stops allowing signals and we don't check for their delivery any more.
This will allow for some simplification in later patches.
A change worth noting is in nfsd4_ssc_setup_dul(). There was previously a signal_pending() check which would only succeed when the thread was being shut down. It should really have tested kthread_should_stop() as well. Now it just does the latter, not the former.
Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/callback.c | 11 ++--------- fs/nfsd/nfs4proc.c | 8 ++++---- fs/nfsd/nfssvc.c | 12 ------------ net/sunrpc/svc_xprt.c | 20 ++++++++------------ 4 files changed, 14 insertions(+), 37 deletions(-)
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 456af7d230cf1..8fe143cad4a2b 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -80,9 +80,6 @@ nfs4_callback_svc(void *vrqstp) set_freezable();
while (!kthread_freezable_should_stop(NULL)) { - - if (signal_pending(current)) - flush_signals(current); /* * Listen for a request on the socket */ @@ -112,11 +109,7 @@ nfs41_callback_svc(void *vrqstp) set_freezable();
while (!kthread_freezable_should_stop(NULL)) { - - if (signal_pending(current)) - flush_signals(current); - - prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE); + prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_IDLE); spin_lock_bh(&serv->sv_cb_lock); if (!list_empty(&serv->sv_cb_list)) { req = list_first_entry(&serv->sv_cb_list, @@ -131,7 +124,7 @@ nfs41_callback_svc(void *vrqstp) } else { spin_unlock_bh(&serv->sv_cb_lock); if (!kthread_should_stop()) - schedule(); + freezable_schedule(); finish_wait(&serv->sv_cb_waitq, &wq); } } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9c51c10bcf080..8560a11daa47d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -38,6 +38,7 @@ #include <linux/slab.h> #include <linux/kthread.h> #include <linux/namei.h> +#include <linux/freezer.h>
#include <linux/sunrpc/addr.h> #include <linux/nfs_ssc.h> @@ -1313,13 +1314,12 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, /* found a match */ if (ni->nsui_busy) { /* wait - and try again */ - prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, - TASK_INTERRUPTIBLE); + prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, TASK_IDLE); spin_unlock(&nn->nfsd_ssc_lock);
/* allow 20secs for mount/unmount for now - revisit */ - if (signal_pending(current) || - (schedule_timeout(20*HZ) == 0)) { + if (kthread_should_stop() || + (freezable_schedule_timeout(20*HZ) == 0)) { finish_wait(&nn->nfsd_ssc_waitq, &wait); kfree(work); return nfserr_eagain; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index a0ecec54d3d7d..8063fab2c0279 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -952,15 +952,6 @@ nfsd(void *vrqstp)
current->fs->umask = 0;
- /* - * thread is spawned with all signals set to SIG_IGN, re-enable - * the ones that will bring down the thread - */ - allow_signal(SIGKILL); - allow_signal(SIGHUP); - allow_signal(SIGINT); - allow_signal(SIGQUIT); - atomic_inc(&nfsdstats.th_cnt);
set_freezable(); @@ -985,9 +976,6 @@ nfsd(void *vrqstp) validate_process_creds(); }
- /* Clear signals before calling svc_exit_thread() */ - flush_signals(current); - atomic_dec(&nfsdstats.th_cnt);
out: diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 000b737784bd9..d1eacf3358b81 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -674,13 +674,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) while (rqstp->rq_pages[i] == NULL) { struct page *p = alloc_page(GFP_KERNEL); if (!p) { - set_current_state(TASK_INTERRUPTIBLE); - if (signalled() || kthread_should_stop()) { + set_current_state(TASK_IDLE); + if (kthread_should_stop()) { set_current_state(TASK_RUNNING); return -EINTR; } - schedule_timeout(msecs_to_jiffies(500)); } + freezable_schedule_timeout(msecs_to_jiffies(500)); rqstp->rq_pages[i] = p; } rqstp->rq_page_end = &rqstp->rq_pages[i]; @@ -713,7 +713,7 @@ rqst_should_sleep(struct svc_rqst *rqstp) return false;
/* are we shutting down? */ - if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) return false;
/* are we freezing? */ @@ -735,18 +735,14 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (rqstp->rq_xprt) goto out_found;
- /* - * We have to be able to interrupt this wait - * to bring down the daemons ... - */ - set_current_state(TASK_INTERRUPTIBLE); + set_current_state(TASK_IDLE); smp_mb__before_atomic(); clear_bit(SP_CONGESTED, &pool->sp_flags); clear_bit(RQ_BUSY, &rqstp->rq_flags); smp_mb__after_atomic();
if (likely(rqst_should_sleep(rqstp))) - time_left = schedule_timeout(timeout); + time_left = freezable_schedule_timeout(timeout); else __set_current_state(TASK_RUNNING);
@@ -761,7 +757,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (!time_left) atomic_long_inc(&pool->sp_stats.threads_timedout);
- if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) return ERR_PTR(-EINTR); return ERR_PTR(-EAGAIN); out_found: @@ -860,7 +856,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) try_to_freeze(); cond_resched(); err = -EINTR; - if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) goto out;
xprt = svc_get_next_xprt(rqstp, timeout);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 18e4cf915543257eae2925671934937163f5639b ]
Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed.
Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 23 ----------------------- include/linux/sunrpc/svc.h | 13 ------------- 2 files changed, 36 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8063fab2c0279..8907dba22c3f2 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -979,31 +979,8 @@ nfsd(void *vrqstp) atomic_dec(&nfsdstats.th_cnt);
out: - /* Take an extra ref so that the svc_put in svc_exit_thread() - * doesn't call svc_destroy() - */ - svc_get(nn->nfsd_serv); - /* Release the thread */ svc_exit_thread(rqstp); - - /* We need to drop a ref, but may not drop the last reference - * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that - * could deadlock with nfsd_shutdown_threads() waiting for us. - * So three options are: - * - drop a non-final reference, - * - get the mutex without waiting - * - sleep briefly andd try the above again - */ - while (!svc_put_not_last(nn->nfsd_serv)) { - if (mutex_trylock(&nfsd_mutex)) { - nfsd_put(net); - mutex_unlock(&nfsd_mutex); - break; - } - msleep(20); - } - return 0; }
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5cf6543d3c8e5..1cf7a7799cc04 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -123,19 +123,6 @@ static inline void svc_put(struct svc_serv *serv) kref_put(&serv->sv_refcnt, svc_destroy); }
-/** - * svc_put_not_last - decrement non-final reference count on SUNRPC serv - * @serv: the svc_serv to have count decremented - * - * Returns: %true is refcount was decremented. - * - * If the refcount is 1, it is not decremented and instead failure is reported. - */ -static inline bool svc_put_not_last(struct svc_serv *serv) -{ - return refcount_dec_not_one(&serv->sv_refcnt.refcount); -} - /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ]
Now that the last nfsd thread is stopped by an explicit act of calling svc_set_num_threads() with a count of zero, we only have a limited number of places that can happen, and don't need to call nfsd_last_thread() in nfsd_put()
So separate that out and call it at the two places where the number of threads is set to zero.
Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all() into nfsd_last_thread(), as they are really part of the same action.
nfsd_put() is now a thin wrapper around svc_put(), so make it a static inline.
nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of places we have to use svc_put() instead.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsd.h | 7 ++++++- fs/nfsd/nfssvc.c | 52 ++++++++++++++++++------------------------------ 2 files changed, 25 insertions(+), 34 deletions(-)
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index fa0144a742678..867dcfd64d426 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -96,7 +96,12 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net);
-void nfsd_put(struct net *net); +static inline void nfsd_put(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + svc_put(nn->nfsd_serv); +}
bool i_am_nfsd(void);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8907dba22c3f2..ee5713fca1870 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -529,9 +529,14 @@ static struct notifier_block nfsd_inet6addr_notifier = { /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0);
-static void nfsd_last_thread(struct svc_serv *serv, struct net *net) +static void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv = nn->nfsd_serv; + + spin_lock(&nfsd_notifier_lock); + nn->nfsd_serv = NULL; + spin_unlock(&nfsd_notifier_lock);
/* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { @@ -541,6 +546,8 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) #endif }
+ svc_xprt_destroy_all(serv, net); + /* * write_ports can create the server without actually starting * any threads--if we get shut down before any threads are @@ -631,7 +638,8 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ svc_set_num_threads(serv, NULL, 0); - nfsd_put(net); + nfsd_last_thread(net); + svc_put(serv); mutex_unlock(&nfsd_mutex); }
@@ -661,9 +669,6 @@ int nfsd_create_serv(struct net *net) serv->sv_maxconn = nn->max_connections; error = svc_bind(serv, net); if (error < 0) { - /* NOT nfsd_put() as notifiers (see below) haven't - * been set up yet. - */ svc_put(serv); return error; } @@ -706,29 +711,6 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; }
-/* This is the callback for kref_put() below. - * There is no code here as the first thing to be done is - * call svc_shutdown_net(), but we cannot get the 'net' from - * the kref. So do all the work when kref_put returns true. - */ -static void nfsd_noop(struct kref *ref) -{ -} - -void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { - svc_xprt_destroy_all(nn->nfsd_serv, net); - nfsd_last_thread(nn->nfsd_serv, net); - svc_destroy(&nn->nfsd_serv->sv_refcnt); - spin_lock(&nfsd_notifier_lock); - nn->nfsd_serv = NULL; - spin_unlock(&nfsd_notifier_lock); - } -} - int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) { int i = 0; @@ -779,7 +761,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (err) break; } - nfsd_put(net); + svc_put(nn->nfsd_serv); return err; }
@@ -794,6 +776,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) int error; bool nfsd_up_before; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
mutex_lock(&nfsd_mutex); dprintk("nfsd: creating service\n"); @@ -813,22 +796,25 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) goto out;
nfsd_up_before = nn->nfsd_net_up; + serv = nn->nfsd_serv;
error = nfsd_startup_net(net, cred); if (error) goto out_put; - error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs); + error = svc_set_num_threads(serv, NULL, nrservs); if (error) goto out_shutdown; - error = nn->nfsd_serv->sv_nrthreads; + error = serv->sv_nrthreads; + if (error == 0) + nfsd_last_thread(net); out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); out_put: /* Threads now hold service active */ if (xchg(&nn->keep_active, 0)) - nfsd_put(net); - nfsd_put(net); + svc_put(serv); + svc_put(serv); out: mutex_unlock(&nfsd_mutex); return error;
Hi Greg,
(+Jeff & linux-nfs in Ccs)
Greg Kroah-Hartman wrote on Tue, Jun 18, 2024 at 02:40:15PM +0200:
[ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ]
Playing with dyad in the 'vulns' repo, I noticed this commit got reverted in the 6.1 tree by pure chance as I just happened to test it on a related commit and wondered why the 6.1 kernel was listed twice: b2c545c39877 ("Revert "nfsd: separate nfsd_last_thread() from nfsd_put()"") db5f2f4db8b7 ("Revert "nfsd: call nfsd_last_thread() before final nfsd_put()"")
See this thread for the discussion that caused that revert: https://lore.kernel.org/all/e341cb408b5663d8c91b8fa57b41bb984be43448.camel@k...
What made me look is that they got in 5.10/15 (without revert):
5.10 tree (since v5.10.220) 838a602db75d ("nfsd: call nfsd_last_thread() before final nfsd_put()") d31cd25f5501 ("nfsd: separate nfsd_last_thread() from nfsd_put()")
5.15 tree (since v5.15.154) c52fee7a1f98 ("nfsd: call nfsd_last_thread() before final nfsd_put()") 56e5eeff6cfa ("nfsd: separate nfsd_last_thread() from nfsd_put()")
I considered trying to revert them as well, but it looks like they've been fixed by this commit (upstream id): 64e6304169f1 ("nfsd: drop the nfsd_put helper") which wasn't in 6.1, so perhaps that's all there is to it and I'm worried too much?
Jeff, you're the one who suggested reverting the two back then, sorry to dump it on you but do you remember the kind of problems you ran into? Is there any chance it would have gone unoticed in the 5.15 tree for 2.5 months? (5.15.154 was April 2024)
(Bonus question: if that is really all there is, would that make sense / should we take the commits back in 6.1 with that extra fix?)
Thanks,
On Mon, 2024-06-24 at 13:14 +0900, Dominique Martinet wrote:
Hi Greg,
(+Jeff & linux-nfs in Ccs)
Greg Kroah-Hartman wrote on Tue, Jun 18, 2024 at 02:40:15PM +0200:
[ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ]
Playing with dyad in the 'vulns' repo, I noticed this commit got reverted in the 6.1 tree by pure chance as I just happened to test it on a related commit and wondered why the 6.1 kernel was listed twice: b2c545c39877 ("Revert "nfsd: separate nfsd_last_thread() from nfsd_put()"") db5f2f4db8b7 ("Revert "nfsd: call nfsd_last_thread() before final nfsd_put()"")
See this thread for the discussion that caused that revert: https://lore.kernel.org/all/e341cb408b5663d8c91b8fa57b41bb984be43448.camel@k...
What made me look is that they got in 5.10/15 (without revert):
5.10 tree (since v5.10.220) 838a602db75d ("nfsd: call nfsd_last_thread() before final nfsd_put()") d31cd25f5501 ("nfsd: separate nfsd_last_thread() from nfsd_put()")
5.15 tree (since v5.15.154) c52fee7a1f98 ("nfsd: call nfsd_last_thread() before final nfsd_put()") 56e5eeff6cfa ("nfsd: separate nfsd_last_thread() from nfsd_put()")
I considered trying to revert them as well, but it looks like they've been fixed by this commit (upstream id): 64e6304169f1 ("nfsd: drop the nfsd_put helper") which wasn't in 6.1, so perhaps that's all there is to it and I'm worried too much?
Jeff, you're the one who suggested reverting the two back then, sorry to dump it on you but do you remember the kind of problems you ran into? Is there any chance it would have gone unoticed in the 5.15 tree for 2.5 months? (5.15.154 was April 2024)
Sorry, I don't think I kept a record of that panic that I hit at the time. I do think that I looked at the original bug report and it looked like it was probably the same problem, but I don't remember the details.
I think I just mentioned reverting them because I didn't see the benefit in taking those into an old kernel. These are privileged anyway, so even if they are bugs I don't seem them as particularly critical.
(Bonus question: if that is really all there is, would that make sense / should we take the commits back in 6.1 with that extra fix?)
Maybe? The problem is that someone has to do the testing for this. These interfaces aren't currently part of any testsuite, so a lot of that tends to be a manual effort.
(Sorry for the slow reply)
Jeff Layton wrote on Tue, Jun 25, 2024 at 11:45:22AM -0400:
Jeff, you're the one who suggested reverting the two back then, sorry to dump it on you but do you remember the kind of problems you ran into? Is there any chance it would have gone unoticed in the 5.15 tree for 2.5 months? (5.15.154 was April 2024)
Sorry, I don't think I kept a record of that panic that I hit at the time. I do think that I looked at the original bug report and it looked like it was probably the same problem, but I don't remember the details.
I think I just mentioned reverting them because I didn't see the benefit in taking those into an old kernel. These are privileged anyway, so even if they are bugs I don't seem them as particularly critical.
Right, normal users can't stop the nfsd so you're correct it probably doesn't matter as far as security goes (and this correctly doesn't have any of the shiny new CVEs assigned); I'm now curious why it got backported to all these trees at all if nothing needs it.
Anyway, I really just started looking at it because it got reverted in 6.1 so was wondering why it didn't get reverted in other trees, but if it hadn't made it to any tree I wouldn't have cared at all...
As long as there doesn't seem to be a problem with older trees (and at least I'm not getting any weird hang or crash on shutdown on my 5.10) let's leave things as they are.
(Bonus question: if that is really all there is, would that make sense / should we take the commits back in 6.1 with that extra fix?)
Maybe? The problem is that someone has to do the testing for this. These interfaces aren't currently part of any testsuite, so a lot of that tends to be a manual effort.
Agreed, let's not add more work there if no-one can quickly test this.
If it turns out some later commit that actually needs this gets backported to 6.1 we can always bring the pair of commits back when someone notices.
Thanks!
On Thu, Jul 18, 2024 at 03:20:51AM -0400, Dominique Martinet wrote:
(Sorry for the slow reply)
Jeff Layton wrote on Tue, Jun 25, 2024 at 11:45:22AM -0400:
Jeff, you're the one who suggested reverting the two back then, sorry to dump it on you but do you remember the kind of problems you ran into? Is there any chance it would have gone unoticed in the 5.15 tree for 2.5 months? (5.15.154 was April 2024)
Sorry, I don't think I kept a record of that panic that I hit at the time. I do think that I looked at the original bug report and it looked like it was probably the same problem, but I don't remember the details.
I think I just mentioned reverting them because I didn't see the benefit in taking those into an old kernel. These are privileged anyway, so even if they are bugs I don't seem them as particularly critical.
Right, normal users can't stop the nfsd so you're correct it probably doesn't matter as far as security goes (and this correctly doesn't have any of the shiny new CVEs assigned); I'm now curious why it got backported to all these trees at all if nothing needs it.
We wanted to backport the NFSD file cache fixes from v5.19 through v6.3 to LTS v6.1, v5.15, and v5.10. Since it wasn't just a handful of specific patches that fixed the NFSD file cache, Sasha and Greg asked me to backport the full range of NFSD commits up until v6.3 to these 3 kernels. The goals were:
1. To fix the file cache 2. To create a platform in the LTS code bases to make it easier to backport the more complex upstream NFSD fixes going forward
Thus the backport of course included some commits that would not normally have been backported to an LTS kernel. That has surprised a few folks.
Anyway, I really just started looking at it because it got reverted in 6.1 so was wondering why it didn't get reverted in other trees, but if it hadn't made it to any tree I wouldn't have cared at all...
As long as there doesn't seem to be a problem with older trees (and at least I'm not getting any weird hang or crash on shutdown on my 5.10) let's leave things as they are.
The only bothersome thing seems to be the inconsistency between the LTS kernels. I haven't heard a report of a problem.
(Bonus question: if that is really all there is, would that make sense / should we take the commits back in 6.1 with that extra fix?)
Maybe? The problem is that someone has to do the testing for this. These interfaces aren't currently part of any testsuite, so a lot of that tends to be a manual effort.
Agreed, let's not add more work there if no-one can quickly test this.
If it turns out some later commit that actually needs this gets backported to 6.1 we can always bring the pair of commits back when someone notices.
Sounds like a plan.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit b38a6023da6a12b561f0421c6a5a1f7624a1529c ]
The commits that introduced these flags neglected to update the Documentation/filesystems/nfs/exporting.rst file.
Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/nfs/exporting.rst | 26 +++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 0e98edd353b5f..6f59a364f84cd 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -215,3 +215,29 @@ following flags are defined: This flag causes nfsd to close any open files for this inode _before_ calling into the vfs to do an unlink or a rename that would replace an existing file. + + EXPORT_OP_REMOTE_FS - Backing storage for this filesystem is remote + PF_LOCAL_THROTTLE exists for loopback NFSD, where a thread needs to + write to one bdi (the final bdi) in order to free up writes queued + to another bdi (the client bdi). Such threads get a private balance + of dirty pages so that dirty pages for the client bdi do not imact + the daemon writing to the final bdi. For filesystems whose durable + storage is not local (such as exported NFS filesystems), this + constraint has negative consequences. EXPORT_OP_REMOTE_FS enables + an export to disable writeback throttling. + + EXPORT_OP_NOATOMIC_ATTR - Filesystem does not update attributes atomically + EXPORT_OP_NOATOMIC_ATTR indicates that the exported filesystem + cannot provide the semantics required by the "atomic" boolean in + NFSv4's change_info4. This boolean indicates to a client whether the + returned before and after change attributes were obtained atomically + with the respect to the requested metadata operation (UNLINK, + OPEN/CREATE, MKDIR, etc). + + EXPORT_OP_FLUSH_ON_CLOSE - Filesystem flushes file data on close(2) + On most filesystems, inodes can remain under writeback after the + file is closed. NFSD relies on client activity or local flusher + threads to handle writeback. Certain filesystems, such as NFS, flush + all of an inode's dirty data on last close. Exports that behave this + way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip + waiting for writeback when closing such files.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 88956eabfdea7d01d550535af120d4ef265b1d02 ]
If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then when the file is closed a NULL pointer is dereferenced. This is because nfsd_pool_stats_release() assumes that the pointer to the svc_serv cannot become NULL while a reference is held.
This used to be the case but a recent patch split nfsd_last_thread() out from nfsd_put(), and clearing the pointer is done in nfsd_last_thread().
This is easily reproduced by running rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats
Fortunately nfsd_pool_stats_release() has easy access to the svc_serv pointer, and so can call svc_put() on it directly.
Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index ee5713fca1870..2a1dd580dfb94 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1084,11 +1084,12 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file)
int nfsd_pool_stats_release(struct inode *inode, struct file *file) { + struct seq_file *seq = file->private_data; + struct svc_serv *serv = seq->private; int ret = seq_release(inode, file); - struct net *net = inode->i_sb->s_fs_info;
mutex_lock(&nfsd_mutex); - nfsd_put(net); + svc_put(serv); mutex_unlock(&nfsd_mutex); return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 2a501f55cd641eb4d3c16a2eab0d678693fac663 ]
If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put() without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing to a structure that has been freed.
So remove 'static' from nfsd_last_thread() and call it when the nfsd_serv is about to be destroyed.
Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 9 +++++++-- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfssvc.c | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 76a60e7a75097..eec442edb6556 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -720,8 +720,10 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + else if (err >= 0 && + !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) svc_get(nn->nfsd_serv);
nfsd_put(net); @@ -771,6 +773,9 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr svc_xprt_put(xprt); } out_err: + if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + nfsd_put(net); return err; } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 867dcfd64d426..3796015dc7656 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -138,6 +138,7 @@ int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change); int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change); void nfsd_reset_versions(struct nfsd_net *nn); int nfsd_create_serv(struct net *net); +void nfsd_last_thread(struct net *net);
extern int nfsd_max_blksize;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2a1dd580dfb94..3d4fd40c987bd 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -529,7 +529,7 @@ static struct notifier_block nfsd_inet6addr_notifier = { /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0);
-static void nfsd_last_thread(struct net *net) +void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv = nn->nfsd_serv;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 64e6304169f1e1f078e7f0798033f80a7fb0ea46 ]
It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer.
Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error.
Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li yieli@redhat.com Cc: NeilBrown neilb@suse.de Signed-off-by: Jeffrey Layton jlayton@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 31 +++++++++++++++++-------------- fs/nfsd/nfsd.h | 7 ------- 2 files changed, 17 insertions(+), 21 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index eec442edb6556..f77f00c931723 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -709,6 +709,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred char *mesg = buf; int fd, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
err = get_int(&mesg, &fd); if (err != 0 || fd < 0) @@ -718,15 +719,15 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred if (err != 0) return err;
- err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + serv = nn->nfsd_serv; + err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (err < 0 && !serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net); - else if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + else if (err >= 0 && !serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv);
- nfsd_put(net); + svc_put(serv); return err; }
@@ -740,6 +741,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr struct svc_xprt *xprt; int port, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv;
if (sscanf(buf, "%15s %5u", transport, &port) != 2) return -EINVAL; @@ -751,32 +753,33 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err != 0) return err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net, + serv = nn->nfsd_serv; + err = svc_xprt_create(serv, transport, net, PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err;
- err = svc_xprt_create(nn->nfsd_serv, transport, net, + err = svc_xprt_create(serv, transport, net, PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close;
- if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + if (!serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv);
- nfsd_put(net); + svc_put(serv); return 0; out_close: - xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); + xprt = svc_find_xprt(serv, transport, net, PF_INET, port); if (xprt != NULL) { svc_xprt_close(xprt); svc_xprt_put(xprt); } out_err: - if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (!serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net);
- nfsd_put(net); + svc_put(serv); return err; }
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3796015dc7656..013bfa24ced21 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -96,13 +96,6 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net);
-static inline void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - svc_put(nn->nfsd_serv); -} - bool i_am_nfsd(void);
struct nfsdfs_client {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit edcf9725150e42beeca42d085149f4c88fa97afd ]
The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep.
First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour.
If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error.
The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request.
When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID.
So clearly it is harmful to allow a false positive, which testing so_count allows.
The test is nonsense because ... well... it doesn't mean anything.
so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks.
When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be.
In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail.
In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success.
The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe.
So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'.
Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e8e642e5ec8f1..c073cc23c5285 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7836,14 +7836,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) { struct file_lock *fl; int status = false; - struct nfsd_file *nf = find_any_file(fp); + struct nfsd_file *nf; struct inode *inode; struct file_lock_context *flctx;
+ spin_lock(&fp->fi_lock); + nf = find_any_file_locked(fp); if (!nf) { /* Any valid lock stateid should have some sort of access */ WARN_ON_ONCE(1); - return status; + goto out; }
inode = locks_inode(nf->nf_file); @@ -7859,7 +7861,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) } spin_unlock(&flctx->flc_lock); } - nfsd_file_put(nf); +out: + spin_unlock(&fp->fi_lock); return status; }
@@ -7869,10 +7872,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) * @cstate: NFSv4 COMPOUND state * @u: RELEASE_LOCKOWNER arguments * - * The lockowner's so_count is bumped when a lock record is added - * or when copying a conflicting lock. The latter case is brief, - * but can lead to fleeting false positives when looking for - * locks-in-use. + * Check if theree are any locks still held and if not - free the lockowner + * and any lock state that is owned. * * Return values: * %nfs_ok: lockowner released or not found @@ -7908,10 +7909,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, spin_unlock(&clp->cl_lock); return nfs_ok; } - if (atomic_read(&lo->lo_owner.so_count) != 2) { - spin_unlock(&clp->cl_lock); - nfs4_put_stateowner(&lo->lo_owner); - return nfserr_locks_held; + + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { + if (check_for_locks(stp->st_stid.sc_file, lo)) { + spin_unlock(&clp->cl_lock); + nfs4_put_stateowner(&lo->lo_owner); + return nfserr_locks_held; + } } unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 ]
A recent change to check_for_locks() changed it to take ->flc_lock while holding ->fi_lock. This creates a lock inversion (reported by lockdep) because there is a case where ->fi_lock is taken while holding ->flc_lock.
->flc_lock is held across ->fl_lmops callbacks, and nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However it doesn't need to.
Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list and so needed the lock. Since then it doesn't walk the list and doesn't need the lock.
Two actions are performed under the lock. One is to call nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on the nfs4_file at all, so don't need the lock.
The other is to set ->fi_had_conflict which is in the nfs4_file. This field is only ever set here (except when initialised to false) so there is no possible problem will multiple threads racing when setting it.
The field is tested twice in nfs4_set_delegation(). The first test does not hold a lock and is documented as an opportunistic optimisation, so it doesn't impose any need to hold ->fi_lock while setting ->fi_had_conflict.
The second test in nfs4_set_delegation() *is* make under ->fi_lock, so removing the locking when ->fi_had_conflict is set could make a change. The change could only be interesting if ->fi_had_conflict tested as false even though nfsd_break_one_deleg() ran before ->fi_lock was unlocked. i.e. while hash_delegation_locked() was running. As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb() there can be no importance to this interaction.
So this patch removes the locking from nfsd_break_one_deleg() and moves the final test on ->fi_had_conflict out of the locked region to make it clear that locking isn't important to the test. It is still tested *after* vfs_setlease() has succeeded. This might be significant and as vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called under ->flc_lock this "after" is a true ordering provided by a spinlock.
Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c073cc23c5285..165acd8138abe 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4946,10 +4946,8 @@ nfsd_break_deleg_cb(struct file_lock *fl) */ fl->fl_break_time = 0;
- spin_lock(&fp->fi_lock); fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); - spin_unlock(&fp->fi_lock); return false; }
@@ -5537,12 +5535,13 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (status) goto out_unlock;
+ status = -EAGAIN; + if (fp->fi_had_conflict) + goto out_unlock; + spin_lock(&state_lock); spin_lock(&fp->fi_lock); - if (fp->fi_had_conflict) - status = -EAGAIN; - else - status = hash_delegation_locked(dp, fp); + status = hash_delegation_locked(dp, fp); spin_unlock(&fp->fi_lock); spin_unlock(&state_lock);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 05eda6e75773592760285e10ac86c56d683be17f ]
It is possible for free_blocked_lock() to be called twice concurrently, once from nfsd4_lock() and once from nfsd4_release_lockowner() calling remove_blocked_locks(). This is why a kref was added.
It is perfectly safe for locks_delete_block() and kref_put() to be called in parallel as they use locking or atomicity respectively as protection. However locks_release_private() has no locking. It is safe for it to be called twice sequentially, but not concurrently.
This patch moves that call from free_blocked_lock() where it could race with itself, to free_nbl() where it cannot. This will slightly delay the freeing of private info or release of the owner - but not by much. It is arguably more natural for this freeing to happen in free_nbl() where the structure itself is freed.
This bug was found by code inspection - it has not been seen in practice.
Fixes: 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock") Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 165acd8138abe..228560f3fd0e0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -318,6 +318,7 @@ free_nbl(struct kref *kref) struct nfsd4_blocked_lock *nbl;
nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); + locks_release_private(&nbl->nbl_lock); kfree(nbl); }
@@ -325,7 +326,6 @@ static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); - locks_release_private(&nbl->nbl_lock); kref_put(&nbl->nbl_kref, free_nbl); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ]
Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs().
Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: NeilBrown neilb@suse.de Message-ID: 20240216012451.22725-1-trondmy@kernel.org [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4proc.c | 4 ++++ fs/nfsd/vfs.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 8560a11daa47d..2c0de247083a9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1107,6 +1107,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, }; struct inode *inode; __be32 status = nfs_ok; + bool save_no_wcc; int err;
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { @@ -1132,8 +1133,11 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status) goto out; + save_no_wcc = cstate->current_fh.fh_no_wcc; + cstate->current_fh.fh_no_wcc = true; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); + cstate->current_fh.fh_no_wcc = save_no_wcc; if (!status) status = nfserrno(attrs.na_labelerr); if (!status) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 542a1adbbf2a1..0ea05ddff0d08 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -486,7 +486,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err; + int host_err = 0; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); int retries; @@ -544,6 +544,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, }
inode_lock(inode); + fh_fill_pre_attrs(fhp); for (retries = 1;;) { struct iattr attrs;
@@ -569,13 +570,14 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, attr->na_dpacl); + fh_fill_post_attrs(fhp); inode_unlock(inode); if (size_change) put_write_access(inode); out: if (!host_err) host_err = commit_metadata(fhp); - return nfserrno(host_err); + return err != 0 ? err : nfserrno(host_err); }
#if defined(CONFIG_NFSD_V4)
On Tue, 18 Jun 2024 14:27:33 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.220-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.10: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 68 tests: 68 pass, 0 fail
Linux version: 5.10.220-rc1-g7927147b02fc Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi!
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Mentioning motivation behind 800 nfs patches would be welcome here.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
Greg Kroah-Hartman wrote on Tue, Jun 18, 2024 at 02:27:33PM +0200:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.220-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
(that's a lot of NFS/FS patches... I normally don't test NFS, but given the content of this cycle I tested very basic client/server code by exporting something mounting it and reading/writing to a file, so at least it's not exploding immediately)
Tested 7927147b02fc ("Linux 5.10.220-rc1") on: - arm i.MX6ULL (Armadillo 640) - arm64 i.MX8MP (Armadillo G4)
No obvious regression in dmesg or basic tests: Tested-by: Dominique Martinet dominique.martinet@atmark-techno.com
On 6/18/2024 1:27 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.220-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Tue, 18 Jun 2024 at 18:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.220-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.10.220-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.10.y * git commit: 7927147b02fc59fa7d326be121ef2a599edda19d * git describe: v5.10.219-771-g7927147b02fc * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
## Test Regressions (compared to v5.10.219)
## Metric Regressions (compared to v5.10.219)
## Test Fixes (compared to v5.10.219)
## Metric Fixes (compared to v5.10.219)
## Test result summary total: 128385, pass: 101019, fail: 5547, skip: 21723, xfail: 96
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 104 total, 104 passed, 0 failed * arm64: 31 total, 31 passed, 0 failed * i386: 25 total, 25 passed, 0 failed * mips: 22 total, 22 passed, 0 failed * parisc: 3 total, 0 passed, 3 failed * powerpc: 23 total, 23 passed, 0 failed * riscv: 9 total, 9 passed, 0 failed * s390: 9 total, 9 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 6 total, 6 passed, 0 failed * x86_64: 27 total, 27 passed, 0 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mm * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
Thanks, Guenter
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
thanks,
greg k-h
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches.
If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
-- Chuck Lever
On 6/25/24 08:13, Chuck Lever III wrote:
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches.
If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Guenter
On Jun 25, 2024, at 12:29 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 08:13, Chuck Lever III wrote:
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches. If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
My naive search found:
Checking commit 44df6f439a17 ... upstream fix 10396f4df8b75ff6ab0aa2cd74296565466f2c8d not found 10396f4df8b75ff6ab0aa2cd74296565466f2c8d nfsd: hold a lighter-weight client reference over CB_RECALL_ANY upstream fix f385f7d244134246f984975ed34cd75f77de479f is already applied Checking commit a2071573d634 ... upstream fix f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 not found f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 sysctl: fix proc_dobool() usability Checking commit bddfdbcddbe2 ... upstream fix 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 not found 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 SUNRPC: Fix svcxdr_init_encode's buflen calculation Checking commit 9fe61450972d ... upstream fix 2111c3c0124f7432fe908c036a50abe8733dbf38 not found 2111c3c0124f7432fe908c036a50abe8733dbf38 namei: fix kernel-doc for struct renamedata and more Checking commit 013c1667cf78 ... upstream fix 2c0f0f3639562d6e38ee9705303c6457c4936eac not found 2c0f0f3639562d6e38ee9705303c6457c4936eac module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 upstream fix 1e80d9cb579ed7edd121753eeccce82ff82521b4 not found 1e80d9cb579ed7edd121753eeccce82ff82521b4 module: potential uninitialized return in module_kallsyms_on_each_symbol() Checking commit 89ff87494c6e ... upstream fix 5c11720767f70d34357d00a15ba5a0ad052c40fe not found 5c11720767f70d34357d00a15ba5a0ad052c40fe SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Checking commit 5191955d6fc6 ... upstream fix 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 not found 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation upstream fix b9f83ffaa0c096b4c832a43964fe6bff3acffe10 not found b9f83ffaa0c096b4c832a43964fe6bff3acffe10 SUNRPC: Fix null pointer dereference in svc_rqst_free()
I'll look into backporting the missing NFSD and SUNRPC patches.
-- Chuck Lever
On 6/25/24 12:08, Chuck Lever III wrote:
On Jun 25, 2024, at 12:29 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 08:13, Chuck Lever III wrote:
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.220 release. There are 770 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. Anything received after that time might be too late.
[ ... ]
Chuck Lever chuck.lever@oracle.com SUNRPC: Prepare for xdr_stream-style decoding on the server-side
The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches. If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
My naive search found:
Checking commit 44df6f439a17 ... upstream fix 10396f4df8b75ff6ab0aa2cd74296565466f2c8d not found 10396f4df8b75ff6ab0aa2cd74296565466f2c8d nfsd: hold a lighter-weight client reference over CB_RECALL_ANY upstream fix f385f7d244134246f984975ed34cd75f77de479f is already applied Checking commit a2071573d634 ... upstream fix f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 not found f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 sysctl: fix proc_dobool() usability Checking commit bddfdbcddbe2 ... upstream fix 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 not found 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 SUNRPC: Fix svcxdr_init_encode's buflen calculation Checking commit 9fe61450972d ... upstream fix 2111c3c0124f7432fe908c036a50abe8733dbf38 not found 2111c3c0124f7432fe908c036a50abe8733dbf38 namei: fix kernel-doc for struct renamedata and more Checking commit 013c1667cf78 ... upstream fix 2c0f0f3639562d6e38ee9705303c6457c4936eac not found 2c0f0f3639562d6e38ee9705303c6457c4936eac module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 upstream fix 1e80d9cb579ed7edd121753eeccce82ff82521b4 not found 1e80d9cb579ed7edd121753eeccce82ff82521b4 module: potential uninitialized return in module_kallsyms_on_each_symbol() Checking commit 89ff87494c6e ... upstream fix 5c11720767f70d34357d00a15ba5a0ad052c40fe not found 5c11720767f70d34357d00a15ba5a0ad052c40fe SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Checking commit 5191955d6fc6 ... upstream fix 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 not found 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation upstream fix b9f83ffaa0c096b4c832a43964fe6bff3acffe10 not found b9f83ffaa0c096b4c832a43964fe6bff3acffe10 SUNRPC: Fix null pointer dereference in svc_rqst_free()
I'll look into backporting the missing NFSD and SUNRPC patches.
My list didn't include patches with conflicts. There are a lot of them. Our robot collects those, but doesn't focus on it. It also doesn't analyze just nfds/SUNRPC patches, but all of them. I started an analysis to list all the fixes with conflicts; so far I found about 100 of them. Three are tagged SUNRPC.
Upstream commit 8e088a20dbe3 ("SUNRPC: add a missing rpc_stat for TCP TLS") upstream: v6.9-rc7 Fixes: 1548036ef120 ("nfs: make the rpc_stat per net namespace") in linux-5.4.y: 19f51adc778f in linux-5.10.y: afdbc21a92a0 in linux-5.15.y: 7ceb89f4016e in linux-6.1.y: 2b7f2d663a96 in linux-6.6.y: 260333221cf0 upstream: v6.9-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed) linux-6.1.y (conflicts - backport needed) linux-6.6.y (already applied)
Upstream commit aed28b7a2d62 ("SUNRPC: Don't dereference xprt->snd_task if it's a cookie") upstream: v5.17-rc2 Fixes: e26d9972720e ("SUNRPC: Clean up scheduling of autoclose") in linux-5.4.y: 2d6f096476e6 in linux-5.10.y: 2ab569edd883 upstream: v5.15-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (already applied)
Upstream commit aad41a7d7cf6 ("SUNRPC: Don't leak sockets in xs_local_connect()") upstream: v5.18-rc6 Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()") in linux-5.4.y: 2f8f6c393b11 in linux-5.10.y: e68b60ae29de in linux-5.15.y: 54f6834b283d upstream: v5.18-rc2 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed)
I'll send a complete list after the analysis is done.
Guenter
On Jun 25, 2024, at 3:35 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 12:08, Chuck Lever III wrote:
On Jun 25, 2024, at 12:29 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 08:13, Chuck Lever III wrote:
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote:
On 6/18/24 05:27, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 5.10.220 release. > There are 770 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. > Anything received after that time might be too late. >
[ ... ] > Chuck Lever chuck.lever@oracle.com > SUNRPC: Prepare for xdr_stream-style decoding on the server-side > The ChromeOS patches robot reports a number of fixes for the patches applied in 5.5.220. This is one example, later fixed with commit 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation"), but there are more. Are those fixes going to be applied in a subsequent release of v5.10.y, was there a reason to not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches. If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
My naive search found: Checking commit 44df6f439a17 ... upstream fix 10396f4df8b75ff6ab0aa2cd74296565466f2c8d not found 10396f4df8b75ff6ab0aa2cd74296565466f2c8d nfsd: hold a lighter-weight client reference over CB_RECALL_ANY upstream fix f385f7d244134246f984975ed34cd75f77de479f is already applied Checking commit a2071573d634 ... upstream fix f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 not found f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 sysctl: fix proc_dobool() usability Checking commit bddfdbcddbe2 ... upstream fix 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 not found 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 SUNRPC: Fix svcxdr_init_encode's buflen calculation Checking commit 9fe61450972d ... upstream fix 2111c3c0124f7432fe908c036a50abe8733dbf38 not found 2111c3c0124f7432fe908c036a50abe8733dbf38 namei: fix kernel-doc for struct renamedata and more Checking commit 013c1667cf78 ... upstream fix 2c0f0f3639562d6e38ee9705303c6457c4936eac not found 2c0f0f3639562d6e38ee9705303c6457c4936eac module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 upstream fix 1e80d9cb579ed7edd121753eeccce82ff82521b4 not found 1e80d9cb579ed7edd121753eeccce82ff82521b4 module: potential uninitialized return in module_kallsyms_on_each_symbol() Checking commit 89ff87494c6e ... upstream fix 5c11720767f70d34357d00a15ba5a0ad052c40fe not found 5c11720767f70d34357d00a15ba5a0ad052c40fe SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Checking commit 5191955d6fc6 ... upstream fix 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 not found 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation upstream fix b9f83ffaa0c096b4c832a43964fe6bff3acffe10 not found b9f83ffaa0c096b4c832a43964fe6bff3acffe10 SUNRPC: Fix null pointer dereference in svc_rqst_free() I'll look into backporting the missing NFSD and SUNRPC patches.
My list didn't include patches with conflicts. There are a lot of them. Our robot collects those, but doesn't focus on it. It also doesn't analyze just nfds/SUNRPC patches, but all of them. I started an analysis to list all the fixes with conflicts; so far I found about 100 of them. Three are tagged SUNRPC.
Upstream commit 8e088a20dbe3 ("SUNRPC: add a missing rpc_stat for TCP TLS") upstream: v6.9-rc7 Fixes: 1548036ef120 ("nfs: make the rpc_stat per net namespace") in linux-5.4.y: 19f51adc778f in linux-5.10.y: afdbc21a92a0 in linux-5.15.y: 7ceb89f4016e in linux-6.1.y: 2b7f2d663a96 in linux-6.6.y: 260333221cf0 upstream: v6.9-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed) linux-6.1.y (conflicts - backport needed) linux-6.6.y (already applied)
Upstream commit aed28b7a2d62 ("SUNRPC: Don't dereference xprt->snd_task if it's a cookie") upstream: v5.17-rc2 Fixes: e26d9972720e ("SUNRPC: Clean up scheduling of autoclose") in linux-5.4.y: 2d6f096476e6 in linux-5.10.y: 2ab569edd883 upstream: v5.15-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (already applied)
Upstream commit aad41a7d7cf6 ("SUNRPC: Don't leak sockets in xs_local_connect()") upstream: v5.18-rc6 Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()") in linux-5.4.y: 2f8f6c393b11 in linux-5.10.y: e68b60ae29de in linux-5.15.y: 54f6834b283d upstream: v5.18-rc2 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed)
I'll send a complete list after the analysis is done.
The "NFSD file cache fixes" backports focused on NFSD, not on SUNRPC, and only the NFS server side of affairs. The missing fixes you found are outside of one or both of those areas, so they can go through the usual stable backport process if the NFS client folks care to do that.
-- Chuck Lever
On Jun 25, 2024, at 3:40 PM, Chuck Lever III chuck.lever@oracle.com wrote:
On Jun 25, 2024, at 3:35 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 12:08, Chuck Lever III wrote:
On Jun 25, 2024, at 12:29 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 08:13, Chuck Lever III wrote:
Hi -
On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote: > On 6/18/24 05:27, Greg Kroah-Hartman wrote: >> This is the start of the stable review cycle for the 5.10.220 release. >> There are 770 patches in this series, all will be posted as a response >> to this one. If anyone has any issues with these being applied, please >> let me know. >> >> Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. >> Anything received after that time might be too late. >> > > [ ... ] >> Chuck Lever chuck.lever@oracle.com >> SUNRPC: Prepare for xdr_stream-style decoding on the server-side >> > The ChromeOS patches robot reports a number of fixes for the patches > applied in 5.5.220. This is one example, later fixed with commit > 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer > calculation"), but there are more. Are those fixes going to be > applied in a subsequent release of v5.10.y, was there a reason to > not include them, or did they get lost ?
I saw this as well, but when I tried to apply a few, they didn't, so I was guessing that Chuck had merged them together into the series.
I'll defer to Chuck on this, this release was all his :)
I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches. If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
My naive search found: Checking commit 44df6f439a17 ... upstream fix 10396f4df8b75ff6ab0aa2cd74296565466f2c8d not found 10396f4df8b75ff6ab0aa2cd74296565466f2c8d nfsd: hold a lighter-weight client reference over CB_RECALL_ANY upstream fix f385f7d244134246f984975ed34cd75f77de479f is already applied Checking commit a2071573d634 ... upstream fix f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 not found f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 sysctl: fix proc_dobool() usability Checking commit bddfdbcddbe2 ... upstream fix 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 not found 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 SUNRPC: Fix svcxdr_init_encode's buflen calculation Checking commit 9fe61450972d ... upstream fix 2111c3c0124f7432fe908c036a50abe8733dbf38 not found 2111c3c0124f7432fe908c036a50abe8733dbf38 namei: fix kernel-doc for struct renamedata and more Checking commit 013c1667cf78 ... upstream fix 2c0f0f3639562d6e38ee9705303c6457c4936eac not found 2c0f0f3639562d6e38ee9705303c6457c4936eac module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 upstream fix 1e80d9cb579ed7edd121753eeccce82ff82521b4 not found 1e80d9cb579ed7edd121753eeccce82ff82521b4 module: potential uninitialized return in module_kallsyms_on_each_symbol() Checking commit 89ff87494c6e ... upstream fix 5c11720767f70d34357d00a15ba5a0ad052c40fe not found 5c11720767f70d34357d00a15ba5a0ad052c40fe SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Checking commit 5191955d6fc6 ... upstream fix 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 not found 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation upstream fix b9f83ffaa0c096b4c832a43964fe6bff3acffe10 not found b9f83ffaa0c096b4c832a43964fe6bff3acffe10 SUNRPC: Fix null pointer dereference in svc_rqst_free() I'll look into backporting the missing NFSD and SUNRPC patches.
My list didn't include patches with conflicts. There are a lot of them. Our robot collects those, but doesn't focus on it. It also doesn't analyze just nfds/SUNRPC patches, but all of them. I started an analysis to list all the fixes with conflicts; so far I found about 100 of them. Three are tagged SUNRPC.
Upstream commit 8e088a20dbe3 ("SUNRPC: add a missing rpc_stat for TCP TLS") upstream: v6.9-rc7 Fixes: 1548036ef120 ("nfs: make the rpc_stat per net namespace") in linux-5.4.y: 19f51adc778f in linux-5.10.y: afdbc21a92a0 in linux-5.15.y: 7ceb89f4016e in linux-6.1.y: 2b7f2d663a96 in linux-6.6.y: 260333221cf0 upstream: v6.9-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed) linux-6.1.y (conflicts - backport needed) linux-6.6.y (already applied)
Upstream commit aed28b7a2d62 ("SUNRPC: Don't dereference xprt->snd_task if it's a cookie") upstream: v5.17-rc2 Fixes: e26d9972720e ("SUNRPC: Clean up scheduling of autoclose") in linux-5.4.y: 2d6f096476e6 in linux-5.10.y: 2ab569edd883 upstream: v5.15-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (already applied)
Upstream commit aad41a7d7cf6 ("SUNRPC: Don't leak sockets in xs_local_connect()") upstream: v5.18-rc6 Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()") in linux-5.4.y: 2f8f6c393b11 in linux-5.10.y: e68b60ae29de in linux-5.15.y: 54f6834b283d upstream: v5.18-rc2 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed)
I'll send a complete list after the analysis is done.
The "NFSD file cache fixes" backports focused on NFSD, not on SUNRPC, and only the NFS server side of affairs. The missing fixes you found are outside of one or both of those areas, so they can go through the usual stable backport process if the NFS client folks care to do that.
Or, to cut this another way: I looked at only the patches that I submitted for v5.10.220; I will take responsibility for ensuring those all have the latest upstream fixes applied, where that is feasible.
Anything else (other subsystems, other LTS kernels) gets the normal stable backport treatment: those subsystem maintainers or their designees have to step up to handle the code work and testing if they view the fix as a priority.
-- Chuck Lever
On 6/25/24 12:49, Chuck Lever III wrote:
On Jun 25, 2024, at 3:40 PM, Chuck Lever III chuck.lever@oracle.com wrote:
On Jun 25, 2024, at 3:35 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 12:08, Chuck Lever III wrote:
On Jun 25, 2024, at 12:29 PM, Guenter Roeck linux@roeck-us.net wrote:
On 6/25/24 08:13, Chuck Lever III wrote:
Hi - > On Jun 25, 2024, at 11:04 AM, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: > > On Tue, Jun 25, 2024 at 07:48:00AM -0700, Guenter Roeck wrote: >> On 6/18/24 05:27, Greg Kroah-Hartman wrote: >>> This is the start of the stable review cycle for the 5.10.220 release. >>> There are 770 patches in this series, all will be posted as a response >>> to this one. If anyone has any issues with these being applied, please >>> let me know. >>> >>> Responses should be made by Thu, 20 Jun 2024 12:32:00 +0000. >>> Anything received after that time might be too late. >>> >> >> [ ... ] >>> Chuck Lever chuck.lever@oracle.com >>> SUNRPC: Prepare for xdr_stream-style decoding on the server-side >>> >> The ChromeOS patches robot reports a number of fixes for the patches >> applied in 5.5.220. This is one example, later fixed with commit >> 90bfc37b5ab9 ("SUNRPC: Fix svcxdr_init_decode's end-of-buffer >> calculation"), but there are more. Are those fixes going to be >> applied in a subsequent release of v5.10.y, was there a reason to >> not include them, or did they get lost ? > > I saw this as well, but when I tried to apply a few, they didn't, so I > was guessing that Chuck had merged them together into the series. > > I'll defer to Chuck on this, this release was all his :) I did this port months ago, I've been waiting for the dust to settle on the 6.1 and 5.15 NFSD backports, so I've all but forgotten the status of individual patches. If you (Greg or Guenter) send me a list of what you believe is missing, I can have a look at the individual cases and then run the finished result through our NFSD CI gauntlet.
This is what the robot reported so far:
1242a87da0d8 SUNRPC: Fix svcxdr_init_encode's buflen calculation Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") 90bfc37b5ab9 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") 10396f4df8b7 nfsd: hold a lighter-weight client reference over CB_RECALL_ANY Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
My naive search found: Checking commit 44df6f439a17 ... upstream fix 10396f4df8b75ff6ab0aa2cd74296565466f2c8d not found 10396f4df8b75ff6ab0aa2cd74296565466f2c8d nfsd: hold a lighter-weight client reference over CB_RECALL_ANY upstream fix f385f7d244134246f984975ed34cd75f77de479f is already applied Checking commit a2071573d634 ... upstream fix f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 not found f1aa2eb5ea05ccd1fd92d235346e60e90a1ed949 sysctl: fix proc_dobool() usability Checking commit bddfdbcddbe2 ... upstream fix 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 not found 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 SUNRPC: Fix svcxdr_init_encode's buflen calculation Checking commit 9fe61450972d ... upstream fix 2111c3c0124f7432fe908c036a50abe8733dbf38 not found 2111c3c0124f7432fe908c036a50abe8733dbf38 namei: fix kernel-doc for struct renamedata and more Checking commit 013c1667cf78 ... upstream fix 2c0f0f3639562d6e38ee9705303c6457c4936eac not found 2c0f0f3639562d6e38ee9705303c6457c4936eac module: correctly exit module_kallsyms_on_each_symbol when fn() != 0 upstream fix 1e80d9cb579ed7edd121753eeccce82ff82521b4 not found 1e80d9cb579ed7edd121753eeccce82ff82521b4 module: potential uninitialized return in module_kallsyms_on_each_symbol() Checking commit 89ff87494c6e ... upstream fix 5c11720767f70d34357d00a15ba5a0ad052c40fe not found 5c11720767f70d34357d00a15ba5a0ad052c40fe SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Checking commit 5191955d6fc6 ... upstream fix 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 not found 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation upstream fix b9f83ffaa0c096b4c832a43964fe6bff3acffe10 not found b9f83ffaa0c096b4c832a43964fe6bff3acffe10 SUNRPC: Fix null pointer dereference in svc_rqst_free() I'll look into backporting the missing NFSD and SUNRPC patches.
My list didn't include patches with conflicts. There are a lot of them. Our robot collects those, but doesn't focus on it. It also doesn't analyze just nfds/SUNRPC patches, but all of them. I started an analysis to list all the fixes with conflicts; so far I found about 100 of them. Three are tagged SUNRPC.
Upstream commit 8e088a20dbe3 ("SUNRPC: add a missing rpc_stat for TCP TLS") upstream: v6.9-rc7 Fixes: 1548036ef120 ("nfs: make the rpc_stat per net namespace") in linux-5.4.y: 19f51adc778f in linux-5.10.y: afdbc21a92a0 in linux-5.15.y: 7ceb89f4016e in linux-6.1.y: 2b7f2d663a96 in linux-6.6.y: 260333221cf0 upstream: v6.9-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed) linux-6.1.y (conflicts - backport needed) linux-6.6.y (already applied)
Upstream commit aed28b7a2d62 ("SUNRPC: Don't dereference xprt->snd_task if it's a cookie") upstream: v5.17-rc2 Fixes: e26d9972720e ("SUNRPC: Clean up scheduling of autoclose") in linux-5.4.y: 2d6f096476e6 in linux-5.10.y: 2ab569edd883 upstream: v5.15-rc1 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (already applied)
Upstream commit aad41a7d7cf6 ("SUNRPC: Don't leak sockets in xs_local_connect()") upstream: v5.18-rc6 Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()") in linux-5.4.y: 2f8f6c393b11 in linux-5.10.y: e68b60ae29de in linux-5.15.y: 54f6834b283d upstream: v5.18-rc2 Affected branches: linux-5.4.y (conflicts - backport needed) linux-5.10.y (conflicts - backport needed) linux-5.15.y (conflicts - backport needed)
I'll send a complete list after the analysis is done.
The "NFSD file cache fixes" backports focused on NFSD, not on SUNRPC, and only the NFS server side of affairs. The missing fixes you found are outside of one or both of those areas, so they can go through the usual stable backport process if the NFS client folks care to do that.
Or, to cut this another way: I looked at only the patches that I submitted for v5.10.220; I will take responsibility for ensuring those all have the latest upstream fixes applied, where that is feasible.
Anything else (other subsystems, other LTS kernels) gets the normal stable backport treatment: those subsystem maintainers or their designees have to step up to handle the code work and testing if they view the fix as a priority.
Sounds good to me.
Thanks, Guenter
linux-stable-mirror@lists.linaro.org