linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Kuniyuki Iwashima	4117071642	nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit(). commit `1cb968a201` upstream. syzbot reported memory leak of struct cred. [0] nfsd_nl_threads_set_doit() passes get_current_cred() to nfsd_svc(), but put_cred() is not called after that. The cred is finally passed down to _svc_xprt_create(), which calls get_cred() with the cred for struct svc_xprt. The ownership of the refcount by get_current_cred() is not transferred to anywhere and is just leaked. nfsd_svc() is also called from write_threads(), but it does not bump file->f_cred there. nfsd_nl_threads_set_doit() is called from sendmsg() and current->cred does not go away. Let's use current_cred() in nfsd_nl_threads_set_doit(). [0]: BUG: memory leak unreferenced object 0xffff888108b89480 (size 184): comm "syz-executor", pid 5994, jiffies 4294943386 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 369454a7): kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline] slab_post_alloc_hook mm/slub.c:4958 [inline] slab_alloc_node mm/slub.c:5263 [inline] kmem_cache_alloc_noprof+0x412/0x580 mm/slub.c:5270 prepare_creds+0x22/0x600 kernel/cred.c:185 copy_creds+0x44/0x290 kernel/cred.c:286 copy_process+0x7a7/0x2870 kernel/fork.c:2086 kernel_clone+0xac/0x6e0 kernel/fork.c:2651 __do_sys_clone+0x7f/0xb0 kernel/fork.c:2792 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xa4/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: `924f4fb003` ("NFSD: convert write_threads to netlink command") Cc: stable@vger.kernel.org Reported-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69744674.a00a0220.33ccc7.0000.GAE@google.com/ Tested-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-03-13 17:20:35 +01:00
Anthony Iliopoulos	982815bb04	nfsd: fix return error code for nfsd_map_name_to_[ug]id [ Upstream commit `404d779466` ] idmap lookups can time out while the cache is waiting for a userspace upcall reply. In that case cache_check() returns -ETIMEDOUT to callers. The nfsd_map_name_to_[ug]id functions currently proceed with attempting to map the id to a kuid despite a potentially temporary failure to perform the idmap lookup. This results in the code returning the error NFSERR_BADOWNER which can cause client operations to return to userspace with failure. Fix this by returning the failure status before attempting kuid mapping. This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients can retry the operation instead of aborting it. Fixes: `65e10f6d0a` ("nfsd: Convert idmap to use kuids and kgids") Cc: stable@vger.kernel.org Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:21:49 -05:00
Anthony Iliopoulos	063a6f2247	nfsd: never defer requests during idmap lookup [ Upstream commit `f9c206cdc4` ] During v4 request compound arg decoding, some ops (e.g. SETATTR) can trigger idmap lookup upcalls. When those upcall responses get delayed beyond the allowed time limit, cache_check() will mark the request for deferral and cause it to be dropped. This prevents nfs4svc_encode_compoundres from being executed, and thus the session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client requests will fail with NFSERR_JUKEBOX, given that the slot will be marked as in-use, making the SEQUENCE op fail. Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred. Fixes: `2f425878b6` ("nfsd: don't use the deferral service, return NFS4ERR_DELAY") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:05 -05:00
Chuck Lever	25ad5d6913	NFS: NFSERR_INVAL is not defined by NFSv2 [ Upstream commit `0ac903d1bf` ] A documenting comment in include/uapi/linux/nfs.h claims incorrectly that NFSv2 defines NFSERR_INVAL. There is no such definition in either RFC 1094 or https://pubs.opengroup.org/onlinepubs/9629799/chap7.htm NFS3ERR_INVAL is introduced in RFC 1813. NFSD returns NFSERR_INVAL for PROC_GETACL, which has no specification (yet). However, nfsd_map_status() maps nfserr_symlink and nfserr_wrong_type to nfserr_inval, which does not align with RFC 1094. This logic was introduced only recently by commit `438f81e0e9` ("nfsd: move error choice for incorrect object types to version-specific code."). Given that we have no INVAL or SERVERFAULT status in NFSv2, probably the only choice is NFSERR_IO. Fixes: `438f81e0e9` ("nfsd: move error choice for incorrect object types to version-specific code.") Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:05 -05:00
Chuck Lever	3f5d7f3865	NFSD: Remove NFSERR_EAGAIN commit `c6c209ceb8` upstream. I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881. None of these RFCs have an NFS status code that match the numeric value "11". Based on the meaning of the EAGAIN errno, I presume the use of this status in NFSD means NFS4ERR_DELAY. So replace the one usage of nfserr_eagain, and remove it from NFSD's NFS status conversion tables. As far as I can tell, NFSERR_EAGAIN has existed since the pre-git era, but was not actually used by any code until commit `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed."), at which time it become possible for NFSD to return a status code of 11 (which is not valid NFS protocol). Fixes: `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:15 +01:00
Edward Adam Davis	8c1cf63ed4	NFSD: net ref data still needs to be freed even if net hasn't startup commit `0b88bfa42e` upstream. When the NFSD instance doesn't to startup, the net ref data memory is not properly reclaimed, which triggers the memory leak issue reported by syzbot [1]. To avoid the problem reported in [1], the net ref data memory reclamation action is moved outside of nfsd_net_up when the net is shutdown. [1] unreferenced object 0xffff88812a39dfc0 (size 64): backtrace (crc a2262fc6): percpu_ref_init+0x94/0x1e0 lib/percpu-refcount.c:76 nfsd_create_serv+0xbe/0x260 fs/nfsd/nfssvc.c:605 nfsd_nl_listener_set_doit+0x62/0xb00 fs/nfsd/nfsctl.c:1882 genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210 BUG: memory leak Reported-by: syzbot+6ee3b889bdeada0a6226@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6ee3b889bdeada0a6226 Fixes: `39972494e3` ("nfsd: update percpu_ref to manage references on nfsd_net") Cc: stable@vger.kernel.org Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:15 +01:00
Olga Kornievskaia	d95499900f	nfsd: check that server is running in unlock_filesystem commit `d0424066fc` upstream. If we are trying to unlock the filesystem via an administrative interface and nfsd isn't running, it crashes the server. This happens currently because nfsd4_revoke_states() access state structures (eg., conf_id_hashtbl) that has been freed as a part of the server shutdown. [ 59.465072] Call trace: [ 59.465308] nfsd4_revoke_states+0x1b4/0x898 [nfsd] (P) [ 59.465830] write_unlock_fs+0x258/0x440 [nfsd] [ 59.466278] nfsctl_transaction_write+0xb0/0x120 [nfsd] [ 59.466780] vfs_write+0x1f0/0x938 [ 59.467088] ksys_write+0xfc/0x1f8 [ 59.467395] __arm64_sys_write+0x74/0xb8 [ 59.467746] invoke_syscall.constprop.0+0xdc/0x1e8 [ 59.468177] do_el0_svc+0x154/0x1d8 [ 59.468489] el0_svc+0x40/0xe0 [ 59.468767] el0t_64_sync_handler+0xa0/0xe8 [ 59.469138] el0t_64_sync+0x1ac/0x1b0 Ensure this can't happen by taking the nfsd_mutex and checking that the server is still up, and then holding the mutex across the call to nfsd4_revoke_states(). Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: `1ac3629bf0` ("nfsd: prepare for supporting admin-revocation of state") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:15 +01:00
NeilBrown	03c68f94fa	nfsd: use correct loop termination in nfsd4_revoke_states() commit `fb321998de` upstream. The loop in nfsd4_revoke_states() stops one too early because the end value given is CLIENT_HASH_MASK where it should be CLIENT_HASH_SIZE. This means that an admin request to drop all locks for a filesystem will miss locks held by clients which hash to the maximum possible hash value. Fixes: `1ac3629bf0` ("nfsd: prepare for supporting admin-revocation of state") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:15 +01:00
NeilBrown	ba4811c8b4	nfsd: provide locking for v4_end_grace commit `2857bd59fe` upstream. Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: `7f5ef2e900` ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:15 +01:00
Scott Mayhew	6b7ad17f4d	NFSD: Fix permission check for read access to executable-only files commit `e901c7fce5` upstream. Commit `abc02e5602` ("NFSD: Support write delegations in LAYOUTGET") added NFSD_MAY_OWNER_OVERRIDE to the access flags passed from nfsd4_layoutget() to fh_verify(). This causes LAYOUTGET to fail for executable-only files, and causes xfstests generic/126 to fail on pNFS SCSI. To allow read access to executable-only files, what we really want is: 1. The "permissions" portion of the access flags (the lower 6 bits) must be exactly NFSD_MAY_READ 2. The "hints" portion of the access flags (the upper 26 bits) can contain any combination of NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_READ_IF_EXEC Fixes: `abc02e5602` ("NFSD: Support write delegations in LAYOUTGET") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-17 16:31:14 +01:00
Haoxiang Li	cb0255265f	nfsd: Drop the client reference in client_states_open() commit `1f941b2c23` upstream. In error path, call drop_client() to drop the reference obtained by get_nfsdfs_clp(). Fixes: `78599c42ae` ("nfsd4: add file to display list of client's opens") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:14:51 +01:00
Andy Shevchenko	0dacf9ce79	nfsd: Mark variable __maybe_unused to avoid W=1 build break commit `ebae102897` upstream. Clang is not happy about set but (in some cases) unused variable: fs/nfsd/export.c:1027:17: error: variable 'inode' set but not used [-Werror,-Wunused-but-set-variable] since it's used as a parameter to dprintk() which might be configured a no-op. To avoid uglifying code with the specific ifdeffery just mark the variable __maybe_unused. The commit [1], which introduced this behaviour, is quite old and hence the Fixes tag points to the first of the Git era. Link: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0431923fb7a1 [1] Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:14:24 +01:00
Chuck Lever	bf4e671c65	NFSD: NFSv4 file creation neglects setting ACL commit `913f7cf77b` upstream. An NFSv4 client that sets an ACL with a named principal during file creation retrieves the ACL afterwards, and finds that it is only a default ACL (based on the mode bits) and not the ACL that was requested during file creation. This violates RFC 8881 section 6.4.1.3: "the ACL attribute is set as given". The issue occurs in nfsd_create_setattr(), which calls nfsd_attrs_valid() to determine whether to call nfsd_setattr(). However, nfsd_attrs_valid() checks only for iattr changes and security labels, but not POSIX ACLs. When only an ACL is present, the function returns false, nfsd_setattr() is skipped, and the POSIX ACL is never applied to the inode. Subsequently, when the client retrieves the ACL, the server finds no POSIX ACL on the inode and returns one generated from the file's mode bits rather than returning the originally-specified ACL. Reported-by: Aurélien Couderc <aurelien.couderc2002@gmail.com> Fixes: `c0cbe70742` ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:14:24 +01:00
Chuck Lever	fcb8d118e1	NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap commit `27d17641ca` upstream. >From RFC 8881: 5.8.1.14. Attribute 75: suppattr_exclcreat > The bit vector that would set all REQUIRED and RECOMMENDED > attributes that are supported by the EXCLUSIVE4_1 method of file > creation via the OPEN operation. The scope of this attribute > applies to all objects with a matching fsid. There's nothing in RFC 8881 that states that suppattr_exclcreat is or is not allowed to contain bits for attributes that are clear in the reported supported_attrs bitmask. But it doesn't make sense for an NFS server to indicate that it /doesn't/ implement an attribute, but then also indicate that clients /are/ allowed to set that attribute using OPEN(create) with EXCLUSIVE4_1. Ensure that the SECURITY_LABEL and ACL bits are not set in the suppattr_exclcreat bitmask when they are also not set in the supported_attrs bitmask. Fixes: `8c18f2052e` ("nfsd41: SUPPATTR_EXCLCREAT attribute") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:14:24 +01:00
Dai Ngo	d083571404	NFSD: use correct reservation type in nfsd4_scsi_fence_client commit `6f52063db9` upstream. The reservation type argument for the pr_preempt call should match the one used in nfsd4_block_get_device_info_scsi. Fixes: `f99d4fbdae` ("nfsd: add SCSI layout support") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:14:19 +01:00
Shardul Bankar	7ced0c0779	nfsd: fix memory leak in nfsd_create_serv error paths [ Upstream commit `df8d829bba` ] When nfsd_create_serv() calls percpu_ref_init() to initialize nn->nfsd_net_ref, it allocates both a percpu reference counter and a percpu_ref_data structure (64 bytes). However, if the function fails later due to svc_create_pooled() returning NULL or svc_bind() returning an error, these allocations are not cleaned up, resulting in a memory leak. The leak manifests as: - Unreferenced percpu allocation (8 bytes per CPU) - Unreferenced percpu_ref_data structure (64 bytes) Fix this by adding percpu_ref_exit() calls in both error paths to properly clean up the percpu_ref_init() allocations. This patch fixes the percpu_ref leak in nfsd_create_serv() seen as an auxiliary leak in syzbot report 099461f8558eb0a1f4f3; the prepare_creds() and vsock-related leaks in the same report remain to be addressed separately. Reported-by: syzbot+099461f8558eb0a1f4f3@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=099461f8558eb0a1f4f3 Fixes: `47e988147f` ("nfsd: add nfsd_serv_try_get and nfsd_serv_put") Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:14:09 +01:00
Mike Snitzer	43c1b51442	nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ [ Upstream commit `b33f7dec3a` ] Also update Documentation/filesystems/nfs/localio.rst accordingly and reduce the technical documentation debt that was previously captured in that document. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Stable-dep-of: `df8d829bba` ("nfsd: fix memory leak in nfsd_create_serv error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:14:09 +01:00
Mike Snitzer	6abdb63db7	nfsd: update percpu_ref to manage references on nfsd_net [ Upstream commit `39972494e3` ] Holding a reference on nfsd_net is what is required, it was never actually about ensuring nn->nfsd_serv available. Move waiting for outstanding percpu references from nfsd_destroy_serv() to nfsd_shutdown_net(). By moving it later it will be possible to invalidate localio clients during nfsd_file_cache_shutdown_net() via __nfsd_file_cache_purge(). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Stable-dep-of: `df8d829bba` ("nfsd: fix memory leak in nfsd_create_serv error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:14:08 +01:00
Sergey Bashirov	bbc709db0c	NFSD/blocklayout: Fix minlength check in proc_layoutget [ Upstream commit `3524b021b0` ] The extent returned by the file system may have a smaller offset than the segment offset requested by the client. In this case, the minimum segment length must be checked against the requested range. Otherwise, the client may not be able to continue the read/write operation. Fixes: `8650b8a058` ("nfsd: pNFS block layout driver") Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:00 +01:00
NeilBrown	6326eb4c58	nfsd: Replace clamp_t in nfsd4_get_drc_mem() A recent change to clamp_t() in 6.1.y caused fs/nfsd/nfs4state.c to fail to compile with gcc-9. The code in nfsd4_get_drc_mem() was written with the assumption that when "max < min", clamp(val, min, max) would return max. This assumption is not documented as an API promise and the change caused a compile failure if it could be statically determined that "max < min". The relevant code was no longer present upstream when commit `1519fbc883` ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()") landed there, so there is no upstream change to nfsd4_get_drc_mem() to backport. There is no clear case that the existing code in nfsd4_get_drc_mem() is functioning incorrectly. The goal of this patch is to permit the clean application of commit `1519fbc883` ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()"), and any commits that depend on it, to LTS kernels without affecting the ability to compile those kernels. This is done by open-coding the __clamp() macro sans the built-in type checking. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220745#c0 Signed-off-by: NeilBrown <neil@brown.name> Stable-dep-of: `1519fbc883` ("minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed_by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-07 06:25:02 +09:00
Olga Kornievskaia	d7be15a634	NFSD: free copynotify stateid in nfs4_free_ol_stateid() commit `4aa17144d5` upstream. Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: `624322f1ad` ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-24 10:35:59 +01:00
Olga Kornievskaia	3bc33097d4	nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes commit `4d3dbc2386` upstream. RFC 7862 Section 4.1.2 says that if the server supports CLONE it MUST support clone_blksize attribute. Fixes: `d6ca7d2643` ("NFSD: Implement FATTR4_CLONE_BLKSIZE attribute") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-24 10:35:59 +01:00
NeilBrown	b6bc86ce39	nfsd: fix refcount leak in nfsd_set_fh_dentry() commit `8a7348a9ed` upstream. nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find the various exported filesystems using LOOKUP requests from a known root filehandle. NFSv3 uses the MOUNT protocol to find those exported filesystems and so is not given access to the pseudo root filesystem. If a v3 (or v2) client uses a filehandle from that filesystem, nfsd_set_fh_dentry() will report an error, but still stores the export in "struct svc_fh" even though it also drops the reference (exp_put()). This means that when fh_put() is called an extra reference will be dropped which can lead to use-after-free and possible denial of service. Normal NFS usage will not provide a pseudo-root filehandle to a v3 client. This bug can only be triggered by the client synthesising an incorrect filehandle. To fix this we move the assignments to the svc_fh later, after all possible error cases have been detected. Reported-and-tested-by: tianshuo han <hantianshuo233@gmail.com> Fixes: `ef7f6c4904` ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()") Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-24 10:35:59 +01:00
Chuck Lever	a5728422b8	NFSD: Skip close replay processing if XDR encoding fails [ Upstream commit `ff8141e49c` ] The replay logic added by commit `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") cannot be done if encoding failed due to a short send buffer; there's no guarantee that the operation encoder has actually encoded the data that is being copied to the replay cache. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:50 +01:00
Chuck Lever	a4948875ed	NFSD: Fix crash in nfsd4_read_release() commit `abb1f08a21` upstream. When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: `15a8b55dbb` ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-11-13 15:33:54 -05:00
Chuck Lever	a156af6a4d	NFSD: Define a proc_layoutcommit for the FlexFiles layout type [ Upstream commit `4b47a8601b` ] Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT operation on a FlexFiles layout. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t Cc: Thomas Haynes <loghyr@hammerspace.com> Cc: stable@vger.kernel.org Fixes: `9b9960a0ca` ("nfsd: Add a super simple flex file server") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:47 +02:00
Sergey Bashirov	c33da548fb	NFSD: Fix last write offset handling in layoutcommit [ Upstream commit `d68886bae7` ] The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:41 +02:00
Sergey Bashirov	da68bc55d5	NFSD: Implement large extent array support in pNFS [ Upstream commit `f963cf2b91` ] When pNFS client in the block or scsi layout mode sends layoutcommit to MDS, a variable length array of modified extents is supplied within the request. This patch allows the server to accept such extent arrays if they do not fit within single memory page. The issue can be reproduced when writing to a 1GB file using FIO with O_DIRECT, 4K block and large I/O depth without preallocation of the file. In this case, the server returns NFSERR_BADXDR to the client. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `d68886bae7` ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:41 +02:00
Sergey Bashirov	18eee64074	NFSD: Minor cleanup in layoutcommit processing [ Upstream commit `274365a51d` ] Remove dprintk in nfsd4_layoutcommit. These are not needed in day to day usage, and the information is also available in Wireshark when capturing NFS traffic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `d68886bae7` ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:40 +02:00
Sergey Bashirov	47c609979b	NFSD: Rework encoding and decoding of nfsd4_deviceid [ Upstream commit `832738e4b3` ] Compilers may optimize the layout of C structures, so we should not rely on sizeof struct and memcpy to encode and decode XDR structures. The byte order of the fields should also be taken into account. This patch adds the correct functions to handle the deviceid4 structure and removes the pad field, which is currently not used by NFSD, from the runtime state. The server's byte order is preserved because the deviceid4 blob on the wire is only used as a cookie by the client. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `d68886bae7` ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:40 +02:00
Sergey Bashirov	5def53c55a	nfsd: Drop dprintk in blocklayout xdr functions [ Upstream commit `e339967eec` ] Minor clean up. Instead of dprintk there are appropriate error codes. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `d68886bae7` ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:40 +02:00
Sergey Bashirov	434b399044	nfsd: Use correct error code when decoding extents [ Upstream commit `26d05e1c37` ] Update error codes in decoding functions of block and scsi layout drivers to match the core nfsd code. NFS4ERR_EINVAL means that the server was able to decode the request, but the decoded values are invalid. Use NFS4ERR_BADXDR instead to indicate a decoding error. And ENOMEM is changed to nfs code NFS4ERR_DELAY. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `d68886bae7` ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-23 16:20:40 +02:00
Scott Mayhew	779327c2be	nfsd: decouple the xprtsec policy check from check_nfsd_access() commit `e4f574ca9c` upstream. A while back I had reported that an NFSv3 client could successfully mount using '-o xprtsec=none' an export that had been exported with 'xprtsec=tls:mtls'. By "successfully" I mean that the mount command would succeed and the mount would show up in /proc/mount. Attempting to do anything futher with the mount would be met with NFS3ERR_ACCES. This was fixed (albeit accidentally) by commit `bb4f07f240` ("nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was subsequently re-broken by commit `0813c5f012` ("nfsd: fix access checking for NLM under XPRTSEC policies"). Transport Layer Security isn't an RPC security flavor or pseudo-flavor, so we shouldn't be conflating them when determining whether the access checks can be bypassed. Split check_nfsd_access() into two helpers, and have __fh_verify() call the helpers directly since __fh_verify() has logic that allows one or both of the checks to be skipped. All other sites will continue to call check_nfsd_access(). Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/ Fixes: `9280c57743` ("NFSD: Handle new xprtsec= export option") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:07 +02:00
Olga Kornievskaia	996b8797d6	nfsd: fix access checking for NLM under XPRTSEC policies commit `0813c5f012` upstream. When an export policy with xprtsec policy is set with "tls" and/or "mtls", but an NFS client is doing a v3 xprtsec=tls mount, then NLM locking calls fail with an error because there is currently no support for NLM with TLS. Until such support is added, allow NLM calls under TLS-secured policy. Fixes: `4cc9b9f2bf` ("nfsd: refine and rename NFSD_MAY_LOCK") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:06 +02:00
Olga Kornievskaia	4c4d66e811	nfsd: fix __fh_verify for localio commit `d9d6b74e4b` upstream. __fh_verify() added a call to svc_xprt_set_valid() to help do connection management but during LOCALIO path rqstp argument is NULL, leading to NULL pointer dereferencing and a crash. Fixes: `eccbbc7c00` ("nfsd: don't use sv_nrthreads in connection limiting calculations.") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:06 +02:00
NeilBrown	0a1ee3c932	nfsd: don't use sv_nrthreads in connection limiting calculations. [ Upstream commit `eccbbc7c00` ] The heuristic for limiting the number of incoming connections to nfsd currently uses sv_nrthreads - allowing more connections if more threads were configured. A future patch will allow number of threads to grow dynamically so that there will be no need to configure sv_nrthreads. So we need a different solution for limiting connections. It isn't clear what problem is solved by limiting connections (as mentioned in a code comment) but the most likely problem is a connection storm - many connections that are not doing productive work. These will be closed after about 6 minutes already but it might help to slow down a storm. This patch adds a per-connection flag XPT_PEER_VALID which indicates that the peer has presented a filehandle for which it has some sort of access. i.e the peer is known to be trusted in some way. We now only count connections which have NOT been determined to be valid. There should be relative few of these at any given time. If the number of non-validated peer exceed a limit - currently 64 - we close the oldest non-validated peer to avoid having too many of these useless connections. Note that this patch significantly changes the meaning of the various configuration parameters for "max connections". The next patch will remove all of these. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `898374fdd7` ("nfsd: unregister with rpcbind when deleting a transport") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:03 +02:00
NeilBrown	18744bc56b	nfsd: refine and rename NFSD_MAY_LOCK [ Upstream commit `4cc9b9f2bf` ] NFSD_MAY_LOCK means a few different things. - it means that GSS is not required. - it means that with NFSEXP_NOAUTHNLM, authentication is not required - it means that OWNER_OVERRIDE is allowed. None of these are specific to locking, they are specific to the NLM protocol. So: - rename to NFSD_MAY_NLM - set NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_BYPASS_GSS in nlm_fopen() so that NFSD_MAY_NLM doesn't need to imply these. - move the test on NFSEXP_NOAUTHNLM out of nfsd_permission() and into fh_verify where other special-case tests on the MAY flags happen. nfsd_permission() can be called from other places than fh_verify(), but none of these will have NFSD_MAY_NLM. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `898374fdd7` ("nfsd: unregister with rpcbind when deleting a transport") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:03 +02:00
Chuck Lever	763d4aa418	NFSD: Replace use of NFSD_MAY_LOCK in nfsd4_lock() [ Upstream commit `6640556b0c` ] NFSv4 LOCK operations should not avoid the set of authorization checks that apply to all other NFSv4 operations. Also, the "no_auth_nlm" export option should apply only to NLM LOCK requests. It's not necessary or sensible to apply it to NFSv4 LOCK operations. Instead, set no permission bits when calling fh_verify(). Subsequent stateid processing handles authorization checks. Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `898374fdd7` ("nfsd: unregister with rpcbind when deleting a transport") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:03 +02:00
Pali Rohár	658bedb82e	nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT [ Upstream commit `bb4f07f240` ] Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass only GSS, but bypass any method. This is a problem specially for NFS3 AUTH_NULL-only exports. The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623, section 2.3.2, to allow mounting NFS2/3 GSS-only export without authentication. So few procedures which do not expose security risk used during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow client mount operation to finish successfully. The problem with current implementation is that for AUTH_NULL-only exports, the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to AUTH_NONE on active mount, which makes the mount inaccessible. Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation and really allow to bypass only exports which have enabled some real authentication (GSS, TLS, or any other). The result would be: For AUTH_NULL-only export if client attempts to do mount with AUTH_UNIX flavor then it will receive access errors, which instruct client that AUTH_UNIX flavor is not usable and will either try other auth flavor (AUTH_NULL if enabled) or fails mount procedure. Similarly if client attempt to do mount with AUTH_NULL flavor and only AUTH_UNIX flavor is enabled then the client will receive access error. This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export). Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `898374fdd7` ("nfsd: unregister with rpcbind when deleting a transport") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:03 +02:00
Olga Kornievskaia	6248ff249b	nfsd: nfserr_jukebox in nlm_fopen should lead to a retry commit `a082e4b4d0` upstream. When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: `d343fce148` ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:33:59 +02:00
Thorsten Blum	017addab06	NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() commit `ab1c282c01` upstream. Commit `5304877936` ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit `72f78ae00a` ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: `5304877936` ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:33:58 +02:00
Dai Ngo	944ec8c775	NFSD: detect mismatch of file handle and delegation stateid in OPEN op commit `9c65001c57` upstream. When the client sends an OPEN with claim type CLAIM_DELEG_CUR_FH or CLAIM_DELEGATION_CUR, the delegation stateid and the file handle must belong to the same file, otherwise return NFS4ERR_INVAL. Note that RFC8881, section 8.2.4, mandates the server to return NFS4ERR_BAD_STATEID if the selected table entry does not match the current filehandle. However returning NFS4ERR_BAD_STATEID in the OPEN causes the client to retry the operation and therefor get the client into a loop. To avoid this situation we return NFS4ERR_INVAL instead. Reported-by: Petro Pavlov <petro.pavlov@vastdata.com> Fixes: `c44c5eeb2c` ("[PATCH] nfsd4: add open state code for CLAIM_DELEGATE_CUR") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-20 18:30:14 +02:00
Jeff Layton	74ad36ed60	nfsd: handle get_client_locked() failure in nfsd4_setclientid_confirm() commit `908e4ead7f` upstream. Lei Lu recently reported that nfsd4_setclientid_confirm() did not check the return value from get_client_locked(). a SETCLIENTID_CONFIRM could race with a confirmed client expiring and fail to get a reference. That could later lead to a UAF. Fix this by getting a reference early in the case where there is an extant confirmed client. If that fails then treat it as if there were no confirmed client found at all. In the case where the unconfirmed client is expiring, just fail and return the result from get_client_locked(). Reported-by: lei lu <llfamsec@gmail.com> Closes: https://lore.kernel.org/linux-nfs/CAEBF3_b=UvqzNKdnfD_52L05Mqrqui9vZ2eFamgAbV0WG+FNWQ@mail.gmail.com/ Fixes: `d20c11d86d` ("nfsd: Protect session creation and client confirm using client_lock") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-20 18:30:14 +02:00
Jeff Layton	0174154faf	nfsd: use threads array as-is in netlink interface commit `8ea688a337` upstream. The old nfsdfs interface for starting a server with multiple pools handles the special case of a single entry array passed down from userland by distributing the threads over every NUMA node. The netlink control interface however constructs an array of length nfsd_nrpools() and fills any unprovided slots with 0's. This behavior defeats the special casing that the old interface relies on. Change nfsd_nl_threads_set_doit() to pass down the array from userland as-is. Fixes: `7f5c330b26` ("nfsd: allow passing in array of thread counts via netlink") Cc: stable@vger.kernel.org Reported-by: Mike Snitzer <snitzer@kernel.org> Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:40 +01:00
Li Lingfeng	d622c2ee6c	nfsd: Initialize ssc before laundromat_work to prevent NULL dereference commit `b31da62889` upstream. In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized, this can cause NULL pointer dereference. Normally the delayed start of laundromat_work allows sufficient time for nfsd_ssc initialization to complete. However, when the kernel waits too long for userspace responses (e.g. in nfs4_state_start_net -> nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done -> cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the delayed work may start before nfsd_ssc initialization finishes. Fix this by moving nfsd_ssc initialization before starting laundromat_work. Fixes: `f4e44b3933` ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:15 +01:00
NeilBrown	7a75a95669	nfsd: nfsd4_spo_must_allow() must check this is a v4 compound request commit `1244f0b2c3` upstream. If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:15 +01:00
Chuck Lever	07862d4f7c	NFSD: Implement FATTR4_CLONE_BLKSIZE attribute commit `d6ca7d2643` upstream. RFC 7862 states that if an NFS server implements a CLONE operation, it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE, but does not implement FATTR4_CLONE_BLKSIZE. Note that in Section 12.2, RFC 7862 claims that FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is because a minor version is not permitted to add a REQUIRED attribute. Confusing. We assume this attribute reports a block size as a count of bytes, as RFC 7862 does not specify a unit. Reported-by: Roland Mainz <roland.mainz@nrubsig.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org # v6.7+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:15 +01:00
Maninder Singh	2029ca75cd	NFSD: fix race between nfsd registration and exports_proc commit `f7fb730cac` upstream. As of now nfsd calls create_proc_exports_entry() at start of init_nfsd and cleanup by remove_proc_entry() at last of exit_nfsd. Which causes kernel OOPs if there is race between below 2 operations: (i) exportfs -r (ii) mount -t nfsd none /proc/fs/nfsd for 5.4 kernel ARM64: CPU 1: el1_irq+0xbc/0x180 arch_counter_get_cntvct+0x14/0x18 running_clock+0xc/0x18 preempt_count_add+0x88/0x110 prep_new_page+0xb0/0x220 get_page_from_freelist+0x2d8/0x1778 __alloc_pages_nodemask+0x15c/0xef0 __vmalloc_node_range+0x28c/0x478 __vmalloc_node_flags_caller+0x8c/0xb0 kvmalloc_node+0x88/0xe0 nfsd_init_net+0x6c/0x108 [nfsd] ops_init+0x44/0x170 register_pernet_operations+0x114/0x270 register_pernet_subsys+0x34/0x50 init_nfsd+0xa8/0x718 [nfsd] do_one_initcall+0x54/0x2e0 CPU 2 : Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 PC is at : exports_net_open+0x50/0x68 [nfsd] Call trace: exports_net_open+0x50/0x68 [nfsd] exports_proc_open+0x2c/0x38 [nfsd] proc_reg_open+0xb8/0x198 do_dentry_open+0x1c4/0x418 vfs_open+0x38/0x48 path_openat+0x28c/0xf18 do_filp_open+0x70/0xe8 do_sys_open+0x154/0x248 Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu(). and same is happening on latest 6.14 kernel as well: [ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty ... [ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48 ... [ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4 ... [ 285.469695] Call trace: [ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P) [ 285.470488] seq_read+0xe0/0x11c [ 285.470675] proc_reg_read+0x9c/0xf0 [ 285.470874] vfs_read+0xc4/0x2fc [ 285.471057] ksys_read+0x6c/0xf4 [ 285.471231] __arm64_sys_read+0x1c/0x28 [ 285.471428] invoke_syscall+0x44/0x100 [ 285.471633] el0_svc_common.constprop.0+0x40/0xe0 [ 285.471870] do_el0_svc_compat+0x1c/0x34 [ 285.472073] el0_svc_compat+0x2c/0x80 [ 285.472265] el0t_32_sync_handler+0x90/0x140 [ 285.472473] el0t_32_sync+0x19c/0x1a0 [ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3) [ 285.473422] ---[ end trace 0000000000000000 ]--- It reproduced simply with below script: while [ 1 ] do /exportfs -r done & while [ 1 ] do insmod /nfsd.ko mount -t nfsd none /proc/fs/nfsd umount /proc/fs/nfsd rmmod nfsd done & So exporting interfaces to user space shall be done at last and cleanup at first place. With change there is no Kernel OOPs. Co-developed-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:15 +01:00
Maninder Singh	3f3eb3c342	NFSD: unregister filesystem in case genl_register_family() fails commit `ff12eb3795` upstream. With rpc_status netlink support, unregister of register_filesystem() was missed in case of genl_register_family() fails. Correcting it by making new label. Fixes: `bd9d6a3efa` ("NFSD: add rpc_status netlink support") Cc: stable@vger.kernel.org Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:15 +01:00
Li Lingfeng	a7fce086f6	nfsd: decrease sc_count directly if fail to queue dl_recall [ Upstream commit `a1d14d931b` ] A deadlock warning occurred when invoking nfs4_put_stid following a failed dl_recall queue operation: T1 T2 nfs4_laundromat nfs4_get_client_reaplist nfs4_anylock_blockers __break_lease spin_lock // ctx->flc_lock spin_lock // clp->cl_lock nfs4_lockowner_has_blockers locks_owner_has_blockers spin_lock // flctx->flc_lock nfsd_break_deleg_cb nfsd_break_one_deleg nfs4_put_stid refcount_dec_and_lock spin_lock // clp->cl_lock When a file is opened, an nfs4_delegation is allocated with sc_count initialized to 1, and the file_lease holds a reference to the delegation. The file_lease is then associated with the file through kernel_setlease. The disassociation is performed in nfsd4_delegreturn via the following call chain: nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg --> nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease The corresponding sc_count reference will be released after this disassociation. Since nfsd_break_one_deleg executes while holding the flc_lock, the disassociation process becomes blocked when attempting to acquire flc_lock in generic_delete_lease. This means: 1) sc_count in nfsd_break_one_deleg will not be decremented to 0; 2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to acquire cl_lock; 3) Consequently, no deadlock condition is created. Given that sc_count in nfsd_break_one_deleg remains non-zero, we can safely perform refcount_dec on sc_count directly. This approach effectively avoids triggering deadlock warnings. Fixes: `230ca75845` ("nfsd: put dl_stid if fail to queue dl_recall") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-04-25 10:47:51 +02:00

1 2 3 4 5 ...

4171 Commits