linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Roberto Bergantinos Corpas	7e29637737	nfs: return EISDIR on nfs3_proc_create if d_alias is a dir [ Upstream commit `410666a298` ] If we found an alias through nfs3_do_create/nfs_add_or_obtain /d_splice_alias which happens to be a dir dentry, we don't return any error, and simply forget about this alias, but the original dentry we were adding and passed as parameter remains negative. This later causes an oops on nfs_atomic_open_v23/finish_open since we supply a negative dentry to do_dentry_open. This has been observed running lustre-racer, where dirs and files are created/removed concurrently with the same name and O_EXCL is not used to open files (frequent file redirection). While d_splice_alias typically returns a directory alias or NULL, we explicitly check d_is_dir() to ensure that we don't attempt to perform file operations (like finish_open) on a directory inode, which triggers the observed oops. Fixes: `7c6c5249f0` ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Reviewed-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-25 11:08:26 +01:00
Sagi Grimberg	596c8b168c	fs/nfs: Fix readdir slow-start regression [ Upstream commit `42e7c876b1` ] Commit `580f236737` ("NFS: Adjust the amount of readahead performed by NFS readdir") reduces the amount of readahead names caching done by the client. The downside of this approach is READDIR now may suffer from a slow-start issue, where initially it will fetch names that fit in a single page, then in 2, 4, 8 until the maximum supported transfer size (usually 1M). This patch tries to take a balanced approach between mitigating the slow-start issue still maintaining some efficiency gains. Fixes: `580f236737` ("NFS: Adjust the amount of readahead performed by NFS readdir") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:24 -05:00
Olga Kornievskaia	b3a8b2bbc4	pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN [ Upstream commit `5248d8474e` ] It is possible to have a task get stuck on waiting on the NFS_LAYOUT_DRAIN in the following scenario 1. cpu a: waiter test NFS_LAYOUT_DRAIN (1) and plh_outstanding (1) 2. cpu b: atomic_dec_and_test() -> clear bit -> wake up 3. cpu c: sets NFS_LAYOUT_DRAIN again 4. cpu a: calls wait_on_bit() sleeps forever. To expand on this we have say 2 outstanding pnfs write IO that get ESTALE which causes both to call pnfs_destroy_layout() and set the NFS_LAYOUT_DRAIN bit but the 1st one doesn't call the pnfs_put_layout_hdr() yet (as that would prevent the 2nd ESTALE write from trying to call pnfs_destroy_layout()). If the 1st ESTALE write is the one that initially sets the NFS_LAYOUT_DRAIN so that new IO on this file initiates new LAYOUTGET. Another new write would find NFS_LAYOUT_DRAIN set and phl_outstanding>0 (step 1) and would wait_on_bit(). LAYOUTGET completes doing step 2. Now, the 2nd of ESTALE writes is calling pnfs_destory_layout() and set the NFS_LAYOUT_DRAIN bit (step 3). Finally, the waiting write wakes up to check the bit and goes back to sleep. The problem revolves around the fact that if NFS_LAYOUT_INVALID_STID was already set, it should not do the work of pnfs_mark_layout_stateid_invalid(), thus NFS_LAYOUT_DRAIN will not be set more than once for an invalid layout. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: `880265c77a` ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:23 -05:00
Mike Snitzer	8d9798d6ff	NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit [ Upstream commit `9bb0060f78` ] nfslocaliod_workqueue is a non-memreclaim workqueue (it isn't initialized with WQ_MEM_RECLAIM), see commit `b9f5dd57f4` ("nfs/localio: use dedicated workqueues for filesystem read and write"). Use nfslocaliod_workqueue for LOCALIO's SYNC work. Also, set PF_LOCAL_THROTTLE \| PF_MEMALLOC_NOIO in nfs_local_fsync_work. Fixes: `b9f5dd57f4` ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Mike Snitzer <snitzer@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:21 -05:00
Mike Snitzer	af87ebd343	nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx [ Upstream commit `894f5c5593` ] nfs_local_commit() doesn't need async cleanup of nfs_local_fsync_ctx, so there is no need to use a kref. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: `9bb0060f78` ("NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:21 -05:00
Zilin Guan	0e036606b2	pnfs/blocklayout: Fix memory leak in bl_parse_scsi() [ Upstream commit `5a74af51c3` ] In bl_parse_scsi(), if the block device length is zero, the function returns immediately without releasing the file reference obtained via bl_open_path(), leading to a memory leak. Fix this by jumping to the out_blkdev_put label to ensure the file reference is properly released. Fixes: `d76c769c8d` ("pnfs/blocklayout: Don't add zero-length pnfs_block_dev") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-23 11:18:36 +01:00
Zilin Guan	86da7efd12	pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node() [ Upstream commit `0c72808365` ] In nfs4_ff_alloc_deviceid_node(), if the allocation for ds_versions fails, the function jumps to the out_scratch label without freeing the already allocated dsaddrs list, leading to a memory leak. Fix this by jumping to the out_err_drain_dsaddrs label, which properly frees the dsaddrs list before cleaning up other resources. Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-23 11:18:36 +01:00
Trond Myklebust	49d352bc26	NFS: Fix a deadlock involving nfs_release_folio() [ Upstream commit `cce0be6eb4` ] Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery waiting on kthreadd, which is attempting to reclaim memory by calling nfs_release_folio(). The latter cannot make progress due to state recovery being needed. It seems that the only safe thing to do here is to kick off a writeback of the folio, without waiting for completion, or else kicking off an asynchronous commit. Reported-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Fixes: `96780ca55e` ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-23 11:18:36 +01:00
Trond Myklebust	a316fd9d30	pNFS: Fix a deadlock when returning a delegation during open() [ Upstream commit `857bf90562` ] Ben Coddington reports seeing a hang in the following stack trace: 0 [ffffd0b50e1774e0] __schedule at ffffffff9ca05415 1 [ffffd0b50e177548] schedule at ffffffff9ca05717 2 [ffffd0b50e177558] bit_wait at ffffffff9ca061e1 3 [ffffd0b50e177568] __wait_on_bit at ffffffff9ca05cfb 4 [ffffd0b50e1775c8] out_of_line_wait_on_bit at ffffffff9ca05ea5 5 [ffffd0b50e177618] pnfs_roc at ffffffffc154207b [nfsv4] 6 [ffffd0b50e1776b8] _nfs4_proc_delegreturn at ffffffffc1506586 [nfsv4] 7 [ffffd0b50e177788] nfs4_proc_delegreturn at ffffffffc1507480 [nfsv4] 8 [ffffd0b50e1777f8] nfs_do_return_delegation at ffffffffc1523e41 [nfsv4] 9 [ffffd0b50e177838] nfs_inode_set_delegation at ffffffffc1524a75 [nfsv4] 10 [ffffd0b50e177888] nfs4_process_delegation at ffffffffc14f41dd [nfsv4] 11 [ffffd0b50e1778a0] _nfs4_opendata_to_nfs4_state at ffffffffc1503edf [nfsv4] 12 [ffffd0b50e1778c0] _nfs4_open_and_get_state at ffffffffc1504e56 [nfsv4] 13 [ffffd0b50e177978] _nfs4_do_open at ffffffffc15051b8 [nfsv4] 14 [ffffd0b50e1779f8] nfs4_do_open at ffffffffc150559c [nfsv4] 15 [ffffd0b50e177a80] nfs4_atomic_open at ffffffffc15057fb [nfsv4] 16 [ffffd0b50e177ad0] nfs4_file_open at ffffffffc15219be [nfsv4] 17 [ffffd0b50e177b78] do_dentry_open at ffffffff9c09e6ea 18 [ffffd0b50e177ba8] vfs_open at ffffffff9c0a082e 19 [ffffd0b50e177bd0] dentry_open at ffffffff9c0a0935 The issue is that the delegreturn is being asked to wait for a layout return that cannot complete because a state recovery was initiated. The state recovery cannot complete until the open() finishes processing the delegations it was given. The solution is to propagate the existing flags that indicate a non-blocking call to the function pnfs_roc(), so that it knows not to wait in this situation. Reported-by: Benjamin Coddington <bcodding@hammerspace.com> Fixes: `29ade5db12` ("pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-23 11:18:35 +01:00
Trond Myklebust	a8559efcd5	NFS: Fix up the automount fs_context to use the correct cred [ Upstream commit `a2a8fc27dd` ] When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:31:19 +01:00
Scott Mayhew	e1df03e293	NFSv4: ensure the open stateid seqid doesn't go backwards [ Upstream commit `2e47c3cc64` ] We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-17 16:31:19 +01:00
Trond Myklebust	a3dbaa09db	NFS: Fix inheritance of the block sizes when automounting [ Upstream commit `2b092175f5` ] Only inherit the block sizes that were actually specified as mount parameters for the parent mount. Fixes: `62a55d088c` ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:18 +01:00
Trond Myklebust	d867a77a93	Expand the type of nfs_fattr->valid [ Upstream commit `ce60ab3964` ] We need to be able to track more than 32 attributes per inode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/1e3405fca54efd0be7c91c1da77917b94f5dfcc4.1748515333.git.bcodding@redhat.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: `2b092175f5` ("NFS: Fix inheritance of the block sizes when automounting") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:18 +01:00
Trond Myklebust	612cc98698	NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags [ Upstream commit `8675c69816` ] When a filesystem is being automounted, it needs to preserve the user-set superblock mount options, such as the "ro" flag. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ Fixes: `f2aedb713c` ("NFS: Add fs_context support.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:18 +01:00
Trond Myklebust	2704453bd1	Revert "nfs: ignore SB_RDONLY when mounting nfs" [ Upstream commit `d4a26d34f1` ] This reverts commit `52cb7f8f17`. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: `52cb7f8f17` ("nfs: ignore SB_RDONLY when mounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	1caf1aa241	Revert "nfs: clear SB_RDONLY before getting superblock" [ Upstream commit `d216b698d4` ] This reverts commit `8cd9b78594`. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: `8cd9b78594` ("nfs: clear SB_RDONLY before getting superblock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	acd4088a25	Revert "nfs: ignore SB_RDONLY when remounting nfs" [ Upstream commit `400fa37afb` ] This reverts commit `80c4de6ab4`. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: `80c4de6ab4` ("nfs: ignore SB_RDONLY when remounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Jonathan Curley	59947dff0f	NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid [ Upstream commit `e0f8058f2c` ] Fixes a crash when layout is null during this call stack: write_inode -> nfs4_write_inode -> pnfs_layoutcommit_inode pnfs_set_layoutcommit relies on the lseg refcount to keep the layout around. Need to clear NFS_INO_LAYOUTCOMMIT otherwise we might attempt to reference a null layout. Fixes: `fe1cf9469d` ("pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid") Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	fa561b29b7	NFS: Initialise verifiers for visible dentries in _nfs4_open_and_get_state [ Upstream commit `0f900f1100` ] Ensure that the verifiers are initialised before calling d_splice_alias() in _nfs4_open_and_get_state(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: `cf5b4059ba` ("NFSv4: Fix races between open and dentry revalidation") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
NeilBrown	ef97a2a5c1	nfs/vfs: discard d_exact_alias() [ Upstream commit `3ff6c8707c` ] d_exact_alias() is a descendent of d_add_unique() which was introduced 20 years ago mostly likely to work around problems with NFS servers of the time. It is now not used in several situations were it was originally needed and there have been no reports of problems - presumably the old NFS servers have been improved. This only place it is now use is in NFSv4 code and the old problematic servers are thought to have been v2/v3 only. There is no clear benefit in reusing a unhashed() dentry which happens to have the same name as the dentry we are adding. So this patch removes d_exact_alias() and the one place that it is used. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250226062135.2043651-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: `0f900f1100` ("NFS: Initialise verifiers for visible dentries in _nfs4_open_and_get_state") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	af4c780b9f	NFS: Initialise verifiers for visible dentries in nfs_atomic_open() [ Upstream commit `518c32a1bc` ] Ensure that the verifiers are initialised before calling d_splice_alias() in nfs_atomic_open(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: `809fd143de` ("NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	85d84f6c98	NFS: Initialise verifiers for visible dentries in readdir and lookup [ Upstream commit `9bd545539b` ] Ensure that the verifiers are initialised before calling d_splice_alias() in both nfs_prime_dcache() and nfs_lookup(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: `a1147b8281` ("NFS: Fix up directory verifier races") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:17 +01:00
Trond Myklebust	ef7a9c2fae	NFS: Avoid changing nlink when file removes and attribute updates race [ Upstream commit `bd4928ec79` ] If a file removal races with another operation that updates its attributes, then skip the change to nlink, and just mark the attributes as being stale. Reported-by: Aiden Lambert <alambert48@gatech.edu> Fixes: `59a707b0d4` ("NFS: Ensure we revalidate the inode correctly after remove or rename") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:16 +01:00
Dai Ngo	b2e4cda71e	NFS: Fix LTP test failures when timestamps are delegated [ Upstream commit `b623390045` ] The utimes01 and utime06 tests fail when delegated timestamps are enabled, specifically in subtests that modify the atime and mtime fields using the 'nobody' user ID. The problem can be reproduced as follow: # echo "/media *(rw,no_root_squash,sync)" >> /etc/exports # export -ra # mount -o rw,nfsvers=4.2 127.0.0.1:/media /tmpdir # cd /opt/ltp # ./runltp -d /tmpdir -s utimes01 # ./runltp -d /tmpdir -s utime06 This issue occurs because nfs_setattr does not verify the inode's UID against the caller's fsuid when delegated timestamps are permitted for the inode. This patch adds the UID check and if it does not match then the request is sent to the server for permission checking. Fixes: `e12912d941` ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:55 +01:00
Trond Myklebust	35517f62a0	NFSv4: Fix an incorrect parameter when calling nfs4_call_sync() [ Upstream commit `1f214e9c3a` ] The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(), which is clearly incorrect. Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs to be passed down to the RPC layer. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Fixes: `76998ebb91` ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:55 +01:00
Yang Xiuwei	b058e49fd6	NFS: sysfs: fix leak when nfs_client kobject add fails [ Upstream commit `7a7a345652` ] If adding the second kobject fails, drop both references to avoid sysfs residue and memory leak. Fixes: `e96f9268ee` ("NFS: Make all of /sys/fs/nfs network-namespace unique") Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Reviewed-by: Benjamin Coddington <ben.coddington@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:54 +01:00
Trond Myklebust	bd4064f18d	NFSv2/v3: Fix error handling in nfs_atomic_open_v23() [ Upstream commit `85d2c2392a` ] When nfs_do_create() returns an EEXIST error, it means that a regular file could not be created. That could mean that a symlink needs to be resolved. If that's the case, a lookup needs to be kicked off. Reported-by: Stephen Abbene <sabbene87@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220710 Fixes: `7c6c5249f0` ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:54 +01:00
Al Viro	7da2c13e73	simplify nfs_atomic_open_v23() [ Upstream commit `aae9db5739` ] 1) finish_no_open() takes ERR_PTR() as dentry now. 2) caller of ->atomic_open() will call d_lookup_done() itself, no need to do it here. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: `85d2c2392a` ("NFSv2/v3: Fix error handling in nfs_atomic_open_v23()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:54 +01:00
Trond Myklebust	8961b12d5a	pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS [ Upstream commit `8ab523ce78` ] The default setting for the transport security policy must be RPC_XPRTSEC_NONE, when using a TCP or RDMA connection without TLS. Conversely, when using TLS, the security policy needs to be set. Fixes: `6c0a8c5fcf` ("NFS: Have struct nfs_client carry a TLS policy field") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:54 +01:00
Trond Myklebust	b8031e779a	pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect() [ Upstream commit `28e19737e1` ] Don't try to add an RDMA transport to a client that is already marked as being a TCP/TLS transport. Fixes: `a35518cae4` ("NFSv4.1/pnfs: fix NFS with TLS in pnfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:54 +01:00
Scott Mayhew	25fbc3c27f	NFS: check if suid/sgid was cleared after a write as needed [ Upstream commit `9ff022f382` ] I noticed xfstests generic/193 and generic/355 started failing against knfsd after commit `e7a8ebc305` ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE"). I ran those same tests against ONTAP (which has had write delegation support for a lot longer than knfsd) and they fail there too... so while it's a new failure against knfsd, it isn't an entirely new failure. Add the NFS_INO_REVAL_FORCED flag so that the presence of a delegation doesn't keep the inode from being revalidated to fetch the updated mode. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:48 +01:00
Joshua Watt	dfd7e631a7	NFS4: Apply delay_retrans to async operations [ Upstream commit `7a84394f02` ] The setting of delay_retrans is applied to synchronous RPC operations because the retransmit count is stored in same struct nfs4_exception that is passed each time an error is checked. However, for asynchronous operations (READ, WRITE, LOCKU, CLOSE, DELEGRETURN), a new struct nfs4_exception is made on the stack each time the task callback is invoked. This means that the retransmit count is always zero and thus delay_retrans never takes effect. Apply delay_retrans to these operations by tracking and updating their retransmit count. Change-Id: Ieb33e046c2b277cb979caa3faca7f52faf0568c9 Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:47 +01:00
Joshua Watt	ba6fdd9b4d	NFS4: Fix state renewals missing after boot [ Upstream commit `9bb3baa9d1` ] Since the last renewal time was initialized to 0 and jiffies start counting at -5 minutes, any clients connected in the first 5 minutes after a reboot would have their renewal timer set to a very long interval. If the connection was idle, this would result in the client state timing out on the server and the next call to the server would return NFS4ERR_BADSESSION. Fix this by initializing the last renewal time to the current jiffies instead of 0. Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-24 10:35:47 +01:00
Al Viro	40be5b9080	nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing [ Upstream commit `a890a2e339` ] Theoretically it's an oopsable race, but I don't believe one can manage to hit it on real hardware; might become doable on a KVM, but it still won't be easy to attack. Anyway, it's easy to deal with - since xdr_encode_hyper() is just a call of put_unaligned_be64(), we can put that under ->d_lock and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-13 15:34:29 -05:00
Anthony Iliopoulos	0fc9604a42	NFSv4.1: fix mount hang after CREATE_SESSION failure [ Upstream commit `bf75ad0968` ] When client initialization goes through server trunking discovery, it schedules the state manager and then sleeps waiting for nfs_client initialization completion. The state manager can fail during state recovery, and specifically in lease establishment as nfs41_init_clientid() will bail out in case of errors returned from nfs4_proc_create_session(), without ever marking the client ready. The session creation can fail for a variety of reasons e.g. during backchannel parameter negotiation, with status -EINVAL. The error status will propagate all the way to the nfs4_state_manager but the client status will not be marked, and thus the mount process will remain blocked waiting. Fix it by adding -EINVAL error handling to nfs4_state_manager(). Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-13 15:34:29 -05:00
Olga Kornievskaia	4904f473c4	NFSv4: handle ERR_GRACE on delegation recalls [ Upstream commit `be390f9524` ] RFC7530 states that clients should be prepared for the return of NFS4ERR_GRACE errors for non-reclaim lock and I/O requests. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-13 15:34:29 -05:00
NeilBrown	0a1ee3c932	nfsd: don't use sv_nrthreads in connection limiting calculations. [ Upstream commit `eccbbc7c00` ] The heuristic for limiting the number of incoming connections to nfsd currently uses sv_nrthreads - allowing more connections if more threads were configured. A future patch will allow number of threads to grow dynamically so that there will be no need to configure sv_nrthreads. So we need a different solution for limiting connections. It isn't clear what problem is solved by limiting connections (as mentioned in a code comment) but the most likely problem is a connection storm - many connections that are not doing productive work. These will be closed after about 6 minutes already but it might help to slow down a storm. This patch adds a per-connection flag XPT_PEER_VALID which indicates that the peer has presented a filehandle for which it has some sort of access. i.e the peer is known to be trusted in some way. We now only count connections which have NOT been determined to be valid. There should be relative few of these at any given time. If the number of non-validated peer exceed a limit - currently 64 - we close the oldest non-validated peer to avoid having too many of these useless connections. Note that this patch significantly changes the meaning of the various configuration parameters for "max connections". The next patch will remove all of these. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: `898374fdd7` ("nfsd: unregister with rpcbind when deleting a transport") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:34:03 +02:00
Anthony Iliopoulos	35b11653da	NFSv4.1: fix backchannel max_resp_sz verification check [ Upstream commit `191512355e` ] When the client max_resp_sz is larger than what the server encodes in its reply, the nfs4_verify_back_channel_attrs() check fails and this causes nfs4_proc_create_session() to fail, in cases where the client page size is larger than that of the server and the server does not want to negotiate upwards. While this is not a problem with the linux nfs server that will reflect the proposed value in its reply irrespective of the local page size, other nfs server implementations may insist on their own max_resp_sz value, which could be smaller. Fix this by accepting smaller max_resp_sz values from the server, as this does not violate the protocol. The server is allowed to decrease but not increase proposed the size, and as such values smaller than the client-proposed ones are valid. Fixes: `43c2e885be` ("nfs4: fix channel attribute sanity-checks") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:00:16 +02:00
Jonathan Curley	f15ebc876f	NFSv4/flexfiles: Fix layout merge mirror check. [ Upstream commit `dd2fa82473` ] Typo in ff_lseg_match_mirrors makes the diff ineffective. This results in merge happening all the time. Merge happening all the time is problematic because it marks lsegs invalid. Marking lsegs invalid causes all outstanding IO to get restarted with EAGAIN and connections to get closed. Closing connections constantly triggers race conditions in the RDMA implementation... Fixes: `660d1eb223` ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match") Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Trond Myklebust	b7c6c76c85	NFS: nfs_invalidate_folio() must observe the offset and size arguments [ Upstream commit `b7b8574225` ] If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation. Fixes: `d47992f86b` ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Trond Myklebust	e1651ba799	NFSv4.2: Serialise O_DIRECT i/o and copy range [ Upstream commit `ca247c8990` ] Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Trond Myklebust	fc0e6342ad	NFSv4.2: Serialise O_DIRECT i/o and clone range [ Upstream commit `c80ebeba11` ] Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Trond Myklebust	5eb9e22919	NFSv4.2: Serialise O_DIRECT i/o and fallocate() [ Upstream commit `b93128f297` ] Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates. Fixes: `99f2378322` ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Trond Myklebust	abfd17844a	NFS: Serialise O_DIRECT i/o and truncate() [ Upstream commit `9eb90f4354` ] Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete. Fixes: `a5864c999d` ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Max Kellermann	7f08d14103	fs/nfs/io: make nfs_start_io_*() killable [ Upstream commit `38a125b315` ] This allows killing processes that wait for a lock when one process is stuck waiting for the NFS server. This aims to complete the coverage of NFS operations being killable, like nfs_direct_wait() does, for example. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: `9eb90f4354` ("NFS: Serialise O_DIRECT i/o and truncate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:44 +02:00
Scott Mayhew	57c1bb02b4	nfs/localio: restore creds before releasing pageio data [ Upstream commit `992203a1fb` ] Otherwise if the nfsd filecache code releases the nfsd_file immediately, it can trigger the BUG_ON(cred == current->cred) in __put_cred() when it puts the nfsd_file->nf_file->f-cred. Fixes: `b9f5dd57f4` ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250807164938.2395136-1-smayhew@redhat.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00
Mike Snitzer	a707c9a838	nfs/localio: add direct IO enablement with sync and async IO support [ Upstream commit `3feec68563` ] This commit simply adds the required O_DIRECT plumbing. It doesn't address the fact that NFS doesn't ensure all writes are page aligned (nor device logical block size aligned as required by O_DIRECT). Because NFS will read-modify-write for IO that isn't aligned, LOCALIO will not use O_DIRECT semantics by default if/when an application requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by: 1: Adding a flag to the nfs_pgio_header struct to allow the NFS O_DIRECT layer to signal that O_DIRECT was used by the application 2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that when enabled will cause LOCALIO to use O_DIRECT semantics (this may cause IO to fail if applications do not properly align their IO). This commit is derived from code developed by Weston Andros Adamson. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Stable-dep-of: `992203a1fb` ("nfs/localio: restore creds before releasing pageio data") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00
Mike Snitzer	b0bf81e05b	nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter [ Upstream commit `0978e5b85f` ] Push the read_iter and write_iter availability checks down to nfs_do_local_read and nfs_do_local_write respectively. This eliminates a redundant nfs_to->nfsd_file_file() call. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Stable-dep-of: `992203a1fb` ("nfs/localio: restore creds before releasing pageio data") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00
Trond Myklebust	526d747df4	NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server [ Upstream commit `4fb2b677fc` ] nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset on entry to the function. Fixes: `b78ef845c3` ("NFSv4.2: query the server for extended attribute support") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00
Trond Myklebust	643ccedbbe	NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported [ Upstream commit `b3ac334360` ] _nfs4_server_capabilities() should clear capabilities that are not supported by the server. Fixes: `d2a00cceb9` ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00

1 2 3 4 5 ...

6983 Commits