Files
linux-stable-mirror/include/linux/namei.h
T
Linus Torvalds 7e0e7bd60d Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Reduce pipe->mutex contention by pre-allocating pages outside the
     lock in anon_pipe_write().

     anon_pipe_write() called alloc_page() once per page while holding
     pipe->mutex. The allocation can sleep doing direct reclaim and runs
     memcg charging, which extends the critical section and stalls any
     concurrent reader on the same mutex. Now up to 8 pages are
     pre-allocated before the mutex is taken, leftovers are recycled
     into the per-pipe tmp_page[] cache before unlock, and any remainder
     is released after unlock, keeping the allocator out of the critical
     section on both sides. On a writers x readers sweep with 64KB
     writes against a 1 MB pipe throughput improves 6-28% and average
     write latency drops 5-22%; under memory pressure - when the cost of
     holding the mutex across reclaim is highest - throughput improves
     21-48% and latency drops 17-33%. The microbenchmark is added to
     selftests.

   - uaccess/sockptr: fix the ignored_trailing logic in
     copy_struct_to_user() to behave as documented and the usize check
     in copy_struct_from_sockptr() for user pointers, and add
     copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
     helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).

   - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
     inode backing a dentry via d_real_inode(). On overlayfs the inode
     attached to the dentry doesn't carry the underlying device
     information; this is used by the filesystem restriction BPF program
     that was merged into systemd.

   - docs: add guidelines for submitting new filesystems, motivated by
     the maintenance burden abandoned and untestable filesystems impose
     on VFS developers, blocking infrastructure work like folio
     conversions and iomap migration.

  Fixes:

   - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
     and drop the now-redundant assignments in callers. This began as a
     one-line dma-buf fix for a path_noexec() warning; a pseudo
     filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
     callers were audited: the only visible effect is on dma-buf where
     SB_I_NOEXEC silences the warning.

   - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
     qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
     device with a sector size > PAGE_SIZE crashed roughly half of them;
     the rest had the same missing error handling pattern. Plus a
     follow-up releasing the superblock buffer_head when setting the
     minix v3 block size fails.

   - mount: honour SB_NOUSER in the new mount API.

   - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
     switching the process-group paths of send_sigio() and send_sigurg()
     from read_lock(&tasklist_lock) to RCU, matching the single-PID
     path.

   - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
     delegated NFS mounts (fsopen() in a container with the mount
     performed by a privileged daemon) that broke when non-init
     s_user_ns was tied to FS_USERNS_MOUNT.

   - selftests/namespaces: fix a hang in nsid_test where an unreaped
     grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
     listns_efault_test, and a false FAIL on kernels without listns()
     where the tests should SKIP.

   - filelock: fix the break_lease() stub signature for
     CONFIG_FILE_LOCKING=n.

   - init/initramfs_test: wait for the async initramfs unpacking before
     running; the test and do_populate_rootfs() share the parser state.

   - fs/coredump: reduce redundant log noise in
     validate_coredump_safety().

   - iomap: pass the correct length to fserror_report_io() in
     __iomap_write_begin().

   - backing-file: fix the backing_file_open() kerneldoc.

  Cleanups:

   - initramfs: refactor the cpio hex header parsing to use hex2bin()
     instead of the hand-rolled simple_strntoul() which is reverted, and
     extend the initramfs KUnit tests to cover header fields with 0x
     prefixes.

   - Replace __get_free_pages() and friends with kmalloc()/kzalloc()
     across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
     isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
     do_mounts init code - part of the larger work of replacing page
     allocator calls with kmalloc().

   - Use clear_and_wake_up_bit() in unlock_buffer() and
     journal_end_buffer_io_sync() instead of open-coding the sequence.

   - Drop unused VFS exports: unexport drop_super_exclusive(), remove
     start_removing_user_path_at(), and fold __start_removing_path()
     into start_removing_path().

   - fs/read_write: narrow the __kernel_write() export with
     EXPORT_SYMBOL_FOR_MODULES().

   - vfs: uapi: retire octal and hex constants in favor of (1 << n) for
     the O_ flags. Finding a free bit for a new flag across the
     architectures was needlessly hard with the mixed bases.

   - dcache: add extra sanity checks of dead dentries in dentry_free()
     via a new DENTRY_WARN_ONCE() that also prints d_flags.

   - iov_iter: use kmemdup_array() in dup_iter() to harden the
     allocation against multiplication overflow.

   - fs/pipe: write to ->poll_usage only once.

   - vfs: remove an always-taken if-branch in find_next_fd().

   - dcache: use kmalloc_flex() for struct external_name in __d_alloc().

   - namei: use QSTR() instead of QSTR_INIT() in path_pts().

   - sync_file_range: delete dead S_ISLNK code.

   - Comment fixes: retire a stale comment in fget_task_next() and fix
     assorted spelling mistakes"

* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
  backing-file: fix backing_file_open() kerneldoc parameter
  iomap: pass the correct len to fserror_report_io in __iomap_write_begin
  vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
  filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
  vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
  bpf: add bpf_real_inode() kfunc
  fs/read_write: Do not export __kernel_write() to the entire world
  libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
  libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
  mount: honour SB_NOUSER in the new mount API
  fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
  selftests/pipe: add pipe_bench microbenchmark
  fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
  fs: retire stale comment in fget_task_next()
  fs: fix spelling mistakes in comment
  bfs: replace get_zeroed_page() with kzalloc()
  binfmt_misc: replace __get_free_page() with kmalloc()
  configfs: replace __get_free_pages() with kzalloc()
  fs/namespace: use __getname() to allocate mntpath buffer
  fs/select: replace __get_free_page() with kmalloc()
  ...
2026-06-15 03:59:45 +05:30

218 lines
8.3 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/path.h>
#include <linux/fcntl.h>
#include <linux/errno.h>
#include <linux/fs_struct.h>
enum { MAX_NESTED_LINKS = 8 };
#define MAXSYMLINKS 40
/* pathwalk mode */
#define LOOKUP_FOLLOW BIT(0) /* follow links at the end */
#define LOOKUP_DIRECTORY BIT(1) /* require a directory */
#define LOOKUP_AUTOMOUNT BIT(2) /* force terminal automount */
#define LOOKUP_EMPTY BIT(3) /* accept empty path [user_... only] */
#define LOOKUP_LINKAT_EMPTY BIT(4) /* Linkat request with empty path. */
#define LOOKUP_DOWN BIT(5) /* follow mounts in the starting point */
#define LOOKUP_MOUNTPOINT BIT(6) /* follow mounts in the end */
#define LOOKUP_REVAL BIT(7) /* tell ->d_revalidate() to trust no cache */
#define LOOKUP_RCU BIT(8) /* RCU pathwalk mode; semi-internal */
#define LOOKUP_CACHED BIT(9) /* Only do cached lookup */
#define LOOKUP_PARENT BIT(10) /* Looking up final parent in path */
/* 5 spare bits for pathwalk */
/* These tell filesystem methods that we are dealing with the final component... */
#define LOOKUP_OPEN BIT(16) /* ... in open */
#define LOOKUP_CREATE BIT(17) /* ... in object creation */
#define LOOKUP_EXCL BIT(18) /* ... in target must not exist */
#define LOOKUP_RENAME_TARGET BIT(19) /* ... in destination of rename() */
/* 4 spare bits for intent */
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS BIT(24) /* No symlink crossing. */
#define LOOKUP_NO_MAGICLINKS BIT(25) /* No nd_jump_link() crossing. */
#define LOOKUP_NO_XDEV BIT(26) /* No mountpoint crossing. */
#define LOOKUP_BENEATH BIT(27) /* No escaping from starting point. */
#define LOOKUP_IN_ROOT BIT(28) /* Treat dirfd as fs root. */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
/* 3 spare bits for scoping */
extern int path_pts(struct path *path);
extern int user_path_at(int, const char __user *, unsigned, struct path *);
extern int kern_path(const char *, unsigned, struct path *);
struct dentry *kern_path_parent(const char *name, struct path *parent);
extern struct dentry *start_creating_path(int, const char *, struct path *, unsigned int);
extern struct dentry *start_creating_user_path(int, const char __user *, struct path *, unsigned int);
extern void end_creating_path(const struct path *, struct dentry *);
extern struct dentry *start_removing_path(const char *, struct path *);
static inline void end_removing_path(const struct path *path , struct dentry *dentry)
{
end_creating_path(path, dentry);
}
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last,
const struct path *root);
int vfs_path_lookup(struct dentry *, struct vfsmount *, const char *,
unsigned int, struct path *);
extern struct dentry *try_lookup_noperm(struct qstr *, struct dentry *);
extern struct dentry *lookup_noperm(struct qstr *, struct dentry *);
extern struct dentry *lookup_noperm_unlocked(struct qstr *, struct dentry *);
extern struct dentry *lookup_noperm_positive_unlocked(struct qstr *, struct dentry *);
struct dentry *lookup_one(struct mnt_idmap *, struct qstr *, struct dentry *);
struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
struct qstr *name, struct dentry *base);
struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_creating_killable(struct mnt_idmap *idmap,
struct dentry *parent,
struct qstr *name);
struct dentry *start_removing_killable(struct mnt_idmap *idmap,
struct dentry *parent,
struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_creating_dentry(struct dentry *parent,
struct dentry *child);
struct dentry *start_removing_dentry(struct dentry *parent,
struct dentry *child);
/* end_creating - finish action started with start_creating
* @child: dentry returned by start_creating() or vfs_mkdir()
*
* Unlock and release the child. This can be called after
* start_creating() whether that function succeeded or not,
* but it is not needed on failure.
*
* If vfs_mkdir() was called then the value returned from that function
* should be given for @child rather than the original dentry, as vfs_mkdir()
* may have provided a new dentry.
*
*
* If vfs_mkdir() was not called, then @child will be a valid dentry and
* @parent will be ignored.
*/
static inline void end_creating(struct dentry *child)
{
end_dirop(child);
}
/* end_creating_keep - finish action started with start_creating() and return result
* @child: dentry returned by start_creating() or vfs_mkdir()
*
* Unlock and return the child. This can be called after
* start_creating() whether that function succeeded or not,
* but it is not needed on failure.
*
* If vfs_mkdir() was called then the value returned from that function
* should be given for @child rather than the original dentry, as vfs_mkdir()
* may have provided a new dentry.
*
* Returns: @child, which may be a dentry or an error.
*
*/
static inline struct dentry *end_creating_keep(struct dentry *child)
{
if (!IS_ERR(child))
dget(child);
end_dirop(child);
return child;
}
/**
* end_removing - finish action started with start_removing
* @child: dentry returned by start_removing()
* @parent: dentry given to start_removing()
*
* Unlock and release the child.
*
* This is identical to end_dirop(). It can be passed the result of
* start_removing() whether that was successful or not, but it not needed
* if start_removing() failed.
*/
static inline void end_removing(struct dentry *child)
{
end_dirop(child);
}
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
int start_renaming(struct renamedata *rd, int lookup_flags,
struct qstr *old_last, struct qstr *new_last);
int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
struct dentry *old_dentry, struct qstr *new_last);
int start_renaming_two_dentries(struct renamedata *rd,
struct dentry *old_dentry, struct dentry *new_dentry);
void end_renaming(struct renamedata *rd);
/**
* mode_strip_umask - handle vfs umask stripping
* @dir: parent directory of the new inode
* @mode: mode of the new inode to be created in @dir
*
* In most filesystems, umask stripping depends on whether or not the
* filesystem supports POSIX ACLs. If the filesystem doesn't support it umask
* stripping is done directly in here. If the filesystem does support POSIX
* ACLs umask stripping is deferred until the filesystem calls
* posix_acl_create().
*
* Some filesystems (like NFSv4) also want to avoid umask stripping by the
* VFS, but don't support POSIX ACLs. Those filesystems can set SB_I_NOUMASK
* to get this effect without declaring that they support POSIX ACLs.
*
* Returns: mode
*/
static inline umode_t __must_check mode_strip_umask(const struct inode *dir, umode_t mode)
{
if (!IS_POSIXACL(dir) && !(dir->i_sb->s_iflags & SB_I_NOUMASK))
mode &= ~current_umask();
return mode;
}
extern int __must_check nd_jump_link(const struct path *path);
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
((char *) name)[min(len, maxlen)] = '\0';
}
/**
* retry_estale - determine whether the caller should retry an operation
* @error: the error that would currently be returned
* @flags: flags being used for next lookup attempt
*
* Check to see if the error code was -ESTALE, and then determine whether
* to retry the call based on whether "flags" already has LOOKUP_REVAL set.
*
* Returns true if the caller should try the operation again.
*/
static inline bool
retry_estale(const long error, const unsigned int flags)
{
return unlikely(error == -ESTALE && !(flags & LOOKUP_REVAL));
}
#endif /* _LINUX_NAMEI_H */