2850 Commits

Author SHA1 Message Date
Nathan Chancellor
f709859f33 init/Kconfig: Adjust fixed clang version for __builtin_counted_by_ref
Commit d39a1d7486 ("compiler_types: Disable __builtin_counted_by_ref
for Clang") used 22.0.0 as the fixed version for a compiler crash but
the fix was only merged in main (23.0.0) and release/22.x (22.1.0). With
the current fixed version number, prerelease or Android LLVM 22 builds
will still be able to hit the compiler crash when building the kernel.
This can be particularly disruptive when bisecting LLVM.

Use 21.1.0 as the fixed version number to ensure the fix for this crash
is always present.

Fixes: d39a1d7486 ("compiler_types: Disable __builtin_counted_by_ref for Clang")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260223-fix-clang-version-builtin-counted-by-ref-v1-1-3ea478a24f0a@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-23 14:35:16 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Kees Cook
d39a1d7486 compiler_types: Disable __builtin_counted_by_ref for Clang
Unfortunately, there is a corner case of __builtin_counted_by_ref()
usage that crashes[1] Clang since support was introduced in Clang 19.
Disable it prior to Clang 22. Found while tested kmalloc_obj treewide
refactoring (via kmalloc_flex() usage).

Link: https://github.com/llvm/llvm-project/issues/182575 [1]
Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:01:14 -08:00
Linus Torvalds
136114e0ab Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
4cff5c05e0 Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...
2026-02-12 11:32:37 -08:00
Linus Torvalds
41f1a08645 Merge tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild/Kconfig updates from Nathan Chancellor:
 "Kbuild:

   - Drop '*_probe' pattern from modpost section check allowlist, which
     hid legitimate warnings (Johan Hovold)

   - Disable -Wtype-limits altogether, instead of enabling at W=2
     (Vincent Mailhol)

   - Improve UAPI testing to skip testing headers that require a libc
     when CONFIG_CC_CAN_LINK is not set, opening up testing of headers
     with no libc dependencies to more environments (Thomas Weißschuh)

   - Update gendwarfksyms documentation with required dependencies
     (Jihan LIN)

   - Reject invalid LLVM= values to avoid unintentionally falling back
     to system toolchain (Thomas Weißschuh)

   - Add a script to help run the kernel build process in a container
     for consistent environments and testing (Guillaume Tucker)

   - Simplify kallsyms by getting rid of the relative base (Ard
     Biesheuvel)

   - Performance and usability improvements to scripts/make_fit.py
     (Simon Glass)

   - Minor various clean ups and fixes

  Kconfig:

   - Move XPM icons to individual files, clearing up GTK deprecation
     warnings (Rostislav Krasny)

   - Support

        depends on FOO if BAR

     as syntactic sugar for

        depends on FOO || !BAR

     (Nicolas Pitre, Graham Roff)

   - Refactor merge_config.sh to use awk over shell/sed/grep,
     dramatically speeding up processing large number of config
     fragments (Anders Roxell, Mikko Rapeli)"

* tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (39 commits)
  kbuild: remove dependency of run-command on config
  scripts/make_fit: Compress dtbs in parallel
  scripts/make_fit: Support a few more parallel compressors
  kbuild: Support a FIT_EXTRA_ARGS environment variable
  scripts/make_fit: Move dtb processing into a function
  scripts/make_fit: Support an initial ramdisk
  scripts/make_fit: Speed up operation
  rust: kconfig: Don't require RUST_IS_AVAILABLE for rustc-option
  MAINTAINERS: Add scripts/install.sh into Kbuild entry
  modpost: Amend ppc64 save/restfpr symnames for -Os build
  MIPS: tools: relocs: Ship a definition of R_MIPS_PC32
  streamline_config.pl: remove superfluous exclamation mark
  kbuild: dummy-tools: Add python3
  scripts: kconfig: merge_config.sh: warn on duplicate input files
  scripts: kconfig: merge_config.sh: use awk in checks too
  scripts: kconfig: merge_config.sh: refactor from shell/sed/grep to awk
  kallsyms: Get rid of kallsyms relative base
  mips: Add support for PC32 relocations in vmlinux
  Documentation: dev-tools: add container.rst page
  scripts: add tool to run containerized builds
  ...
2026-02-11 13:40:35 -08:00
Linus Torvalds
36ae1c45b2 Merge tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Scheduler Kconfig space updates:

   - Further consolidate configurable preemption modes (Peter Zijlstra)

     Reduce the number of architectures that are allowed to offer
     PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of
     preemption models from four to just two: 'full' and 'lazy' on
     up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
     x86).

     None and voluntary are only available as legacy features on
     platforms that don't implement lazy preemption yet, or which don't
     even support preemption.

     The goal is to eventually remove cond_resched() and voluntary
     preemption altogether.

  RSEQ based 'scheduler time slice extension' support (Thomas Gleixner
  and Peter Zijlstra):

  This allows a thread to request a time slice extension when it enters
  a critical section to avoid contention on a resource when the thread
  is scheduled out inside of the critical section.

   - Add fields and constants for time slice extension
   - Provide static branch for time slice extensions
   - Add statistics for time slice extensions
   - Add prctl() to enable time slice extensions
   - Implement sys_rseq_slice_yield()
   - Implement syscall entry work for time slice extensions
   - Implement time slice extension enforcement timer
   - Reset slice extension when scheduled
   - Implement rseq_grant_slice_extension()
   - entry: Hook up rseq time slice extension
   - selftests: Implement time slice extension test
   - Allow registering RSEQ with slice extension
   - Move slice_ext_nsec to debugfs
   - Lower default slice extension
   - selftests/rseq: Add rseq slice histogram script

  Scheduler performance/scalability improvements:

   - Update rq->avg_idle when a task is moved to an idle CPU, which
     improves the scalability of various workloads (Shubhang Kaushik)

   - Reorder fields in 'struct rq' for better caching (Blake Jones)

   - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde):
      - Move checking for nohz cpus after time check
      - Change likelyhood of nohz.nr_cpus
      - Remove nohz.nr_cpus and use weight of cpumask instead

   - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)

   - Cleanups (Yury Norov):
      - Drop useless cpumask_empty() in find_energy_efficient_cpu()
      - Simplify task_numa_find_cpu()
      - Use cpumask_weight_and() in sched_balance_find_dst_group()

  DL scheduler updates:

   - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel
     Fernandes, with fixes by Peter Zijlstra)

  RT scheduler updates:

   - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)

  Entry code updates and performance improvements (Jinjie Ruan)

  This is part of the scheduler tree in this cycle due to inter-
  dependencies with the RSEQ based time slice extension work:

    - Remove unused syscall argument from syscall_trace_enter()
    - Rework syscall_exit_to_user_mode_work() for architecture reuse
    - Add arch_ptrace_report_syscall_entry/exit()
    - Inline syscall_exit_work() and syscall_trace_enter()

  Scheduler core updates (Peter Zijlstra):

   - Rework sched_class::wakeup_preempt() and rq_modified_*()
   - Avoid rq->lock bouncing in sched_balance_newidle()
   - Rename rcu_dereference_check_sched_domain() =>
            rcu_dereference_sched_domain()
   - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper

  Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar):

   - Fold the sched_avg update
   - Change rcu_dereference_check_sched_domain() to rcu-sched
   - Switch to rcu_dereference_all()
   - Remove superfluous rcu_read_lock()
   - Limit hrtick work
   - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
   - Clean up comments in 'struct cfs_rq'
   - Separate se->vlag from se->vprot
   - Rename cfs_rq::avg_load to cfs_rq::sum_weight
   - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
   - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for
     wrapped-signed aritmetics
   - Sort out 'blocked_load*' namespace noise

  Scheduler debugging code updates:

   - Export hidden tracepoints to modules (Gabriele Monaco)

   - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
     (Fushuai Wang)

   - Add assertions to QUEUE_CLASS (Peter Zijlstra)

   - hrtimer: Fix tracing oddity (Thomas Gleixner)

  Misc fixes and cleanups:

   - Re-evaluate scheduling when migrating queued tasks out of throttled
     cgroups (Zicheng Qu)

   - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)

   - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)

   - sched/cpufreq: Use %pe format for PTR_ERR() printing
     (zenghongling)"

* tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
  sched/cpufreq: Use %pe format for PTR_ERR() printing
  sched/rt: Skip currently executing CPU in rto_next_cpu()
  sched/clock: Avoid false sharing for sched_clock_irqtime
  selftests/sched_ext: Add test for DL server total_bw consistency
  selftests/sched_ext: Add test for sched_ext dl_server
  sched/debug: Fix dl_server (re)start conditions
  sched/debug: Add support to change sched_ext server params
  sched_ext: Add a DL server for sched_ext tasks
  sched/debug: Stop and start server based on if it was active
  sched/debug: Fix updating of ppos on server write ops
  sched/deadline: Clear the defer params
  entry: Inline syscall_exit_work() and syscall_trace_enter()
  entry: Add arch_ptrace_report_syscall_entry/exit()
  entry: Rework syscall_exit_to_user_mode_work() for architecture reuse
  entry: Remove unused syscall argument from syscall_trace_enter()
  sched: remove task_struct->faults_disabled_mapping
  sched: Update rq->avg_idle when a task is moved to an idle CPU
  selftests/rseq: Add rseq slice histogram script
  hrtimer: Fix trace oddity
  ...
2026-02-10 12:50:10 -08:00
Linus Torvalds
4d84667627 Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
 "x86 PMU driver updates:

   - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
     (Dapeng Mi)

     Compared to previous iterations of the Intel PMU code, there's been
     a lot of changes, which center around three main areas:

      - Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
        Off-Core Response (OCR) facility

      - New PEBS data source encoding layout

      - Support the new "RDPMC user disable" feature

   - Likewise, a large series adds uncore PMU support for Intel Diamond
     Rapids (DMR) CPUs (Zide Chen)

     This centers around these four main areas:

      - DMR may have two Integrated I/O and Memory Hub (IMH) dies,
        separate from the compute tile (CBB) dies. Each CBB and each IMH
        die has its own discovery domain.

      - Unlike prior CPUs that retrieve the global discovery table
        portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
        discovery and MSR for CBB PMON discovery.

      - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
        PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.

      - IIO free-running counters in DMR are MMIO-based, unlike SPR.

   - Also add support for Add missing PMON units for Intel Panther Lake,
     and support Nova Lake (NVL), which largely maps to Panther Lake.
     (Zide Chen)

   - KVM integration: Add support for mediated vPMUs (by Kan Liang and
     Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
     Sandipan Das and Mingwei Zhang)

   - Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
     which are a low-power variant of Panther Lake (Zide Chen)

   - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
     (aka MaxLinear Lightning Mountain), which maps to the existing
     Airmont code (Martin Schiller)

  Performance enhancements:

   - Speed up kexec shutdown by avoiding unnecessary cross CPU calls
     (Jan H. Schönherr)

   - Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)

  User-space stack unwinding support:

   - Various cleanups and refactorings in preparation to generalize the
     unwinding code for other architectures (Jens Remus)

  Uprobes updates:

   - Transition from kmap_atomic to kmap_local_page (Keke Ming)

   - Fix incorrect lockdep condition in filter_chain() (Breno Leitao)

   - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)

  Misc fixes and cleanups:

   - s390: Remove kvm_types.h from Kbuild (Randy Dunlap)

   - x86/intel/uncore: Convert comma to semicolon (Chen Ni)

   - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)

   - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"

* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  s390: remove kvm_types.h from Kbuild
  uprobes: Fix incorrect lockdep condition in filter_chain()
  x86/ibs: Fix typo in dc_l2tlb_miss comment
  x86/uprobes: Fix XOL allocation failure for 32-bit tasks
  perf/x86/intel/uncore: Convert comma to semicolon
  perf/x86/intel: Add support for rdpmc user disable feature
  perf/x86: Use macros to replace magic numbers in attr_rdpmc
  perf/x86/intel: Add core PMU support for Novalake
  perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
  perf/x86/intel: Add core PMU support for DMR
  perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
  perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
  perf/core: Fix slow perf_event_task_exit() with LBR callstacks
  perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
  uprobes: use kmap_local_page() for temporary page mappings
  arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  perf/x86/intel/uncore: Add Nova Lake support
  ...
2026-02-10 12:00:46 -08:00
Linus Torvalds
f17b474e36 Merge tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:

 - Support associating BPF program with struct_ops (Amery Hung)

 - Switch BPF local storage to rqspinlock and remove recursion detection
   counters which were causing false positives (Amery Hung)

 - Fix live registers marking for indirect jumps (Anton Protopopov)

 - Introduce execution context detection BPF helpers (Changwoo Min)

 - Improve verifier precision for 32bit sign extension pattern
   (Cupertino Miranda)

 - Optimize BTF type lookup by sorting vmlinux BTF and doing binary
   search (Donglin Peng)

 - Allow states pruning for misc/invalid slots in iterator loops (Eduard
   Zingerman)

 - In preparation for ASAN support in BPF arenas teach libbpf to move
   global BPF variables to the end of the region and enable arena kfuncs
   while holding locks (Emil Tsalapatis)

 - Introduce support for implicit arguments in kfuncs and migrate a
   number of them to new API. This is a prerequisite for cgroup
   sub-schedulers in sched-ext (Ihor Solodrai)

 - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen)

 - Fix ORC stack unwind from kprobe_multi (Jiri Olsa)

 - Speed up fentry attach by using single ftrace direct ops in BPF
   trampolines (Jiri Olsa)

 - Require frozen map for calculating map hash (KP Singh)

 - Fix lock entry creation in TAS fallback in rqspinlock (Kumar
   Kartikeya Dwivedi)

 - Allow user space to select cpu in lookup/update operations on per-cpu
   array and hash maps (Leon Hwang)

 - Make kfuncs return trusted pointers by default (Matt Bobrowski)

 - Introduce "fsession" support where single BPF program is executed
   upon entry and exit from traced kernel function (Menglong Dong)

 - Allow bpf_timer and bpf_wq use in all programs types (Mykyta
   Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei
   Starovoitov)

 - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their
   definition across the tree (Puranjay Mohan)

 - Allow BPF arena calls from non-sleepable context (Puranjay Mohan)

 - Improve register id comparison logic in the verifier and extend
   linked registers with negative offsets (Puranjay Mohan)

 - In preparation for BPF-OOM introduce kfuncs to access memcg events
   (Roman Gushchin)

 - Use CFI compatible destructor kfunc type (Sami Tolvanen)

 - Add bitwise tracking for BPF_END in the verifier (Tianci Cao)

 - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou
   Tang)

 - Make BPF selftests work with 64k page size (Yonghong Song)

* tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits)
  selftests/bpf: Fix outdated test on storage->smap
  selftests/bpf: Choose another percpu variable in bpf for btf_dump test
  selftests/bpf: Remove test_task_storage_map_stress_lookup
  selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
  selftests/bpf: Update task_local_storage/recursion test
  selftests/bpf: Update sk_storage_omem_uncharge test
  bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
  bpf: Support lockless unlink when freeing map or local storage
  bpf: Prepare for bpf_selem_unlink_nofail()
  bpf: Remove unused percpu counter from bpf_local_storage_map_free
  bpf: Remove cgroup local storage percpu counter
  bpf: Remove task local storage percpu counter
  bpf: Change local_storage->lock and b->lock to rqspinlock
  bpf: Convert bpf_selem_unlink to failable
  bpf: Convert bpf_selem_link_map to failable
  bpf: Convert bpf_selem_unlink_map to failable
  bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage
  selftests/xsk: fix number of Tx frags in invalid packet
  selftests/xsk: properly handle batch ending in the middle of a packet
  bpf: Prevent reentrance into call_rcu_tasks_trace()
  ...
2026-02-10 11:26:21 -08:00
Linus Torvalds
85f24b0ace Merge tag 'hardening-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "Mostly small cleanups and various scattered annotations and flex array
  warning fixes that we reviewed by unlanded in other trees. Introduces
  new annotation for expanding counted_by to pointer members, now that
  compiler behavior between GCC and Clang has been normalized.

   - Various missed __counted_by annotations (Thorsten Blum)

   - Various missed -Wflex-array-member-not-at-end fixes (Gustavo A. R.
     Silva)

   - Avoid leftover tempfiles for interrupted compile-time FORTIFY tests
     (Nicolas Schier)

   - Remove non-existant CONFIG_UBSAN_REPORT_FULL from docs (Stefan
     Wiehler)

   - fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
     (David Laight)

   - Add __counted_by_ptr attribute, tests, and first user (Bill
     Wendling, Kees Cook)

   - Update MAINTAINERS file to make hardening section not include
     pstore"

* tag 'hardening-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  MAINTAINERS: pstore: Remove L: entry
  nfp: tls: Avoid -Wflex-array-member-not-at-end warnings
  carl9170: Avoid -Wflex-array-member-not-at-end warning
  coredump: Use __counted_by_ptr for struct core_name::corename
  lkdtm/bugs: Add __counted_by_ptr() test PTR_BOUNDS
  compiler_types.h: Attributes: Add __counted_by_ptr macro
  fortify: Cleanup temp file also on non-successful exit
  fortify: Rename temporary file to match ignore pattern
  fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
  ecryptfs: Annotate struct ecryptfs_message with __counted_by
  fs/xattr: Annotate struct simple_xattr with __counted_by
  crypto: af_alg - Annotate struct af_alg_iv with __counted_by
  Kconfig.ubsan: Remove CONFIG_UBSAN_REPORT_FULL from documentation
  drm/nouveau: fifo: Avoid -Wflex-array-member-not-at-end warning
2026-02-10 08:54:13 -08:00
Linus Torvalds
d16738a4e7 Merge tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull kthread updates from Frederic Weisbecker:
 "The kthread code provides an infrastructure which manages the
  preferred affinity of unbound kthreads (node or custom cpumask)
  against housekeeping (CPU isolation) constraints and CPU hotplug
  events.

  One crucial missing piece is the handling of cpuset: when an isolated
  partition is created, deleted, or its CPUs updated, all the unbound
  kthreads in the top cpuset become indifferently affine to _all_ the
  non-isolated CPUs, possibly breaking their preferred affinity along
  the way.

  Solve this with performing the kthreads affinity update from cpuset to
  the kthreads consolidated relevant code instead so that preferred
  affinities are honoured and applied against the updated cpuset
  isolated partitions.

  The dispatch of the new isolated cpumasks to timers, workqueues and
  kthreads is performed by housekeeping, as per the nice Tejun's
  suggestion.

  As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
  from boot defined domain isolation (through isolcpus=) and cpuset
  isolated partitions. Housekeeping cpumasks are now modifiable with a
  specific RCU based synchronization. A big step toward making
  nohz_full= also mutable through cpuset in the future"

* tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (33 commits)
  doc: Add housekeeping documentation
  kthread: Document kthread_affine_preferred()
  kthread: Comment on the purpose and placement of kthread_affine_node() call
  kthread: Honour kthreads preferred affinity after cpuset changes
  sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
  sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN
  kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management
  kthread: Include kthreadd to the managed affinity list
  kthread: Include unbound kthreads in the managed affinity list
  kthread: Refine naming of affinity related fields
  PCI: Remove superfluous HK_TYPE_WQ check
  sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()
  cpuset: Remove cpuset_cpu_is_isolated()
  timers/migration: Remove superfluous cpuset isolation test
  cpuset: Propagate cpuset isolation update to timers through housekeeping
  cpuset: Propagate cpuset isolation update to workqueue through housekeeping
  PCI: Flush PCI probe workqueue on cpuset isolated partition change
  sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
  sched/isolation: Flush memcg workqueues on cpuset isolated partition change
  cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
  ...
2026-02-09 19:57:30 -08:00
Linus Torvalds
9e355113f0 Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains a mix of VFS cleanups, performance improvements, API
  fixes, documentation, and a deprecation notice.

  Scalability and performance:

   - Rework pid allocation to only take pidmap_lock once instead of
     twice during alloc_pid(), improving thread creation/teardown
     throughput by 10-16% depending on false-sharing luck. Pad the
     namespace refcount to reduce false-sharing

   - Track file lock presence via a flag in ->i_opflags instead of
     reading ->i_flctx, avoiding false-sharing with ->i_readcount on
     open/close hot paths. Measured 4-16% improvement on 24-core
     open-in-a-loop benchmarks

   - Use a consume fence in locks_inode_context() to match the
     store-release/load-consume idiom, eliminating a hardware fence on
     some architectures

   - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
     false-sharing

   - Remove a redundant DCACHE_MANAGED_DENTRY check in
     __follow_mount_rcu() that never fires since the caller already
     verifies it, eliminating a 100% mispredicted branch

   - Fix a 100% mispredicted likely() in devcgroup_inode_permission()
     that became wrong after a prior code reorder

  Bug fixes and correctness:

   - Make insert_inode_locked() wait for inode destruction instead of
     skipping, fixing a corner case where two matching inodes could
     exist in the hash

   - Move f_mode initialization before file_ref_init() in alloc_file()
     to respect the SLAB_TYPESAFE_BY_RCU ordering contract

   - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
     no buffers attached, preventing a null pointer dereference when
     AS_RELEASE_ALWAYS is set but no release_folio op exists

   - Fix select restart_block to store end_time as timespec64, avoiding
     truncation of tv_sec on 32-bit architectures

   - Make dump_inode() use get_kernel_nofault() to safely access inode
     and superblock fields, matching the dump_mapping() pattern

  API modernization:

   - Make posix_acl_to_xattr() allocate the buffer internally since
     every single caller was doing it anyway. Reduces boilerplate and
     unnecessary error checking across ~15 filesystems

   - Replace deprecated simple_strtoul() with kstrtoul() for the
     ihash_entries, dhash_entries, mhash_entries, and mphash_entries
     boot parameters, adding proper error handling

   - Convert chardev code to use guard(mutex) and __free(kfree) cleanup
     patterns

   - Replace min_t() with min() or umin() in VFS code to avoid silently
     truncating unsigned long to unsigned int

   - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
     already check the flag

  Deprecation:

   - Begin deprecating legacy BSD process accounting (acct(2)). The
     interface has numerous footguns and better alternatives exist
     (eBPF)

  Documentation:

   - Fix and complete kernel-doc for struct export_operations, removing
     duplicated documentation between ReST and source

   - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()

  Testing:

   - Add a kunit test for initramfs cpio handling of entries with
     filesize > PATH_MAX

  Misc:

   - Add missing <linux/init_task.h> include in fs_struct.c"

* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  posix_acl: make posix_acl_to_xattr() alloc the buffer
  fs: make insert_inode_locked() wait for inode destruction
  initramfs_test: kunit test for cpio.filesize > PATH_MAX
  fs: improve dump_inode() to safely access inode fields
  fs: add <linux/init_task.h> for 'init_fs'
  docs: exportfs: Use source code struct documentation
  fs: move initializing f_mode before file_ref_init()
  exportfs: Complete kernel-doc for struct export_operations
  exportfs: Mark struct export_operations functions at kernel-doc
  exportfs: Fix kernel-doc output for get_name()
  acct(2): begin the deprecation of legacy BSD process accounting
  device_cgroup: remove branch hint after code refactor
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
  fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
  select: store end_time as timespec64 in restart block
  chardev: Switch to guard(mutex) and __free(kfree)
  namespace: Replace simple_strtoul with kstrtoul to parse boot params
  dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
  ...
2026-02-09 15:13:05 -08:00
Linus Torvalds
c84bb79f70 Merge tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nullfs update from Christian Brauner:
 "Add a completely catatonic minimal pseudo filesystem called "nullfs"
  and make pivot_root() work in the initramfs.

  Currently pivot_root() does not work on the real rootfs because it
  cannot be unmounted. Userspace has to recursively delete initramfs
  contents manually before continuing boot, using the fragile
  switch_root sequence (overmount + chroot).

  Add nullfs, a minimal immutable filesystem that serves as the true
  root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is
  mounted on top of it. This allows userspace to simply:

      chdir(new_root);
      pivot_root(".", ".");
      umount2(".", MNT_DETACH);

  without the traditional switch_root workarounds. systemd already
  handles this correctly. It tries pivot_root() first and falls back
  to MS_MOVE only when that fails.

  This also means rootfs mounts in unprivileged namespaces no longer
  need MNT_LOCKED, since the immutable nullfs guarantees nothing can be
  revealed by unmounting the covering mount.

  nullfs is a single-instance filesystem (get_tree_single()) marked
  SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root
  directory. This means sooner or later it can be used to overmount
  other directories to hide their contents without any additional
  protection needed.

  We enable it unconditionally. If we see any real regression we'll
  hide it behind a boot option.

  nullfs has extensions beyond this in the future. It will serve as a
  concept to support the creation of completely empty mount namespaces -
  which is work coming up in the next cycle"

* tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: use nullfs unconditionally as the real rootfs
  docs: mention nullfs
  fs: add immutable rootfs
  fs: add init_pivot_root()
  fs: ensure that internal tmpfs mount gets mount id zero
2026-02-09 13:41:34 -08:00
Linus Torvalds
996812c453 Merge tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initrd removal from Christian Brauner:
 "Remove the deprecated linuxrc-based initrd code path and related dead
  code. The linuxrc initrd path was deprecated in 2020 and this series
  completes its removal. If we see real-life regressions we'll revert.

  The core change removes handle_initrd() and init_linuxrc() — the
  entire flow that ran /linuxrc from an initrd, pivoted roots, and
  handed off to the real root filesystem. With that gone, initrd_load()
  becomes void (no longer short-circuits prepare_namespace()),
  rd_load_image() is simplified to always load /initrd.image instead of
  taking a path, and rd_load_disk() is deleted.

  The /proc/sys/kernel/real-root-dev sysctl and its backing variable are
  removed since they only existed for linuxrc to communicate the real
  root device back to the kernel.

  The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped,
  and noinitrd and ramdisk_start= gain deprecation warnings.

  Initramfs is entirely unaffected. The non-linuxrc initrd path
  (root=/dev/ram0) is preserved but now carries a deprecation warning
  targeting January 2027 removal"

* tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  init: remove /proc/sys/kernel/real-root-dev
  initrd: remove deprecated code path (linuxrc)
  init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
2026-02-09 11:03:25 -08:00
Frederic Weisbecker
23f09dcc0a cpuset: Propagate cpuset isolation update to workqueue through housekeeping
Until now, cpuset would propagate isolated partition changes to
workqueues so that unbound workers get properly reaffined.

Since housekeeping now centralizes, synchronize and propagates isolation
cpumask changes, perform the work from that subsystem for consolidation
and consistency purposes.

For simplification purpose, the target function is adapted to take the
new housekeeping mask instead of the isolated mask.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Marco Crivellari <marco.crivellari@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: cgroups@vger.kernel.org
2026-02-03 15:23:34 +01:00
Mike Rapoport (Microsoft)
d49004c5f0 arch, mm: consolidate initialization of nodes, zones and memory map
To initialize node, zone and memory map data structures every architecture
calls free_area_init() during setup_arch() and passes it an array of zone
limits.

Beside code duplication it creates "interesting" ordering cases between
allocation and initialization of hugetlb and the memory map.  Some
architectures allocate hugetlb pages very early in setup_arch() in certain
cases, some only create hugetlb CMA areas in setup_arch() and sometimes
hugetlb allocations happen mm_core_init().

With arch_zone_limits_init() helper available now on all architectures it
is no longer necessary to call free_area_init() from architecture setup
code.  Rather core MM initialization can call arch_zone_limits_init() in a
single place.

This allows to unify ordering of hugetlb vs memory map allocation and
initialization.

Remove the call to free_area_init() from architecture specific code and
place it in a new mm_core_init_early() function that is called immediately
after setup_arch().

After this refactoring it is possible to consolidate hugetlb allocations
and eliminate differences in ordering of hugetlb and memory map
initialization among different architectures.

As the first step of this consolidation move hugetlb_bootmem_alloc() to
mm_core_early_init().

Link: https://lkml.kernel.org/r/20260111082105.290734-24-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:18 -08:00
Sun Jian
499f86de4f init/main: read bootconfig header with get_unaligned_le32()
get_boot_config_from_initrd() scans up to 3 bytes before initrd_end to
handle GRUB 4-byte alignment.  As a result, the bootconfig header
immediately preceding the magic may be unaligned.

Read the size and checksum fields with get_unaligned_le32() instead of
casting to u32 * and using le32_to_cpu(), avoiding potential unaligned
access and silencing sparse "cast to restricted __le32" warnings.

Sparse warnings (gcc + C=1):
  init/main.c:292:16: warning: cast to restricted __le32
  init/main.c:293:16: warning: cast to restricted __le32

No functional change intended.

Link: https://lkml.kernel.org/r/20260113101532.1630770-1-sun.jian.kdev@gmail.com
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 19:07:14 -08:00
Lillian Berry
a906f3ae44 init/main.c: check if rdinit was explicitly set before printing warning
The rdinit parameter is set by default, and attempted during boot even if
not specified in the command line.  Only print the warning about rdinit
being inaccessible if the rdinit value was found in command line; it's
just noise otherwise.

[akpm@linux-foundation.org: move ramdisk_execute_command_set into __initdata]
Link: https://lkml.kernel.org/r/20260111125635.53682-1-lillian@star-ark.net
Signed-off-by: Lillian Berry <lillian@star-ark.net>
Cc: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Huan Yang <link@vivo.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 19:07:14 -08:00
Christoph Hellwig
377521af03 sched: remove task_struct->faults_disabled_mapping
This reverts commit 2b69987be5 ("sched: Add
task_struct->faults_disabled_mapping"), which added this field without
review or maintainer signoff.  With bcachefs removed from the
tree it is also unused now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260122085223.487092-1-hch@lst.de
2026-01-22 11:11:21 +01:00
Thomas Gleixner
d7a5da7a0f rseq: Add fields and constants for time slice extension
Aside of a Kconfig knob add the following items:

   - Two flag bits for the rseq user space ABI, which allow user space to
     query the availability and enablement without a syscall.

   - A new member to the user space ABI struct rseq, which is going to be
     used to communicate request and grant between kernel and user space.

   - A rseq state struct to hold the kernel state of this

   - Documentation of the new mechanism

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251215155708.669472597@linutronix.de
2026-01-22 11:11:16 +01:00
Bill Wendling
150a04d817 compiler_types.h: Attributes: Add __counted_by_ptr macro
Introduce __counted_by_ptr(), which works like __counted_by(), but for
pointer struct members.

struct foo {
	int a, b, c;
	char *buffer __counted_by_ptr(bytes);
	short nr_bars;
	struct bar *bars __counted_by_ptr(nr_bars);
	size_t bytes;
};

Because "counted_by" can only be applied to pointer members in very
recent compiler versions, its application ends up needing to be distinct
from flexibe array "counted_by" annotations, hence a separate macro.

This is a reworking of Kees' previous patch [1].

Link: https://lore.kernel.org/all/20251020220118.1226740-1-kees@kernel.org/ [1]
Co-developed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://patch.msgid.link/20260116005838.2419118-1-morbo@google.com
Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-17 11:00:28 -08:00
Thomas Weißschuh
2a0a30805a kbuild: uapi: drop dependency on CC_CAN_LINK
The header tests try to compile each header. Some UAPI headers depend on
libc headers so they need a full userspace toolchain to build. This
dependency is expressed in kconfig as a dependency on CC_CAN_LINK.
Many kernel builds do not satisfy CC_CAN_LINK as they only use a
minimal kernel (cross-) compiler. In those configurations the UAPI
headers are not tested at all.

However most UAPI headers do not even depend on any libc headers,
and such dependencies are undesired in any case. Also the static
analysis performed by headers_check.pl does not need CC_CAN_LINK.

Drop the hard dependency on CC_CAN_LINK and instead skip the affected
compilation step for exactly those headers which require libc.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251223-uapi-nostdinc-v1-5-d91545d794f7@linutronix.de
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-01-16 15:02:11 -07:00
David Disseldorp
aaf7683961 initramfs_test: kunit test for cpio.filesize > PATH_MAX
initramfs unpack skips over cpio entries where namesize > PATH_MAX,
instead of returning an error. Add coverage for this behaviour.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://patch.msgid.link/20260114135051.4943-2-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 17:05:35 +01:00
Jeff Layton
46329a9dd7 acct(2): begin the deprecation of legacy BSD process accounting
As Christian points out [1], even though it's privileged, this interface
has a lot of footguns. There are better options these days (e.g. eBPF),
so it would be good to start discouraging its use and mark it as
deprecated.

[1]: https://lore.kernel.org/linux-fsdevel/20250212-giert-spannend-8893f1eaba7d@brauner/

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260106-bsd-acct-v1-1-d15564b52c83@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:50:36 +01:00
Christian Brauner
313c47f4fe fs: use nullfs unconditionally as the real rootfs
Remove the "nullfs_rootfs" boot parameter and simply always use nullfs.
The mutable rootfs will be mounted on top of it. Systems that don't use
pivot_root() to pivot away from the real rootfs will have an additional
mount stick around but that shouldn't be a problem at all. If it is
we'll rever this commit.

This also simplifies the boot process and removes the need for the
traditional switch_root workarounds.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 11:23:39 +01:00
Askar Safin
e6ce36ccc8 init: remove /proc/sys/kernel/real-root-dev
It is not used anymore.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251119222407.3333257-4-safinaskar@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 17:22:27 +01:00
Askar Safin
c350a65b56 initrd: remove deprecated code path (linuxrc)
Remove linuxrc initrd code path, which was deprecated in 2020.

Initramfs and (non-initial) RAM disks (i. e. brd) still work.

Both built-in and bootloader-supplied initramfs still work.

Non-linuxrc initrd code path (i. e. using /dev/ram as final root
filesystem) still works, but I put deprecation message into it.

Also I deprecate command line parameters "noinitrd" and "ramdisk_start=".

Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251119222407.3333257-3-safinaskar@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 17:22:22 +01:00
Christian Brauner
576ee5dfd4 fs: add immutable rootfs
Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.

Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.

The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:

  chdir(rootfs);
  pivot_root(".", ".");
  umount2(".", MNT_DETACH);

and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)

Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.

In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.

systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.

This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.

Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 16:52:09 +01:00
Paul E. McKenney
a73fc3dcc6 rcu: Clean up after the SRCU-fastification of RCU Tasks Trace
Now that RCU Tasks Trace has been re-implemented in terms of SRCU-fast,
the ->trc_ipi_to_cpu, ->trc_blkd_cpu, ->trc_blkd_node, ->trc_holdout_list,
and ->trc_reader_special task_struct fields are no longer used.

In addition, the rcu_tasks_trace_qs(), rcu_tasks_trace_qs_blkd(),
exit_tasks_rcu_finish_trace(), and rcu_spawn_tasks_trace_kthread(),
show_rcu_tasks_trace_gp_kthread(), rcu_tasks_trace_get_gp_data(),
rcu_tasks_trace_torture_stats_print(), and get_rcu_tasks_trace_gp_kthread()
functions and all the other functions that they invoke are no longer used.

Also, the TRC_NEED_QS and TRC_NEED_QS_CHECKED CPP macros are no longer used.
Neither are the rcu_tasks_trace_lazy_ms and rcu_task_ipi_delay rcupdate
module parameters and the TASKS_TRACE_RCU_READ_MB Kconfig option.

This commit therefore removes all of them.

[ paulmck: Apply Alexei Starovoitov feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2026-01-01 16:39:46 +08:00
Ihor Solodrai
90e5b38a26 kbuild: Sync kconfig when PAHOLE_VERSION changes
This patch implements kconfig re-sync when the pahole version changes
between builds, similar to how it happens for compiler version change
via CC_VERSION_TEXT.

Define PAHOLE_VERSION in the top-level Makefile and export it for
config builds. Set CONFIG_PAHOLE_VERSION default to the exported
variable.

Kconfig records the PAHOLE_VERSION value in
include/config/auto.conf.cmd [1].

The Makefile includes auto.conf.cmd, so if PAHOLE_VERSION changes
between builds, make detects a dependency change and triggers
syncconfig to update the kconfig [2].

For external module builds, add a warning message in the prepare
target, similar to the existing compiler version mismatch warning.

Note that if pahole is not installed or available, PAHOLE_VERSION is
set to 0 by pahole-version.sh, so the (un)installation of pahole is
treated as a version change.

See previous discussions for context [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/kconfig/preprocess.c?h=v6.18#n91
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile?h=v6.18#n815
[3] https://lore.kernel.org/bpf/8f946abf-dd88-4fac-8bb4-84fcd8d81cf0@oracle.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Nicolas Schier <nsc@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://lore.kernel.org/bpf/20251219181321.1283664-6-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Kan Liang
eff95e1702 perf: Add APIs to create/release mediated guest vPMUs
Currently, exposing PMU capabilities to a KVM guest is done by emulating
guest PMCs via host perf events, i.e. by having KVM be "just" another user
of perf.  As a result, the guest and host are effectively competing for
resources, and emulating guest accesses to vPMU resources requires
expensive actions (expensive relative to the native instruction).  The
overhead and resource competition results in degraded guest performance
and ultimately very poor vPMU accuracy.

To address the issues with the perf-emulated vPMU, introduce a "mediated
vPMU", where the data plane (PMCs and enable/disable knobs) is exposed
directly to the guest, but the control plane (event selectors and access
to fixed counters) is managed by KVM (via MSR interceptions).  To allow
host perf usage of the PMU to (partially) co-exist with KVM/guest usage
of the PMU, KVM and perf will coordinate to a world switch between host
perf context and guest vPMU context near VM-Enter/VM-Exit.

Add two exported APIs, perf_{create,release}_mediated_pmu(), to allow KVM
to create and release a mediated PMU instance (per VM).  Because host perf
context will be deactivated while the guest is running, mediated PMU usage
will be mutually exclusive with perf analysis of the guest, i.e. perf
events that do NOT exclude the guest will not behave as expected.

To avoid silent failure of !exclude_guest perf events, disallow creating a
mediated PMU if there are active !exclude_guest events, and on the perf
side, disallowing creating new !exclude_guest perf events while there is
at least one active mediated PMU.

Exempt PMU resources that do not support mediated PMU usage, i.e. that are
outside the scope/view of KVM's vPMU and will not be swapped out while the
guest is running.

Guard mediated PMU with a new kconfig to help readers identify code paths
that are unique to mediated PMU support, and to allow for adding arch-
specific hooks without stubs.  KVM x86 is expected to be the only KVM
architecture to support a mediated PMU in the near future (e.g. arm64 is
trending toward a partitioned PMU implementation), and KVM x86 will select
PERF_GUEST_MEDIATED_PMU unconditionally, i.e. won't need stubs.

Immediately select PERF_GUEST_MEDIATED_PMU when KVM x86 is enabled so that
all paths are compile tested.  Full KVM support is on its way...

[sean: add kconfig and WARNing, rewrite changelog, swizzle patch ordering]
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Link: https://patch.msgid.link/20251206001720.468579-5-seanjc@google.com
2025-12-17 13:31:04 +01:00
Askar Safin
7f3b336685 init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
...which do nothing. They were deprecated (in documentation) in
6b99e6e6aa ("Documentation/admin-guide: blockdev/ramdisk: remove use of
"rdev"") in 2020 and in kernel messages in c8376994c8 ("initrd: remove
support for multiple floppies") in 2020.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
Link: https://patch.msgid.link/20251119222407.3333257-2-safinaskar@gmail.com
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15 14:32:54 +01:00
Linus Torvalds
509d3f4584 Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
2025-12-06 14:01:20 -08:00
Linus Torvalds
7cd122b552 Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Linus Torvalds
6dfafbd029 Merge tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "There was a rather late merge of a new color pipeline feature, that
  some userspace projects are blocked on, and has seen a lot of work in
  amdgpu. This should have seen some time in -next. There is additional
  support for this for Intel, that if it arrives in the next day or two
  I'll pass it on in another pull request and you can decide if you want
  to take it.

  Highlights:
   - Arm Ethos NPU accelerator driver
   - new DRM color pipeline support
   - amdgpu will now run discrete SI/CIK cards instead of radeon, which
     enables vulkan support in userspace
   - msm gets gen8 gpu support
   - initial Xe3P support in xe

  Full detail summary:

  New driver:
   - Arm Ethos-U65/U85 accel driver

  Core:
   - support the drm color pipeline in vkms/amdgfx
   - add support for drm colorop pipeline
   - add COLOR PIPELINE plane property
   - add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
   - throttle dirty worker with vblank
   - use drm_for_each_bridge_in_chain_scoped in drm's bridge code
   - Ensure drm_client_modeset tests are enabled in UML
   - add simulated vblank interrupt - use in drivers
   - dumb buffer sizing helper
   - move freeing of drm client memory to driver
   - crtc sharpness strength property
   - stop using system_wq in scheduler/drivers
   - support emergency restore in drm-client

  Rust:
   - make slice::as_flattened usable on all supported rustc
   - add FromBytes::from_bytes_prefix() method
   - remove redundant device ptr from Rust GEM object
   - Change how AlwaysRefCounted is implemented for GEM objects

  gpuvm:
   - Add deferred vm_bo cleanup to GPUVM (for rust)

  atomic:
   - cleanup and improve state handling interfaces

  buddy:
   - optimize block management

  dma-buf:
   - heaps: Create heap per CMA reserved location
   - improve userspace documentation

  dp:
   - add POST_LT_ADJ_REQ training sequence
   - DPCD dSC quirk for synaptics panamera devices
   - helpers to query branch DSC max throughput

  ttm:
   - Rename ttm_bo_put to ttm_bo_fini
   - allow page protection flags on risc-v
   - rework pipelined eviction fence handling

  amdgpu:
   - enable amdgpu by default for SI/CI dGPUs
   - enable DC by default on SI
   - refactor CIK/SI enablement
   - add ABM KMS property
   - Re-enable DM idle optimizations
   - DC Analog encoders support
   - Powerplay fixes for fiji/iceland
   - Enable DC on bonaire by default
   - HMM cleanup
   - Add new RAS framework
   - DML2.1 updates
   - YCbCr420 fixes
   - DC FP fixes
   - DMUB fixes
   - LTTPR fixes
   - DTBCLK fixes
   - DMU cursor offload handling
   - Userq validation improvements
   - Unify shutdown callback handling
   - Suspend improvements
   - Power limit code cleanup
   - SR-IOV fixes
   - AUX backlight fixes
   - DCN 3.5 fixes
   - HDMI compliance fixes
   - DCN 4.0.1 cursor updates
   - DCN interrupt fix
   - DC KMS full update improvements
   - Add additional HDCP traces
   - DCN 3.2 fixes
   - DP MST fixes
   - Add support for new SR-IOV mailbox interface
   - UQ reset support
   - HDP flush rework
   - VCE1 support

  amdkfd:
   - HMM cleanups
   - Relax checks on save area overallocations
   - Fix GPU mappings after prefetch

  radeon:
   - refactor CIK/SI enablement

  xe:
   - Initial Xe3P support
   - panic support on VRAM for display
   - fix stolen size check
   - Loosen used tracking restriction
   - New SR-IOV debugfs structure and debugfs updates
   - Hide the GPU madvise flag behind a VM_BIND flag
   - Always expose VRAM provisioning data on discrete GPUs
   - Allow VRAM mappings for userptr when used with SVM
   - Allow pinning of p2p dma-buf
   - Use per-tile debugfs where appropriate
   - Add documentation for Execution Queues
   - PF improvements
   - VF migration recovery redesign work
   - User / Kernel VRAM partitioning
   - Update Tile-based messages
   - Allow configfs to disable specific GT types
   - VF provisioning and migration improvements
   - use SVM range helpers in PT layer
   - Initial CRI support
   - access VF registers using dedicated MMIO view
   - limit number of jobs per exec queue
   - add sriov_admin sysfs tree
   - more crescent island specific support
   - debugfs residency counter
   - SRIOV migration work
   - runtime registers for GFX 35

  i915:
   - add initial Xe3p_LPD display version 35 support
   - Enable LNL+ content adaptive sharpness filter
   - Use optimized VRR guardband
   - Enable Xe3p LT PHY
   - enable FBC support for Xe3p_LPD display
   - add display 30.02 firmware support
   - refactor SKL+ watermark latency setup
   - refactor fbdev handling
   - call i915/xe runtime PM via function pointers
   - refactor i915/xe stolen memory/display interfaces
   - use display version instead of gfx version in display code
   - extend i915_display_info with Type-C port details
   - lots of display cleanups/refactorings
   - set O_LARGEFILE in __create_shmem
   - skuip guc communication warning on reset
   - fix time conversions
   - defeature DRRS on LNL+
   - refactor intel_frontbuffer split between i915/xe/display
   - convert inteL_rom interfaces to struct drm_device
   - unify display register polling interfaces
   - aovid lock inversion when pinning to GGTT on CHV/BXT+VTD

  panel:
   - Add KD116N3730A08/A12, chromebook mt8189
   - JT101TM023, LQ079L1SX01,
   - GLD070WX3-SL01 MIPI DSI
   - Samsung LTL106AL0, Samsung LTL106AL01
   - Raystar RFF500F-AWH-DNN
   - Winstar WF70A8SYJHLNGA
   - Wanchanglong w552946aaa
   - Samsung SOFEF00
   - Lenovo X13s panel
   - ilitek-ili9881c - add rpi 5" support
   - visionx-rm69299 - add backlight support
   - edp - support AUI B116XAN02.0

  bridge:
   - improve ref counting
   - ti-sn65dsi86 - add support for DP mode with HPD
   - synopsis: support CEC, init timer with correct freq
   - ASL CS5263 DP-to-HDMI bridge support

  nova-core:
   - introduce bitfield! macro
   - introduce safe integer converters
   - GSP inits to fully booted state on Ampere
   - Use more future-proof register for GPU identification

  nova-drm:
   - select NOVA_CORE
   - 64-bit only

  nouveau:
   - improve reclocking on tegra 186+
   - add large page and compression support

  msm:
   - GPU:
      - Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
      - A612 support
   - MDSS:
      - Added support for Glymur and QCS8300 platforms
   - DPU:
      - Enabled Quad-Pipe support, unlocking higher resolutions support
      - Added support for Glymur platform
      - Documented DPU on QCS8300 platform as supported
   - DisplayPort:
      - Added support for Glymur platform
      - Added support lame remapping inside DP block
      - Documented DisplayPort controller on QCS8300 and SM6150/QCS615
        as supported

  tegra:
   - NVJPG driver

  panfrost:
   - display JM contexts over debugfs
   - export JM contexts to userspace
   - improve error and job handling

  panthor:
   - support custom ASN_HASH for mt8196
   - support mali-G1 GPU
   - flush shmem write before mapping buffers uncached
   - make timeout per-queue instead of per-job

  mediatek:
   - MT8195/88 HDMIv2/DDCv2 support

  rockchip:
   - dsi: add support for RK3368

  amdxdna:
   - enhance runtime PM
   - last hardware error reading uapi
   - support firmware debug output
   - add resource and telemetry data uapi
   - preemption support

  imx:
   - add driver for HDMI TX Parallel audio interface

  ivpu:
   - add support for user-managed preemption buffer
   - add userptr support
   - update JSM firware API to 3.33.0
   - add better alloc/free warnings
   - fix page fault in unbind all bos
   - rework bind/unbind of imported buffers
   - enable MCA ECC signalling
   - split fw runtime and global memory buffers
   - add fdinfo memory statistics

  tidss:
   - convert to drm logging
   - logging cleanup

  ast:
   - refactor generation init paths
   - add per chip generation detect_tx_chip
   - set quirks for each chip model

  atmel-hlcdc:
   - set LCDC_ATTRE register in plane disable
   - set correct values for plane scaler

  solomon:
   - use drm helper for get_modes and move_valid

  sitronix:
   - fix output position when clearing screens

  qaic:
   - support dma-buf exports
   - support new firmware's READ_DATA implementation
   - sahara AIC200 image table update
   - add sysfs support
   - add coredump support
   - add uevents support
   - PM support

  sun4i:
   - layer refactors to decouple plane from output
   - improve DE33 support

  vc4:
   - switch to generic CEC helpers

  komeda:
   - use drm_ logging functions

  vkms:
   - configfs support for display configuration

  vgem:
   - fix fence timer deadlock

  etnaviv:
   - add HWDB entry for GC8000 Nano Ultra VIP r6205"

* tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits)
  Revert "drm/amd: Skip power ungate during suspend for VPE"
  drm/amdgpu: use common defines for HUB faults
  drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
  drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
  drm/amdgpu: use static ids for ACP platform devs
  drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
  drm/amdgpu: Forward VMID reservation errors
  drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Cache VM fault info
  drm/amdgpu/gmc6: Don't print MC client as it's unknown
  drm/amdgpu/cz_ih: Enable soft IRQ handler ring
  drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
  drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
  drm/amdgpu/cik_ih: Enable soft IRQ handler ring
  drm/amdgpu/si_ih: Enable soft IRQ handler ring
  drm/amd/display: fix typo in display_mode_core_structs.h
  drm/amd/display: fix Smart Power OLED not working after S4
  drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
  ...
2025-12-04 08:53:30 -08:00
Linus Torvalds
2ddcf4962c Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:

  - Enable -fms-extensions, allowing anonymous use of tagged struct or
    union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
    conversion patch is added here, too (btrfs).

    [ Editor's note: the core of this actually came in early through a
      shared branch and a few other trees    - Linus ]

  - Introduce architecture-specific CC_CAN_LINK and flags for userprogs

  - Add new packaging target 'modules-cpio-pkg' for building a initramfs
    cpio w/ kmods

  - Handle included .c files in gen_compile_commands

  - Minor kbuild changes:
     - Use objtree for module signing key path, fixing oot kmod signing
     - Improve documentation of KBUILD_BUILD_TIMESTAMP
     - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
     - Rename scripts/Makefile.extrawarn to Makefile.warn
     - Drop obsolete types.h check from headers_check.pl
     - Remove outdated config leak ignore entries

* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: add target to build a cpio containing modules
  initramfs: add gen_init_cpio to hostprogs unconditionally
  kbuild: allow architectures to override CC_CAN_LINK
  init: deduplicate cc-can-link.sh invocations
  kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
  scripts: headers_install.sh: Remove two outdated config leak ignore entries
  scripts/clang-tools: Handle included .c files in gen_compile_commands
  kbuild: uapi: Drop types.h check from headers_check.pl
  kbuild: Rename Makefile.extrawarn to Makefile.warn
  MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
  kbuild: uapi: reuse KBUILD_USERCFLAGS
  kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
  kbuild: Use objtree for module signing key path
  btrfs: send: make use of -fms-extensions for defining struct fs_path
2025-12-03 14:42:21 -08:00
Linus Torvalds
2b09f480f0 Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq updates from Thomas Gleixner:
 "A large overhaul of the restartable sequences and CID management:

  The recent enablement of RSEQ in glibc resulted in regressions which
  are caused by the related overhead. It turned out that the decision to
  invoke the exit to user work was not really a decision. More or less
  each context switch caused that. There is a long list of small issues
  which sums up nicely and results in a 3-4% regression in I/O
  benchmarks.

  The other detail which caused issues due to extra work in context
  switch and task migration is the CID (memory context ID) management.
  It also requires to use a task work to consolidate the CID space,
  which is executed in the context of an arbitrary task and results in
  sporadic uncontrolled exit latencies.

  The rewrite addresses this by:

   - Removing deprecated and long unsupported functionality

   - Moving the related data into dedicated data structures which are
     optimized for fast path processing.

   - Caching values so actual decisions can be made

   - Replacing the current implementation with a optimized inlined
     variant.

   - Separating fast and slow path for architectures which use the
     generic entry code, so that only fault and error handling goes into
     the TIF_NOTIFY_RESUME handler.

   - Rewriting the CID management so that it becomes mostly invisible in
     the context switch path. That moves the work of switching modes
     into the fork/exit path, which is a reasonable tradeoff. That work
     is only required when a process creates more threads than the
     cpuset it is allowed to run on or when enough threads exit after
     that. An artificial thread pool benchmarks which triggers this did
     not degrade, it actually improved significantly.

     The main effect in migration heavy scenarios is that runqueue lock
     held time and therefore contention goes down significantly"

* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/mmcid: Switch over to the new mechanism
  sched/mmcid: Implement deferred mode change
  irqwork: Move data struct to a types header
  sched/mmcid: Provide CID ownership mode fixup functions
  sched/mmcid: Provide new scheduler CID mechanism
  sched/mmcid: Introduce per task/CPU ownership infrastructure
  sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
  sched/mmcid: Provide precomputed maximal value
  sched/mmcid: Move initialization out of line
  signal: Move MMCID exit out of sighand lock
  sched/mmcid: Convert mm CID mask to a bitmap
  cpumask: Cache num_possible_cpus()
  sched/mmcid: Use cpumask_weighted_or()
  cpumask: Introduce cpumask_weighted_or()
  sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
  sched/mmcid: Move scheduler code out of global header
  sched: Fixup whitespace damage
  sched/mmcid: Cacheline align MM CID storage
  sched/mmcid: Use proper data structures
  sched/mmcid: Revert the complex CID management
  ...
2025-12-02 08:48:53 -08:00
Linus Torvalds
1d18101a64 Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull cred guard updates from Christian Brauner:
 "This contains substantial credential infrastructure improvements
  adding guard-based credential management that simplifies code and
  eliminates manual reference counting in many subsystems.

  Features:

   - Kernel Credential Guards

     Add with_kernel_creds() and scoped_with_kernel_creds() guards that
     allow using the kernel credentials without allocating and copying
     them. This was requested by Linus after seeing repeated
     prepare_kernel_creds() calls that duplicate the kernel credentials
     only to drop them again later.

     The new guards completely avoid the allocation and never expose the
     temporary variable to hold the kernel credentials anywhere in
     callers.

   - Generic Credential Guards

     Add scoped_with_creds() guards for the common override_creds() and
     revert_creds() pattern. This builds on earlier work that made
     override_creds()/revert_creds() completely reference count free.

   - Prepare Credential Guards

     Add prepare credential guards for the more complex pattern of
     preparing a new set of credentials and overriding the current
     credentials with them:
      - prepare_creds()
      - modify new creds
      - override_creds()
      - revert_creds()
      - put_cred()

  Cleanups:

   - Make init_cred static since it should not be directly accessed

   - Add kernel_cred() helper to properly access the kernel credentials

   - Fix scoped_class() macro that was introduced two cycles ago

   - coredump: split out do_coredump() from vfs_coredump() for cleaner
     credential handling

   - coredump: move revert_cred() before coredump_cleanup()

   - coredump: mark struct mm_struct as const

   - coredump: pass struct linux_binfmt as const

   - sev-dev: use guard for path"

* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  trace: use override credential guard
  trace: use prepare credential guard
  coredump: use override credential guard
  coredump: use prepare credential guard
  coredump: split out do_coredump() from vfs_coredump()
  coredump: mark struct mm_struct as const
  coredump: pass struct linux_binfmt as const
  coredump: move revert_cred() before coredump_cleanup()
  sev-dev: use override credential guards
  sev-dev: use prepare credential guard
  sev-dev: use guard for path
  cred: add prepare credential guard
  net/dns_resolver: use credential guards in dns_query()
  cgroup: use credential guards in cgroup_attach_permissions()
  act: use credential guards in acct_write_process()
  smb: use credential guards in cifs_get_spnego_key()
  nfs: use credential guards in nfs_idmap_get_key()
  nfs: use credential guards in nfs_local_call_write()
  nfs: use credential guards in nfs_local_call_read()
  erofs: use credential guards
  ...
2025-12-01 13:45:41 -08:00
Linus Torvalds
415d34b92c Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
 "This contains substantial namespace infrastructure changes including a new
  system call, active reference counting, and extensive header cleanups.
  The branch depends on the shared kbuild branch for -fms-extensions support.

  Features:

   - listns() system call

     Add a new listns() system call that allows userspace to iterate
     through namespaces in the system. This provides a programmatic
     interface to discover and inspect namespaces, addressing
     longstanding limitations:

     Currently, there is no direct way for userspace to enumerate
     namespaces. Applications must resort to scanning /proc/*/ns/ across
     all processes, which is:
      - Inefficient - requires iterating over all processes
      - Incomplete - misses namespaces not attached to any running
        process but kept alive by file descriptors, bind mounts, or
        parent references
      - Permission-heavy - requires access to /proc for many processes
      - No ordering or ownership information
      - No filtering per namespace type

     The listns() system call solves these problems:

       ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                      size_t nr_ns_ids, unsigned int flags);

       struct ns_id_req {
             __u32 size;
             __u32 spare;
             __u64 ns_id;
             struct /* listns */ {
                     __u32 ns_type;
                     __u32 spare2;
                     __u64 user_ns_id;
             };
       };

     Features include:
      - Pagination support for large namespace sets
      - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
      - Filtering by owning user namespace
      - Permission checks respecting namespace isolation

   - Active Reference Counting

     Introduce an active reference count that tracks namespace
     visibility to userspace. A namespace is visible in the following
     cases:
      - The namespace is in use by a task
      - The namespace is persisted through a VFS object (namespace file
        descriptor or bind-mount)
      - The namespace is a hierarchical type and is the parent of child
        namespaces

     The active reference count does not regulate lifetime (that's still
     done by the normal reference count) - it only regulates visibility
     to namespace file handles and listns().

     This prevents resurrection of namespaces that are pinned only for
     internal kernel reasons (e.g., user namespaces held by
     file->f_cred, lazy TLB references on idle CPUs, etc.) which should
     not be accessible via (1)-(3).

   - Unified Namespace Tree

     Introduce a unified tree structure for all namespaces with:
      - Fixed IDs assigned to initial namespaces
      - Lookup based solely on inode number
      - Maintained list of owned namespaces per user namespace
      - Simplified rbtree comparison helpers

   Cleanups

    - Header Reorganization:
      - Move namespace types into separate header (ns_common_types.h)
      - Decouple nstree from ns_common header
      - Move nstree types into separate header
      - Switch to new ns_tree_{node,root} structures with helper functions
      - Use guards for ns_tree_lock

   - Initial Namespace Reference Count Optimization
      - Make all reference counts on initial namespaces a nop to avoid
        pointless cacheline ping-pong for namespaces that can never go
        away
      - Drop custom reference count initialization for initial namespaces
      - Add NS_COMMON_INIT() macro and use it for all namespaces
      - pid: rely on common reference count behavior

   - Miscellaneous Cleanups
      - Rename exit_task_namespaces() to exit_nsproxy_namespaces()
      - Rename is_initial_namespace() and make argument const
      - Use boolean to indicate anonymous mount namespace
      - Simplify owner list iteration in nstree
      - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
      - nsfs: use inode_just_drop()
      - pidfs: raise DCACHE_DONTCACHE explicitly
      - pidfs: simplify PIDFD_GET__NAMESPACE ioctls
      - libfs: allow to specify s_d_flags
      - cgroup: add cgroup namespace to tree after owner is set
      - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

  Fixes:

   - setns(pidfd, ...) race condition

     Fix a subtle race when using pidfds with setns(). When the target
     task exits after prepare_nsset() but before commit_nsset(), the
     namespace's active reference count might have been dropped. If
     setns() then installs the namespaces, it would bump the active
     reference count from zero without taking the required reference on
     the owner namespace, leading to underflow when later decremented.

     The fix resurrects the ownership chain if necessary - if the caller
     succeeded in grabbing passive references, the setns() should
     succeed even if the target task exits or gets reaped.

   - Return EFAULT on put_user() error instead of success

   - Make sure references are dropped outside of RCU lock (some
     namespaces like mount namespace sleep when putting the last
     reference)

   - Don't skip active reference count initialization for network
     namespace

   - Add asserts for active refcount underflow

   - Add asserts for initial namespace reference counts (both passive
     and active)

   - ipc: enable is_ns_init_id() assertions

   - Fix kernel-doc comments for internal nstree functions

   - Selftests
      - 15 active reference count tests
      - 9 listns() functionality tests
      - 7 listns() permission tests
      - 12 inactive namespace resurrection tests
      - 3 threaded active reference count tests
      - commit_creds() active reference tests
      - Pagination and stress tests
      - EFAULT handling test
      - nsid tests fixes"

* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
  nstree: fix kernel-doc comments for internal functions
  nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
  selftests/namespaces: fix nsid tests
  ns: drop custom reference count initialization for initial namespaces
  pid: rely on common reference count behavior
  ns: add asserts for initial namespace active reference counts
  ns: add asserts for initial namespace reference counts
  ns: make all reference counts on initial namespace a nop
  ipc: enable is_ns_init_id() assertions
  fs: use boolean to indicate anonymous mount namespace
  ns: rename is_initial_namespace()
  ns: make is_initial_namespace() argument const
  nstree: use guards for ns_tree_lock
  nstree: simplify owner list iteration
  nstree: switch to new structures
  nstree: add helper to operate on struct ns_tree_{node,root}
  nstree: move nstree types into separate header
  nstree: decouple from ns_common header
  ns: move namespace types into separate header
  ...
2025-12-01 09:47:41 -08:00
Andy Shevchenko
aa514a297a calibrate: update header inclusion
While cleaning up some headers, I got a build error on this file:

init/calibrate.c:20:9: error: call to undeclared function 'kstrtoul'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]

Update header inclusions to follow IWYU (Include What You Use) principle.

Link: https://lkml.kernel.org/r/20251124230607.1445421-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:45 -08:00
Thorsten Blum
af06a40474 init: replace simple_strtoul with kstrtoul to improve lpj_setup
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'lpj=' boot parameter.

Check the return value of kstrtoul() and reject invalid values.  This adds
error handling while preserving existing behavior for valid values, and
removes use of the deprecated simple_strtoul() helper.

Link: https://lkml.kernel.org/r/20251122114539.446937-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:43 -08:00
Pasha Tatashin
48a1b2321d liveupdate: kho: move to kernel/liveupdate
Move KHO to kernel/liveupdate/ in preparation of placing all Live Update
core kernel related files to the same place.

[pasha.tatashin@soleen.com: disable the menu when DEFERRED_STRUCT_PAGE_INIT]
  Link: https://lkml.kernel.org/r/CA+CK2bAvh9Oa2SLfsbJ8zztpEjrgr_hr-uGgF1coy8yoibT39A@mail.gmail.com
Link: https://lkml.kernel.org/r/20251101142325.1326536-8-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:33 -08:00
Thomas Gleixner
8cea569ca7 sched/mmcid: Use proper data structures
Having a lot of CID functionality specific members in struct task_struct
and struct mm_struct is not really making the code easier to read.

Encapsulate the CID specific parts in data structures and keep them
separate from the stuff they are embedded in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de
2025-11-20 12:14:52 +01:00
Al Viro
2313598222 convert ramfs and tmpfs
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.

With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().

The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.

NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Thomas Weißschuh
deab487e0f kbuild: allow architectures to override CC_CAN_LINK
The generic test for CC_CAN_LINK assumes that all architectures use -m32
and -m64 to switch between 32-bit and 64-bit compilation. This is overly
simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may
also require byte order handling (-mlittle-endian, -EL). Expressing all
of the different possibilities will be very complicated and brittle.
Instead allow architectures to supply their own logic which will be
easy to understand and evolve.

Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need
to be implemented as kconfig does not allow the reuse of string options.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:35 +01:00
Thomas Weißschuh
80623f2c83 init: deduplicate cc-can-link.sh invocations
The command to invoke scripts/cc-can-link.sh is very long and new usages
are about to be added.

Add a helper variable to make the code easier to read and maintain.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:34 +01:00
Alexandre Courbot
88622323dd rust: enable slice_flatten feature and provide it through an extension trait
In Rust 1.80, the previously unstable `slice::flatten` family of methods
have been stabilized and renamed to `slice::as_flattened`.

This creates an issue as we want to use `as_flattened`, but need to
support the MSRV (which at the moment is Rust 1.78) where it is named
`flatten`.

Solve this by enabling the `slice_flatten` feature, and providing an
`as_flattened` implementation through an extension trait for compiler
versions where it is not available.

The trait is then exported from the prelude, making the `as_flattened`
family of methods transparently available for all supported compiler
versions.

This extension trait can be removed once the MSRV passes 1.80.

Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/all/CANiq72kK4pG=O35NwxPNoTO17oRcg1yfGcvr3==Fi4edr+sfmw@mail.gmail.com/
Acked-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Message-ID: <20251110-gsp_boot-v9-8-8ae4058e3c0e@nvidia.com>
Message-ID: <20251104-b4-as-flattened-v3-1-6cb9c26b45cd@nvidia.com>
2025-11-14 20:25:57 +09:00
Douglas Anderson
032a730268 init/main.c: wrap long kernel cmdline when printing to logs
The kernel cmdline length is allowed to be longer than what printk can
handle.  When this happens the cmdline that's printed to the kernel ring
buffer at bootup is cutoff and some kernel cmdline options are "hidden"
from the logs.  This undercuts the usefulness of the log message.

Specifically, grepping for COMMAND_LINE_SIZE shows that 2048 is common and
some architectures even define it as 4096.  s390 allows a CONFIG-based
maximum up to 1MB (though it's not expected that anyone will go over the
default max of 4096 [1]).

The maximum message pr_notice() seems to be able to handle (based on
experiment) is 1021 characters.  This appears to be based on the current
value of PRINTKRB_RECORD_MAX as 1024 and the fact that pr_notice() spends
2 characters on the loglevel prefix and we have a '\n' at the end.

While it would be possible to increase the limits of printk() (and
therefore pr_notice()) somewhat, it doesn't appear possible to increase it
enough to fully include a 2048-character cmdline without breaking
userspace.  Specifically on at least two tested userspaces (ChromeOS plus
the Debian-based distro I'm typing this message on) the `dmesg` tool reads
lines from `/dev/kmsg` in 2047-byte chunks.  As per
`Documentation/ABI/testing/dev-kmsg`:

  Every read() from the opened device node receives one record
  of the kernel's printk buffer.
  ...
  Messages in the record ring buffer get overwritten as whole,
  there are never partial messages received by read().

We simply can't fit a 2048-byte cmdline plus the "Kernel command line:"
prefix plus info about time/log_level/etc in a 2047-byte read.

The above means that if we want to avoid the truncation we need to do some
type of wrapping of the cmdline when printing.

Add wrapping to the printout of the kernel command line.  By default, the
wrapping is set to 1021 characters to avoid breaking anyone, but allow
wrapping to be set lower by a Kconfig knob
"CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN".  Any tools that are correctly parsing
the cmdline today (because it is less than 1021 characters) will see no
difference in their behavior.  The format of wrapped output is designed to
be matched by anyone using "grep" to search for the cmdline and also to be
easy for tools to handle.  Anyone who is sure their tools (if any) handle
the wrapped format can choose a lower wrapping value and have prettier
output.

Setting CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN to 0 fully disables the wrapping
logic.  This means that long command lines will be truncated again, but
this config could be set if command lines are expected to be long and
userspace is known not to handle parsing logs with the wrapping.

Wrapping is based on spaces, ignoring quotes.  All lines are prefixed with
"Kernel command line: " and lines that are not the last line have a " \"
suffix added to them.  The prefix and suffix count towards the line length
for wrapping purposes.  The ideal length will be exceeded if no
appropriate place to wrap is found.

The wrapping function added here is fairly generic and could be made a
library function (somewhat like print_hex_dump()) if it's needed elsewhere
in the kernel.  However, having printk() directly incorporate this
wrapping would be unlikely to be a good idea since it would break
printouts into more than one record without any obvious common line prefix
to tie lines together.  It would also be extra overhead when, in general,
kernel log message should simply be kept smaller than 1021 bytes.  For
some discussion on this topic, see responses to the v1 posting of this
patch [2].

[akpm@linux-foundation.org: make print_kernel_cmdline __init]
[dianders@chromium.org: v4]
  Link: https://lkml.kernel.org/r/20251027082204.v4.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lkml.kernel.org/r/20251023113257.v3.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lore.kernel.org/r/20251021131633.26700Dd6-hca@linux.ibm.com [1]
Link: https://lore.kernel.org/r/CAD=FV=VNyt1zG_8pS64wgV8VkZWiWJymnZ-XCfkrfaAhhFSKcA@mail.gmail.com [2]
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Chant <achant@google.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12 10:00:16 -08:00