mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2026-04-29 12:28:27 +02:00
6db2f7c67b2804bc13fa385ff4e462fc3b366f8f
25591 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
119d1cbc6e |
Merge tag 'slab-for-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka: - A stable fix for kmalloc_nolock() in non-preemptible contexts on PREEMPT_RT (Swaraj Gaikwad) * tag 'slab-for-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: fix kmalloc_nolock() context check for PREEMPT_RT |
||
|
|
99a3e3a1cf |
slab: fix kmalloc_nolock() context check for PREEMPT_RT
On PREEMPT_RT kernels, local_lock becomes a sleeping lock. The current
check in kmalloc_nolock() only verifies we're not in NMI or hard IRQ
context, but misses the case where preemption is disabled.
When a BPF program runs from a tracepoint with preemption disabled
(preempt_count > 0), kmalloc_nolock() proceeds to call
local_lock_irqsave() which attempts to acquire a sleeping lock,
triggering:
BUG: sleeping function called from invalid context
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6128
preempt_count: 2, expected: 0
Fix this by checking !preemptible() on PREEMPT_RT, which directly
expresses the constraint that we cannot take a sleeping lock when
preemption is disabled. This encompasses the previous checks for NMI
and hard IRQ contexts while also catching cases where preemption is
disabled.
Fixes:
|
||
|
|
c25f2fb1f4 |
Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton: - A patch series from David Hildenbrand which fixes a few things related to hugetlb PMD sharing - The remainder are singletons, please see their changelogs for details * tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: restore per-memcg proactive reclaim with !CONFIG_NUMA mm/kfence: fix potential deadlock in reboot notifier Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode mm: do not copy page tables unnecessarily for VM_UFFD_WP mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather mm/rmap: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix hugetlb_pmd_shared() mm: remove unnecessary and incorrect mmap lock assert x86/kfence: avoid writing L1TF-vulnerable PTEs mm/vma: do not leak memory when .mmap_prepare swaps the file migrate: correct lock ordering for hugetlb file folios panic: only warn about deprecated panic_print on write access fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes() mm: take into account mm_cid size for mm_struct static definitions mm: rename cpu_bitmap field to flexible_array mm: add missing static initializer for init_mm::mm_cid.lock |
||
|
|
c03e9c42ae |
Merge tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski: - minor fixes for the corner cases of the SWIOTLB pool management (Robin Murphy) * tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: Avoid allocating redundant pools mm_zone: Generalise has_managed_dma() dma/pool: Improve pool lookup |
||
|
|
16aca2c98a |
mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
Commit |
||
|
|
9bc9ccbf4c |
mm/kfence: fix potential deadlock in reboot notifier
The reboot notifier callback can deadlock when calling
cancel_delayed_work_sync() if toggle_allocation_gate() is blocked in
wait_event_idle() waiting for allocations, that might not happen on
shutdown path.
The issue is that cancel_delayed_work_sync() waits for the work to
complete, but the work is waiting for kfence_allocation_gate > 0 which
requires allocations to happen (each allocation is increased by 1) -
allocations that may have stopped during shutdown.
Fix this by:
1. Using cancel_delayed_work() (non-sync) to avoid blocking. Now the
callback succeeds and return.
2. Adding wake_up() to unblock any waiting toggle_allocation_gate()
3. Adding !kfence_enabled to the wait condition so the wake succeeds
The static_branch_disable() IPI will still execute after the wake, but at
this early point in shutdown (reboot notifier runs with INT_MAX priority),
the system is still functional and CPUs can respond to IPIs.
Link: https://lkml.kernel.org/r/20260116-kfence_fix-v1-1-4165a055933f@debian.org
Fixes:
|
||
|
|
35e2470326 |
mm: do not copy page tables unnecessarily for VM_UFFD_WP
Commit |
||
|
|
8ce720d5bd |
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
As reported, ever since commit |
||
|
|
a8682d500f |
mm/rmap: fix two comments related to huge_pmd_unshare()
PMD page table unsharing no longer touches the refcount of a PMD page
table. Also, it is not about dropping the refcount of a "PMD page" but
the "PMD page table".
Let's just simplify by saying that the PMD page table was unmapped,
consequently also unmapping the folio that was mapped into this page.
This code should be deduplicated in the future.
Link: https://lkml.kernel.org/r/20251223214037.580860-4-david@kernel.org
Fixes:
|
||
|
|
3937027cae |
mm/hugetlb: fix two comments related to huge_pmd_unshare()
Ever since we stopped using the page count to detect shared PMD page tables, these comments are outdated. The only reason we have to flush the TLB early is because once we drop the i_mmap_rwsem, the previously shared page table could get freed (to then get reallocated and used for other purpose). So we really have to flush the TLB before that could happen. So let's simplify the comments a bit. The "If we unshared PMDs, the TLB flush was not recorded in mmu_gather." part introduced as in commit |
||
|
|
90888b4ae1 |
mm: remove unnecessary and incorrect mmap lock assert
This check was introduced by commit |
||
|
|
605f6586ec |
mm/vma: do not leak memory when .mmap_prepare swaps the file
The current implementation of mmap() is set up such that a struct file object is obtained for the input fd in ksys_mmap_pgoff() via fget(), and its reference count decremented at the end of the function via. fput(). If a merge can be achieved, we are fine to simply decrement the refcount on the file. Otherwise, in __mmap_new_file_vma(), we increment the reference count on the file via get_file() such that the fput() in ksys_mmap_pgoff() does not free the now-referenced file object. The introduction of the f_op->mmap_prepare hook changes things, as it becomes possible for a driver to replace the file object right at the beginning of the mmap operation. The current implementation is buggy if this happens because it unconditionally calls get_file() on the mapping's file whether or not it was replaced (and thus whether or not its reference count will be decremented at the end of ksys_mmap_pgoff()). This results in a memory leak, and was exposed in commit |
||
|
|
b7880cb166 |
migrate: correct lock ordering for hugetlb file folios
Syzbot has found a deadlock (analyzed by Lance Yang):
1) Task (5749): Holds folio_lock, then tries to acquire i_mmap_rwsem(read lock).
2) Task (5754): Holds i_mmap_rwsem(write lock), then tries to acquire
folio_lock.
migrate_pages()
-> migrate_hugetlbs()
-> unmap_and_move_huge_page() <- Takes folio_lock!
-> remove_migration_ptes()
-> __rmap_walk_file()
-> i_mmap_lock_read() <- Waits for i_mmap_rwsem(read lock)!
hugetlbfs_fallocate()
-> hugetlbfs_punch_hole() <- Takes i_mmap_rwsem(write lock)!
-> hugetlbfs_zero_partial_page()
-> filemap_lock_hugetlb_folio()
-> filemap_lock_folio()
-> __filemap_get_folio <- Waits for folio_lock!
The migration path is the one taking locks in the wrong order according to
the documentation at the top of mm/rmap.c. So expand the scope of the
existing i_mmap_lock to cover the calls to remove_migration_ptes() too.
This is (mostly) how it used to be after commit
|
||
|
|
6ac433f8b2 |
mm: rename cpu_bitmap field to flexible_array
The cpu_bitmap flexible array now contains more than just the cpu_bitmap.
In preparation for changing the static mm_struct definitions to cover for
the additional space required, change the cpu_bitmap type from "unsigned
long" to "char", require an unsigned long alignment of the flexible array,
and rename the field from "cpu_bitmap" to "flexible_array".
Introduce the MM_STRUCT_FLEXIBLE_ARRAY_INIT macro to statically initialize
the flexible array. This covers the init_mm and efi_mm static
definitions.
This is a preparation step for fixing the missing mm_cid size for static
mm_struct definitions.
Link: https://lkml.kernel.org/r/20251224173358.647691-3-mathieu.desnoyers@efficios.com
Fixes:
|
||
|
|
12a6ddfc76 |
mm: add missing static initializer for init_mm::mm_cid.lock
Initialize the mm_cid.lock struct member of init_mm.
Link: https://lkml.kernel.org/r/20251224173358.647691-2-mathieu.desnoyers@efficios.com
Fixes:
|
||
|
|
6782a30d20 |
Merge tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang: - Recognize all ZONE_DEVICE users as physaddr consumers - Fix format string for extended_linear_cache_size_show() - Fix target list setup for multiple decoders sharing the same downstream port - Restore HBIW check before derefernce platform data - Fix potential infinite loop in __cxl_dpa_reserve() - Check for invalid addresses returned from translation functions on error * tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Check for invalid addresses returned from translation functions on errors cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve() cxl/acpi: Restore HBIW check before dereferencing platform_data cxl/port: Fix target list setup for multiple decoders sharing the same dport cxl/region: fix format string for resource_size_t x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers |
||
|
|
f46c26f1bc |
mm: numa,memblock: include <asm/numa.h> for 'numa_nodes_parsed'
The 'numa_nodes_parsed' is defined in <asm/numa.h> but this file
is not included in mm/numa_memblks.c (build x86_64) so add this
to the incldues to fix the following sparse warning:
mm/numa_memblks.c:13:12: warning: symbol 'numa_nodes_parsed' was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20260108101539.229192-1-ben.dooks@codethink.co.uk
Fixes:
|
||
|
|
038a102535 |
mm/page_alloc: prevent pcp corruption with SMP=n
The kernel test robot has reported: BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 Call Trace: <IRQ> __dump_stack (lib/dump_stack.c:95) dump_stack_lvl (lib/dump_stack.c:123) dump_stack (lib/dump_stack.c:130) spin_dump (kernel/locking/spinlock_debug.c:71) do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) __free_frozen_pages (mm/page_alloc.c:2973) ___free_pages (mm/page_alloc.c:5295) __free_pages (mm/page_alloc.c:5334) tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) ? rcu_core (kernel/rcu/tree.c:?) rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) rcu_core_si (kernel/rcu/tree.c:2879) handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) irq_exit_rcu (kernel/softirq.c:741) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) </IRQ> <TASK> RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) free_pcppages_bulk (mm/page_alloc.c:1494) drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) __drain_all_pages (mm/page_alloc.c:2731) drain_all_pages (mm/page_alloc.c:2747) kcompactd (mm/compaction.c:3115) kthread (kernel/kthread.c:465) ? __cfi_kcompactd (mm/compaction.c:3166) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork (arch/x86/kernel/process.c:164) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork_asm (arch/x86/entry/entry_64.S:255) </TASK> Matthew has analyzed the report and identified that in drain_page_zone() we are in a section protected by spin_lock(&pcp->lock) and then get an interrupt that attempts spin_trylock() on the same lock. The code is designed to work this way without disabling IRQs and occasionally fail the trylock with a fallback. However, the SMP=n spinlock implementation assumes spin_trylock() will always succeed, and thus it's normally a no-op. Here the enabled lock debugging catches the problem, but otherwise it could cause a corruption of the pcp structure. The problem has been introduced by commit |
||
|
|
4795d205d7 |
mm: kmsan: fix poisoning of high-order non-compound pages
kmsan_free_page() is called by the page allocator's free_pages_prepare()
during page freeing. Its job is to poison all the memory covered by the
page. It can be called with an order-0 page, a compound high-order page
or a non-compound high-order page. But page_size() only works for order-0
and compound pages. For a non-compound high-order page it will
incorrectly return PAGE_SIZE.
The implication is that the tail pages of a high-order non-compound page
do not get poisoned at free, so any invalid access while they are free
could go unnoticed. It looks like the pages will be poisoned again at
allocation time, so that would bookend the window.
Fix this by using the order parameter to calculate the size.
Link: https://lkml.kernel.org/r/20260104134348.3544298-1-ryan.roberts@arm.com
Fixes:
|
||
|
|
3b617fd3d3 |
mm/vma: enforce VMA fork limit on unfaulted,faulted mremap merge too
The is_mergeable_anon_vma() function uses vmg->middle as the source VMA. However when merging a new VMA, this field is NULL. In all cases except mremap(), the new VMA will either be newly established and thus lack an anon_vma, or will be an expansion of an existing VMA thus we do not care about whether VMA is CoW'd or not. In the case of an mremap(), we can end up in a situation where we can accidentally allow an unfaulted/faulted merge with a VMA that has been forked, violating the general rule that we do not permit this for reasons of anon_vma lock scalability. Now we have the ability to be aware of the fact we are copying a VMA and also know which VMA that is, we can explicitly check for this, so do so. This is pertinent since commit |
||
|
|
61f67c230a |
mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
Patch series "mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge", v2. Commit |
||
|
|
590b13669b |
mm/zswap: fix error pointer free in zswap_cpu_comp_prepare()
crypto_alloc_acomp_node() may return ERR_PTR(), but the fail path checks
only for NULL and can pass an error pointer to crypto_free_acomp(). Use
IS_ERR_OR_NULL() to only free valid acomp instances.
Link: https://lkml.kernel.org/r/20251231074638.2564302-1-pbutsykin@cloudlinux.com
Fixes:
|
||
|
|
392b3d9d59 |
mm/damon/sysfs-scheme: cleanup access_pattern subdirs on scheme dir setup failure
When a DAMOS-scheme DAMON sysfs directory setup fails after setup of
access_pattern/ directory, subdirectories of access_pattern/ directory are
not cleaned up. As a result, DAMON sysfs interface is nearly broken until
the system reboots, and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-5-sj@kernel.org
Fixes:
|
||
|
|
dc7e1d75fd |
mm/damon/sysfs-scheme: cleanup quotas subdirs on scheme dir setup failure
When a DAMOS-scheme DAMON sysfs directory setup fails after setup of
quotas/ directory, subdirectories of quotas/ directory are not cleaned up.
As a result, DAMON sysfs interface is nearly broken until the system
reboots, and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-4-sj@kernel.org
Fixes:
|
||
|
|
9814cc832b |
mm/damon/sysfs: cleanup attrs subdirs on context dir setup failure
When a context DAMON sysfs directory setup is failed after setup of attrs/
directory, subdirectories of attrs/ directory are not cleaned up. As a
result, DAMON sysfs interface is nearly broken until the system reboots,
and the memory for the unremoved directory is leaked.
Cleanup the directories under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-3-sj@kernel.org
Fixes:
|
||
|
|
a24ca8ebb0 |
mm/damon/sysfs: cleanup intervals subdirs on attrs dir setup failure
Patch series "mm/damon/sysfs: free setup failures generated zombie sub-sub
dirs".
Some DAMON sysfs directory setup functions generates its sub and sub-sub
directories. For example, 'monitoring_attrs/' directory setup creates
'intervals/' and 'intervals/intervals_goal/' directories under
'monitoring_attrs/' directory. When such sub-sub directories are
successfully made but followup setup is failed, the setup function should
recursively clean up the subdirectories.
However, such setup functions are only dereferencing sub directory
reference counters. As a result, under certain setup failures, the
sub-sub directories keep having non-zero reference counters. It means the
directories cannot be removed like zombies, and the memory for the
directories cannot be freed.
The user impact of this issue is limited due to the following reasons.
When the issue happens, the zombie directories are still taking the path.
Hence attempts to generate the directories again will fail, without
additional memory leak. This means the upper bound memory leak is
limited. Nonetheless this also implies controlling DAMON with a feature
that requires the setup-failed sysfs files will be impossible until the
system reboots.
Also, the setup operations are quite simple. The certain failures would
hence only rarely happen, and are difficult to artificially trigger.
This patch (of 4):
When attrs/ DAMON sysfs directory setup is failed after setup of
intervals/ directory, intervals/intervals_goal/ directory is not cleaned
up. As a result, DAMON sysfs interface is nearly broken until the system
reboots, and the memory for the unremoved directory is leaked.
Cleanup the directory under such failures.
Link: https://lkml.kernel.org/r/20251225023043.18579-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251225023043.18579-2-sj@kernel.org
Fixes:
|
||
|
|
f9132fbc2e |
mm/damon/core: remove call_control in inactive contexts
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is not
easily exploitable since it requires the sysfs write permission and making
a definitely weird file writes, though. Please refer to the report for
more details about the issue reproduction steps.
Fix the issue by making two changes. Firstly, move the final
kdamond_call() for cancelling all existing damon_call() requests from
terminating DAMON context to be done before the ctx->kdamond reset. This
makes any code that sees NULL ctx->kdamond can safely assume the context
may not access damon_call() requests anymore. Secondly, let damon_call()
to cleanup the damon_call_control objects that were added to the
already-terminated DAMON context, before returning the error.
Link: https://lkml.kernel.org/r/20251231012315.75835-1-sj@kernel.org
Fixes:
|
||
|
|
b020191692 |
mm/hugetlb: ignore hugepage kernel args if hugepages are unsupported
Skip processing hugepage kernel arguments (hugepagesz, hugepages, and
default_hugepagesz) when hugepages are not supported by the architecture.
Some architectures may need to disable hugepages based on conditions
discovered during kernel boot. The hugepages_supported() helper allows
architecture code to advertise whether hugepages are supported.
Currently, normal hugepage allocation is guarded by hugepages_supported(),
but gigantic hugepages are allocated regardless of this check. This
causes problems on powerpc for fadump (firmware- assisted dump).
In the fadump (firmware-assisted dump) scenario, a production kernel crash
causes the system to boot into a special kernel whose sole purpose is to
collect the memory dump and reboot. Features such as hugepages are not
required in this environment and should be disabled.
For example, when the fadump kernel boots with the following kernel
arguments:
default_hugepagesz=1GB hugepagesz=1GB hugepages=200
Before this patch, the kernel prints the following logs:
HugeTLB: allocating 200 of page size 1.00 GiB failed. Only allocated 58 hugepages.
HugeTLB support is disabled!
HugeTLB: huge pages not supported, ignoring associated command-line parameters
hugetlbfs: disabling because there are no supported hugepage sizes
Even though the logs state that HugeTLB support is disabled, gigantic
hugepages are still allocated. This causes the fadump kernel to run out
of memory during boot.
After this patch is applied, the kernel prints the following logs for
the same command line:
HugeTLB: hugepages unsupported, ignoring default_hugepagesz=1GB cmdline
HugeTLB: hugepages unsupported, ignoring hugepagesz=1GB cmdline
HugeTLB: hugepages unsupported, ignoring hugepages=200 cmdline
HugeTLB support is disabled!
hugetlbfs: disabling because there are no supported hugepage sizes
To fix the issue, gigantic hugepage allocation should be guarded by
hugepages_supported().
Previously, two approaches were proposed to bring gigantic hugepage
allocation under hugepages_supported():
[1] Check hugepages_supported() in the generic code before allocating
gigantic hugepages
[2] Make arch_hugetlb_valid_size() return false for all hugetlb sizes
Approach [2] has two minor issues:
1. It prints misleading logs about invalid hugepage sizes
2. The kernel still processes hugepage kernel arguments unnecessarily
To control gigantic hugepage allocation, skip processing hugepage kernel
arguments (default_hugepagesz, hugepagesz and hugepages) when
hugepages_supported() returns false.
Note for backporting: This fix is a partial reversion of the commit
mentioned in the Fixes tag and is only valid once the change referenced by
the Depends-on tag is present. When backporting this patch, the commit
mentioned in the Depends-on tag must be included first.
Link: https://lore.kernel.org/all/20250121150419.1342794-1-sourabhjain@linux.ibm.com/ [1]
Link: https://lore.kernel.org/all/20250128043358.163372-1-sourabhjain@linux.ibm.com/ [2]
Link: https://lkml.kernel.org/r/20251224115524.1272010-1-sourabhjain@linux.ibm.com
Fixes:
|
||
|
|
b9efe36b5e |
mm/page_alloc: make percpu_pagelist_high_fraction reads lock-free
When page isolation loops indefinitely during memory offline, reading /proc/sys/vm/percpu_pagelist_high_fraction blocks on pcp_batch_high_lock, causing hung task warnings. Make procfs reads lock-free since percpu_pagelist_high_fraction is a simple integer with naturally atomic reads, writers still serialize via the mutex. This prevents hung task warnings when reading the procfs file during long-running memory offline operations. [akpm@linux-foundation.org: add comment, per Michal] Link: https://lkml.kernel.org/r/aS_y9AuJQFydLEXo@tiehlicka Link: https://lkml.kernel.org/r/20251201060009.1420792-1-aboorvad@linux.ibm.com Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
a38be54626 |
mm/damon/core: get memcg reference before access
The commit |
||
|
|
eb3f781ab7 |
mm: vmalloc: fix up vrealloc_node_align() kernel-doc macro name
Sphinx reports kernel-doc warning:
WARNING: ./mm/vmalloc.c:4284 expecting prototype for vrealloc_node_align_noprof(). Prototype was for vrealloc_node_align() instead
Fix the macro name in vrealloc_node_align_noprof() kernel-doc comment.
Link: https://lkml.kernel.org/r/20251219014006.16328-5-bagasdotme@gmail.com
Fixes:
|
||
|
|
6626734dd2 |
mm_zone: Generalise has_managed_dma()
It would be useful to be able to check for potential DMA pages beyond just ZONE_DMA - generalise the existing has_managed_dma() function to allow checking other zones too. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/bd002d2351074e57be1ca08f03f333debac658fb.1768230104.git.robin.murphy@arm.com |
||
|
|
269031b15c |
x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
Commit |
||
|
|
d6b5a8d6f1 |
mm/ksm: fix pte_unmap_unlock of wrong address in break_ksm_pmd_entry
On ARM32 with HIGHMEM/HIGHPTE, break_ksm_pmd_entry() triggers a BUG during
KSM unmerging because pte_unmap_unlock() is passed a pointer that may be
beyond the mapped PTE page.
The issue occurs when the PTE iteration loop completes without finding a
KSM page. After the loop, 'ptep' has been incremented past the last PTE
entry. On ARM32 LPAE with 512 PTEs per page (512 * 8 = 4096 bytes), this
means ptep points to the next page, outside the kmap'd region.
When pte_unmap_unlock(ptep, ptl) calls kunmap_local(ptep), it unmaps the
wrong page address, leaving the original kmap slot still mapped. The next
kmap_local then finds this slot unexpectedly occupied:
WARNING: mm/highmem.c:622 kunmap_local_indexed (address mismatch)
kernel BUG at mm/highmem.c:564 __kmap_local_pfn_prot (slot not empty)
Fix this by passing start_ptep to pte_unmap_unlock(), which always points
within the originally mapped PTE page.
Reproducer: Run LTP ksm03 test on ARM32 with HIGHMEM enabled. The test
triggers KSM merging followed by unmerging (writing 0 then 2 to
/sys/kernel/mm/ksm/run), which exercises break_ksm_pmd_entry().
Link: https://lkml.kernel.org/r/20251220202926.318366-1-sashal@kernel.org
Fixes:
|
||
|
|
a76a5ae2c6 |
mm/page_owner: fix memory leak in page_owner_stack_fops->release()
The page_owner_stack_fops->open() callback invokes seq_open_private(),
therefore its corresponding ->release() callback must call
seq_release_private(). Otherwise it will cause a memory leak of struct
stack_print_ctx.
Link: https://lkml.kernel.org/r/20251219074232.136482-1-ranxiaokai627@163.com
Fixes:
|
||
|
|
077d925b60 |
mm/memremap: fix spurious large folio warning for FS-DAX
This patch addresses a warning that I discovered while working on famfs,
which is an fs-dax file system that virtually always does PMD faults (next
famfs patch series coming after the holidays).
However, XFS also does PMD faults in fs-dax mode, and it also triggers the
warning. It takes some effort to get XFS to do a PMD fault, but
instructions to reproduce it are below.
The VM_WARN_ON_ONCE(folio_test_large(folio)) check in
free_zone_device_folio() incorrectly triggers for MEMORY_DEVICE_FS_DAX
when PMD (2MB) mappings are used.
FS-DAX legitimately creates large file-backed folios when handling PMD
faults. This is a core feature of FS-DAX that provides significant
performance benefits by mapping 2MB regions directly to persistent memory.
When these mappings are unmapped, the large folios are freed through
free_zone_device_folio(), which triggers the spurious warning.
The warning was introduced by commit that added support for large zone
device private folios. However, that commit did not account for FS-DAX
file-backed folios, which have always supported large (PMD-sized)
mappings.
The check distinguishes between anonymous folios (which clear
AnonExclusive flags for each sub-page) and file-backed folios. For
file-backed folios, it assumes large folios are unexpected - but this
assumption is incorrect for FS-DAX.
The fix is to exempt MEMORY_DEVICE_FS_DAX from the large folio warning,
allowing FS-DAX to continue using PMD mappings without triggering false
warnings.
Link: https://lkml.kernel.org/r/20251219123717.39330-1-john@groves.net
Fixes:
|
||
|
|
0c75714095 |
mm/page_alloc: report 1 as zone_batchsize for !CONFIG_MMU
Commit |
||
|
|
6db12d5c47 |
mm: memcg: fix unit conversion for K() macro in OOM log
The commit |
||
|
|
e6dbcb7c0e |
mm: fixup pfnmap memory failure handling to use pgoff
The memory failure handling implementation for the PFNMAP memory with no
struct pages is faulty. The VA of the mapping is determined based on the
the PFN. It should instead be based on the file mapping offset.
At the occurrence of poison, the memory_failure_pfn is triggered on the
poisoned PFN. Introduce a callback function that allows mm to translate
the PFN to the corresponding file page offset. The kernel module using
the registration API must implement the callback function and provide the
translation. The translated value is then used to determine the VA
information and sending the SIGBUS to the usermode process mapped to the
poisoned PFN.
The callback is also useful for the driver to be notified of the poisoned
PFN, which may then track it.
Link: https://lkml.kernel.org/r/20251211070603.338701-2-ankita@nvidia.com
Fixes:
|
||
|
|
02129e623c |
mm/damon/vaddr: fix missing pte_unmap_unlock in damos_va_migrate_pmd_entry()
If the PTE page table lock is acquired by pte_offset_map_lock(), the lock
must be released via pte_unmap_unlock().
However, in damos_va_migrate_pmd_entry(), if damos_va_filter_out() returns
true, it immediately returns without releasing the lock.
This fixes the issue by not stopping page table traversal when
damos_va_filter_out() returns true and ensuring that the lock is released.
Link: https://lkml.kernel.org/r/20251209151034.77221-1-akinobu.mita@gmail.com
Fixes:
|
||
|
|
7838a4eb8a |
mm/page_alloc: change all pageblocks migrate type on coalescing
When a page is freed it coalesces with a buddy into a higher order page
while possible. When the buddy page migrate type differs, it is expected
to be updated to match the one of the page being freed.
However, only the first pageblock of the buddy page is updated, while the
rest of the pageblocks are left unchanged.
That causes warnings in later expand() and other code paths (like below),
since an inconsistency between migration type of the list containing the
page and the page-owned pageblocks migration types is introduced.
[ 308.986589] ------------[ cut here ]------------
[ 308.987227] page type is 0, passed migratetype is 1 (nr=256)
[ 308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270
[ 308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[ 308.987439] Unloaded tainted modules: hmac_s390(E):2
[ 308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G E 6.18.0-gcc-bpf-debug #431 PREEMPT
[ 308.987657] Tainted: [E]=UNSIGNED_MODULE
[ 308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[ 308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270)
[ 308.987676] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[ 308.987688] 0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300
[ 308.987692] 0000000000000008 0000034998d57290 000002be00000100 0000023e00000008
[ 308.987696] 0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0
[ 308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2 larl %r2,000003499883abd4
00000349976fa5f6: c0e5ffe3f4b5 brasl %r14,0000034997378f60
#00000349976fa5fc: af000000 mc 0,0
>00000349976fa600: a7f4ff4c brc 15,00000349976fa498
00000349976fa604: b9040026 lgr %r2,%r6
00000349976fa608: c0300088317f larl %r3,0000034998800906
00000349976fa60e: c0e5fffdb6e1 brasl %r14,00000349976b13d0
00000349976fa614: af000000 mc 0,0
[ 308.987734] Call Trace:
[ 308.987738] [<00000349976fa600>] expand+0x240/0x270
[ 308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270)
[ 308.987749] [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940
[ 308.987754] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[ 308.987759] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[ 308.987763] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[ 308.987768] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[ 308.987774] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[ 308.987781] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[ 308.987786] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[ 308.987791] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[ 308.987799] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[ 308.987804] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[ 308.987809] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[ 308.987813] [<000003499734d70e>] do_exception+0x1de/0x540
[ 308.987822] [<0000034998387390>] __do_pgm_check+0x130/0x220
[ 308.987830] [<000003499839a934>] pgm_check_handler+0x114/0x160
[ 308.987838] 3 locks held by mempig_verify/5224:
[ 308.987842] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[ 308.987859] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[ 308.987871] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[ 308.987886] Last Breaking-Event-Address:
[ 308.987890] [<0000034997379096>] __warn_printk+0x136/0x140
[ 308.987897] irq event stamp: 52330356
[ 308.987901] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[ 308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[ 308.987913] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[ 308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[ 308.987929] ---[ end trace 0000000000000000 ]---
[ 308.987936] ------------[ cut here ]------------
[ 308.987940] page type is 0, passed migratetype is 1 (nr=256)
[ 308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0
[ 308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[ 308.988070] Unloaded tainted modules: hmac_s390(E):2
[ 308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G W E 6.18.0-gcc-bpf-debug #431 PREEMPT
[ 308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[ 308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[ 308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0)
[ 308.988118] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[ 308.988133] 0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300
[ 308.988139] 0000000000000001 0000000000000008 000002be00000100 000002be803ac000
[ 308.988144] 0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728
[ 308.988153] Krnl Code: 00000349976f9e22: c020008a06d9 larl %r2,000003499883abd4
00000349976f9e28: c0e5ffe3f89c brasl %r14,0000034997378f60
#00000349976f9e2e: af000000 mc 0,0
>00000349976f9e32: a7f4ff4e brc 15,00000349976f9cce
00000349976f9e36: b904002b lgr %r2,%r11
00000349976f9e3a: c030008a06e7 larl %r3,000003499883ac08
00000349976f9e40: c0e5fffdbac8 brasl %r14,00000349976b13d0
00000349976f9e46: af000000 mc 0,0
[ 308.988184] Call Trace:
[ 308.988188] [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0
[ 308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0)
[ 308.988202] [<00000349976ff946>] rmqueue_bulk+0x706/0x940
[ 308.988208] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[ 308.988214] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[ 308.988221] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[ 308.988227] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[ 308.988233] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[ 308.988240] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[ 308.988247] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[ 308.988253] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[ 308.988260] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[ 308.988267] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[ 308.988273] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[ 308.988279] [<000003499734d70e>] do_exception+0x1de/0x540
[ 308.988286] [<0000034998387390>] __do_pgm_check+0x130/0x220
[ 308.988293] [<000003499839a934>] pgm_check_handler+0x114/0x160
[ 308.988300] 3 locks held by mempig_verify/5224:
[ 308.988305] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[ 308.988322] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[ 308.988334] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[ 308.988346] Last Breaking-Event-Address:
[ 308.988350] [<0000034997379096>] __warn_printk+0x136/0x140
[ 308.988356] irq event stamp: 52330356
[ 308.988360] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[ 308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[ 308.988373] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[ 308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[ 308.988388] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com
Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com
Fixes:
|
||
|
|
6a0e5b3338 |
kasan: unpoison vms[area] addresses with a common tag
A KASAN tag mismatch, possibly causing a kernel panic, can be observed on
systems with a tag-based KASAN enabled and with multiple NUMA nodes. It
was reported on arm64 and reproduced on x86. It can be explained in the
following points:
1. There can be more than one virtual memory chunk.
2. Chunk's base address has a tag.
3. The base address points at the first chunk and thus inherits
the tag of the first chunk.
4. The subsequent chunks will be accessed with the tag from the
first chunk.
5. Thus, the subsequent chunks need to have their tag set to
match that of the first chunk.
Use the new vmalloc flag that disables random tag assignment in
__kasan_unpoison_vmalloc() - pass the same random tag to all the
vm_structs by tagging the pointers before they go inside
__kasan_unpoison_vmalloc(). Assigning a common tag resolves the pcpu
chunk address mismatch.
[akpm@linux-foundation.org: use WARN_ON_ONCE(), per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZeuGdKSEm11oGT6FS71_vGq1vjq-xY36kxVdFvwmag2ZQ@mail.gmail.com
[maciej.wieczor-retman@intel.com: remove unneeded pr_warn()]
Link: https://lkml.kernel.org/r/919897daaaa3c982a27762a2ee038769ad033991.1764945396.git.m.wieczorretman@pm.me
Link: https://lkml.kernel.org/r/873821114a9f722ffb5d6702b94782e902883fdf.1764874575.git.m.wieczorretman@pm.me
Fixes:
|
||
|
|
6f13db031e |
kasan: refactor pcpu kasan vmalloc unpoison
A KASAN tag mismatch, possibly causing a kernel panic, can be observed
on systems with a tag-based KASAN enabled and with multiple NUMA nodes.
It was reported on arm64 and reproduced on x86. It can be explained in
the following points:
1. There can be more than one virtual memory chunk.
2. Chunk's base address has a tag.
3. The base address points at the first chunk and thus inherits
the tag of the first chunk.
4. The subsequent chunks will be accessed with the tag from the
first chunk.
5. Thus, the subsequent chunks need to have their tag set to
match that of the first chunk.
Refactor code by reusing __kasan_unpoison_vmalloc in a new helper in
preparation for the actual fix.
Link: https://lkml.kernel.org/r/eb61d93b907e262eefcaa130261a08bcb6c5ce51.1764874575.git.m.wieczorretman@pm.me
Fixes:
|
||
|
|
007f5da43b |
mm/kasan: fix incorrect unpoisoning in vrealloc for KASAN
Patch series "kasan: vmalloc: Fixes for the percpu allocator and
vrealloc", v3.
Patches fix two issues related to KASAN and vmalloc.
The first one, a KASAN tag mismatch, possibly resulting in a kernel panic,
can be observed on systems with a tag-based KASAN enabled and with
multiple NUMA nodes. Initially it was only noticed on x86 [1] but later a
similar issue was also reported on arm64 [2].
Specifically the problem is related to how vm_structs interact with
pcpu_chunks - both when they are allocated, assigned and when pcpu_chunk
addresses are derived.
When vm_structs are allocated they are unpoisoned, each with a different
random tag, if vmalloc support is enabled along the KASAN mode. Later
when first pcpu chunk is allocated it gets its 'base_addr' field set to
the first allocated vm_struct. With that it inherits that vm_struct's
tag.
When pcpu_chunk addresses are later derived (by pcpu_chunk_addr(), for
example in pcpu_alloc_noprof()) the base_addr field is used and offsets
are added to it. If the initial conditions are satisfied then some of the
offsets will point into memory allocated with a different vm_struct. So
while the lower bits will get accurately derived the tag bits in the top
of the pointer won't match the shadow memory contents.
The solution (proposed at v2 of the x86 KASAN series [3]) is to unpoison
the vm_structs with the same tag when allocating them for the per cpu
allocator (in pcpu_get_vm_areas()).
The second one reported by syzkaller [4] is related to vrealloc and
happens because of random tag generation when unpoisoning memory without
allocating new pages. This breaks shadow memory tracking and needs to
reuse the existing tag instead of generating a new one. At the same time
an inconsistency in used flags is corrected.
This patch (of 3):
Syzkaller reported a memory out-of-bounds bug [4]. This patch fixes two
issues:
1. In vrealloc the KASAN_VMALLOC_VM_ALLOC flag is missing when
unpoisoning the extended region. This flag is required to correctly
associate the allocation with KASAN's vmalloc tracking.
Note: In contrast, vzalloc (via __vmalloc_node_range_noprof)
explicitly sets KASAN_VMALLOC_VM_ALLOC and calls
kasan_unpoison_vmalloc() with it. vrealloc must behave consistently --
especially when reusing existing vmalloc regions -- to ensure KASAN can
track allocations correctly.
2. When vrealloc reuses an existing vmalloc region (without allocating
new pages) KASAN generates a new tag, which breaks tag-based memory
access tracking.
Introduce KASAN_VMALLOC_KEEP_TAG, a new KASAN flag that allows reusing the
tag already attached to the pointer, ensuring consistent tag behavior
during reallocation.
Pass KASAN_VMALLOC_KEEP_TAG and KASAN_VMALLOC_VM_ALLOC to the
kasan_unpoison_vmalloc inside vrealloc_node_align_noprof().
Link: https://lkml.kernel.org/r/cover.1765978969.git.m.wieczorretman@pm.me
Link: https://lkml.kernel.org/r/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me
Link: https://lore.kernel.org/all/e7e04692866d02e6d3b32bb43b998e5d17092ba4.1738686764.git.maciej.wieczor-retman@intel.com/ [1]
Link: https://lore.kernel.org/all/aMUrW1Znp1GEj7St@MiWiFi-R3L-srv/ [2]
Link: https://lore.kernel.org/all/CAPAsAGxDRv_uFeMYu9TwhBVWHCCtkSxoWY4xmFB_vowMbi8raw@mail.gmail.com/ [3]
Link: https://syzkaller.appspot.com/bug?extid=997752115a851cb0cf36 [4]
Fixes:
|
||
|
|
44f9a00a44 |
Merge tag 'slab-for-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka: - A stable fix for a missing tag reset that can happen in kfree_nolock() with KASAN+SLUB_TINY configs (Deepanshu Kartikey) * tag 'slab-for-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: reset KASAN tag in defer_free() before accessing freed memory |
||
|
|
40fbbd64bb |
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull shmem rename fixes from Al Viro: "A couple of shmem rename fixes - recent regression from tree-in-dcache series and older breakage from stable directory offsets stuff" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: shmem: fix recovery on rename failures shmem_whiteout(): fix regression from tree-in-dcache series |
||
|
|
e1b4c6a583 |
shmem: fix recovery on rename failures
maple_tree insertions can fail if we are seriously short on memory;
simple_offset_rename() does not recover well if it runs into that.
The same goes for simple_offset_rename_exchange().
Moreover, shmem_whiteout() expects that if it succeeds, the caller will
progress to d_move(), i.e. that shmem_rename2() won't fail past the
successful call of shmem_whiteout().
Not hard to fix, fortunately - mtree_store() can't fail if the index we
are trying to store into is already present in the tree as a singleton.
For simple_offset_rename_exchange() that's enough - we just need to be
careful about the order of operations.
For simple_offset_rename() solution is to preinsert the target into the
tree for new_dir; the rest can be done without any potentially failing
operations.
That preinsertion has to be done in shmem_rename2() rather than in
simple_offset_rename() itself - otherwise we'd need to deal with the
possibility of failure after successful shmem_whiteout().
Fixes:
|
||
|
|
3010f06c52 |
shmem_whiteout(): fix regression from tree-in-dcache series
Now that shmem_mknod() hashes the new dentry, d_rehash() in
shmem_whiteout() should be removed.
X-paperbag: brown
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Fixes:
|
||
|
|
9d9c1cfec0 |
Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton: "There are no significant series in this small merge. Please see the individual changelogs for details" [ Editor's note: it's mainly ocfs2 and a couple of random fixes ] * tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: memfd_luo: add CONFIG_SHMEM dependency mm: shmem: avoid build warning for CONFIG_SHMEM=n ocfs2: fix memory leak in ocfs2_merge_rec_left() ocfs2: invalidate inode if i_mode is zero after block read ocfs2: avoid -Wflex-array-member-not-at-end warning ocfs2: convert remaining read-only checks to ocfs2_emergency_state ocfs2: add ocfs2_emergency_state helper and apply to setattr checkpatch: add uninitialized pointer with __free attribute check args: fix documentation to reflect the correct numbers ocfs2: fix kernel BUG in ocfs2_find_victim_chain liveupdate: luo_core: fix redundant bound check in luo_ioctl() ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list fs/fat: remove unnecessary wrapper fat_max_cache() ocfs2: replace deprecated strcpy with strscpy ocfs2: check tl_used after reading it from trancate log inode liveupdate: luo_file: don't use invalid list iterator |
||
|
|
2516a87153 |
Merge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton: - "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand) fixes a couple of minor things in ppc land - "Improve folio split related functions" (Zi Yan) some cleanups and minorish fixes in the folio splitting code * tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/damon/tests/core-kunit: avoid damos_test_commit stack warning mm: vmscan: correct nr_requested tracing in scan_folios MAINTAINERS: add idr core-api doc file to XARRAY mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages() mm: fix CONFIG_STACK_GROWSUP typo in mm.h mm/huge_memory: fix folio split stats counting mm/huge_memory: make min_order_for_split() always return an order mm/huge_memory: replace can_split_folio() with direct refcount calculation mm/huge_memory: change folio_split_supported() to folio_check_splittable() mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION |