mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2026-05-26 11:40:24 +02:00
84060ea3e0b6294abde57b85502ccf9fa65f94de
24228 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f717acc6e9 |
Merge tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport: - Mark set_high_memory() as __init to fix section mismatch - Accept memory allocated in memblock_double_array() to mitigate crash of SNP guests * tag 'fixes-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: Accept allocated memory before use in memblock_double_array() mm,mm_init: Mark set_high_memory as __init |
||
|
|
da8bf5daa5 |
memblock: Accept allocated memory before use in memblock_double_array()
When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:
RIP: 0010:memcpy_orig+0x68/0x130
Code: ...
RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
memblock_double_array+0xff/0x310
memblock_add_range+0x1fb/0x2f0
memblock_reserve+0x4f/0xa0
memblock_alloc_range_nid+0xac/0x130
memblock_alloc_internal+0x53/0xc0
memblock_alloc_try_nid+0x3d/0xa0
swiotlb_init_remap+0x149/0x2f0
mem_init+0xb/0xb0
mm_core_init+0x8f/0x350
start_kernel+0x17e/0x5d0
x86_64_start_reservations+0x14/0x30
x86_64_start_kernel+0x92/0xa0
secondary_startup_64_no_verify+0x194/0x19b
Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.
Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.
Cc: stable@vger.kernel.org # see patch description, needs adjustments for <= 6.11
Fixes:
|
||
|
|
7b08b74f3d |
mm: fix folio_pte_batch() on XEN PV
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
For example, with a 4-page folio, the PTE layout might look like this:
[ 53.465673] [ T2552] folio_pte_batch: printing PTE values at addr=0x7f1ac9dc5000
[ 53.465674] [ T2552] PTE[453] = 000000010085c125
[ 53.465679] [ T2552] PTE[454] = 000000010085d125
[ 53.465682] [ T2552] PTE[455] = 000000010085e125
[ 53.465684] [ T2552] PTE[456] = 000000010085f125
[ 53.465686] [ T2552] PTE[457] = 0000000000000000 <-- not present
[ 53.465689] [ T2552] PTE[458] = 0000000101da7125
pte_advance_pfn(PTE[456]) returns a pte_none() due to invalid PFN->MFN
mapping. The next actual PTE (PTE[457]) is also pte_none(), so the loop
continues and includes PTE[457] in the batch, resulting in 5 batched
entries for a 4-page folio. This triggers the following warning:
[ 53.465751] [ T2552] page: refcount:85 mapcount:20 mapping:ffff88813ff4f6a8 index:0x110 pfn:0x10085c
[ 53.465754] [ T2552] head: order:2 mapcount:80 entire_mapcount:0 nr_pages_mapped:4 pincount:0
[ 53.465756] [ T2552] memcg:ffff888003573000
[ 53.465758] [ T2552] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[ 53.465761] [ T2552] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 53.465764] [ T2552] raw: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465767] [ T2552] raw: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465768] [ T2552] head: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465770] [ T2552] head: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465772] [ T2552] head: 0020000000000202 ffffea0004021701 000000040000004f 00000000ffffffff
[ 53.465774] [ T2552] head: 0000000300000003 8000000300000002 0000000000000013 0000000000000004
[ 53.465775] [ T2552] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio)
Original code works as expected everywhere, except on XEN PV, where
pte_advance_pfn() can yield a pte_none() after balloon inflation due to
MFNs invalidation. In XEN, pte_advance_pfn() ends up calling
__pte()->xen_make_pte()->pte_pfn_to_mfn(), which returns pte_none() when
mfn == INVALID_P2M_ENTRY.
The pte_pfn_to_mfn() documents that nastiness:
If there's no mfn for the pfn, then just create an
empty non-present pte. Unfortunately this loses
information about the original pfn, so
pte_mfn_to_pfn is asymmetric.
While such hacks should certainly be removed, we can do better in
folio_pte_batch() and simply check ahead of time how many PTEs we can
possibly batch in our folio.
This way, we can not only fix the issue but cleanup the code: removing the
pte_pfn() check inside the loop body and avoiding end_ptr comparison +
arithmetic.
Link: https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
Fixes:
|
||
|
|
dac2a4f663 |
mm/hugetlb: copy the CMA flag when demoting
Since commit |
||
|
|
9a9794a81a |
mm, swap: fix false warning for large allocation with !THP_SWAP
The !CONFIG_THP_SWAP check existed before just fine because slot cache
would reject high order allocation and let the caller split all folios and
try again.
But slot cache is gone, so large allocation will directly go to the
allocator, and the allocator should just fail silently to inform caller to
do the folio split, this is totally fine and expected.
Remove this meaningless warning.
Link: https://lkml.kernel.org/r/20250429094803.85518-1-ryncsn@gmail.com
Fixes:
|
||
|
|
a0309faf1c |
mm: vmalloc: support more granular vrealloc() sizing
Introduce struct vm_struct::requested_size so that the requested
(re)allocation size is retained separately from the allocated area size.
This means that KASAN will correctly poison the correct spans of requested
bytes. This also means we can support growing the usable portion of an
allocation that can already be supported by the existing area's existing
allocation.
Link: https://lkml.kernel.org/r/20250426001105.it.679-kees@kernel.org
Fixes:
|
||
|
|
be6e843fc5 |
mm/huge_memory: fix dereferencing invalid pmd migration entry
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below. To prevent this invalid access, it is necessary to
check the PMD migration entry and return early. In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked, it
cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
Fixes:
|
||
|
|
42e31f0daf |
mm,mm_init: Mark set_high_memory as __init
set_high_memory() touches arch_zone_lowest_possible_pfn which is marked as __initdata, which creates a section mismatch. Since the only user of the function is free_area_init() which is also marked as __init, mark set_high_memory() as __init as well. Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505060901.Qcs06UoB-lkp@intel.com/ Link: https://lore.kernel.org/r/20250506111012.108743-1-osalvador@suse.de Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
||
|
|
9910affec3 |
Merge tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka: - Stable fix to avoid bugs due to leftover obj_ext after allocation profiling is disabled at runtime (Zhenhua Huang) * tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm, slab: clean up slab->obj_exts always |
||
|
|
4b5256f990 |
Merge tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Fixes for nid setting in memmap_init_reserved_pages():
- pass 'size' rather than 'end' to memblock_set_node() as that
function expects
- fix a corner case when memblock.reserved is doubled at
memmap_init_reserved_pages() and the newly reserved block
won't have nid assigned"
* tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add test for memblock_set_node
mm/memblock: repeat setting reserved region nid if array is doubled
mm/memblock: pass size instead of end to memblock_set_node()
|
||
|
|
be8250786c |
mm, slab: clean up slab->obj_exts always
When memory allocation profiling is disabled at runtime or due to an
error, shutdown_mem_profiling() is called: slab->obj_exts which
previously allocated remains.
It won't be cleared by unaccount_slab() because of
mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts
should always be cleaned up in unaccount_slab() to avoid following error:
[...]BUG: Bad page state in process...
..
[...]page dumped because: page still charged to cgroup
[andriy.shevchenko@linux.intel.com: fold need_slab_obj_ext() into its only user]
Fixes:
|
||
|
|
2d900efff9 |
mm/migrate: fix sleep in atomic for large folios and buffer heads
The large folio + buffer head noref migration scenarios are being naughty and blocking while holding a spinlock. As a consequence of the pagecache lookup path taking the folio lock this serializes against migration paths, so they can wait for each other. For the private_lock atomic case, a new BH_Migrate flag is introduced which enables the lookup to bail. This allows the critical region of the private_lock on the migration path to be reduced to the way it was before |
||
|
|
6fea5fabd3 |
Merge tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton: "16 hotfixes. 2 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. All patches are basically for MM although five are alterations to MAINTAINERS" [ Basic counting skills are clearly not a strictly necessary requirement for kernel maintainers. - Linus ] * tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add section for locking of mm's and VMAs mm: vmscan: fix kswapd exit condition in defrag_mode mm: vmscan: restore high-cpu watermark safety in kswapd MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization mm, hugetlb: increment the number of pages to be reset on HVO writeback: fix false warning in inode_to_wb() docs: ABI: replace mcroce@microsoft.com with new Meta address mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable() MAINTAINERS: add memory advice section MAINTAINERS: add mmap trace events to MEMORY MAPPING mm: memcontrol: fix swap counter leak from offline cgroup MAINTAINERS: add MM subsection for the page allocator MAINTAINERS: update SLAB ALLOCATOR maintainers fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page() |
||
|
|
3bf8a4598f |
Merge tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook: - lib/prime_numbers: KUnit test should not select PRIME_NUMBERS (Geert Uytterhoeven) - ubsan: Fix panic from test_ubsan_out_of_bounds (Mostafa Saleh) - ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP (Nathan Chancellor) - string: Add load_unaligned_zeropad() code path to sized_strscpy() (Peter Collingbourne) - kasan: Add strscpy() test to trigger tag fault on arm64 (Vincenzo Frascino) - Disable GCC randstruct for COMPILE_TEST * tag 'hardening-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/prime_numbers: KUnit test should not select PRIME_NUMBERS ubsan: Fix panic from test_ubsan_out_of_bounds lib/Kconfig.ubsan: Remove 'default UBSAN' from UBSAN_INTEGER_WRAP hardening: Disable GCC randstruct for COMPILE_TEST kasan: Add strscpy() test to trigger tag fault on arm64 string: Add load_unaligned_zeropad() code path to sized_strscpy() |
||
|
|
a1f0220f33 |
mm: vmscan: fix kswapd exit condition in defrag_mode
Vlastimil points out an issue with kswapd in defrag_mode not waking up
kcompactd reliably.
Background: When kswapd is woken for any higher-order request, it
initially checks those high-order watermarks to decide if work is
necesary. However, it cannot (efficiently) meet the contiguity goal of
such a request by itself. So once it has reclaimed a compaction gap, it
adjusts the request down to check for free order-0 pages, then wakes
kcompactd to coalesce them into larger blocks.
In defrag_mode, the initial watermark check needs to be analogously
against free pageblocks. However, once kswapd drops the high-order to
hand off contiguity work, it also needs to fall back to base page
watermarks - otherwise it'll keep reclaiming until blocks are freed.
While it appears kcompactd is woken up frequently enough to do most of the
compaction work, kswapd ends up overreclaiming by quite a bit:
DEFRAGMODE DEFRAGMODE-thispatch
Hugealloc Time mean 79381.34 ( +0.00%) 88126.12 ( +11.02%)
Hugealloc Time stddev 85852.16 ( +0.00%) 135366.75 ( +57.67%)
Kbuild Real time 249.35 ( +0.00%) 226.71 ( -9.04%)
Kbuild User time 1249.16 ( +0.00%) 1249.37 ( +0.02%)
Kbuild System time 171.76 ( +0.00%) 166.93 ( -2.79%)
THP fault alloc 51666.87 ( +0.00%) 52685.60 ( +1.97%)
THP fault fallback 16970.00 ( +0.00%) 15951.87 ( -6.00%)
Direct compact fail 166.53 ( +0.00%) 178.93 ( +7.40%)
Direct compact success 17.13 ( +0.00%) 4.13 ( -71.69%)
Compact daemon scanned migrate 3095413.33 ( +0.00%) 9231239.53 ( +198.22%)
Compact daemon scanned free 2155966.53 ( +0.00%) 7053692.87 ( +227.17%)
Compact direct scanned migrate 265642.47 ( +0.00%) 68388.33 ( -74.26%)
Compact direct scanned free 130252.60 ( +0.00%) 55634.87 ( -57.29%)
Compact total migrate scanned 3361055.80 ( +0.00%) 9299627.87 ( +176.69%)
Compact total free scanned 2286219.13 ( +0.00%) 7109327.73 ( +210.96%)
Alloc stall 1890.80 ( +0.00%) 6297.60 ( +232.94%)
Pages kswapd scanned 9043558.80 ( +0.00%) 5952576.73 ( -34.18%)
Pages kswapd reclaimed 1891708.67 ( +0.00%) 1030645.00 ( -45.52%)
Pages direct scanned 1017090.60 ( +0.00%) 2688047.60 ( +164.29%)
Pages direct reclaimed 92682.60 ( +0.00%) 309770.53 ( +234.22%)
Pages total scanned 10060649.40 ( +0.00%) 8640624.33 ( -14.11%)
Pages total reclaimed 1984391.27 ( +0.00%) 1340415.53 ( -32.45%)
Swap out 884585.73 ( +0.00%) 417781.93 ( -52.77%)
Swap in 287106.27 ( +0.00%) 95589.73 ( -66.71%)
File refaults 551697.60 ( +0.00%) 426474.80 ( -22.70%)
Link: https://lkml.kernel.org/r/20250416135142.778933-3-hannes@cmpxchg.org
Fixes:
|
||
|
|
3844818145 |
mm: vmscan: restore high-cpu watermark safety in kswapd
Vlastimil points out that commit |
||
|
|
8bdea2fce9 |
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with
CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute
mapped shared vs. mapped exclusively) to then adjust the entire mapcount.
This means that another process might stumble in do_wp_page() over a
PTE-mapped PMD folio that is indicated as "exclusively mapped", but still
has an entire mapcount (PMD mapping), because it is racing with the
process that is unmapping the folio (PMD mapping). Note that do_wp_page()
will back off once it detects the remaining folio reference from the
process that is in the process of unmapping the folio.
This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio))
check in do_wp_page(), that can easily be reproduced by looping a couple
of times over allocating a PMD THP, forking a child where we immediately
unmap it again, and writing in the parent concurrently to the THP.
[ 252.738129][T16470] ------------[ cut here ]------------
[ 252.739267][T16470] WARNING: CPU: 3 PID: 16470 at mm/memory.c:3738 do_wp_page+0x2a75/0x2c00
[ 252.740968][T16470] Modules linked in:
[ 252.741958][T16470] CPU: 3 UID: 0 PID: 16470 Comm: ...
...
[ 252.765841][T16470] <TASK>
[ 252.766419][T16470] ? srso_alias_return_thunk+0x5/0xfbef5
[ 252.767558][T16470] ? rcu_is_watching+0x12/0x60
[ 252.768525][T16470] ? srso_alias_return_thunk+0x5/0xfbef5
[ 252.769645][T16470] ? srso_alias_return_thunk+0x5/0xfbef5
[ 252.770778][T16470] ? lock_acquire+0x33/0x80
[ 252.771697][T16470] ? __handle_mm_fault+0x5e8/0x3e40
[ 252.772735][T16470] ? __handle_mm_fault+0x5e8/0x3e40
[ 252.773781][T16470] __handle_mm_fault+0x1869/0x3e40
[ 252.774839][T16470] handle_mm_fault+0x22a/0x640
[ 252.775808][T16470] do_user_addr_fault+0x618/0x1000
[ 252.776847][T16470] exc_page_fault+0x68/0xd0
[ 252.777775][T16470] asm_exc_page_fault+0x26/0x30
While we could adjust the sequence in __folio_remove_rmap(), let's rater
move the mapcount sanity checks after the mapcount vs. refcount
stabilization phase. With this fix, a simple reproducer is happy.
While at it, convert the two VM_WARN_ON_ONCE() we are moving to
VM_WARN_ON_ONCE_FOLIO().
Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com
Fixes:
|
||
|
|
274fe92de2 |
mm, hugetlb: increment the number of pages to be reset on HVO
commit |
||
|
|
8c03ebd7cd |
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
Not like fault_in_readable() or fault_in_writeable(), in
fault_in_safe_writeable() local variable 'start' is increased page by page
to loop till the whole address range is handled. However, it mistakenly
calculates the size of the handled range with 'uaddr - start'.
Fix it here.
Andreas said:
: In gfs2, fault_in_iov_iter_writeable() is used in
: gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially
: affects buffered as well as direct reads. This bug could cause those
: gfs2 functions to spin in a loop.
Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes:
|
||
|
|
6b956934ad |
mm: memcontrol: fix swap counter leak from offline cgroup
commit |
||
|
|
4067196a52 |
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
When the last page in the zone is accepted, __accept_page() calls
static_branch_dec(). This function takes cpu_hotplug_lock, which can lead
to a deadlock if the allocation occurs during CPU bringup path as
_cpu_up() also takes the lock.
To prevent this deadlock, defer static_branch_dec() to a workqueue.
Call static_branch_dec() only when the workqueue is not yet initialized.
Workqueues are initialized before CPU bring up, so this will not conflict
with the first scenario.
Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com
Fixes:
|
||
|
|
a54f4a97e3 |
Merge tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka: - Stable fix adding zero initialization of slab->obj_ext to prevent crashes with allocation profiling (Suren Baghdasaryan) * tag 'slab-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: ensure slab->obj_exts is clear in a newly allocated slab page |
||
|
|
62d32440ac |
kasan: Add strscpy() test to trigger tag fault on arm64
When we invoke strscpy() with a maximum size of N bytes, it assumes that: - It can always read N bytes from the source. - It always write N bytes (zero-padded) to the destination. On aarch64 with Memory Tagging Extension enabled if we pass an N that is bigger then the source buffer, it would previously trigger an MTE fault. Implement a KASAN KUnit test that triggers the issue with the previous implementation of read_word_at_a_time() on aarch64 with MTE enabled. Cc: Will Deacon <will@kernel.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Peter Collingbourne <pcc@google.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://linux-review.googlesource.com/id/If88e396b9e7c058c1a4b5a252274120e77b1898a Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250403000703.2584581-3-pcc@google.com Signed-off-by: Kees Cook <kees@kernel.org> |
||
|
|
d2f5819b6e |
slab: ensure slab->obj_exts is clear in a newly allocated slab page
ktest recently reported crashes while running several buffered io tests
with __alloc_tagging_slab_alloc_hook() at the top of the crash call stack.
The signature indicates an invalid address dereference with low bits of
slab->obj_exts being set. The bits were outside of the range used by
page_memcg_data_flags and objext_flags and hence were not masked out
by slab_obj_exts() when obtaining the pointer stored in slab->obj_exts.
The typical crash log looks like this:
00510 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
00510 Mem abort info:
00510 ESR = 0x0000000096000045
00510 EC = 0x25: DABT (current EL), IL = 32 bits
00510 SET = 0, FnV = 0
00510 EA = 0, S1PTW = 0
00510 FSC = 0x05: level 1 translation fault
00510 Data abort info:
00510 ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
00510 CM = 0, WnR = 1, TnD = 0, TagAccess = 0
00510 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00510 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000104175000
00510 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00510 Internal error: Oops: 0000000096000045 [#1] SMP
00510 Modules linked in:
00510 CPU: 10 UID: 0 PID: 7692 Comm: cat Not tainted 6.15.0-rc1-ktest-g189e17946605 #19327 NONE
00510 Hardware name: linux,dummy-virt (DT)
00510 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00510 pc : __alloc_tagging_slab_alloc_hook+0xe0/0x190
00510 lr : __kmalloc_noprof+0x150/0x310
00510 sp : ffffff80c87df6c0
00510 x29: ffffff80c87df6c0 x28: 000000000013d1ff x27: 000000000013d200
00510 x26: ffffff80c87df9e0 x25: 0000000000000000 x24: 0000000000000001
00510 x23: ffffffc08041953c x22: 000000000000004c x21: ffffff80c0002180
00510 x20: fffffffec3120840 x19: ffffff80c4821000 x18: 0000000000000000
00510 x17: fffffffec3d02f00 x16: fffffffec3d02e00 x15: fffffffec3d00700
00510 x14: fffffffec3d00600 x13: 0000000000000200 x12: 0000000000000006
00510 x11: ffffffc080bb86c0 x10: 0000000000000000 x9 : ffffffc080201e58
00510 x8 : ffffff80c4821060 x7 : 0000000000000000 x6 : 0000000055555556
00510 x5 : 0000000000000001 x4 : 0000000000000010 x3 : 0000000000000060
00510 x2 : 0000000000000000 x1 : ffffffc080f50cf8 x0 : ffffff80d801d000
00510 Call trace:
00510 __alloc_tagging_slab_alloc_hook+0xe0/0x190 (P)
00510 __kmalloc_noprof+0x150/0x310
00510 __bch2_folio_create+0x5c/0xf8
00510 bch2_folio_create+0x2c/0x40
00510 bch2_readahead+0xc0/0x460
00510 read_pages+0x7c/0x230
00510 page_cache_ra_order+0x244/0x3a8
00510 page_cache_async_ra+0x124/0x170
00510 filemap_readahead.isra.0+0x58/0xa0
00510 filemap_get_pages+0x454/0x7b0
00510 filemap_read+0xdc/0x418
00510 bch2_read_iter+0x100/0x1b0
00510 vfs_read+0x214/0x300
00510 ksys_read+0x6c/0x108
00510 __arm64_sys_read+0x20/0x30
00510 invoke_syscall.constprop.0+0x54/0xe8
00510 do_el0_svc+0x44/0xc8
00510 el0_svc+0x18/0x58
00510 el0t_64_sync_handler+0x104/0x130
00510 el0t_64_sync+0x154/0x158
00510 Code: d5384100 f9401c01 b9401aa3 b40002e1 (f8227881)
00510 ---[ end trace 0000000000000000 ]---
00510 Kernel panic - not syncing: Oops: Fatal exception
00510 SMP: stopping secondary CPUs
00510 Kernel Offset: disabled
00510 CPU features: 0x0000,000000e0,00000410,8240500b
00510 Memory Limit: none
Investigation indicates that these bits are already set when we allocate
slab page and are not zeroed out after allocation. We are not yet sure
why these crashes start happening only recently but regardless of the
reason, not initializing a field that gets used later is wrong. Fix it
by initializing slab->obj_exts during slab page allocation.
Fixes:
|
||
|
|
a995199384 |
mm: fix apply_to_existing_page_range()
In the case of apply_to_existing_page_range(), apply_to_pte_range() is
reached with 'create' set to false. When !create, the loop over the PTE
page table is broken.
apply_to_pte_range() will only move to the next PTE entry if 'create' is
true or if the current entry is not pte_none().
This means that the user of apply_to_existing_page_range() will not have
'fn' called for any entries after the first pte_none() in the PTE page
table.
Fix the loop logic in apply_to_pte_range().
There are no known runtime issues from this, but the fix is trivial enough
for stable@ even without a known buggy user.
Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes:
|
||
|
|
8c56c5dbcf |
mm: (un)track_pfn_copy() fix + doc improvements
We got a late smatch warning and some additional review feedback.
smatch warnings:
mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'.
We actually use the pfn only when it is properly initialized; however, we
may pass an uninitialized value to a function -- although it will not use
it that likely still is UB in C.
So let's just fix it by always initializing pfn in the caller of
track_pfn_copy(), and improving the documentation of track_pfn_copy().
While at it, clarify the doc of untrack_pfn_copy(), that internal checks
make sure if we actually have to untrack anything.
Link: https://lkml.kernel.org/r/20250408085950.976103-1-david@redhat.com
Fixes:
|
||
|
|
8ab1b16023 |
mm: fix filemap_get_folios_contig returning batches of identical folios
filemap_get_folios_contig() is supposed to return distinct folios found
within [start, end]. Large folios in the Xarray become multi-index
entries. xas_next() can iterate through the sub-indexes before finding a
sibling entry and breaking out of the loop.
This can result in a returned folio_batch containing an indeterminate
number of duplicate folios, which forces the callers to skeptically handle
the returned batch. This is inefficient and incurs a large maintenance
overhead.
We can fix this by calling xas_advance() after we have successfully adding
a folio to the batch to ensure our Xarray is positioned such that it will
correctly find the next folio - similar to filemap_get_read_batch().
Link: https://lkml.kernel.org/r/Z-8s1-kiIDkzgRbc@fedora
Fixes:
|
||
|
|
9e2bd67773 |
mm/hugetlb: add a line break at the end of the format string
Missing line break at the end of the format string. Link: https://lkml.kernel.org/r/20250407103017.2979821-1-18810879172@163.com Signed-off-by: wangxuewen <wangxuewen@kylinos.cn> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
aabf58bfaa |
mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
In set_max_huge_pages(), min_count is computed taking into account surplus
huge pages, which might lead in some cases to not be able to free huge
pages and end up accounting them as surplus instead.
One way to solve it is to subtract surplus_huge_pages directly, but we
cannot do it blindly because there might be surplus pages that are also
free pages, which might happen when we fail to restore the vmemmap for
optimized hvo pages. So we could be subtracting the same page twice.
In order to work this around, let us first compute the number of free
persistent pages, and use that along with surplus pages to compute
min_count.
Steps to reproduce:
1) create 5 hugetlb folios in Node0
2) run a program to use all the hugetlb folios
3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus
the 5 hugetlb folios in Node0 are accounted as surplus.
4) create 5 hugetlb folios in Node1
5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios
The result:
Node0 Node1
Total 5 5
Free 0 5
Surp 5 5
The result with this patch:
Node0 Node1
Total 5 0
Free 0 0
Surp 5 0
Link: https://lkml.kernel.org/r/20250409055957.3774471-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20250407124706.2688092-1-tujinjiang@huawei.com
Fixes:
|
||
|
|
60580e0bd5 |
mm/cma: report base address of single range correctly
The cma_declare_contiguous_nid code was refactored by commit |
||
|
|
90abee6d78 |
mm: page_alloc: speed up fallbacks in rmqueue_bulk()
The test robot identified |
||
|
|
e2ffee91c4 |
mm/kasan: add module decription
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in mm/kasan/kasan_test.o
[akpm@linux-foundation.org: update description text, per Andrey]
Link: https://lkml.kernel.org/r/20250324173242.1501003-9-arnd@kernel.org
Fixes:
|
||
|
|
41e6ddcaa0 |
mm/vma: add give_up_on_oom option on modify/merge, use in uffd release
Currently, if a VMA merge fails due to an OOM condition arising on commit
merge or a failure to duplicate anon_vma's, we report this so the caller
can handle it.
However there are cases where the caller is only ostensibly trying a
merge, and doesn't mind if it fails due to this condition.
Since we do not want to introduce an implicit assumption that we only
actually modify VMAs after OOM conditions might arise, add a 'give up on
oom' option and make an explicit contract that, should this flag be set, we
absolutely will not modify any VMAs should OOM arise and just bail out.
Since it'd be very unusual for a user to try to vma_modify() with this flag
set but be specifying a range within a VMA which ends up being split (which
can fail due to rlimit issues, not only OOM), we add a debug warning for
this condition.
The motivating reason for this is uffd release - syzkaller (and Pedro
Falcato's VERY astute analysis) found a way in which an injected fault on
allocation, triggering an OOM condition on commit merge, would result in
uffd code becoming confused and treating an error value as if it were a VMA
pointer.
To avoid this, we make use of this new VMG flag to ensure that this never
occurs, utilising the fact that, should we be clearing entire VMAs, we do
not wish an OOM event to be reported to us.
Many thanks to Pedro Falcato for his excellent analysis and Jann Horn for
his insightful and intelligent analysis of the situation, both of whom were
instrumental in this fix.
Link: https://lkml.kernel.org/r/20250321100937.46634-1-lorenzo.stoakes@oracle.com
Reported-by: syzbot+20ed41006cf9d842c2b5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001e.GAE@google.com/
Fixes:
|
||
|
|
382360d289 |
mm/hugetlb: fix nid mismatch in alloc_surplus_hugetlb_folio()
It's wrong to use nid directly since the nid may be changed in allocation.
Use folio_nid() to obtain the nid of folio instead.
Fix:
|
||
|
|
c5bb27e2da |
mm/page_alloc: avoid second trylock of zone->lock
spin_trylock followed by spin_lock will cause extra write cache access.
If the lock is contended it may cause unnecessary cache line bouncing and
will execute redundant irq restore/save pair. Therefore, check
alloc/fpi_flags first and use spin_trylock or spin_lock.
Link: https://lkml.kernel.org/r/20250331002809.94758-1-alexei.starovoitov@gmail.com
Fixes:
|
||
|
|
a84edd52f0 |
mm/compaction: fix bug in hugetlb handling pathway
The compaction code doesn't take references on pages until we're certain we should attempt to handle it. In the hugetlb case, isolate_or_dissolve_huge_page() may return -EBUSY without taking a reference to the folio associated with our pfn. If our folio's refcount drops to 0, compound_nr() becomes unpredictable, making low_pfn and nr_scanned unreliable. The user-visible effect is minimal - this should rarely happen (if ever). Fix this by storing the folio statistics earlier on the stack (just like the THP and Buddy cases). Also revert commit |
||
|
|
51339d99c0 |
locking/local_lock, mm: replace localtry_ helpers with local_trylock_t type
Partially revert commit |
||
|
|
eac8ea8736 |
mm/memblock: repeat setting reserved region nid if array is doubled
Commit |
||
|
|
06eaa824fd |
mm/memblock: pass size instead of end to memblock_set_node()
The second parameter of memblock_set_node() is size instead of end.
Since it iterates from lower address to higher address, finally the node
id is correct. But during the process, some of them are wrong.
Pass size instead of end.
Fixes:
|
||
|
|
6f110a5e4f |
Disable SLUB_TINY for build testing
... and don't error out so hard on missing module descriptions. Before commit |
||
|
|
8fa7292fee |
treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
8c7c1b5506 |
Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton: - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike Rapoport fixes a couple of issues with the just-merged "arch, mm: reduce code duplication in mem_init()" series - The series "MAINTAINERS: add my isub-entries to MM part." from Mike Rapoport does some maintenance on MAINTAINERS - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some cleanup work to the page mapping code - The series "mseal system mappings" from Jeff Xu permits sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode) - Plus the usual shower of singleton patches * tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) mseal sysmap: add arch-support txt mseal sysmap: enable s390 selftest: test system mappings are sealed mseal sysmap: update mseal.rst mseal sysmap: uprobe mapping mseal sysmap: enable arm64 mseal sysmap: enable x86-64 mseal sysmap: generic vdso vvar mapping selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: kernel config and header change mm: pgtable: remove tlb_remove_page_ptdesc() x86: pgtable: convert to use tlb_remove_ptdesc() riscv: pgtable: unconditionally use tlb_remove_ptdesc() mm: pgtable: convert some architectures to use tlb_remove_ptdesc() mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc* mm: pgtable: make generic tlb_remove_table() use struct ptdesc microblaze/mm: put mm_cmdline_setup() in .init.text section mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range MAINTAINERS: mm: add entry for secretmem MAINTAINERS: mm: add entry for numa memblocks and numa emulation ... |
||
|
|
204e9a18f1 |
Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM hotfixes from Andrew Morton: "Five hotfixes. Three are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. All patches are for MM" * tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead() mm/hugetlb: move hugetlb_sysctl_init() to the __init section mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock mm/hugetlb_vmemmap: fix memory loads ordering mm/userfaultfd: fix release hang over concurrent GUP |
||
|
|
2985dae1e5 |
mm/page_alloc: Fix try_alloc_pages
Fix an obvious bug. try_alloc_pages() should set_page_refcounted. [ Not so obvious: it was probably correct at the time it was written but was at some point then rebased on top of v6.14-rc1. And at that point there was a semantic conflict with commit |
||
|
|
3491aa0478 |
Merge tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson: - Relax IGD support code to match display class device rather than specifically requiring a VGA device (Tomita Moeko) - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD levels to take advantage of huge pfnmap support added in v6.12 (Alex Williamson) - Extend virtio vfio-pci variant driver to include migration support for block devices where enabled by the PF (Yishai Hadas) - Virtualize INTx PIN register for devices where the platform does not route legacy PCI interrupts for the device and the interrupt is reported as IRQ_NOTCONNECTED (Alex Williamson) * tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: Handle INTx IRQ_NOTCONNECTED vfio/virtio: Enable support for virtio-block live migration vfio/type1: Use mapping page mask for pfnmaps mm: Provide address mask in struct follow_pfnmap_args vfio/type1: Use consistent types for page counts vfio/type1: Use vfio_batch for vaddr_get_pfns() vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch vfio/type1: Catch zero from pin_user_pages_remote() vfio/pci: match IGD devices in display controller class |
||
|
|
9342bc134a |
mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
We triggered the below BUG: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402 head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff) page_type: f4(hugetlb) page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1) ------------[ cut here ]------------ kernel BUG at ./include/linux/page-flags.h:310! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : const_folio_flags+0x3c/0x58 lr : const_folio_flags+0x3c/0x58 Call trace: const_folio_flags+0x3c/0x58 (P) do_migrate_range+0x164/0x720 offline_pages+0x63c/0x6fc memory_subsys_offline+0x190/0x1f4 device_offline+0xc0/0x13c state_store+0x90/0xd8 dev_attr_store+0x18/0x2c sysfs_kf_write+0x44/0x54 kernfs_fop_write_iter+0x120/0x1cc vfs_write+0x240/0x378 ksys_write+0x70/0x108 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x48/0x10c el0_svc_common.constprop.0+0x40/0xe0 When allocating a hugetlb folio, between the folio is taken from buddy and prep_compound_page() is called, start_isolate_page_range() and do_migrate_range() is called. When do_migrate_range() scans the head page of the hugetlb folio, the compound_head field isn't set, so scans the tail page next. And at this time, the compound_head field of tail page is set, folio_test_large() is called by tail page, thus triggers VM_BUG_ON(). To fix it, get folio refcount before calling folio_test_large(). Link: https://lkml.kernel.org/r/20250324131750.1551884-1-tujinjiang@huawei.com Fixes: |
||
|
|
7790c9c926 |
memblock: don't release high memory to page allocator when HIGHMEM is off
Nathan Chancellor reports the following crash on a MIPS system with CONFIG_HIGHMEM=n: Linux version 6.14.0-rc6-00359-g6faea3422e3b (nathan@ax162) (mips-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Fri Mar 21 08:12:02 MST 2025 earlycon: uart8250 at I/O port 0x3f8 (options '38400n8') printk: legacy bootconsole [uart8250] enabled Config serial console: console=ttyS0,38400n8r CPU0 revision is: 00019300 (MIPS 24Kc) FPU revision is: 00739300 MIPS: machine is mti,malta Software DMA cache coherency enabled Initial ramdisk at: 0x8fad0000 (5360128 bytes) OF: reserved mem: Reserved memory: No reserved-memory node in the DT Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes. Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes Zone ranges: DMA [mem 0x0000000000000000-0x0000000000ffffff] Normal [mem 0x0000000001000000-0x000000001fffffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000000000000-0x000000000fffffff] node 0: [mem 0x0000000090000000-0x000000009fffffff] Initmem setup node 0 [mem 0x0000000000000000-0x000000009fffffff] On node 0, zone Normal: 16384 pages in unavailable ranges random: crng init done percpu: Embedded 3 pages/cpu s18832 r8192 d22128 u49152 Kernel command line: rd_start=0xffffffff8fad0000 rd_size=5360128 console=ttyS0,38400n8r printk: log buffer data + meta data: 32768 + 102400 = 135168 bytes Dentry cache hash table entries: 65536 (order: 4, 262144 bytes, linear) Inode-cache hash table entries: 32768 (order: 3, 131072 bytes, linear) Writing ErrCtl register=00000000 Readback ErrCtl register=00000000 Built 1 zonelists, mobility grouping on. Total pages: 16384 mem auto-init: stack:all(zero), heap alloc:off, heap free:off Unhandled kernel unaligned access[#1]: CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00359-g6faea3422e3b #1 Hardware name: mti,malta $ 0 : 00000000 00000001 81cb0880 00129027 $ 4 : 00000001 0000000a 00000002 00129026 $ 8 : ffffdfff 80101e00 00000002 00000000 $12 : 81c9c224 81c63e68 00000002 00000000 $16 : 805b1e00 00025800 81cb0880 00000002 $20 : 00000000 81c63e64 0000000a 81f10000 $24 : 81c63e64 81c63e60 $28 : 81c60000 81c63de0 00000001 81cc9d20 Hi : 00000000 Lo : 00000000 epc : 814a227c __free_pages_ok+0x144/0x3c0 ra : 81cc9d20 memblock_free_all+0x1d4/0x27c Status: 10000002 KERNEL EXL Cause : 00800410 (ExcCode 04) BadVA : 00129026 PrId : 00019300 (MIPS 24Kc) Modules linked in: Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000) Stack : 81f10000 805a9e00 81c80000 00000000 00000002 814aa240 000003ff 00000400 00000000 81f10000 81c9c224 00003b1f 81c80000 81c63e60 81ca0000 81c63e64 81f10000 0000000a 0000001f 81cc9d20 81f10000 81cc96d8 00000000 81c80000 81c9c224 81c63e60 81c63e64 00000000 81f10000 00024000 00028000 00025c00 90000000 a0000000 00000002 00000017 00000000 00000000 81f10000 81f10000 ... Call Trace: [<814a227c>] __free_pages_ok+0x144/0x3c0 [<81cc9d20>] memblock_free_all+0x1d4/0x27c [<81cc6764>] mm_core_init+0x100/0x138 [<81cb4ba4>] start_kernel+0x4a0/0x6e4 Code: 1080ffd5 02003825 2467ffff <8ce30000> 7c630500 1060ffd4 00000000 8ce30000 7c630180 The crash happens because commit |
||
|
|
2ebc3b68ac |
mm/mm_init: init holes in the end of the memory map for FLATMEM
Patch series "mm: fixes for fallouts from mem_init() cleanup". These are the fixes for fallouts from mem_init() cleanup reported by Nathan Chancellor and kbuild. The details are in the commit messages. This patch (of 2): Kernel test robot reports the following crash on 32-bit system with FLATMEM and DEBUG_VM_PGFLAGS enabled: [ 0.478822][ T0] kernel BUG at include/linux/page-flags.h:536! [ 0.479312][ T0] Oops: invalid opcode: 0000 [#1] PREEMPT SMP [ 0.479768][ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00357-g8268af309d07 #1 [ 0.480470][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 0.481260][ T0] EIP: reserve_bootmem_region (include/linux/page-flags.h:536) [ 0.481683][ T0] Code: 5d c3 01 f1 89 c8 ba e1 38 f4 c3 e8 1e 37 8e fc 0f 0b b8 90 e2 62 c4 e8 e2 05 5e fc 01 f1 89 c8 ba be 85 f7 c3 e8 04 37 8e fc <0f> 0b b8 80 e2 62 c4 e8 c8 05 5e fc 55 89 e5 53 57 56 83 ec 10 89 [ 0.483177][ T0] EAX: 00000000 EBX: c425df50 ECX: 00000000 EDX: 00000000 [ 0.483712][ T0] ESI: 017ffc00 EDI: ffffffff EBP: c425df34 ESP: c425df2c [ 0.484248][ T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046 [ 0.484846][ T0] CR0: 80050033 CR2: 00000000 CR3: 04b48000 CR4: 00000090 [ 0.485376][ T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.485907][ T0] DR6: fffe0ff0 DR7: 00000400 [ 0.486253][ T0] Call Trace: [ 0.486494][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478) [ 0.486822][ T0] ? die (arch/x86/kernel/dumpstack.c:?) [ 0.487099][ T0] ? do_trap (arch/x86/kernel/traps.c:? arch/x86/kernel/traps.c:197) [ 0.487409][ T0] ? do_error_trap (arch/x86/kernel/traps.c:217) [ 0.487752][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536) [ 0.488153][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301) [ 0.488490][ T0] ? handle_invalid_op (arch/x86/kernel/traps.c:254) [ 0.488869][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536) [ 0.489271][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:316) [ 0.489619][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1055) [ 0.489996][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301) [ 0.490332][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536) [ 0.490733][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301) [ 0.491068][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536) [ 0.491470][ T0] memmap_init_reserved_pages (mm/memblock.c:2203) [ 0.491887][ T0] free_low_memory_core_early (mm/memblock.c:?) [ 0.492302][ T0] memblock_free_all (mm/memblock.c:2272 include/linux/atomic/atomic-arch-fallback.h:546 include/linux/atomic/atomic-long.h:123 include/linux/atomic/atomic-instrumented.h:3261 include/linux/mm.h:67 mm/memblock.c:2273) [ 0.492659][ T0] mem_init (arch/x86/mm/init_32.c:735) [ 0.492952][ T0] mm_core_init (mm/mm_init.c:2730) [ 0.493271][ T0] start_kernel (init/main.c:958) [ 0.493604][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79) [ 0.493969][ T0] startup_32_smp (arch/x86/kernel/head_32.S:292) The crash happens because after commit |
||
|
|
bd145bdd26 |
mm/page_alloc: replace flag check with PageHWPoison() in check_new_page_bad()
This patch replaces the direct check for the __PG_HWPOISON flag with the PageHWPoison() macro, improving code readability and maintaining consistency with other parts of the memory management code. Link: https://lkml.kernel.org/r/20250320063346.489030-1-ye.liu@linux.dev Signed-off-by: Ye Liu <liuye@kylinos.cn> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
7f29070f4c |
mm/damon/core: simplify control flow in damon_register_ops()
The function logic is not complex, so using goto is unnecessary. Replace it with a straightforward if-else to simplify control flow and improve readability. Link: https://lkml.kernel.org/r/Z9vxcPCw8tDsjKw1@OneApple Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |