Commit Graph

9386 Commits

Author SHA1 Message Date
Wei Yang
19fc795e9d maple_tree: simplify split calculation
commit 4f6a6bed0b upstream.

Patch series "simplify split calculation", v3.


This patch (of 3):

The current calculation for splitting nodes tries to enforce a minimum
span on the leaf nodes.  This code is complex and never worked correctly
to begin with, due to the min value being passed as 0 for all leaves.

The calculation should just split the data as equally as possible
between the new nodes.  Note that b_end will be one more than the data,
so the left side is still favoured in the calculation.

The current code may also lead to a deficient node by not leaving enough
data for the right side of the split. This issue is also addressed with
the split calculation change.

[Liam.Howlett@Oracle.com: rephrase the change log]
Link: https://lkml.kernel.org/r/20241113031616.10530-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20241113031616.10530-2-richard.weiyang@gmail.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 10:05:42 +01:00
Steven Rostedt
d0bf6bfc9c atomic64: Use arch_spin_locks instead of raw_spin_locks
commit 6c8ad3ab45 upstream.

raw_spin_locks can be traced by lockdep or tracing itself. Atomic64
operations can be used in the tracing infrastructure. When an architecture
does not have true atomic64 operations it can use the generic version that
disables interrupts and uses spin_locks.

The tracing ring buffer code uses atomic64 operations for the time
keeping. But because some architectures use the default operations, the
locking inside the atomic operations can cause an infinite recursion.

As atomic64 implementation is architecture specific, it should not be
using raw_spin_locks() but instead arch_spin_locks as that is the purpose
of arch_spin_locks. To be used in architecture specific implementations of
generic infrastructure like atomic64 operations.

Note, by switching from raw_spin_locks to arch_spin_locks, the locks taken
to emulate the atomic64 operations will not have lockdep, mmio, or any
kind of checks done on them. They will not even disable preemption,
although the code will disable interrupts preventing the tasks that hold
the locks from being preempted. As the locks held are done so for very
short periods of time, and the logic is only done to emulate atomic64, not
having them be instrumented should not be an issue.

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/20250122144311.64392baf@gandalf.local.home
Fixes: c84897c0ff ("ring-buffer: Remove 32bit timestamp logic")
Closes: https://lore.kernel.org/all/86fb4f86-a0e4-45a2-a2df-3154acc4f086@gaisler.com/
Reported-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 10:05:14 +01:00
Carlos Llamas
149dd2acff lockdep: Fix upper limit for LOCKDEP_*_BITS configs
[ Upstream commit e638072e61 ]

Lockdep has a set of configs used to determine the size of the static
arrays that it uses. However, the upper limit that was initially setup
for these configs is too high (30 bit shift). This equates to several
GiB of static memory for individual symbols. Using such high values
leads to linker errors:

  $ make defconfig
  $ ./scripts/config -e PROVE_LOCKING --set-val LOCKDEP_BITS 30
  $ make olddefconfig all
  [...]
  ld: kernel image bigger than KERNEL_IMAGE_SIZE
  ld: section .bss VMA wraps around address space

Adjust the upper limits to the maximum values that avoid these issues.
The need for anything more, likely points to a problem elsewhere. Note
that LOCKDEP_CHAINS_BITS was intentionally left out as its upper limit
had a different symptom and has already been fixed [1].

Reported-by: J. R. Okajima <hooanon05g@gmail.com>
Closes: https://lore.kernel.org/all/30795.1620913191@jrobl/ [1]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20241024183631.643450-2-cmllamas@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17 10:04:44 +01:00
Herbert Xu
f939008a52 rhashtable: Fix rhashtable_try_insert test
[ Upstream commit 9d4f8e54ce ]

The test on whether rhashtable_insert_one did an insertion relies
on the value returned by rhashtable_lookup_one.  Unfortunately that
value is overwritten after rhashtable_insert_one returns.  Fix this
by moving the test before data gets overwritten.

Simplify the test as only data == NULL matters.

Finally move atomic_inc back within the lock as otherwise it may
be reordered with the atomic_dec on the removal side, potentially
leading to an underflow.

Reported-by: Michael Kelley <mhklinux@outlook.com>
Fixes: e1d3422c95 ("rhashtable: Fix potential deadlock by moving schedule_work outside lock")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08 09:57:35 +01:00
Breno Leitao
eb2e58484b rhashtable: Fix potential deadlock by moving schedule_work outside lock
[ Upstream commit e1d3422c95 ]

Move the hash table growth check and work scheduling outside the
rht lock to prevent a possible circular locking dependency.

The original implementation could trigger a lockdep warning due to
a potential deadlock scenario involving nested locks between
rhashtable bucket, rq lock, and dsq lock. By relocating the
growth check and work scheduling after releasing the rth lock, we break
this potential deadlock chain.

This change expands the flexibility of rhashtable by removing
restrictive locking that previously limited its use in scheduler
and workqueue contexts.

Import to say that this calls rht_grow_above_75(), which reads from
struct rhashtable without holding the lock, if this is a problem, we can
move the check to the lock, and schedule the workqueue after the lock.

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>

Modified so that atomic_inc is also moved outside of the bucket
lock along with the growth above 75% check.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08 09:57:29 +01:00
Yang Erkun
90daa2f835 maple_tree: reload mas before the second call for mas_empty_area
commit 1fd8bc7cd8 upstream.

Change the LONG_MAX in simple_offset_add to 1024, and do latter:

[root@fedora ~]# mkdir /tmp/dir
[root@fedora ~]# for i in {1..1024}; do touch /tmp/dir/$i; done
touch: cannot touch '/tmp/dir/1024': Device or resource busy
[root@fedora ~]# rm /tmp/dir/123
[root@fedora ~]# touch /tmp/dir/1024
[root@fedora ~]# rm /tmp/dir/100
[root@fedora ~]# touch /tmp/dir/1025
touch: cannot touch '/tmp/dir/1025': Device or resource busy

After we delete file 100, actually this is a empty entry, but the latter
create failed unexpected.

mas_alloc_cyclic has two chance to find empty entry.  First find the entry
with range range_lo and range_hi, if no empty entry exist, and range_lo >
min, retry find with range min and range_hi.  However, the first call
mas_empty_area may mark mas as EBUSY, and the second call for
mas_empty_area will return false directly.  Fix this by reload mas before
second call for mas_empty_area.

[Liam.Howlett@Oracle.com: fix mas_alloc_cyclic() second search]
  Link: https://lore.kernel.org/all/20241216060600.287B4C4CED0@smtp.kernel.org/
  Link: https://lkml.kernel.org/r/20241216190113.1226145-2-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20241214093005.72284-1-yangerkun@huaweicloud.com
Fixes: 9b6713cc75 ("maple_tree: Add mtree_alloc_cyclic()")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com> says:
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-09 13:33:50 +01:00
Thomas Gleixner
7b7a9175c6 timekeeping: Remove CONFIG_DEBUG_TIMEKEEPING
commit d44d26987b upstream.

Since 135225a363 timekeeping_cycles_to_ns() handles large offsets which
would lead to 64bit multiplication overflows correctly. It's also protected
against negative motion of the clocksource unconditionally, which was
exclusive to x86 before.

timekeeping_advance() handles large offsets already correctly.

That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases
is very close to zero. Remove all of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14 20:04:16 +01:00
Kees Cook
9fcb73b960 lib: stackinit: hide never-taken branch from compiler
commit 5c3793604f upstream.

The never-taken branch leads to an invalid bounds condition, which is by
design. To avoid the unwanted warning from the compiler, hide the
variable from the optimizer.

../lib/stackinit_kunit.c: In function 'do_nothing_u16_zero':
../lib/stackinit_kunit.c:51:49: error: array subscript 1 is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds=]
   51 | #define DO_NOTHING_RETURN_SCALAR(ptr)           *(ptr)
      |                                                 ^~~~~~
../lib/stackinit_kunit.c:219:24: note: in expansion of macro 'DO_NOTHING_RETURN_SCALAR'
  219 |                 return DO_NOTHING_RETURN_ ## which(ptr + 1);    \
      |                        ^~~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/20241117113813.work.735-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14 20:03:32 +01:00
Marco Elver
9bfeeeff2c stackdepot: fix stack_depot_save_flags() in NMI context
commit 031e04bdc8 upstream.

Per documentation, stack_depot_save_flags() was meant to be usable from
NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset.  However, it still
would try to take the pool_lock in an attempt to save a stack trace in the
current pool (if space is available).

This could result in deadlock if an NMI is handled while pool_lock is
already held.  To avoid deadlock, only try to take the lock in NMI context
and give up if unsuccessful.

The documentation is fixed to clearly convey this.

Link: https://lkml.kernel.org/r/Z0CcyfbPqmxJ9uJH@elver.google.com
Link: https://lkml.kernel.org/r/20241122154051.3914732-1-elver@google.com
Fixes: 4434a56ec2 ("stackdepot: make fast paths lock-less again")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14 20:03:32 +01:00
Wei Yang
29aa92f514 maple_tree: refine mas_store_root() on storing NULL
commit 0ea120b278 upstream.

Currently, when storing NULL on mas_store_root(), the behavior could be
improved.

Storing NULLs over the entire tree may result in a node being used to
store a single range.  Further stores of NULL may cause the node and
tree to be corrupt and cause incorrect behaviour.  Fixing the store to
the root null fixes the issue by ensuring that a range of 0 - ULONG_MAX
results in an empty tree.

Users of the tree may experience incorrect values returned if the tree
was expanded to store values, then overwritten by all NULLS, then
continued to store NULLs over the empty area.

For example possible cases are:

  * store NULL at any range result a new node
  * store NULL at range [m, n] where m > 0 to a single entry tree result
    a new node with range [m, n] set to NULL
  * store NULL at range [m, n] where m > 0 to an empty tree result
    consecutive NULL slot
  * it allows for multiple NULL entries by expanding root
    to store NULLs to an empty tree

This patch tries to improve in:

  * memory efficient by setting to empty tree instead of using a node
  * remove the possibility of consecutive NULL slot which will prohibit
    extended null in later operation

Link: https://lkml.kernel.org/r/20241031231627.14316-5-richard.weiyang@gmail.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09 10:41:01 +01:00
Jinjie Ruan
3213b92754 kunit: string-stream: Fix a UAF bug in kunit_init_suite()
commit 39e21403c9 upstream.

In kunit_debugfs_create_suite(), if alloc_string_stream() fails in the
kunit_suite_for_each_test_case() loop, the "suite->log = stream"
has assigned before, and the error path only free the suite->log's stream
memory but not set it to NULL, so the later string_stream_clear() of
suite->log in kunit_init_suite() will cause below UAF bug.

Set stream pointer to NULL after free to fix it.

	Unable to handle kernel paging request at virtual address 006440150000030d
	Mem abort info:
	  ESR = 0x0000000096000004
	  EC = 0x25: DABT (current EL), IL = 32 bits
	  SET = 0, FnV = 0
	  EA = 0, S1PTW = 0
	  FSC = 0x04: level 0 translation fault
	Data abort info:
	  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
	  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
	  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
	[006440150000030d] address between user and kernel address ranges
	Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
	Dumping ftrace buffer:
	   (ftrace buffer empty)
	Modules linked in: iio_test_gts industrialio_gts_helper cfg80211 rfkill ipv6 [last unloaded: iio_test_gts]
	CPU: 5 UID: 0 PID: 6253 Comm: modprobe Tainted: G    B   W        N 6.12.0-rc4+ #458
	Tainted: [B]=BAD_PAGE, [W]=WARN, [N]=TEST
	Hardware name: linux,dummy-virt (DT)
	pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
	pc : string_stream_clear+0x54/0x1ac
	lr : string_stream_clear+0x1a8/0x1ac
	sp : ffffffc080b47410
	x29: ffffffc080b47410 x28: 006440550000030d x27: ffffff80c96b5e98
	x26: ffffff80c96b5e80 x25: ffffffe461b3f6c0 x24: 0000000000000003
	x23: ffffff80c96b5e88 x22: 1ffffff019cdf4fc x21: dfffffc000000000
	x20: ffffff80ce6fa7e0 x19: 032202a80000186d x18: 0000000000001840
	x17: 0000000000000000 x16: 0000000000000000 x15: ffffffe45c355cb4
	x14: ffffffe45c35589c x13: ffffffe45c03da78 x12: ffffffb810168e75
	x11: 1ffffff810168e74 x10: ffffffb810168e74 x9 : dfffffc000000000
	x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
	x5 : ffffffc080b473a0 x4 : 0000000000000000 x3 : 0000000000000000
	x2 : 0000000000000001 x1 : ffffffe462fbf620 x0 : dfffffc000000000
	Call trace:
	 string_stream_clear+0x54/0x1ac
	 __kunit_test_suites_init+0x108/0x1d8
	 kunit_exec_run_tests+0xb8/0x100
	 kunit_module_notify+0x400/0x55c
	 notifier_call_chain+0xfc/0x3b4
	 blocking_notifier_call_chain+0x68/0x9c
	 do_init_module+0x24c/0x5c8
	 load_module+0x4acc/0x4e90
	 init_module_from_file+0xd4/0x128
	 idempotent_init_module+0x2d4/0x57c
	 __arm64_sys_finit_module+0xac/0x100
	 invoke_syscall+0x6c/0x258
	 el0_svc_common.constprop.0+0x160/0x22c
	 do_el0_svc+0x44/0x5c
	 el0_svc+0x48/0xb8
	 el0t_64_sync_handler+0x13c/0x158
	 el0t_64_sync+0x190/0x194
	Code: f9400753 d2dff800 f2fbffe0 d343fe7c (38e06b80)
	---[ end trace 0000000000000000 ]---
	Kernel panic - not syncing: Oops: Fatal exception

Link: https://lore.kernel.org/r/20241112080314.407966-1-ruanjinjie@huawei.com
Cc: stable@vger.kernel.org
Fixes: a3fdf78478 ("kunit: string-stream: Decouple string_stream from kunit")
Suggested-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09 10:41:01 +01:00
Zichen Xie
5d28fac593 kunit: Fix potential null dereference in kunit_device_driver_test()
commit 435c20eed5 upstream.

kunit_kzalloc() may return a NULL pointer, dereferencing it without
NULL check may lead to NULL dereference.
Add a NULL check for test_state.

Link: https://lore.kernel.org/r/20241115054335.21673-1-zichenxie0106@gmail.com
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Signed-off-by: Zichen Xie <zichenxie0106@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09 10:41:00 +01:00
Bartosz Golaszewski
9517bc76ff lib: string_helpers: silence snprintf() output truncation warning
commit a508ef4b1d upstream.

The output of ".%03u" with the unsigned int in range [0, 4294966295] may
get truncated if the target buffer is not 12 bytes. This can't really
happen here as the 'remainder' variable cannot exceed 999 but the
compiler doesn't know it. To make it happy just increase the buffer to
where the warning goes away.

Fixes: 3c9f3681d0 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Kees Cook <kees@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20241101205453.9353-1-brgl@bgdev.pl
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05 14:03:01 +01:00
Jan Hendrik Farr
e46d4caa77 Compiler Attributes: disable __counted_by for clang < 19.1.3
commit f06e108a3d upstream.

This patch disables __counted_by for clang versions < 19.1.3 because
of the two issues listed below. It does this by introducing
CONFIG_CC_HAS_COUNTED_BY.

1. clang < 19.1.2 has a bug that can lead to __bdos returning 0:
https://github.com/llvm/llvm-project/pull/110497

2. clang < 19.1.3 has a bug that can lead to __bdos being off by 4:
https://github.com/llvm/llvm-project/pull/112636

Fixes: c8248faf3c ("Compiler Attributes: counted_by: Adjust name and identifier expansion")
Cc: stable@vger.kernel.org # 6.6.x: 16c31dd7fd: Compiler Attributes: counted_by: bump min gcc version
Cc: stable@vger.kernel.org # 6.6.x: 2993eb7a8d: Compiler Attributes: counted_by: fixup clang URL
Cc: stable@vger.kernel.org # 6.6.x: 231dc3f0c9: lkdtm/bugs: Improve warning message for compilers without counted_by support
Cc: stable@vger.kernel.org # 6.6.x
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20240913164630.GA4091534@thelio-3990X/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409260949.a1254989-oliver.sang@intel.com
Link: https://lore.kernel.org/all/Zw8iawAF5W2uzGuh@archlinux/T/#m204c09f63c076586a02d194b87dffc7e81b8de7b
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jan Hendrik Farr <kernel@jfarr.cc>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20241029140036.577804-2-kernel@jfarr.cc
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05 14:02:45 +01:00
Sabyrzhan Tasbolatov
e691826a3d kasan: move checks to do_strncpy_from_user
[ Upstream commit ae193dd793 ]

Patch series "kasan: migrate the last module test to kunit", v4.

copy_user_test() is the last KUnit-incompatible test with
CONFIG_KASAN_MODULE_TEST requirement, which we are going to migrate to
KUnit framework and delete the former test and Kconfig as well.

In this patch series:

	- [1/3] move kasan_check_write() and check_object_size() to
		do_strncpy_from_user() to cover with KASAN checks with
		multiple conditions	in strncpy_from_user().

	- [2/3] migrated copy_user_test() to KUnit, where we can also test
		strncpy_from_user() due to [1/4].

		KUnits have been tested on:
		- x86_64 with CONFIG_KASAN_GENERIC. Passed
		- arm64 with CONFIG_KASAN_SW_TAGS. 1 fail. See [1]
		- arm64 with CONFIG_KASAN_HW_TAGS. 1 fail. See [1]
		[1] https://lore.kernel.org/linux-mm/CACzwLxj21h7nCcS2-KA_q7ybe+5pxH0uCDwu64q_9pPsydneWQ@mail.gmail.com/

	- [3/3] delete CONFIG_KASAN_MODULE_TEST and documentation occurrences.

This patch (of 3):

Since in the commit 2865baf54077("x86: support user address masking
instead of non-speculative conditional") do_strncpy_from_user() is called
from multiple places, we should sanitize the kernel *dst memory and size
which were done in strncpy_from_user() previously.

Link: https://lkml.kernel.org/r/20241016131802.3115788-1-snovitoll@gmail.com
Link: https://lkml.kernel.org/r/20241016131802.3115788-2-snovitoll@gmail.com
Fixes: 2865baf540 ("x86: support user address masking instead of non-speculative conditional")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hu Haowen <2023002089@link.tyut.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05 14:02:12 +01:00
Masami Hiramatsu (Google)
cb6fcef8b4 objpool: fix to make percpu slot allocation more robust
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
will use kmalloc if user specifies that combination.  Here the reason why
combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
not support all GFP flag, especially GFP_ATOMIC.  So we should check if
gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first.  This
ensures caller can sleep.  And for the robustness, even if vmalloc fails,
it should retry with kmalloc to allocate it.

Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com
Fixes: aff1871bfc ("objpool: fix choosing allocation for percpu slots")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/
Cc: Leo Yan <leo.yan@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wu <wuqiang.matt@bytedance.com>
Cc: Mikel Rychliski <mikel@mikelr.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07 14:14:58 -08:00
Linus Torvalds
3dfffd506e Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "The important one is a change to the way in which we handle protection
  keys around signal delivery so that we're more closely aligned with
  the x86 behaviour, however there is also a revert of the previous fix
  to disable software tag-based KASAN with GCC, since a workaround
  materialised shortly afterwards.

  I'd love to say we're done with 6.12, but we're aware of some
  longstanding fpsimd register corruption issues that we're almost at
  the bottom of resolving.

  Summary:

   - Fix handling of POR_EL0 during signal delivery so that pushing the
     signal context doesn't fail based on the pkey configuration of the
     interrupted context and align our user-visible behaviour with that
     of x86.

   - Fix a bogus pointer being passed to the CPU hotplug code from the
     Arm SDEI driver.

   - Re-enable software tag-based KASAN with GCC by using an alternative
     implementation of '__no_sanitize_address'"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
  firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
  Revert "kasan: Disable Software Tag-Based KASAN with GCC"
  kasan: Fix Software Tag-Based KASAN with GCC
2024-11-01 07:54:11 -10:00
Linus Torvalds
d56239a82e Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
 "VFS:

   - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set

   - Add a get_tree_bdev_flags() helper that allows to modify e.g.,
     whether errors are logged into the filesystem context during
     superblock creation. This is used by erofs to fix a userspace
     regression where an error is currently logged when its used on a
     regular file which is an new allowed mode in erofs.

  netfs:

   - Fix the sysfs debug path in the documentation.

   - Fix iov_iter_get_pages*() for folio queues by skipping the page
     extracation if we're at the end of a folio.

  afs:

   - Fix moving subdirectories to different parent directory.

  autofs:

   - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
     validate_dev_ioctl(). The actual ioctl number, not the ioctl
     command needs to be checked for autofs"

* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
  autofs: fix thinko in validate_dev_ioctl()
  iov_iter: Fix iov_iter_get_pages*() for folio_queue
  afs: Fix missing subdir edit when renamed between parent dirs
  doc: correcting the debug path for cachefiles
  erofs: use get_tree_bdev_flags() to avoid misleading messages
  fs/super.c: introduce get_tree_bdev_flags()
2024-11-01 07:37:10 -10:00
Linus Torvalds
7fbaacafbc Merge tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:

 - Fix for a slub_kunit test warning with MEM_ALLOC_PROFILING_DEBUG (Pei
   Xiao)

 - Fix for a MTE-based KASAN BUG in krealloc() (Qun-Wei Lin)

* tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm: krealloc: Fix MTE false alarm in __do_krealloc
  slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof
2024-10-29 16:24:02 -10:00
Hugh Dickins
c749d9b7eb iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
generic/077 on x86_32 CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y with highmem,
on huge=always tmpfs, issues a warning and then hangs (interruptibly):

WARNING: CPU: 5 PID: 3517 at mm/highmem.c:622 kunmap_local_indexed+0x62/0xc9
CPU: 5 UID: 0 PID: 3517 Comm: cp Not tainted 6.12.0-rc4 #2
...
copy_page_from_iter_atomic+0xa6/0x5ec
generic_perform_write+0xf6/0x1b4
shmem_file_write_iter+0x54/0x67

Fix copy_page_from_iter_atomic() by limiting it in that case
(include/linux/skbuff.h skb_frag_must_loop() does similar).

But going forward, perhaps CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is too
surprising, has outlived its usefulness, and should just be removed?

Fixes: 908a1ad894 ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/dd5f0c89-186e-18e1-4f43-19a60f5a9774@google.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28 13:39:35 +01:00
Linus Torvalds
c2cd8e4592 Merge tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - objpool: Fix choosing allocation for percpu slots

   Fixes to allocate objpool's percpu slots correctly according to the
   GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose
   the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag
   is set, because GFP_ATOMIC is a combined flag.

 - tracing/probes: Fix MAX_TRACE_ARGS limit handling

   If more than MAX_TRACE_ARGS are passed for creating a probe event,
   the entries over MAX_TRACE_ARG in trace_arg array are not
   initialized. Thus if the kernel accesses those entries, it crashes.
   This rejects creating event if the number of arguments is over
   MAX_TRACE_ARGS.

 - tracing: Consider the NUL character when validating the event length

   A strlen() is used when parsing the event name, and the original code
   does not consider the terminal null byte. Thus it can pass the name
   one byte longer than the buffer. This fixes to check it correctly.

* tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Consider the NULL character when validating the event length
  tracing/probes: Fix MAX_TRACE_ARGS limit handling
  objpool: fix choosing allocation for percpu slots
2024-10-24 13:51:58 -07:00
David Howells
e65a0dc1ca iov_iter: Fix iov_iter_get_pages*() for folio_queue
p9_get_mapped_pages() uses iov_iter_get_pages_alloc2() to extract pages
from an iterator when performing a zero-copy request and under some
circumstances, this crashes with odd page errors[1], for example, I see:

    page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbcf0
    flags: 0x2000000000000000(zone=1)
    ...
    page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:1444!

This is because, unlike in iov_iter_extract_folioq_pages(), the
iter_folioq_get_pages() helper function doesn't skip the current folio
when iov_offset points to the end of it, but rather extracts the next
page beyond the end of the folio and adds it to the list.  Reading will
then clobber the contents of this page, leading to system corruption,
and if the page is not in use, put_page() may try to clean up the unused
page.

This can be worked around by copying the iterator before each
extraction[2] and using iov_iter_advance() on the original as the
advance function steps over the page we're at the end of.

Fix this by skipping the page extraction if we're at the end of the
folio.

This was reproduced in the ktest environment[3] by forcing 9p to use the
fscache caching mode and then reading a file through 9p.

Fixes: db0aa2e956 ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Reported-by: Antony Antony <antony@phenome.org>
Closes: https://lore.kernel.org/r/ZxFQw4OI9rrc7UYc@Antony2201.local/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/ZxFEi1Tod43pD6JC@moon.secunet.de/ [1]
Link: https://lore.kernel.org/r/2299159.1729543103@warthog.procyon.org.uk/ [2]
Link: https://github.com/koverstreet/ktest.git [3]
Tested-by: Antony Antony <antony.antony@secunet.com>
Link: https://lore.kernel.org/r/3327438.1729678025@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24 13:50:27 +02:00
Marco Elver
237ab03e30 Revert "kasan: Disable Software Tag-Based KASAN with GCC"
This reverts commit 7aed6a2c51.

Now that __no_sanitize_address attribute is fixed for KASAN_SW_TAGS with
GCC, allow re-enabling KASAN_SW_TAGS with GCC.

Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20241021120013.3209481-2-elver@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-10-23 16:04:30 +01:00
Pei Xiao
2b059d0d1e slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof
'modprobe slub_kunit' will have a warning as shown below. The root cause
is that __kmalloc_cache_noprof was directly used, which resulted in no
alloc_tag being allocated. This caused current->alloc_tag to be null,
leading to a warning in alloc_tag_add_check.

Let's add an alloc_hook layer to __kmalloc_cache_noprof specifically
within lib/slub_kunit.c, which is the only user of this internal slub
function outside kmalloc implementation itself.

[58162.947016] WARNING: CPU: 2 PID: 6210 at
./include/linux/alloc_tag.h:125 alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.957721] Call trace:
[58162.957919]  alloc_tagging_slab_alloc_hook+0x268/0x27c
[58162.958286]  __kmalloc_cache_noprof+0x14c/0x344
[58162.958615]  test_kmalloc_redzone_access+0x50/0x10c [slub_kunit]
[58162.959045]  kunit_try_run_case+0x74/0x184 [kunit]
[58162.959401]  kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[58162.959841]  kthread+0x10c/0x118
[58162.960093]  ret_from_fork+0x10/0x20
[58162.960363] ---[ end trace 0000000000000000 ]---

Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: a0a44d9175 ("mm, slab: don't wrap internal functions with alloc_hooks()")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-23 09:50:58 +02:00
Viktor Malik
aff1871bfc objpool: fix choosing allocation for percpu slots
objpool intends to use vmalloc for default (non-atomic) allocations of
percpu slots and objects. However, the condition checking if GFP flags
set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits
(__GFP_HIGH|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will
be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the
___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL
is specified, i.e. in all current usages of objpool.

This may lead to unexpected OOM errors since kmalloc cannot allocate
large amounts of memory.

For instance, objpool is used by fprobe rethook which in turn is used by
BPF kretprobe.multi and kprobe.session probe types. Trying to attach
these to all kernel functions with libbpf using

    SEC("kprobe.session/*")
    int kprobe(struct pt_regs *ctx)
    {
        [...]
    }

fails on objpool slot allocation with ENOMEM.

Fix the condition to truly use vmalloc by default.

Link: https://lore.kernel.org/all/20240826060718.267261-1-vmalik@redhat.com/

Fixes: b4edb8d2d4 ("lib: objpool added: ring-array based lockless MPMC")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-10-22 14:22:42 +09:00
Linus Torvalds
a777c32ca4 Merge tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a regression in mpi that broke RSA"

* tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue
2024-10-21 09:59:43 -07:00
Linus Torvalds
4e6bd4a33a Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Fix several issues with the 'rustc-option' macro. It includes a
     refactor from Masahiro of three '{cc,rust}-*' macros, which is not
     a fix but avoids repeating the same commands (which would be
     several lines in the case of 'rustc-option').

   - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
     includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
     a fix but is needed for the actual fix.

  And a trivial grammar fix"

* tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
  cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
  kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
  kbuild: fix issues with rustc-option
  kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
  lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
2024-10-19 08:32:47 -07:00
Linus Torvalds
3d5ad2d4ec Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:

 - Fix BPF verifier to not affect subreg_def marks in its range
   propagation (Eduard Zingerman)

 - Fix a truncation bug in the BPF verifier's handling of
   coerce_reg_to_size_sx (Dimitar Kanaliev)

 - Fix the BPF verifier's delta propagation between linked registers
   under 32-bit addition (Daniel Borkmann)

 - Fix a NULL pointer dereference in BPF devmap due to missing rxq
   information (Florian Kauer)

 - Fix a memory leak in bpf_core_apply (Jiri Olsa)

 - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
   arrays of nested structs (Hou Tao)

 - Fix build ID fetching where memory areas backing the file were
   created with memfd_secret (Andrii Nakryiko)

 - Fix BPF task iterator tid filtering which was incorrectly using pid
   instead of tid (Jordan Rome)

 - Several fixes for BPF sockmap and BPF sockhash redirection in
   combination with vsocks (Michal Luczaj)

 - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)

 - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
   of an infinite BPF tailcall (Pu Lehui)

 - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
   be resolved (Thomas Weißschuh)

 - Fix a bug in kfunc BTF caching for modules where the wrong BTF object
   was returned (Toke Høiland-Jørgensen)

 - Fix a BPF selftest compilation error in cgroup-related tests with
   musl libc (Tony Ambardar)

 - Several fixes to BPF link info dumps to fill missing fields (Tyrone
   Wu)

 - Add BPF selftests for kfuncs from multiple modules, checking that the
   correct kfuncs are called (Simon Sundberg)

 - Ensure that internal and user-facing bpf_redirect flags don't overlap
   (Toke Høiland-Jørgensen)

 - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
   Riel)

 - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
   under RT (Wander Lairson Costa)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
  lib/buildid: Handle memfd_secret() files in build_id_parse()
  selftests/bpf: Add test case for delta propagation
  bpf: Fix print_reg_state's constant scalar dump
  bpf: Fix incorrect delta propagation between linked registers
  bpf: Properly test iter/task tid filtering
  bpf: Fix iter/task tid filtering
  riscv, bpf: Make BPF_CMPXCHG fully ordered
  bpf, vsock: Drop static vsock_bpf_prot initialization
  vsock: Update msg_count on read_skb()
  vsock: Update rx_bytes on read_skb()
  bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
  selftests/bpf: Add asserts for netfilter link info
  bpf: Fix link info netfilter flags to populate defrag flag
  selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
  selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
  bpf: Fix truncation bug in coerce_reg_to_size_sx()
  selftests/bpf: Assert link info uprobe_multi count & path_size if unset
  bpf: Fix unpopulated path_size when uprobe_multi fields unset
  selftests/bpf: Fix cross-compiling urandom_read
  selftests/bpf: Add test for kfunc module order
  ...
2024-10-18 16:27:14 -07:00
Linus Torvalds
4d939780b7 Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "28 hotfixes. 13 are cc:stable. 23 are MM.

  It is the usual shower of unrelated singletons - please see the
  individual changelogs for details"

* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
  maple_tree: add regression test for spanning store bug
  maple_tree: correct tree corruption on spanning store
  mm/mglru: only clear kswapd_failures if reclaimable
  mm/swapfile: skip HugeTLB pages for unuse_vma
  selftests: mm: fix the incorrect usage() info of khugepaged
  MAINTAINERS: add Jann as memory mapping/VMA reviewer
  mm: swap: prevent possible data-race in __try_to_reclaim_swap
  mm: khugepaged: fix the incorrect statistics when collapsing large file folios
  MAINTAINERS: kasan, kcov: add bugzilla links
  mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
  mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
  Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
  Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
  maple_tree: check for MA_STATE_BULK on setting wr_rebalance
  mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
  mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
  mm: remove unused stub for can_swapin_thp()
  mailmap: add an entry for Andy Chiu
  MAINTAINERS: add memory mapping/VMA co-maintainers
  fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
  ...
2024-10-17 16:33:06 -07:00
Andrii Nakryiko
5ac9b4e935 lib/buildid: Handle memfd_secret() files in build_id_parse()
>From memfd_secret(2) manpage:

  The memory areas backing the file created with memfd_secret(2) are
  visible only to the processes that have access to the file descriptor.
  The memory region is removed from the kernel page tables and only the
  page tables of the processes holding the file descriptor map the
  corresponding physical memory. (Thus, the pages in the region can't be
  accessed by the kernel itself, so that, for example, pointers to the
  region can't be passed to system calls.)

We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].

  [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/

Fixes: de3ec364c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai@intel.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
2024-10-17 21:30:32 +02:00
Linus Torvalds
6efbea77b3 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:

 - Disable software tag-based KASAN when compiling with GCC, as
   functions are incorrectly instrumented leading to a crash early
   during boot

 - Fix pkey configuration for kernel threads when POE is enabled

 - Fix invalid memory accesses in uprobes when targetting load-literal
   instructions

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kasan: Disable Software Tag-Based KASAN with GCC
  Documentation/protection-keys: add AArch64 to documentation
  arm64: set POR_EL0 for kernel threads
  arm64: probes: Fix uprobes for big-endian kernels
  arm64: probes: Fix simulate_ldr*_literal()
  arm64: probes: Remove broken LDR (literal) uprobe support
2024-10-17 09:51:03 -07:00
Lorenzo Stoakes
bea07fd631 maple_tree: correct tree corruption on spanning store
Patch series "maple_tree: correct tree corruption on spanning store", v3.

There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.

This bug seems far more likely to happen since commit f8d112a4e6
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.

We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.

The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.

This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.

The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.

The fix patch commit message goes into great detail as to how this occurs.

This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.

Bert has kindly tested the fix and confirmed it resolved his issues.  Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.


This patch (of 2):

There has been a subtle bug present in the maple tree implementation from
its inception.

This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.

A range may always ultimately span two leaf nodes.  In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.

This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().

In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.

When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.

When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e.  r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.

We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.

We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().

After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e. 
r_mas.offset <= r_mas.end).

Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.

Consider the following tree:

Height
  0                             Root Node
                                 /      \
                 pivot = 0xffff /        \ pivot = ULONG_MAX
                               /          \
  1                       A [-----]       ...
                             /   \
             pivot = 0x4fff /     \ pivot = 0xffff
                           /       \
  2 (LEAVES)          B [-----]  [-----] C
                                      ^--- Last pivot 0xffff.

Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):

1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
   determines that this is a spanning store across nodes B and C. The mas
   state is set such that the current node from which we traverse further
   is node A.

2. In mas_wr_spanning_store() we try to find elements to the right of pivot
   0xffff by searching for an index of 0x10000:

    - mas_wr_walk_index() invokes mas_wr_walk_descend() and
      mas_wr_node_walk() in turn.

        - mas_wr_node_walk() loops over entries in node A until EITHER it
          finds an entry whose pivot equals or exceeds 0x10000 OR it
          reaches the final entry.

        - Since no entry has a pivot equal to or exceeding 0x10000, pivot
          0xffff is selected, leading to node C.

    - mas_wr_walk_traverse() resets the mas state to traverse node C. We
      loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
      in turn once again.

         - Again, we reach the last entry in node C, which has a pivot of
           0xffff.

3. We then copy the elements to the left of 0x4000 in node B to the big
   node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
   too.

4. We determine whether we have any entries to copy from the right of the
   end of the range via - and with r_mas set up at the entry at pivot
   0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
   pivot 0xffff.

5. BUG! The maple tree is corrupted with a duplicate entry.

This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.

spanning store across two leaf nodes with a range that ends at that shared
pivot.

A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.

Instead, this patch detects if the right hand node is populated, i.e.  has
anything we need to copy.

We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.

The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().

The work performed in commit f8d112a4e6 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.

The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.

After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.

Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-17 08:35:10 -07:00
Sidhartha Kumar
a6e0ceb7bf maple_tree: check for MA_STATE_BULK on setting wr_rebalance
It is possible for a bulk operation (MA_STATE_BULK is set) to enter the
new_end < mt_min_slots[type] case and set wr_rebalance as a store type. 
This is incorrect as bulk stores do not rebalance per write, but rather
after the all of the writes are done through the mas_bulk_rebalance()
path.  Therefore, add a check to make sure MA_STATE_BULK is not set before
we return wr_rebalance as the store type.

Also add a test to make sure wr_rebalance is never the store type when
doing bulk operations via mas_expected_entries()

This is a hotfix for this rc however it has no userspace effects as there
are no users of the bulk insertion mode.

Link: https://lkml.kernel.org/r/20241011214451.7286-1-sidhartha.kumar@oracle.com
Fixes: 5d659bbb52 ("maple_tree: introduce mas_wr_store_type()")
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-17 00:28:09 -07:00
Florian Westphal
dc783ba4b9 lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Ben Greear reports following splat:
 ------------[ cut here ]------------
 net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
 WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
 Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
  codetag_unload_module+0x19b/0x2a0
  ? codetag_load_module+0x80/0x80

nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.

Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.

Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat                # will WARN.

[akpm@linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-17 00:28:07 -07:00
Qianqiang Liu
cd843399d7 crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue
The "err" variable may be returned without an initialized value.

Fixes: 8e3a67f2de ("crypto: lib/mpi - Add error checks to extension")
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-10-16 13:38:16 +08:00
Will Deacon
7aed6a2c51 kasan: Disable Software Tag-Based KASAN with GCC
Syzbot reports a KASAN failure early during boot on arm64 when building
with GCC 12.2.0 and using the Software Tag-Based KASAN mode:

  | BUG: KASAN: invalid-access in smp_build_mpidr_hash arch/arm64/kernel/setup.c:133 [inline]
  | BUG: KASAN: invalid-access in setup_arch+0x984/0xd60 arch/arm64/kernel/setup.c:356
  | Write of size 4 at addr 03ff800086867e00 by task swapper/0
  | Pointer tag: [03], memory tag: [fe]

Initial triage indicates that the report is a false positive and a
thorough investigation of the crash by Mark Rutland revealed the root
cause to be a bug in GCC:

  > When GCC is passed `-fsanitize=hwaddress` or
  > `-fsanitize=kernel-hwaddress` it ignores
  > `__attribute__((no_sanitize_address))`, and instruments functions
  > we require are not instrumented.
  >
  > [...]
  >
  > All versions [of GCC] I tried were broken, from 11.3.0 to 14.2.0
  > inclusive.
  >
  > I think we have to disable KASAN_SW_TAGS with GCC until this is
  > fixed

Disable Software Tag-Based KASAN when building with GCC by making
CC_HAS_KASAN_SW_TAGS depend on !CC_IS_GCC.

Cc: Andrey Konovalov <andreyknvl@gmail.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com
Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241014161100.18034-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-10-15 11:38:10 +01:00
Timo Grautstueck
ab8851431b lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
Just a grammar fix in lib/Kconfig.debug, under the config option
RUST_BUILD_ASSERT_ALLOW.

Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://github.com/Rust-for-Linux/linux/issues/1006
Fixes: ecaa6ddff2 ("rust: add `build_error` crate")
Signed-off-by: Timo Grautstueck <timo.grautstueck@web.de>
Link: https://lore.kernel.org/r/20241006140244.5509-1-timo.grautstueck@web.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-07 19:13:03 +02:00
Linus Torvalds
f6785e0ccf Merge tag 'slab-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
 "Fixes for issues introduced in this merge window: kobject memory leak,
  unsupressed warning and possible lockup in new slub_kunit tests,
  misleading code in kvfree_rcu_queue_batch()"

* tag 'slab-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  mm, slab: suppress warnings in test_leak_destroy kunit test
  rcu/kvfree: Refactor kvfree_rcu_queue_batch()
  mm, slab: fix use of SLAB_SUPPORTS_SYSFS in kmem_cache_release()
2024-10-04 12:05:39 -07:00
Linus Torvalds
20c2474fa5 Merge tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "vfs:

   - Ensure that iter_folioq_get_pages() advances to the next slot
     otherwise it will end up using the same folio with an out-of-bound
     offset.

  iomap:

   - Dont unshare delalloc extents which can't be reflinked, and thus
     can't be shared.

   - Constrain the file range passed to iomap_file_unshare() directly in
     iomap instead of requiring the callers to do it.

  netfs:

   - Use folioq_count instead of folioq_nr_slot to prevent an
     unitialized value warning in netfs_clear_buffer().

   - Fix missing wakeup after issuing writes by scheduling the write
     collector only if all the subrequest queues are empty and thus no
     writes are pending.

   - Fix two minor documentation bugs"

* tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: constrain the file range passed to iomap_file_unshare
  iomap: don't bother unsharing delalloc extents
  netfs: Fix missing wakeup after issuing writes
  Documentation: add missing folio_queue entry
  folio_queue: fix documentation
  netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer
  iov_iter: fix advancing slot in iter_folioq_get_pages()
2024-10-03 09:22:50 -07:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Vlastimil Babka
cac39b0706 slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
showed the runner for built-in kunit tests kunit_run_all_tests() is
called before setting system_state to SYSTEM_RUNNING and calling
rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
was unable to reproduce the problem myself, skipping the test when the
slub_kunit module is built-in should avoid the issue.

An alternative fix that was moving the call to kunit_run_all_tests() a
bit later in the boot was tried, but has broken tests with functions
marked as __init due to free_initmem() already being done.

Fixes: 4e1c44b3db ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: rcu@vger.kernel.org
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-02 16:28:46 +02:00
Vlastimil Babka
3f1dd33f99 mm, slab: suppress warnings in test_leak_destroy kunit test
The test_leak_destroy kunit test intends to test the detection of stray
objects in kmem_cache_destroy(), which normally produces a warning. The
other slab kunit tests suppress the warnings in the kunit test context,
so suppress warnings and related printk output in this test as well.
Automated test running environments then don't need to learn to filter
the warnings.

Also rename the test's kmem_cache, the name was wrongly copy-pasted from
test_kfree_rcu.

Fixes: 4e1c44b3db ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408251723.42f3d902-oliver.sang@intel.com
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Closes: https://lore.kernel.org/all/CAB=+i9RHHbfSkmUuLshXGY_ifEZg9vCZi3fqr99+kmmnpDus7Q@mail.gmail.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-02 16:28:46 +02:00
Omar Sandoval
0d24852bd7 iov_iter: fix advancing slot in iter_folioq_get_pages()
iter_folioq_get_pages() decides to advance to the next folioq slot when
it has reached the end of the current folio. However, it is checking
offset, which is the beginning of the current part, instead of
iov_offset, which is adjusted to the end of the current part, so it
doesn't advance the slot when it's supposed to. As a result, on the next
iteration, we'll use the same folio with an out-of-bounds offset and
return an unrelated page.

This manifested as various crashes and other failures in 9pfs in drgn's
VM testing setup and BPF CI.

Fixes: db0aa2e956 ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Link: https://lore.kernel.org/linux-fsdevel/20240923183432.1876750-1-chantr4@gmail.com/
Tested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/r/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 11:49:57 +02:00
Linus Torvalds
9c44575c78 Merge tag 'bitmap-for-6.12' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:

 - switch all bitmamp APIs from inline to __always_inline (Brian Norris)

   The __always_inline series improves on code generation, and now with
   the latest compiler versions is required to avoid compilation
   warnings. It spent enough in my backlog, and I'm thankful to Brian
   Norris for taking over and moving it forward.

 - introduce GENMASK_U128() macro (Anshuman Khandual)

   GENMASK_U128() is a prerequisite needed for arm64 development

* tag 'bitmap-for-6.12' of https://github.com/norov/linux:
  lib/test_bits.c: Add tests for GENMASK_U128()
  uapi: Define GENMASK_U128
  nodemask: Switch from inline to __always_inline
  cpumask: Switch from inline to __always_inline
  bitmap: Switch from inline to __always_inline
  find: Switch from inline to __always_inline
2024-09-27 12:10:45 -07:00
Linus Torvalds
eee280841e Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull  misc fixes from Andrew Morton:
 "19 hotfixes.  13 are cc:stable.

  There's a focus on fixes for the memfd_pin_folios() work which was
  added into 6.11. Apart from that, the usual shower of singleton fixes"

* tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  ocfs2: fix uninit-value in ocfs2_get_block()
  zram: don't free statically defined names
  memory tiers: use default_dram_perf_ref_source in log message
  Revert "list: test: fix tests for list_cut_position()"
  kselftests: mm: fix wrong __NR_userfaultfd value
  compiler.h: specify correct attribute for .rodata..c_jump_table
  mm/damon/Kconfig: update DAMON doc URL
  mm: kfence: fix elapsed time for allocated/freed track
  ocfs2: fix deadlock in ocfs2_get_system_file_inode
  ocfs2: reserve space for inline xattr before attaching reflink tree
  mm: migrate: annotate data-race in migrate_folio_unmap()
  mm/hugetlb: simplify refs in memfd_alloc_folio
  mm/gup: fix memfd_pin_folios alloc race panic
  mm/gup: fix memfd_pin_folios hugetlb page allocation
  mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
  mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
  mm/filemap: fix filemap_get_folios_contig THP panic
  mm: make SPLIT_PTE_PTLOCKS depend on SMP
  tools: fix shared radix-tree build
2024-09-27 10:27:22 -07:00
Guenter Roeck
c509f67df3 Revert "list: test: fix tests for list_cut_position()"
This reverts commit e620799c41.

The commit introduces unit test failures.

     Expected cur == &entries[i], but
         cur == 0000037fffadfd80
         &entries[i] == 0000037fffadfd60
     # list_test_list_cut_position: pass:0 fail:1 skip:0 total:1
     not ok 21 list_test_list_cut_position
     # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
     Expected cur == &entries[i], but
         cur == 0000037fffa9fd70
         &entries[i] == 0000037fffa9fd60
     # list_test_list_cut_before: EXPECTATION FAILED at lib/list-test.c:444
     Expected cur == &entries[i], but
         cur == 0000037fffa9fd80
         &entries[i] == 0000037fffa9fd70

Revert it.

Link: https://lkml.kernel.org/r/20240922150507.553814-1-linux@roeck-us.net
Fixes: e620799c41 ("list: test: fix tests for list_cut_position()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: I Hsin Cheng <richard120310@gmail.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-26 14:01:44 -07:00
Linus Torvalds
11a299a793 Merge tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:

 - Improve blk-integrity segment counting and merging (Keith)

 - NVMe pull request via Keith:
      - Multipath fixes (Hannes)
      - Sysfs attribute list NULL terminate fix (Shin'ichiro)
      - Remove problematic read-back (Keith)

 - Fix for a regression with the IO scheduler switching freezing from
   6.11 (Damien)

 - Use a raw spinlock for sbitmap, as it may get called from preempt
   disabled context (Ming)

 - Cleanup for bd_claiming waiting, using var_waitqueue() rather than
   the bit waitqueues, as that more accurately describes that it does
   (Neil)

 - Various cleanups (Kanchan, Qiu-ji, David)

* tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux:
  nvme: remove CC register read-back during enabling
  nvme: null terminate nvme_tls_attrs
  nvme-multipath: avoid hang on inaccessible namespaces
  nvme-multipath: system fails to create generic nvme device
  lib/sbitmap: define swap_lock as raw_spinlock_t
  block: Remove unused blk_limits_io_{min,opt}
  drbd: Fix atomicity violation in drbd_uuid_set_bm()
  block: Fix elv_iosched_local_module handling of "none" scheduler
  block: remove bogus union
  block: change wait on bd_claiming to use a var_waitqueue
  blk-integrity: improved sg segment mapping
  block: unexport blk_rq_count_integrity_sg
  nvme-rdma: use request to get integrity segments
  scsi: use request to get integrity segments
  block: provide a request helper for user integrity segments
  blk-integrity: consider entire bio list for merging
  blk-integrity: properly account for segments
  blk-mq: set the nr_integrity_segments from bio
  blk-mq: unconditional nr_integrity_segments
2024-09-25 14:56:40 -07:00
Linus Torvalds
68e5c7d4ce Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Support cross-compiling linux-headers Debian package and kernel-devel
   RPM package

 - Add support for the linux-debug Pacman package

 - Improve module rebuilding speed by factoring out the common code to
   scripts/module-common.c

 - Separate device tree build rules into scripts/Makefile.dtbs

 - Add a new script to generate modules.builtin.ranges, which is useful
   for tracing tools to find symbols in built-in modules

 - Refactor Kconfig and misc tools

 - Update Kbuild and Kconfig documentation

* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
  kbuild: doc: replace "gcc" in external module description
  kbuild: doc: describe the -C option precisely for external module builds
  kbuild: doc: remove the description about shipped files
  kbuild: doc: drop section numbering, use references in modules.rst
  kbuild: doc: throw out the local table of contents in modules.rst
  kbuild: doc: remove outdated description of the limitation on -I usage
  kbuild: doc: remove description about grepping CONFIG options
  kbuild: doc: update the description about Kbuild/Makefile split
  kbuild: remove unnecessary export of RUST_LIB_SRC
  kbuild: remove append operation on cmd_ld_ko_o
  kconfig: cache expression values
  kconfig: use hash table to reuse expressions
  kconfig: refactor expr_eliminate_dups()
  kconfig: add comments to expression transformations
  kconfig: change some expr_*() functions to bool
  scripts: move hash function from scripts/kconfig/ to scripts/include/
  kallsyms: change overflow variable to bool type
  kallsyms: squash output_address()
  kbuild: add install target for modules.builtin.ranges
  scripts: add verifier script for builtin module range data
  ...
2024-09-24 13:02:06 -07:00
Linus Torvalds
9ab27b0186 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
 "The core clk framework is left largely untouched this time around
  except for support for the newly ratified DT property
  'assigned-clock-rates-u64'.

  I'm much more excited about the support for loading DT overlays from
  KUnit tests so that we can test how the clk framework parses DT nodes
  during clk registration. The clk framework has some places that are
  highly DeviceTree dependent so this charts the path to extend the
  KUnit tests to cover even more framework code in the future. I've got
  some more tests on the list that use the DT overlay support, but they
  uncovered issues with clk unregistration that I'm still working on
  fixing.

  Outside the core, the clk driver update pile is dominated by Qualcomm
  and Renesas SoCs, making it fairly usual. Looking closer, there are
  fixes for things all over the place, like adding missing clk
  frequencies or moving defines for the number of clks out of DT binding
  headers into the drivers. There are even conversions of DT bindings to
  YAML and migration away from strings to describe clk topology. Overall
  it doesn't look unusual so I expect the new drivers to be where we'll
  have fixes in the coming weeks.

  Core:
   - KUnit tests for clk registration and fixed rate basic clk type
   - A couple more devm helpers, one consumer and one provider
   - Support for assigned-clock-rates-u64

  New Drivers:
   - Camera, display and GPU clocks on Qualcomm SM4450
   - Camera clocks on Qualcomm SM8150
   - Rockchip rk3576 clks
   - Microchip SAM9X7 clks
   - Renesas RZ/V2H(P) (R9A09G057) clks

  Updates:
   - Mark a bunch of struct freq_tbl const to reduce .data usage
   - Add Qualcomm MSM8226 A7PLL and Regera PLL support
   - Fix the Qualcomm Lucid 5LPE PLL configuration sequence to not reuse
     Trion, as they do differ
   - A number of fixes to the Qualcomm SM8550 display clock driver
   - Fold Qualcomm SM8650 display clock driver into SM8550 one
   - Add missing clocks and GDSCs needed for audio on Qualcomm MSM8998
   - Add missing USB MP resets, GPLL9, and QUPv3 DFS to Qualcomm SC8180X
   - Fix sdcc clk frequency tables on Qualcomm SC8180X
   - Drop the Qualcomm SM8150 gcc_cpuss_ahb_clk_src
   - Mark Qualcomm PCIe GDSCs as RET_ON on sm8250 and sm8540 to avoid
     them turning off during suspend
   - Use the HW_CTRL mechanism on Qualcomm SM8550 video clock controller
     GDSCs
   - Get rid of CLK_NR_CLKS defines in Rockchip DT binding headers
   - Some fixes for Rockchip rk3228 and rk3588
   - Exynos850: Add clock for Thermal Management Unit
   - Exynos7885: Fix duplicated ID in the header, add missing TOP PLLs
     and add clocks for USB block in the FSYS clock controller
   - ExynosAutov9: Add DPUM clock controller
   - ExynosAutov920: Add new (first) clock controllers: TOP and PERIC0
     (and a bit more complete bindings)
   - Use clk_hw pointer instead of fw_name for acm_aud_clk[0-1]_sel
     clocks on i.MX8Q as parents in ACM provider
   - Add i.MX95 NETCMIX support to the block control provider
   - Fix parents for ENETx_REF_SEL clocks on i.MX6UL
   - Add USB clocks, resets and power domains on Renesas RZ/G3S
   - Add Generic Timer (GTM), I2C Bus Interface (RIIC), SD/MMC Host
     Interface (SDHI) and Watchdog Timer (WDT) clocks and resets on
     Renesas RZ/V2H
   - Add PCIe, PWM, and CAN-FD clocks on Renesas R-Car V4M
   - Add LCD controller clocks and resets on Renesas RZ/G2UL
   - Add DMA clocks and resets on Renesas RZ/G3S
   - Add fractional multiplication PLL support on Renesas R-Car Gen4
   - Document support for the Renesas RZ/G2M v3.0 (r8a774a3) SoC
   - Support for the Microchip SAM9X7 SoC as follows:
   - Updates for the Microchip PLL drivers
   - DT binding documentation updates (for the new clock driver and for
     the slow clock controller that SAM9X7 is using)
   - A fix for the Microchip SAMA7G5 clock driver to avoid allocating
     more memory than necessary
   - Constify some Amlogic structs
   - Add SM1 eARC clocks for Amlogic
   - Introduce a symbol namespace for Amlogic clock specific symbols
   - Add reset controller support to audiomix block control on i.MX
   - Add CLK_SET_RATE_PARENT flag to all audiomix clocks and to i.MX7D
     lcdif_pixel_src clock
   - Fix parent clocks for earc_phy and audpll on i.MX8MP
   - Fix default parents for enet[12]_ref_sel on i.MX6UL
   - Add ops in composite 8M and 93 that allow no-op on disable
   - Add check for PCC present bit on composite 7ULP register
   - Fix fractional part for fracn-gppll on prepare in i.MX
   - Fix clock tree update for TF-A managed clocks on i.MX8M
   - Drop CLK_SET_PARENT_GATE for DRAM mux on i.MX7D
   - Add the SAI7 IPG clock for i.MX8MN
   - Mark the 'nand_usdhc_bus' clock as non-critical on i.MX8MM
   - Add LVDS bypass clocks on i.MX8QXP
   - Add muxes for MIPI and PHY ref clocks on i.MX
   - Reorder dc0_bypass0_clk, lcd_pxl and dc1_disp clocks on i.MX8QXP
   - Add 1039.5MHz and 800MHz rates to fracn-gppll table on i.MX
   - Add CLK_SET_RATE_PARENT for media_disp pixel clocks on i.MX8QXP
   - Add some module descriptions to the i.MX generic and the i.MXRT1050
     driver
   - Fix return value for bypass for composite i.MX7ULP
   - Move Mediatek clk bindings to clock/
   - Convert some more clk bindings to dt schema"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (180 commits)
  clk: Switch back to struct platform_driver::remove()
  dt-bindings: clock, reset: fix top-comment indentation rk3576 headers
  clk: rockchip: remove unused mclk_pdm0_p/pdm0_p definitions
  clk: provide devm_clk_get_optional_enabled_with_rate()
  clk: fixed-rate: add devm_clk_hw_register_fixed_rate_parent_data()
  clk: imx6ul: fix clock parent for IMX6UL_CLK_ENETx_REF_SEL
  clk: renesas: r9a09g057: Add clock and reset entries for GTM/RIIC/SDHI/WDT
  clk: renesas: rzv2h: Add support for dynamic switching divider clocks
  clk: renesas: r9a08g045: Add clocks, resets and power domains for USB
  clk: rockchip: fix error for unknown clocks
  clk: rockchip: rk3588: drop unused code
  clk: rockchip: Add clock controller for the RK3576
  clk: rockchip: Add new pll type pll_rk3588_ddr
  dt-bindings: clock, reset: Add support for rk3576
  dt-bindings: clock: rockchip,rk3588-cru: drop unneeded assigned-clocks
  clk: rockchip: rk3588: Fix 32k clock name for pmu_24m_32k_100m_src_p
  clk: imx95: enable the clock of NETCMIX block control
  dt-bindings: clock: add RMII clock selection
  dt-bindings: clock: add i.MX95 NETCMIX block control
  clk: imx: imx8: Use clk_hw pointer for self registered clock in clk_parent_data
  ...
2024-09-23 15:01:48 -07:00
Linus Torvalds
b3f391fddf Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet:

 - rcu_pending, btree key cache rework: this solves lock contenting in
   the key cache, eliminating the biggest source of the srcu lock hold
   time warnings, and drastically improving performance on some metadata
   heavy workloads - on multithreaded creates we're now 3-4x faster than
   xfs.

 - We're now using an rhashtable instead of the system inode hash table;
   this is another significant performance improvement on multithreaded
   metadata workloads, eliminating more lock contention.

 - for_each_btree_key_in_subvolume_upto(): new helper for iterating over
   keys within a specific subvolume, eliminating a lot of open coded
   "subvolume_get_snapshot()" and also fixing another source of srcu
   lock time warnings, by running each loop iteration in its own
   transaction (as the existing for_each_btree_key() does).

 - More work on btree_trans locking asserts; we now assert that we don't
   hold btree node locks when trans->locked is false, which is important
   because we don't use lockdep for tracking individual btree node
   locks.

 - Some cleanups and improvements in the bset.c btree node lookup code,
   from Alan.

 - Rework of btree node pinning, which we use in backpointers fsck. The
   old hacky implementation, where the shrinker just skipped over nodes
   in the pinned range, was causing OOMs; instead we now use another
   shrinker with a much higher seeks number for pinned nodes.

 - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
   where rebalance would sometimes fall back to allocating from the full
   filesystem, which is not what we want when it's trying to move data
   to a specific target.

 - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
   allocations.

 - Idmap mounts are now supported (Hongbo Li)

 - Rename whiteouts are now supported (Hongbo Li)

 - Erasure coding can now handle devices being marked as failed, or
   forcibly removed. We still need the evacuate path for erasure coding,
   but it's getting very close to ready for people to start using.

* tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits)
  bcachefs: return err ptr instead of null in read sb clean
  bcachefs: Remove duplicated include in backpointers.c
  bcachefs: Don't drop devices with stripe pointers
  bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
  bcachefs: bch_fs.rw_devs_change_count
  bcachefs: bch2_dev_remove_stripes()
  bcachefs: bch2_trigger_ptr() calculates sectors even when no device
  bcachefs: improve error messages in bch2_ec_read_extent()
  bcachefs: improve error message on too few devices for ec
  bcachefs: improve bch2_new_stripe_to_text()
  bcachefs: ec_stripe_head.nr_created
  bcachefs: bch_stripe.disk_label
  bcachefs: stripe_to_mem()
  bcachefs: EIO errcode cleanup
  bcachefs: Rework btree node pinning
  bcachefs: split up btree cache counters for live, freeable
  bcachefs: btree cache counters should be size_t
  bcachefs: Don't count "skipped access bit" as touched in btree cache scan
  bcachefs: Failed devices no longer require mounting in degraded mode
  bcachefs: bch2_dev_rcu_noerror()
  ...
2024-09-23 10:05:41 -07:00