Commit Graph

1325981 Commits

Author SHA1 Message Date
Bartosz Golaszewski
bd1be0d58b clocksource/drivers/timer-integrator-ap: Add missing Kconfig dependency on OF
[ Upstream commit 2246464821 ]

This driver accesses the of_aliases global variable declared in
linux/of.h and defined in drivers/base/of.c. It requires OF support or
will cause a link failure. Add the missing Kconfig dependency.

Closes: https://lore.kernel.org/oe-kbuild-all/202601152233.og6LdeUo-lkp@intel.com/
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20260116111723.10585-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:57 -05:00
Niklas Söderlund
88c7679218 clocksource/drivers/sh_tmu: Always leave device running after probe
[ Upstream commit b1278972b0 ]

The TMU device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the TMU is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.

This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the TMU registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the TMU driver. However in the context of
holding a raw spinlock the TMU driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.

This mix of spinlock contexts trips a lockdep warning.

    =============================
    [ BUG: Invalid wait context ]
    6.18.0-arm64-renesas-09926-gee959e7c5e34 #1 Not tainted
    -----------------------------
    swapper/0/0 is trying to lock:
    ffff000008c9e180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
    other info that might help us debug this:
    context-{5:5}
    1 lock held by swapper/0/0:
    ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
     #0: ffff8000817ec298
    ccree e6601000.crypto: ARM ccree device initialized
     (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
    stack backtrace:
    CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-arm64-renesas-09926-gee959e7c5e34 #1 PREEMPT
    Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
    Call trace:
     show_stack+0x14/0x1c (C)
     dump_stack_lvl+0x6c/0x90
     dump_stack+0x14/0x1c
     __lock_acquire+0x904/0x1584
     lock_acquire+0x220/0x34c
     _raw_spin_lock_irqsave+0x58/0x80
     __pm_runtime_resume+0x38/0x88
     sh_tmu_clock_event_set_oneshot+0x84/0xd4
     clockevents_switch_state+0xfc/0x13c
     tick_broadcast_set_event+0x30/0xa4
     __tick_broadcast_oneshot_control+0x1e0/0x3a8
     tick_broadcast_oneshot_control+0x30/0x40
     cpuidle_enter_state+0x40c/0x680
     cpuidle_enter+0x30/0x40
     do_idle+0x1f4/0x280
     cpu_startup_entry+0x34/0x40
     kernel_init+0x0/0x130
     do_one_initcall+0x0/0x230
     __primary_switched+0x88/0x90

For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251202221341.1856773-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:57 -05:00
Alexei Starovoitov
dea9989a3f bpf: Recognize special arithmetic shift in the verifier
[ Upstream commit bffacdb80b ]

cilium bpf_wiregard.bpf.c when compiled with -O1 fails to load
with the following verifier log:

192: (79) r2 = *(u64 *)(r10 -304)     ; R2=pkt(r=40) R10=fp0 fp-304=pkt(r=40)
...
227: (85) call bpf_skb_store_bytes#9          ; R0=scalar()
228: (bc) w2 = w0                     ; R0=scalar() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
229: (c4) w2 s>>= 31                  ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
230: (54) w2 &= -134                  ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))
...
232: (66) if w2 s> 0xffffffff goto pc+125     ; R2=scalar(smin=umin=umin32=0x80000000,smax=umax=umax32=0xffffff7a,smax32=-134,var_off=(0x80000000; 0x7fffff7a))
...
238: (79) r4 = *(u64 *)(r10 -304)     ; R4=scalar() R10=fp0 fp-304=scalar()
239: (56) if w2 != 0xffffff78 goto pc+210     ; R2=0xffffff78 // -136
...
258: (71) r1 = *(u8 *)(r4 +0)
R4 invalid mem access 'scalar'

The error might confuse most bpf authors, since fp-304 slot had 'pkt'
pointer at insn 192 and became 'scalar' at 238. That happened because
bpf_skb_store_bytes() clears all packet pointers including those in
the stack. On the first glance it might look like a bug in the source
code, since ctx->data pointer should have been reloaded after the call
to bpf_skb_store_bytes().

The relevant part of cilium source code looks like this:

// bpf/lib/nodeport.h
int dsr_set_ipip6()
{
	if (ctx_adjust_hroom(...))
		return DROP_INVALID; // -134
	if (ctx_store_bytes(...))
		return DROP_WRITE_ERROR; // -141
	return 0;
}

bool dsr_fail_needs_reply(int code)
{
	if (code == DROP_FRAG_NEEDED) // -136
		return true;
	return false;
}

tail_nodeport_ipv6_dsr()
{
	ret = dsr_set_ipip6(...);
	if (!IS_ERR(ret)) {
		...
	} else {
		if (dsr_fail_needs_reply(ret))
			return dsr_reply_icmp6(...);
	}
}

The code doesn't have arithmetic shift by 31 and it reloads ctx->data
every time it needs to access it. So it's not a bug in the source code.

The reason is DAGCombiner::foldSelectCCToShiftAnd() LLVM transformation:

  // If this is a select where the false operand is zero and the compare is a
  // check of the sign bit, see if we can perform the "gzip trick":
  // select_cc setlt X, 0, A, 0 -> and (sra X, size(X)-1), A
  // select_cc setgt X, 0, A, 0 -> and (not (sra X, size(X)-1)), A

The conditional branch in dsr_set_ipip6() and its return values
are optimized into BPF_ARSH plus BPF_AND:

227: (85) call bpf_skb_store_bytes#9
228: (bc) w2 = w0
229: (c4) w2 s>>= 31   ; R2=scalar(smin=0,smax=umax=0xffffffff,smin32=-1,smax32=0,var_off=(0x0; 0xffffffff))
230: (54) w2 &= -134   ; R2=scalar(smin=0,smax=umax=umax32=0xffffff7a,smax32=0x7fffff7a,var_off=(0x0; 0xffffff7a))

after insn 230 the register w2 can only be 0 or -134,
but the verifier approximates it, since there is no way to
represent two scalars in bpf_reg_state.
After fallthough at insn 232 the w2 can only be -134,
hence the branch at insn
239: (56) if w2 != -136 goto pc+210
should be always taken, and trapping insn 258 should never execute.
LLVM generated correct code, but the verifier follows impossible
path and rejects valid program. To fix this issue recognize this
special LLVM optimization and fork the verifier state.
So after insn 229: (c4) w2 s>>= 31
the verifier has two states to explore:
one with w2 = 0 and another with w2 = 0xffffffff
which makes the verifier accept bpf_wiregard.c

A similar pattern exists were OR operation is used in place of the AND
operation, the verifier detects that pattern as well by forking the
state before the OR operation with a scalar in range [-1,0].

Note there are 20+ such patterns in bpf_wiregard.o compiled
with -O1 and -O2, but they're rarely seen in other production
bpf programs, so push_stack() approach is not a concern.

Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260112201424.816836-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:57 -05:00
Sami Tolvanen
4e3e57dbf4 bpf: crypto: Use the correct destructor kfunc type
[ Upstream commit b40a5d724f ]

With CONFIG_CFI enabled, the kernel strictly enforces that indirect
function calls use a function pointer type that matches the target
function. I ran into the following type mismatch when running BPF
self-tests:

  CFI failure at bpf_obj_free_fields+0x190/0x238 (target:
    bpf_crypto_ctx_release+0x0/0x94; expected type: 0xa488ebfc)
  Internal error: Oops - CFI: 00000000f2008228 [#1]  SMP
  ...

As bpf_crypto_ctx_release() is also used in BPF programs and using
a void pointer as the argument would make the verifier unhappy, add
a simple stub function with the correct type and register it as the
destructor kfunc instead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20260110082548.113748-7-samitolvanen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:57 -05:00
Martin Schiller
caa44eb450 perf/x86/cstate: Add Airmont NP
[ Upstream commit 3006911f28 ]

From the perspective of Intel cstate residency counters, the Airmont NP
(aka Lightning Mountain) is identical to the Airmont.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251124074846.9653-4-ms@dev.tdt.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:57 -05:00
Martin Schiller
af1e3d63c8 perf/x86/msr: Add Airmont NP
[ Upstream commit 63dbadcafc ]

Like Airmont, the Airmont NP (aka Intel / MaxLinear Lightning Mountain)
supports SMI_COUNT MSR.

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://patch.msgid.link/20251124074846.9653-2-ms@dev.tdt.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Cupertino Miranda
17440ef493 bpf: verifier improvement in 32bit shift sign extension pattern
[ Upstream commit d18dec4b89 ]

This patch improves the verifier to correctly compute bounds for
sign extension compiler pattern composed of left shift by 32bits
followed by a sign right shift by 32bits.  Pattern in the verifier was
limitted to positive value bounds and would reset bound computation for
negative values.  New code allows both positive and negative values for
sign extension without compromising bound computation and verifier to
pass.

This change is required by GCC which generate such pattern, and was
detected in the context of systemd, as described in the following GCC
bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119731

Three new tests were added in verifier_subreg.c.

Signed-off-by: Cupertino Miranda  <cupertino.miranda@oracle.com>
Signed-off-by: Andrew Pinski  <andrew.pinski@oss.qualcomm.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: David Faust  <david.faust@oracle.com>
Cc: Jose Marchesi  <jose.marchesi@oracle.com>
Cc: Elena Zannoni  <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20251202180220.11128-2-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Sam James
5ab9a95868 sparc: don't reference obsolete termio struct for TC* constants
[ Upstream commit be0bccffcd ]

Similar in nature to commit ab10727660 ("powerpc: Fix struct termio related ioctl macros").

glibc-2.42 drops the legacy termio struct, but the ioctls.h header still
defines some TC* constants in terms of termio (via sizeof). Hardcode the
values instead.

This fixes building Python for example, which falls over like:
  ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'

Link: https://bugs.gentoo.org/961769
Link: https://bugs.gentoo.org/962600
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Andreas Larsson
8e6b0e2adf sparc: Synchronize user stack on fork and clone
[ Upstream commit e38eba3b77 ]

Flush all uncommitted user windows before calling the generic syscall
handlers for clone, fork, and vfork.

Prior to entering the arch common handlers sparc_{clone|fork|vfork}, the
arch-specific syscall wrappers for these syscalls will attempt to flush
all windows (including user windows).

In the window overflow trap handlers on both SPARC{32|64},
if the window can't be stored (i.e due to MMU related faults) the routine
backups the user window and increments a thread counter (wsaved).

By adding a synchronization point after the flush attempt, when fault
handling is enabled, any uncommitted user windows will be flushed.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
Closes: https://lore.kernel.org/sparclinux/fe5cc47167430007560501aabb28ba154985b661.camel@physik.fu-berlin.de/
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20260119144753.27945-2-ludwig.rydberg@gaisler.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Luke Wang
bfc2420bac block: decouple secure erase size limit from discard size limit
[ Upstream commit ee81212f74 ]

Secure erase should use max_secure_erase_sectors instead of being limited
by max_discard_sectors. Separate the handling of REQ_OP_SECURE_ERASE from
REQ_OP_DISCARD to allow each operation to use its own size limit.

Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Yu Kuai
1f0221070f blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
[ Upstream commit 9d20fd6ce1 ]

In blk_mq_update_nr_hw_queues(), debugfs_mutex is not held while
creating debugfs entries for hctxs. Hence add debugfs_mutex there,
it's safe because queue is not frozen.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Jason Andryuk
7fb2e1fa6e xenbus: Use .freeze/.thaw to handle xenbus devices
[ Upstream commit e08dd1ee49 ]

The goal is to fix s2idle and S3 for Xen PV devices.  A domain resuming
from s3 or s2idle disconnects its PV devices during resume.  The
backends are not expecting this and do not reconnect.

b3e96c0c75 ("xen: use freeze/restore/thaw PM events for suspend/
resume/chkpt") changed xen_suspend()/do_suspend() from
PMSG_SUSPEND/PMSG_RESUME to PMSG_FREEZE/PMSG_THAW/PMSG_RESTORE, but the
suspend/resume callbacks remained.

.freeze/restore are used with hiberation where Linux restarts in a new
place in the future.  .suspend/resume are useful for runtime power
management for the duration of a boot.

The current behavior of the callbacks works for an xl save/restore or
live migration where the domain is restored/migrated to a new location
and connecting to a not-already-connected backend.

Change xenbus_pm_ops to use .freeze/thaw/restore and drop the
.suspend/resume hook.  This matches the use in drivers/xen/manage.c for
save/restore and live migration.  With .suspend/resume empty, PV devices
are left connected during s2idle and s3, so PV devices are not changed
and work after resume.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20251119224731.61497-2-jason.andryuk@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Ata İlhan Köktürk
6ac7d51ae0 ACPI: battery: fix incorrect charging status when current is zero
[ Upstream commit bb1256e0dd ]

On some laptops, such as the Huawei Matebook series, the embedded
controller continues to report "Charging" status even when the
charge threshold is reached and no current is being drawn.

This incorrect reporting prevents the system from switching to battery
power profiles, leading to significantly higher power (e.g., 18W instead
of 7W during browsing) and missed remaining battery time estimation.

Validate the "Charging" state by checking if rate_now is zero. If the
hardware reports charging but the current is zero, report "Not Charging"
to user space.

Signed-off-by: Ata İlhan Köktürk <atailhan2006@gmail.com>
[ rjw: Whitespace fix, braces added to an inner if (), new comment rewrite ]
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20260129144856.43058-1-atailhan2006@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:56 -05:00
Jakob Riemenschneider
7aa95b6f14 ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display)
[ Upstream commit 229ecbaac6 ]

Windows 11, version 22H2 introduced a new function index (Function 9) to
the Microsoft LPS0 _DSM, titled "Turn On Display Notification".

According to Microsoft documentation, this function signals to the system
firmware that the OS intends to turn on the display when exiting Modern
Standby. This allows the firmware to release Power Limits (PLx) earlier.

Crucially, this patch fixes a functional issue observed on the Lenovo Yoga
Slim 7i Aura (15ILL9), where system fans and keyboard backlights fail to
resume after suspend. Investigation linked shows the EC on this device
turns off these components during sleep but requires the Function 9
notification to wake them up again.

This patch defines the new function index (ACPI_MS_TURN_ON_DISPLAY) and
invokes it in acpi_s2idle_restore_early_lps0(). The execution order is
updated to match the logic of an "intent" signal:

 1. LPS0 Exit (Function 6)
 2. Turn On Display Intent (Function 9)
 3. Modern Standby Exit (Function 8)
 4. Screen On (Function 4)

Invoking Function 9 before the Modern Standby Exit ensures the firmware
has time to restore power rails and functionality (like fans) before the
software fully exits the sleep state.

Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications#turn-on-display-notification-function-9
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220505
Suggested-by: Antheas Kapenekakis <antheas@antheas.dev>
Signed-off-by: Jakob Riemenschneider <riemenschneiderjakob@gmail.com>
Link: https://patch.msgid.link/20260127200121.1292216-1-riemenschneiderjakob@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Sebastian Andrzej Siewior
e3f21fdea4 perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD
[ Upstream commit ab26d9c855 ]

Passing IRQF_ONESHOT ensures that the interrupt source is masked until
the secondary (threaded) handler is done. If only a primary handler is
used then the flag makes no sense because the interrupt can not fire
(again) while its handler is running.
The flag also disallows force-threading of the primary handler and the
irq-core will warn about this.

The intention here was probably not allowing forced-threading.

Replace IRQF_ONESHOT with IRQF_NO_THREAD.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Thomas Richter
7f5a66a18b s390/perf: Disable register readout on sampling events
[ Upstream commit b2c04fc123 ]

Running commands
 # ./perf record  -IR0,R1 -a sleep 1
extracts and displays register value of general purpose register r1 and r0.
However the value displayed of any register is random and does not
reflect the register value recorded at the time of the sample interrupt.

The sampling device driver on s390 creates a very large buffer
for the hardware to store the samples. Only when that large buffer
gets full an interrupt is generated and many hundreds of sample
entries are processed and copied to the kernel ring buffer and
eventually get copied to the perf tool. It is during the copy
to the kernel ring buffer that each sample is processed (on s390)
and at that time the register values are extracted.
This is not the original goal, the register values should be read
when the samples are created not when the samples are copied to the
kernel ring buffer.

Prevent this event from being installed in the first place and
return -EOPNOTSUPP. This is already the case for PERF_SAMPLE_REGS_USER.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Konrad Dybcio
e35db916e8 cpufreq: dt-platdev: Block the driver from probing on more QC platforms
[ Upstream commit 7b78189907 ]

Add a number of QC platforms to the blocklist, they all use either the
qcom-cpufreq-hw driver.

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Jiasheng Jiang
dec123825c md-cluster: fix NULL pointer dereference in process_metadata_update
[ Upstream commit f150e753cb ]

The function process_metadata_update() blindly dereferences the 'thread'
pointer (acquired via rcu_dereference_protected) within the wait_event()
macro.

While the code comment states "daemon thread must exist", there is a valid
race condition window during the MD array startup sequence (md_run):

1. bitmap_load() is called, which invokes md_cluster_ops->join().
2. join() starts the "cluster_recv" thread (recv_daemon).
3. At this point, recv_daemon is active and processing messages.
4. However, mddev->thread (the main MD thread) is not initialized until
   later in md_run().

If a METADATA_UPDATED message is received from a remote node during this
specific window, process_metadata_update() will be called while
mddev->thread is still NULL, leading to a kernel panic.

To fix this, we must validate the 'thread' pointer. If it is NULL, we
release the held lock (no_new_dev_lockres) and return early, safely
ignoring the update request as the array is not yet fully ready to
process it.

Link: https://lore.kernel.org/linux-raid/20260117145903.28921-1-jiashengjiangcool@gmail.com
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Magnus Lindholm
d4ca6ca2c6 alpha: fix user-space corruption during memory compaction
[ Upstream commit dd5712f337 ]

Alpha systems can suffer sporadic user-space crashes and heap
corruption when memory compaction is enabled.

Symptoms include SIGSEGV, glibc allocator failures (e.g. "unaligned
tcache chunk"), and compiler internal errors. The failures disappear
when compaction is disabled or when using global TLB invalidation.

The root cause is insufficient TLB shootdown during page migration.
Alpha relies on ASN-based MM context rollover for instruction cache
coherency, but this alone is not sufficient to prevent stale data or
instruction translations from surviving migration.

Fix this by introducing a migration-specific helper that combines:
  - MM context invalidation (ASN rollover),
  - immediate per-CPU TLB invalidation (TBI),
  - synchronous cross-CPU shootdown when required.

The helper is used only by migration/compaction paths to avoid changing
global TLB semantics.

Additionally, update flush_tlb_other(), pte_clear(), to use
READ_ONCE()/WRITE_ONCE() for correct SMP memory ordering.

This fixes observed crashes on both UP and SMP Alpha systems.

Reviewed-by: Ivan Kokshaysky <ink@unseen.parts>
Tested-by: Matoro Mahri <matoro_mailinglist_kernel@matoro.tk>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20260102173603.18247-2-linmag7@gmail.com
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:55 -05:00
Daniel Tang
33f53ac8e9 powercap: intel_rapl: Add PL4 support for Ice Lake
[ Upstream commit 54b3cd55a5 ]

Microsoft Surface Pro 7 firmware throttles the processor upon
boot/resume. Userspace needs to be able to restore the correct value.

Link: https://github.com/linux-surface/linux-surface/issues/706
Signed-off-by: Daniel Tang <danielzgtg.opensource@gmail.com>
Link: https://patch.msgid.link/6088605.ChMirdbgyp@daniel-desktop3
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Armin Wolf
61103a85e6 ACPICA: Abort AML bytecode execution when executing AML_FATAL_OP
[ Upstream commit 026ad376a6 ]

The ACPI specification states that when executing AML_FATAL_OP,
the OS should log the fatal error event and shutdown in a timely
fashion.

Windows complies with this requirement by immediatly entering a
Bso_d, effectively aborting the execution of the AML bytecode in
question.

ACPICA however might continue with the AML bytecode execution
should acpi_os_signal() simply return AE_OK. This will cause issues
because ACPI BIOS implementations might assume that the Fatal()
operator does not return.

Fix this by aborting the AML bytecode execution in such a case
by returning AE_ERROR. Also turn struct acpi_signal_fatal_info into a
local variable because of its small size (12 bytes) and to ensure
that acpi_os_signal() always receives valid information about the
fatal ACPI BIOS error.

Link: https://github.com/acpica/acpica/commit/d516c7758ba6
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3325491.5fSG56mABF@rafael.j.wysocki
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Ai Chao
5e7c8b5cce ACPI: resource: Add JWIPC JVC9100 to irq1_level_low_skip_override[]
[ Upstream commit ba6ded26df ]

Like the JWIPC JVC9100 has its serial IRQ (10 and 11) described
as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh which
breaks the serial.

irq 10, level, active-low, shared, skip-override
irq 11, level, active-low, shared, skip-override

Add the JVC9100 to the irq1_level_low_skip_override[] quirk table to fix
this.

Signed-off-by: Ai Chao <aichao@kylinos.cn>
Link: https://patch.msgid.link/20260113072719.4154485-1-aichao@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Tuo Li
b803811485 ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4()
[ Upstream commit f132e089fe ]

In acpi_processor_errata_piix4(), the pointer dev is first assigned an IDE
device and then reassigned an ISA device:

  dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB, ...);
  dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB_0, ...);

If the first lookup succeeds but the second fails, dev becomes NULL. This
leads to a potential null-pointer dereference when dev_dbg() is called:

  if (errata.piix4.bmisx)
    dev_dbg(&dev->dev, ...);

To prevent this, use two temporary pointers and retrieve each device
independently, avoiding overwriting dev with a possible NULL value.

Signed-off-by: Tuo Li <islituo@gmail.com>
[ rjw: Subject adjustment, added an empty code line ]
Link: https://patch.msgid.link/20260111163214.202262-1-islituo@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Mauro Carvalho Chehab
be10c1bdf6 EFI/CPER: don't go past the ARM processor CPER record buffer
[ Upstream commit eae21beecb ]

There's a logic inside GHES/CPER to detect if the section_length
is too small, but it doesn't detect if it is too big.

Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.

Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:

	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
	[Hardware Error]: event severity: recoverable
	[Hardware Error]:  Error 0, type: recoverable
	[Hardware Error]:   section_type: ARM processor error
	[Hardware Error]:   MIDR: 0xff304b2f8476870a
	[Hardware Error]:   section length: 854918320, CPER size: 67
	[Hardware Error]:   section length is too big
	[Hardware Error]:   firmware-generated error record is incorrect
	[Hardware Error]:   ERR_INFO_NUM is 46198

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject and changelog tweaks ]
Link: https://patch.msgid.link/41cd9f6b3ace3cdff7a5e864890849e4b1c58b63.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Mauro Carvalho Chehab
242c652849 APEI/GHES: ARM processor Error: don't go past allocated memory
[ Upstream commit 87880af2d2 ]

If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/7fd9f38413be05ee2d7cfdb0dc31ea2274cf1a54.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Mauro Carvalho Chehab
b6be51a124 APEI/GHES: ensure that won't go past CPER allocated record
[ Upstream commit fa2408a24f ]

The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.

Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:

    Unable to handle kernel paging request at virtual address fff00000f9b40000
    Mem abort info:
      ESR = 0x0000000096000007
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x07: level 3 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
    [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
    Internal error: Oops: 0000000096000007 [#1]  SMP
    Modules linked in:
    CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
    Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
    Workqueue: kacpi_notify acpi_os_execute_deferred
    pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    pc : hex_dump_to_buffer+0x30c/0x4a0
    lr : hex_dump_to_buffer+0x328/0x4a0
    sp : ffff800080e13880
    x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
    x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
    x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
    x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
    x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
    x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
    x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
    x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
    x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
    Call trace:
     hex_dump_to_buffer+0x30c/0x4a0 (P)
     print_hex_dump+0xac/0x170
     cper_estatus_print_section+0x90c/0x968
     cper_estatus_print+0xf0/0x158
     __ghes_print_estatus+0xa0/0x148
     ghes_proc+0x1bc/0x220
     ghes_notify_hed+0x5c/0xb8
     notifier_call_chain+0x78/0x148
     blocking_notifier_call_chain+0x4c/0x80
     acpi_hed_notify+0x28/0x40
     acpi_ev_notify_dispatch+0x50/0x80
     acpi_os_execute_deferred+0x24/0x48
     process_one_work+0x15c/0x3b0
     worker_thread+0x2d0/0x400
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
    ---[ end trace 0000000000000000 ]---

Prevent that by taking the actual allocated are into account when
checking for CPER length.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/4e70310a816577fabf37d94ed36cde4ad62b1e0a.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:54 -05:00
Mauro Carvalho Chehab
7780c0bad2 EFI/CPER: don't dump the entire memory region
[ Upstream commit 55cc6fe571 ]

The current logic at cper_print_fw_err() doesn't check if the
error record length is big enough to handle offset. On a bad firmware,
if the ofset is above the actual record, length -= offset will
underflow, making it dump the entire memory.

The end result can be:

 - the logic taking a lot of time dumping large regions of memory;
 - data disclosure due to the memory dumps;
 - an OOPS, if it tries to dump an unmapped memory region.

Fix it by checking if the section length is too small before doing
a hex dump.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/1752b5ba63a3e2f148ddee813b36c996cc617e86.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Hou Wenlong
0cb30030b4 x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is set
[ Upstream commit db9aded979 ]

The PVH entry is available for 32-bit KVM guests, and 32-bit KVM guests
do not depend on CONFIG_X86_PAE. However, mk_early_pgtbl_32() builds
different pagetables depending on whether CONFIG_X86_PAE is set.
Therefore, enabling PAE mode for 32-bit KVM guests without
CONFIG_X86_PAE being set would result in a boot failure during CR3
loading.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <d09ce9a134eb9cbc16928a5b316969f8ba606b81.1768017442.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Md Haris Iqbal
7aac0a30dc rnbd-srv: Zero the rsp buffer before using it
[ Upstream commit 69d26698e4 ]

Before using the data buffer to send back the response message, zero it
completely. This prevents any stray bytes to be picked up by the client
side when there the message is exchanged between different protocol
versions.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Jinqian Yang
5dbe1f1435 arm64: Add support for TSV110 Spectre-BHB mitigation
[ Upstream commit e3baa5d4b3 ]

The TSV110 processor is vulnerable to the Spectre-BHB (Branch History
Buffer) attack, which can be exploited to leak information through
branch prediction side channels. This commit adds the MIDR of TSV110
to the list for software mitigation.

Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Robin Murphy
fc38a1f0d5 perf/arm-cmn: Support CMN-600AE
[ Upstream commit 12a94953c3 ]

The functional safety features of CMN-600AE have little to no impact on
the PMU relative to the base CMN-600 design, so for simplicity we can
reasonably just treat it as the same thing. The only obvious difference
is that the revision numbers aren't aligned, so we may hide some aliases
for events which do actually exist, but those can still be specified via
the underlying "type,eventid" format so it's not too big a deal.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Tested-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Heiko Carstens
c0ec4299b1 s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS
[ Upstream commit b4780fe4dd ]

Add -Wno-default-const-init-unsafe to purgatory KBUILD_CFLAGS, similar
to scripts/Makefile.extrawarn, since clang generates warnings for the
dummy variable in typecheck():

    CC      arch/s390/purgatory/purgatory.o
      arch/s390/include/asm/ptrace.h:221:9: warning: default initialization of an object of type 'typeof (regs->psw)' (aka 'const psw_t') leaves the object uninitialized [-Wdefault-const-init-var-unsafe]
        221 |         return psw_bits(regs->psw).pstate;
            |                ^
      arch/s390/include/asm/ptrace.h:98:2: note: expanded from macro 'psw_bits'
         98 |         typecheck(psw_t, __psw);                \
            |         ^
      include/linux/typecheck.h:11:12: note: expanded from macro 'typecheck'
         11 |         typeof(x) __dummy2; \
            |                   ^

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Kaushlendra Kumar
bcbfd4d818 tools/power cpupower: Reset errno before strtoull()
[ Upstream commit f9bd3762cf ]

cpuidle_state_get_one_value() never cleared errno before calling
strtoull(), so a prior ERANGE caused every cpuidle counter read to
return zero. Reset errno to 0 before the conversion so each sysfs read
is evaluated independently.

Link: https://lore.kernel.org/r/20251201121745.3776703-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:53 -05:00
Henrique Carvalho
ab6564f416 smb: client: prevent races in ->query_interfaces()
[ Upstream commit c3c06e42e1 ]

It was possible for two query interface works to be concurrently trying
to update the interfaces.

Prevent this by checking and updating iface_last_update under
iface_lock.

Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Andreas Gruenbacher
5d2c4f182e gfs2: fiemap page fault fix
[ Upstream commit e411d74cc5 ]

In gfs2_fiemap(), we are calling iomap_fiemap() while holding the inode
glock.  This can lead to recursive glock taking if the fiemap buffer is
memory mapped to the same inode and accessing it triggers a page fault.

Fix by disabling page faults for iomap_fiemap() and faulting in the
buffer by hand if necessary.

Fixes xfstest generic/742.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Henrique Carvalho
a20d029585 smb: client: add proper locking around ses->iface_last_update
[ Upstream commit e97dcac3dc ]

There is a missing ses->iface_lock in cifs_setup_session,
around ses->iface_last_update.

Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
jinbaohong
0a45c428ad btrfs: handle user interrupt properly in btrfs_trim_fs()
[ Upstream commit bfb670b918 ]

When a fatal signal is pending or the process is freezing,
btrfs_trim_block_group() and btrfs_trim_free_extents() return -ERESTARTSYS.
Currently this is treated as a regular error: the loops continue to the
next iteration and count it as a block group or device failure.

Instead, break out of the loops immediately and return -ERESTARTSYS to
userspace without counting it as a failure. Also skip the device loop
entirely if the block group loop was interrupted.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: jinbaohong <jinbaohong@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Qu Wenruo
b034e842e3 btrfs: fallback to buffered IO if the data profile has duplication
[ Upstream commit 7c2830f00c ]

[BACKGROUND]
Inspired by a recent kernel bug report, which is related to direct IO
buffer modification during writeback, that leads to contents mismatch of
different RAID1 mirrors.

[CAUSE AND PROBLEMS]
The root cause is exactly the same explained in commit 968f19c5b1
("btrfs: always fallback to buffered write if the inode requires
checksum"), that we can not trust direct IO buffer which can be modified
halfway during writeback.

Unlike data checksum verification, if this happened on inodes without
data checksum but has the data has extra mirrors, it will lead to
stealth data mismatch on different mirrors.

This will be way harder to detect without data checksum.

Furthermore for RAID56, we can even have data without checksum and data
with checksum mixed inside the same full stripe.

In that case if the direct IO buffer got changed halfway for the
nodatasum part, the data with checksum immediately lost its ability to
recover, e.g.:

" " = Good old data or parity calculated using good old data
"X" = Data modified during writeback

              0                32K                      64K
  Data 1      |                                         |  Has csum
  Data 2      |XXXXXXXXXXXXXXXX                         |  No csum
  Parity      |                                         |

In above case, the parity is calculated using data 1 (has csum, from
page cache, won't change during writeback), and old data 2 (has no csum,
direct IO write).

After parity is calculated, but before submission to the storage, direct
IO buffer of data 2 is modified, causing the range [0, 32K) of data 2
has a different content.

Now all data is submitted to the storage, and the fs got fully synced.

Then the device of data 1 is lost, has to be rebuilt from data 2 and
parity. But since the data 2 has some modified data, and the parity is
calculated using old data, the recovered data is no the same for data 1,
causing data checksum mismatch.

[FIX]
Fix the problem by checking the data allocation profile.
If our data allocation profile is either RAID0 or SINGLE, we can allow
true zero-copy direct IO and the end user is fully responsible for any
race.

However this is not going to fix all situations, as it's still possible
to race with balance where the fs got a new data profile after the data
allocation profile check.
But this fix should still greatly reduce the window of the original bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99171
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Ezrak1e
67288113c5 dlm: validate length in dlm_search_rsb_tree
[ Upstream commit 080e5563f8 ]

The len parameter in dlm_dump_rsb_name() is not validated and comes
from network messages. When it exceeds DLM_RESNAME_MAXLEN, it can
cause out-of-bounds write in dlm_search_rsb_tree().

Add length validation to prevent potential buffer overflow.

Signed-off-by: Ezrak1e <ezrakiez@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Jori Koolstra
f57ccd4657 minix: Add required sanity checking to minix_check_superblock()
[ Upstream commit 8c97a6ddc9 ]

The fs/minix implementation of the minix filesystem does not currently
support any other value for s_log_zone_size than 0. This is also the
only value supported in util-linux; see mkfs.minix.c line 511. In
addition, this patch adds some sanity checking for the other minix
superblock fields, and moves the minix_blocks_needed() checks for the
zmap and imap also to minix_check_super_block().

This also closes a related syzbot bug report.

Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20251208153947.108343-1-jkoolstra@xs4all.nl
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: syzbot+5ad0824204c7bf9b67f2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5ad0824204c7bf9b67f2
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:52 -05:00
Adrian Hunter
91705db664 i3c: mipi-i3c-hci: Reset RING_OPERATION1 fields during init
[ Upstream commit 78f63ae4a8 ]

The MIPI I3C HCI specification does not define reset values for
RING_OPERATION1 fields, and some controllers (e.g., Intel) do not clear
them during a software reset.  Ensure the ring pointers are explicitly
set to zero during bus initialization to avoid inconsistent state.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-2-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Frank Li
689d85945c i3c: master: svc: Initialize 'dev' to NULL in svc_i3c_master_ibi_isr()
[ Upstream commit 3c9ffb4db7 ]

Initialize the 'dev' pointer to NULL in svc_i3c_master_ibi_isr() and add
a NULL check in the error path.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202512131016.YCKIsDXM-lkp@intel.com/
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20251215200852.3079073-1-Frank.Li@nxp.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Tetsuo Handa
d209ebaee9 hfsplus: pretend special inodes as regular files
[ Upstream commit ed8889ca21 ]

Since commit af153bb63a ("vfs: catch invalid modes in may_open()")
requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/
S_IFIFO/S_IFSOCK type, use S_IFREG for special inodes.

Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Link: https://lore.kernel.org/r/d0a07b1b-8b73-4002-8e29-e2bd56871262@I-love.SAKURA.ne.jp
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Jeffrey Bencteux
5632d14b2f audit: add missing syscalls to read class
[ Upstream commit bcb90a2834 ]

The "at" variant of getxattr() and listxattr() are missing from the
audit read class. Calling getxattrat() or listxattrat() on a file to
read its extended attributes will bypass audit rules such as:

-w /tmp/test -p rwa -k test_rwa

The current patch adds missing syscalls to the audit read class.

Signed-off-by: Jeffrey Bencteux <jeff@bencteux.fr>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Deepakkumar Karn
727e5140e0 fs/buffer: add alert in try_to_free_buffers() for folios without buffers
[ Upstream commit b68f91ef3b ]

try_to_free_buffers() can be called on folios with no buffers attached
when filemap_release_folio() is invoked on a folio belonging to a mapping
with AS_RELEASE_ALWAYS set but no release_folio operation defined.

In such cases, folio_needs_release() returns true because of the
AS_RELEASE_ALWAYS flag, but the folio has no private buffer data. This
causes try_to_free_buffers() to call drop_buffers() on a folio with no
buffers, leading to a null pointer dereference.

Adding a check in try_to_free_buffers() to return early if the folio has no
buffers attached, with WARN_ON_ONCE() to alert about the misconfiguration.
This provides defensive hardening.

Signed-off-by: Deepakkumar Karn <dkarn@redhat.com>
Link: https://patch.msgid.link/20251211131211.308021-1-dkarn@redhat.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Viacheslav Dubeyko
773855c2ef hfsplus: fix volume corruption issue for generic/498
[ Upstream commit 9a8c4ad447 ]

The xfstests' test-case generic/498 leaves HFS+ volume
in corrupted state:

sudo ./check generic/498
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #18 SMP PREEMPT_DYNAMIC Thu Dec 4 12:24:45 PST 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/498 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/498.full for details)

Ran: generic/498
Failures: generic/498
Failed 1 of 1 tests

sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
Invalid leaf record count
(It should be 16 instead of 2)
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x8000 CatStat = 0x00000000
** Repairing volume.
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
CheckHardLinks: found 1 pre-Leopard file inodes.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.

The generic/498 test executes such steps on final phase:

mkdir $SCRATCH_MNT/A
mkdir $SCRATCH_MNT/B
mkdir $SCRATCH_MNT/A/C
touch $SCRATCH_MNT/B/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/B/foo

ln $SCRATCH_MNT/B/foo $SCRATCH_MNT/A/C/foo
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/A

"Simulate a power failure and mount the filesystem
to check that what we explicitly fsync'ed exists."

_flakey_drop_and_remount

The FSCK tool complains about "Invalid leaf record count".
HFS+ b-tree header contains leaf_count field is updated
by hfs_brec_insert() and hfs_brec_remove(). The hfs_brec_insert()
is involved into hard link creation process. However,
modified in-core leaf_count field is stored into HFS+
b-tree header by hfs_btree_write() method. But,
unfortunately, hfs_btree_write() hasn't been called
by hfsplus_cat_write_inode() and hfsplus_file_fsync()
stores not fully consistent state of the Catalog File's
b-tree.

This patch adds calling hfs_btree_write() method in
the hfsplus_cat_write_inode() with the goal of
storing consistent state of Catalog File's b-tree.
Finally, it makes FSCK tool happy.

sudo ./check generic/498
FSTYP         -- hfsplus
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #22 SMP PREEMPT_DYNAMIC Sat Dec  6 17:01:31 PST 2025
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/498 33s ...  31s
Ran: generic/498
Passed all 1 tests

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20251207035821.3863657-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:51 -05:00
Jeffrey Bencteux
3e762a0371 audit: add fchmodat2() to change attributes class
[ Upstream commit 4f493a6079 ]

fchmodat2(), introduced in version 6.6 is currently not in the change
attribute class of audit. Calling fchmodat2() to change a file
attribute in the same fashion than chmod() or fchmodat() will bypass
audit rules such as:

-w /tmp/test -p rwa -k test_rwa

The current patch adds fchmodat2() to the change attributes class.

Signed-off-by: Jeffrey Bencteux <jeff@bencteux.fr>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:50 -05:00
Bhavik Sachdev
9c1c09c3d1 statmount: permission check should return EPERM
[ Upstream commit fccbe38a5d ]

Currently, statmount() returns ENOENT when caller is not CAP_SYS_ADMIN
in the user namespace owner of target mount namespace. This should be
EPERM instead.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Link: https://patch.msgid.link/20251129091455.757724-2-b.sachdev1904@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:50 -05:00
Anthony Pighin (Nokia)
0fd52d901b rtc: interface: Alarm race handling should not discard preceding error
[ Upstream commit 81be22cd4a ]

Commit 795cda8338 ("rtc: interface: Fix long-standing race when setting
alarm") should not discard any errors from the preceding validations.

Prior to that commit, if the alarm feature was disabled, or the
set_alarm failed, a meaningful error code would be returned to the
caller for further action.

After, more often than not, the __rtc_read_time will cause a success
return code instead, misleading the caller.

An example of this is when timer_enqueue is called for a rtc-abx080x
device. Since that driver does not clear the alarm feature bit, but
instead relies on the set_alarm operation to return invalid, the discard
of the return code causes very different behaviour; i.e.
    hwclock: select() to /dev/rtc0 to wait for clock tick timed out

Fixes: 795cda8338 ("rtc: interface: Fix long-standing race when setting alarm")
Signed-off-by: Anthony Pighin (Nokia) <anthony.pighin@nokia.com>
Reviewed-by: Esben Haabendal <esben@geanix.com>
Tested-by: Nick Bowler <nbowler@draconx.ca>
Link: https://patch.msgid.link/BN0PR08MB6951415A751F236375A2945683D1A@BN0PR08MB6951.namprd08.prod.outlook.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:50 -05:00
Ian Rogers
70a69f675b libperf build: Always place libperf includes first
[ Upstream commit 8c5b40678c ]

When building tools/perf the CFLAGS can contain a directory for the
installed headers.

As the headers may be being installed while building libperf.a this can
cause headers to be partially installed and found in the include path
while building an object file for libperf.a.

The installed header may reference other installed headers that are
missing given the partial nature of the install and then the build fails
with a missing header file.

Avoid this by ensuring the libperf source headers are always first in
the CFLAGS.

Fixes: 3143504918 ("libperf: Make libperf.a part of the perf build")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:20:50 -05:00