The blamed commit increased the needed headroom to account for
alignment. This means that the size required to always align a Tx buffer
was added inside the dpaa2_eth_needed_headroom() function. By doing
that, a manual adjustment of the pointer passed to PTR_ALIGN() was no
longer correct since the 'buffer_start' variable was already pointing
to the start of the skb's memory.
The behavior of the dpaa2-eth driver without this patch was to drop
frames on Tx even when the headroom was matching the 128 bytes
necessary. Fix this by removing the manual adjust of 'buffer_start' from
the PTR_MODE call.
Closes: https://lore.kernel.org/netdev/70f0dcd9-1906-4d13-82df-7bbbbe7194c6@app.fastmail.com/T/#u
Fixes: f422abe3f2 ("dpaa2-eth: increase the needed headroom to account for alignment")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Mathew McBride <matt@traverse.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251016135807.360978-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ENETC RX ring uses the page halves flipping mechanism, each page is
split into two halves for the RX ring to use. And ENETC_RXB_TRUESIZE is
defined to 2048 to indicate the size of half a page. However, the page
size is configurable, for ARM64 platform, PAGE_SIZE is default to 4K,
but it could be configured to 16K or 64K.
When PAGE_SIZE is set to 16K or 64K, ENETC_RXB_TRUESIZE is not correct,
and the RX ring will always use the first half of the page. This is not
consistent with the description in the relevant kernel doc and commit
messages.
This issue is invisible in most cases, but if users want to increase
PAGE_SIZE to receive a Jumbo frame with a single buffer for some use
cases, it will not work as expected, because the buffer size of each
RX BD is fixed to 2048 bytes.
Based on the above two points, we expect to correct ENETC_RXB_TRUESIZE
to (PAGE_SIZE >> 1), as described in the comment.
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251016080131.3127122-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After applying the workaround for err050089, the LS1028A platform
experiences RCU stalls on RT kernel. This issue is caused by the
recursive acquisition of the read lock enetc_mdio_lock. Here list some
of the call stacks identified under the enetc_poll path that may lead to
a deadlock:
enetc_poll
-> enetc_lock_mdio
-> enetc_clean_rx_ring OR napi_complete_done
-> napi_gro_receive
-> enetc_start_xmit
-> enetc_lock_mdio
-> enetc_map_tx_buffs
-> enetc_unlock_mdio
-> enetc_unlock_mdio
After enetc_poll acquires the read lock, a higher-priority writer attempts
to acquire the lock, causing preemption. The writer detects that a
read lock is already held and is scheduled out. However, readers under
enetc_poll cannot acquire the read lock again because a writer is already
waiting, leading to a thread hang.
Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
recursive lock acquisition.
Fixes: 6d36ecdbc4 ("net: enetc: take the MDIO lock only once per NAPI poll cycle")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
Acked-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251015021427.180757-1-jianpeng.chang.cn@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull arm64 fixes from Catalin Marinas:
- Explicitly encode the XZR register if the value passed to
write_sysreg_s() is 0.
The GIC CDEOI instruction is encoded as a system register write with
XZR as the source register. However, clang does not honour the "Z"
register constraint, leading to incorrect code generation
- Ensure the interrupts (DAIF.IF) are unmasked when completing
single-step of a suspended breakpoint before calling
exit_to_user_mode().
With pseudo-NMIs, interrupts are (additionally) masked at the PMR_EL1
register, handled by local_irq_*()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: debug: always unmask interrupts in el0_softstp()
arm64/sysreg: Fix GIC CDEOI instruction encoding
Pull RISC-V fixes from Paul Walmsley:
- Disable CFI with Rust for any platform other than x86 and ARM64
- Keep task mm_cpumasks up-to-date to avoid triggering M-mode firmware
warnings if the kernel tries to send an IPI to an offline CPU
- Improve kprobe address validation performance and avoid desyncs
(following x86)
- Avoid duplicate device probes by avoiding DT hardware probing when
ACPI is enabled in early boot
- Use the correct set of dependencies for
CONFIG_ARCH_HAS_ELF_CORE_EFLAGS, avoiding an allnoconfig warning
- Fix a few other minor issues
* tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kprobes: convert one final __ASSEMBLY__ to __ASSEMBLER__
riscv: Respect dependencies of ARCH_HAS_ELF_CORE_EFLAGS
riscv: acpi: avoid errors caused by probing DT devices when ACPI is used
riscv: kprobes: Fix probe address validation
riscv: entry: fix typo in comment 'instruciton' -> 'instruction'
RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors
riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES
rust: cfi: only 64-bit arm and x86 support CFI_CLANG
Regardless of the DeviceContext of a device, we can't give any
guarantees about the DeviceContext of its parent device.
This is very subtle, since it's only caused by a simple typo, i.e.
Self::from_raw(parent)
which preserves the DeviceContext in this case, vs.
Device::from_raw(parent)
which discards the DeviceContext.
(I should have noticed it doing the correct thing in auxiliary::Device
subsequently, but somehow missed it.)
Hence, fix both Device::parent() and auxiliary::Device::parent().
Cc: stable@vger.kernel.org
Fixes: a4c9f71e34 ("rust: device: implement Device::parent()")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
This fixes the following build error
CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o
progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as
different kind of symbol
228 | u32 off;
| ^
The symbol 'off' was previously defined in
tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an
enum i40e_ptp_gpio_pin_state from
drivers/net/ethernet/intel/i40e/i40e_ptp.c:
enum i40e_ptp_gpio_pin_state {
end = -2,
invalid = -1,
off = 0,
in_A = 1,
in_B = 2,
out_A = 3,
out_B = 4,
};
This enum is included when CONFIG_I40E is enabled. As of commit
032676ff82 ("LoongArch: Update Loongson-3 default config file"),
CONFIG_I40E is set in the defconfig, which leads to the conflict.
Renaming the local variable avoids the redefinition and allows the
build to succeed.
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We intend that EL0 exception handlers unmask all DAIF exceptions
before calling exit_to_user_mode().
When completing single-step of a suspended breakpoint, we do not call
local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(),
leaving all DAIF exceptions masked.
When pseudo-NMIs are not in use this is benign.
When pseudo-NMIs are in use, this is unsound. At this point interrupts
are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag
manipulation may not work correctly. For example, a subsequent
local_irq_enable() within exit_to_user_mode_loop() will only unmask
interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and
anything depending on interrupts being unmasked (e.g. delivery of
signals) will not work correctly.
This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING.
Move the call to `try_step_suspended_breakpoints()` outside of the check
so that interrupts can be unmasked even if we don't call the step handler.
Fixes: 0ac7584c08 ("arm64: debug: split single stepping exception entry")
Cc: <stable@vger.kernel.org> # 6.17
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111
otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE.
Currenly, its usage is encoded as a system register write, with a constant
0 value:
write_sysreg_s(0, GICV5_OP_GIC_CDEOI)
While compiling with GCC, the 0 constant value, through these asm
constraints and modifiers ('x' modifier and 'Z' constraint combo):
asm volatile(__msr_s(r, "%x0") : : "rZ" (__val));
forces the compiler to issue the XZR register for the MSR operation (ie
that corresponds to Rt == 0b11111) issuing the right instruction encoding.
Unfortunately LLVM does not yet understand that modifier/constraint
combo so it ends up issuing a different register from XZR for the MSR
source, which in turns means that it encodes the GIC CDEOI instruction
wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE
that we must prevent.
Add a conditional to write_sysreg_s() macro that detects whether it
is passed a constant 0 value and issues an MSR write with XZR as source
register - explicitly doing what the asm modifier/constraint is meant to
achieve through constraints/modifiers, fixing the LLVM compilation issue.
Fixes: 7ec80fb3f0 ("irqchip/gic-v5: Add GICv5 PPI support")
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Sascha Bischoff <sascha.bischoff@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 29d6d30f5c ("Btrfs: send, don't send rmdir for same target
multiple times") has fixed an issue that a send stream contained a rmdir
operation for the same directory multiple times. After that fix we keep
track of the last directory for which we sent a rmdir operation and
compare with it before sending a rmdir for the parent inode of a deleted
hardlink we are processing. But there is still a corner case that in
between rmdir dir operations for the same inode we find deleted hardlinks
for other parent inodes, so tracking just the last inode for which we sent
a rmdir operation is not enough.
Hardlinks of a file in the same directory are stored in the same INODE_REF
item, but if the number of hardlinks is too large and can not fit in a
leaf, we use INODE_EXTREF items to store them. The key of an INODE_EXTREF
item is (inode_id, INODE_EXTREF, hash[name, parent ino]), so between two
hardlinks for the same parent directory, we can find others for other
parent directories. For example for the reproducer below we get the
following (from a btrfs inspect-internal dump-tree output):
item 0 key (259 INODE_EXTREF 2309449) itemoff 16257 itemsize 26
index 6925 parent 257 namelen 8 name: foo.6923
item 1 key (259 INODE_EXTREF 2311350) itemoff 16231 itemsize 26
index 6588 parent 258 namelen 8 name: foo.6587
item 2 key (259 INODE_EXTREF 2457395) itemoff 16205 itemsize 26
index 6611 parent 257 namelen 8 name: foo.6609
(...)
So tracking the last directory's inode number does not work in this case
since we process a link for parent inode 257, then for 258 and then back
again for 257, and that second time we process a deleted link for 257 we
think we have not yet sent a rmdir operation.
Fix this by using a rbtree to keep track of all the directories for which
we have already sent rmdir operations, and add those directories to the
'check_dirs' ref list in process_recorded_refs() only if the directory is
not yet in the rbtree, otherwise skip it since it means we have already
sent a rmdir operation for that directory.
The following test script reproduces the problem:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
mkdir $MNT/a $MNT/b
echo 123 > $MNT/a/foo
for ((i = 1; i <= 1000; i++)); do
ln $MNT/a/foo $MNT/a/foo.$i
ln $MNT/a/foo $MNT/b/foo.$i
done
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send $MNT/snap1 -f /tmp/base.send
rm -r $MNT/a $MNT/b
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -p $MNT/snap1 $MNT/snap2 -f /tmp/incremental.send
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive $MNT -f /tmp/base.send
btrfs receive $MNT -f /tmp/incremental.send
rm -f /tmp/base.send /tmp/incremental.send
umount $MNT
When running it, it fails like this:
$ ./test.sh
(...)
At subvol snap1
At snapshot snap2
ERROR: rmdir o257-9-0 failed: No such file or directory
CC: <stable@vger.kernel.org>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Ting-Chang Hou <tchou@synology.com>
[ Updated changelog ]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull io_uring fixes from Jens Axboe:
- Revert of a change that went into an older kernel, and which has been
reported to cause a regression for some write workloads on LVM while
a snapshop is being created
- Fix a regression from this merge window, where some compilers (and/or
certain .config options) would cause an earlier evaluations of a
dereference which would then cause a NULL pointer dereference.
I was only able to reproduce this with OPTIMIZE_FOR_SIZE=y, but David
Howells hit it with just KASAN enabled. Depending on how things
inlined, this makes sense
- Fix for a missing lock around a mem region unregistration
- Fix for ring resizing with the same placement after resize
* tag 'io_uring-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/rw: check for NULL io_br_sel when putting a buffer
io_uring: fix unexpected placement on same size resizing
io_uring: protect mem region deregistration
Revert "io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()"
Format the kernel-doc for SCALE_HW_CALIB_INVALID correctly to
avoid a kernel-doc warning:
Warning: include/linux/misc_cgroup.h:26 Enum value
'MISC_CG_RES_TDX' not described in enum 'misc_res_type'
Fixes: 7c035bea94 ("KVM: TDX: Register TDX host key IDs to cgroup misc controller")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- iostats accounting fixed on multipath retries (Amit)
- secure concatenation response fixup (Martin)
- tls partial record fixup (Wilfred)
- Fix for a lockdep reported issue with the elevator lock and
blk group frozen operations
- Fix for a regression in this merge window, where updating
'nr_requests' would not do the right thing for queues with
shared tags
* tag 'block-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme/tcp: handle tls partially sent records in write_space()
block: Remove elevator_lock usage from blkg_conf frozen operations
blk-mq: fix stale tag depth for shared sched tags in blk_mq_update_nr_requests()
nvme-auth: update sc_c in host response
nvme-multipath: Skip nr_active increments in RETRY disposition
Pull mmc cleanup from Ulf Hansson:
"Move rpmb_frame struct and constants to rpmb common header
This helps us to avoid sharing an immutable branch between our git
trees. I was planning to send it before rc1, but I didn't make it"
* tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
rpmb: move rpmb_frame struct and constants to common header
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are rather boring
device-specific fixes and quirks:
- A few fixes for missing NULL checks
- ASoC NAU8821 fixes for jack and irq handling
- Various fixes for ASoC TAS2781, IDT821034, sc8280xp, max9809x,
wcd938x, and SoundWire
- Usual HD-audio and USB-audio quirks"
* tag 'sound-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx
ALSA: usb-audio: fix vendor quirk for Logitech H390
ALSA: usb-audio: add volume quirks for MS LifeChat LX-3000
ASoC: amd/sdw_utils: avoid NULL deref when devm_kasprintf() fails
ASoC: max98090/91: fixed max98091 ALSA widget powering up/down
ASoC: dt-bindings: Add compatible string fsl,imx-audio-tlv320
ASoC: codecs: wcd938x-sdw: remove redundant runtime pm calls
ASoC: sdw_utils: add rt1321 part id to codec_info_list
ALSA: usb-audio: Fix NULL pointer deference in try_to_register_card
ALSA: firewire: amdtp-stream: fix enum kernel-doc warnings
ALSA: usb-audio: add mixer_playback_min_mute quirk for Logitech H390
ASoC: nau8821: Avoid unnecessary blocking in IRQ handler
ASoC: nau8821: Add DMI quirk to bypass jack debounce circuit
ASoC: nau8821: Consistently clear interrupts before unmasking
ASoC: nau8821: Generalize helper to clear IRQ status
ASoC: nau8821: Cancel jdet_work before handling jack ejection
ASoC: codecs: Fix gain setting ranges for Renesas IDT821034 codec
ASoC: tas2781: Update ti,tas2781.yaml for adding tas58xx
ASoC: tas2781: Support more newly-released amplifiers tas58xx in the driver
ASoC: qcom: sc8280xp: Add support for QCS615
...
Pull drm fixes from Dave Airlie:
"As per usual xe/amdgpu are the leaders, with some i915 and then a
bunch of scattered fixes. There are a bunch of stability fixes for
some older amdgpu cards.
draw:
- Avoid color truncation
gpuvm:
- Avoid kernel-doc warning
sched:
- Avoid double free
i915:
- Skip GuC communication warning if reset is in progress
- Couple frontbuffer related fixes
- Deactivate PSR only on LNL and when selective fetch enabled
xe:
- Increase global invalidation timeout to handle some workloads
- Fix NPD while evicting BOs in an array of VM binds
- Fix resizable BAR to account for possibly needing to move BARs
other than the LMEMBAR
- Fix error handling in xe_migrate_init()
- Fix atomic fault handling with mixed mappings or if the page is
already in VRAM
- Enable media samplers power gating for platforms before Xe2
- Fix de-registering exec queue from GuC when unbinding
- Ensure data migration to system if indicated by madvise with SVM
- Fix kerneldoc for kunit change
- Always account for cacheline alignment on migration
- Drop bogus assertion on eviction
amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes
amdkfd:
- MES fix
ast:
- Fix display output after reboot
bridge:
- lt9211: Fix version check
panthor:
- Fix MCU suspend
qaic:
- Init bootlog in correct order
- Treat remaining == 0 as error in find_and_map_user_pages()
- Lock access to DBC request queue
rockchip:
- vop2: Fix destination size in atomic check"
* tag 'drm-fixes-2025-10-17' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies
drm/xe/evict: drop bogus assert
drm/xe/migrate: don't misalign current bytes
drm/xe/kunit: Fix kerneldoc for parameterized tests
drm/xe/svm: Ensure data will be migrated to system if indicated by madvise.
drm/gpuvm: Fix kernel-doc warning for drm_gpuvm_map_req.map
drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabled
drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off
accel/qaic: Synchronize access to DBC request queue head & tail pointer
accel/qaic: Treat remaining == 0 as error in find_and_map_user_pages()
accel/qaic: Fix bootlog initialization ordering
drm/rockchip: vop2: use correct destination rectangle height check
drm/draw: fix color truncation in drm_draw_fill24
drm/xe/guc: Check GuC running state before deregistering exec queue
drm/xe: Enable media sampler power gating
drm/xe: Handle mixed mappings and existing VRAM on atomic faults
drm/xe/migrate: Fix an error path
drm/xe: Move rebar to be done earlier
drm/xe: Don't allow evicting of BOs in same VM in array of VM binds
drm/xe: Increase global invalidation timeout to 1000us
...
Pull i2c fixes from Wolfram Sang:
- PM cleanup after all prerequisites are merged with rc1
- usbio: missing addition after all dependencies are in
- slimpro: DT binding schema conversion
* tag 'i2c-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: Convert apm,xgene-slimpro-i2c to DT schema
i2c: usbio: Add ACPI device-id for MTL-CVF devices
i2c: Remove redundant pm_runtime_mark_last_busy() calls
TEE QTEE fixes for v6.18
- Adds ARCH_QCOM dependency for the QTEE driver
- Fixing return values for copy_from_user() failures
- Guarding against potential off by one read
* tag 'tee-qcomtee-fixes-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee:
tee: QCOMTEE should depend on ARCH_QCOM
tee: qcom: return -EFAULT instead of -EINVAL if copy_from_user() fails
tee: qcom: prevent potential off by one read
Due to the wider deployment of the ->sync_state() support, for PM domains
for example, we are receiving reports about the sync_state() pending
message that is being logged in fw_devlink_dev_sync_state(). In particular
as it's printed at the warning level, which is questionable.
Even if it certainly is useful to know that the ->sync_state() condition
could not be met, there may be nothing wrong with it. For example, a driver
may be built as module and are still waiting to be initialized/probed. For
this reason let's move to the info level for now.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Sebin Francis <sebin.francis@ti.com>
Reported-by: Diederik de Haas <didi.debian@cknow.org>
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Sebin Francis <sebin.francis@ti.com>
Tested-by: Sebin Francis <sebin.francis@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When executing kvm_riscv_vcpu_aia_has_interrupts, the vCPU may have
migrated and the IMSIC VS-file have not been updated yet, currently
the HGEIP CSR should be read from the imsic->vsfile_cpu ( the pCPU
before migration ) via on_each_cpu_mask, but this will trigger an
IPI call and repeated IPI within a period of time is expensive in
a many-core systems.
Just let the vCPU execute and update the correct IMSIC VS-file via
kvm_riscv_vcpu_aia_imsic_update may be a simple solution.
Fixes: 4cec89db80 ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251016012659.82998-1-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
Robert recently reported two corrupted images that can cause system
crashes, which are related to the new encoded extents introduced
in Linux 6.15:
- The first one [1] has plen != 0 (e.g. plen == 0x2000000) but
(plen & Z_EROFS_EXTENT_PLEN_MASK) == 0. It is used to represent
special extents such as sparse extents (!EROFS_MAP_MAPPED), but
previously only plen == 0 was handled;
- The second one [2] has pa 0xffffffffffdcffed and plen 0xb4000,
then "cur [0xfffffffffffff000] += bvec.bv_len [0x1000]" in
"} while ((cur += bvec.bv_len) < end);" wraps around, causing an
out-of-bound access of pcl->compressed_bvecs[] in
z_erofs_submit_queue(). EROFS only supports 48-bit physical block
addresses (up to 1EiB for 4k blocks), so add a sanity check to
enforce this.
Fixes: 1d191b4ca5 ("erofs: implement encoded extent metadata")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/r/75022.1759355830@localhost [1]
Closes: https://lore.kernel.org/r/80524.1760131149@localhost [2]
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
On all platforms set_clock_selection() writes to a GRF register. This
requires certain clocks running and thus should happen before the
clocks are disabled.
This has been noticed on RK3576 Sige5, which hangs during system suspend
when trying to suspend the second network interface. Note, that
suspending the first interface works, because the second device ensures
that the necessary clocks for the GRF are enabled.
Cc: stable@vger.kernel.org
Fixes: 2f2b60a0ec ("net: ethernet: stmmac: dwmac-rk: Add gmac support for rk3588")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251014-rockchip-network-clock-fix-v1-1-c257b4afdf75@collabora.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Driver Changes:
- Increase global invalidation timeout to handle some workloads
(Kenneth Graunke)
- Fix NPD while evicting BOs in an array of VM binds (Matthew Brost)
- Fix resizable BAR to account for possibly needing to move BARs other
than the LMEMBAR (Lucas De Marchi)
- Fix error handling in xe_migrate_init() (Thomas Hellström)
- Fix atomic fault handling with mixed mappings or if the page is
already in VRAM (Matthew Brost)
- Enable media samplers power gating for platforms before Xe2 (Vinay
Belgaumkar)
- Fix de-registering exec queue from GuC when unbinding (Matthew Brost)
- Ensure data migration to system if indicated by madvise with SVM
(Thomas Hellström)
- Fix kerneldoc for kunit change (Matt Roper)
- Always account for cacheline alignment on migration (Matthew Auld)
- Drop bogus assertion on eviction (Matthew Auld)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/rch735eqkmprfyutk3ux2fsqa3e5ve4p77w7a5j66qdpgyquxr@ao3wzcqtpn6s
Creating FDB entries is possible from a non-initial user namespace when
having CAP_NET_ADMIN, yet, when deleting FDB entries, processes receive
an EPERM because the capability is always checked against the initial
user namespace. This restricts the FDB management from unprivileged
containers.
Drop the netlink_capable check in rtnl_fdb_del as it was originally
dropped in c5c351088a and reintroduced in 1690be63a2 without
intention.
This patch was tested using a container on GyroidOS, where it was
possible to delete FDB entries from an unprivileged user namespace and
private network namespace.
Fixes: 1690be63a2 ("bridge: Add vlan support to static neighbors")
Reviewed-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Tested-by: Harshal Gohel <hg@simonwunderlich.de>
Signed-off-by: Johannes Wiesböck <johannes.wiesboeck@aisec.fraunhofer.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251015201548.319871-1-johannes.wiesboeck@aisec.fraunhofer.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Apply the formatting guidelines introduced in the previous commit to
make the file `rustfmt`-clean again.
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
`rustfmt`, by default, formats imports in a way that is prone to conflicts
while merging and rebasing, since in some cases it condenses several
items into the same line.
For instance, Linus mentioned [1] that the following case:
use crate::{
fmt,
page::AsPageIter,
};
is compressed by `rustfmt` into:
use crate::{fmt, page::AsPageIter};
which is undesirable.
Similarly, `rustfmt` may put several items in the same line even if the
braces span already multiple lines, e.g.:
use kernel::{
acpi, c_str,
device::{property, Core},
of, platform,
};
The options that control the formatting behavior around imports are
generally unstable, and `rustfmt` releases do not allow to use nightly
features, unlike the compiler and other Rust tooling [2].
For the moment, we can introduce a workaround to prevent `rustfmt`
from compressing the example above -- the "trailing empty comment":
use crate::{
fmt,
page::AsPageIter, //
};
which is reminiscent of the trailing comma behavior in other formatters.
We already used empty comments for formatting purposes in the past,
e.g. in commit b9b701fce4 ("rust: clarify the language unstable features
in use").
In addition, `rustfmt` actually reformats with a vertical layout (i.e. it
does not put two items in the same line) when seeing such a comment,
i.e. it doesn't just preserve the formatting, which is good in the sense
that we can use it to easily reformat some imports, since it matches
the style we generally want to have.
A Git merge driver would help (suggested by Gary and Wedson), though
maintainers would need to set it up, the diffs would still be larger
and the formatting rules for imports would remain hard to predict.
Thus document the style that we will follow in the coding guidelines
by introducing a new section and explain how the trailing empty comment
works there too.
We discussed the issue with upstream Rust in our usual Rust <-> Rust
for Linux meeting [3], and there have also been a few other discussions
in parallel in issues [4][5] and Zulip [6]. We will see what happens,
but upstream Rust has already created a subteam of `rustfmt` to try
to overcome the bandwidth issue [7], which is a good signal, and some
organization work has already started (e.g. tracking issues). We will
continue our discussions with them about it.
Cc: Caleb Cartwright <caleb.cartwright@outlook.com>
Cc: Yacin Tmimi <yacintmimi@gmail.com>
Cc: Manish Goregaokar <manishsmail@gmail.com>
Cc: Deadbeef <ent3rm4n@gmail.com>
Cc: Cameron Steffen <cam.steffen94@gmail.com>
Cc: Jieyou Xu <jieyouxu@outlook.com>
Link: https://lore.kernel.org/all/CAHk-=wgO7S_FZUSBbngG5vtejWOpzDfTTBkVvP3_yjJmFddbzA@mail.gmail.com/ [1]
Link: https://github.com/rust-lang/rustfmt/issues/4884 [2]
Link: https://hackmd.io/iSCyY3JTTz-g8YM-nnzTTA [3]
Link: https://github.com/rust-lang/rustfmt/issues/4991 [4]
Link: https://github.com/rust-lang/rustfmt/issues/3361 [5]
Link: https://rust-lang.zulipchat.com/#narrow/channel/392734-council/topic/rustfmt.20maintenance/near/543815381 [6]
Link: https://github.com/rust-lang/team/pull/2017 [7]
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The 'accel' parameter of mlx5e_txwqe_build_eseg_csum() and the similar
'state' parameter of mlx5e_accel_tx_ids_len() were NULL when called
from mlx5i_sq_xmit() and were causing kernel panics from that context.
Fix that by passing in a local empty mlx5e_accel_tx_state variable, thus
guaranteeing that 'accel' is never NULL. Also remove an unnecessary
check from mlx5e_tx_wqe_inline_mode().
Fixes: e5a1861a29 ("net/mlx5e: Implement PSP Tx data path")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1760511923-890650-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some network drivers assume this field is zero after napi_get_frags().
We must clear it in napi_reuse_skb() otherwise the following can happen:
1) A packet is received, and skb_shinfo(skb)->hwtstamps is populated
because a bit in the receive descriptor announced hwtstamp
availability for this packet.
2) Packet is given to gro layer via napi_gro_frags().
3) Packet is merged to a prior one held in GRO queues.
4) skb is saved after some cleanup in napi->skb via a call
to napi_reuse_skb().
5) Next packet is received 10 seconds later, gets the recycled skb
from napi_get_frags().
6) The receive descriptor does not announce hwtstamp availability.
Driver does not clear shinfo->hwtstamps.
7) We have in shinfo->hwtstamps an old timestamp.
Fixes: ac45f602ee ("net: infrastructure for hardware time stamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251015063221.4171986-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When building with Clang 20 or newer, there are some objtool warnings
from unexpected fallthroughs to other functions:
vmlinux.o: warning: objtool: mlx5e_mpwrq_mtts_per_wqe() falls through to next function mlx5e_mpwrq_max_num_entries()
vmlinux.o: warning: objtool: mlx5e_mpwrq_max_log_rq_size() falls through to next function mlx5e_get_linear_rq_headroom()
LLVM 20 contains an (admittedly problematic [1]) optimization [2] to
convert divide by zero into the equivalent of __builtin_unreachable(),
which invokes undefined behavior and destroys code generation when it is
encountered in a control flow graph.
mlx5e_mpwrq_umr_entry_size() returns 0 in the default case of an
unrecognized mlx5e_mpwrq_umr_mode value. mlx5e_mpwrq_mtts_per_wqe(),
which is inlined into mlx5e_mpwrq_max_log_rq_size(), uses the result of
mlx5e_mpwrq_umr_entry_size() in a divide operation without checking for
zero, so LLVM is able to infer there will be a divide by zero in this
case and invokes undefined behavior. While there is some proposed work
to isolate this undefined behavior and avoid the destructive code
generation that results in these objtool warnings, code should still be
defensive against divide by zero.
As the WARN_ONCE() implies that an invalid value should be handled
gracefully, return 1 instead of 0 in the default case so that the
results of this division operation is always valid.
Fixes: 168723c1f8 ("net/mlx5e: xsk: Use umr_mode to calculate striding RQ parameters")
Link: https://lore.kernel.org/CAGG=3QUk8-Ak7YKnRziO4=0z=1C_7+4jF+6ZeDQ9yF+kuTOHOQ@mail.gmail.com/ [1]
Link: https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448 [2]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2131
Closes: https://github.com/ClangBuiltLinux/linux/issues/2132
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251014-mlx5e-avoid-zero-div-from-mlx5e_mpwrq_umr_entry_size-v1-1-dc186b8819ef@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TX frames aren't padded and unknown memory is sent into the ether.
Theoretically, it isn't even guaranteed that the extra memory exists
and can be sent out, which could cause further problems. In practice,
I found that plenty of tailroom exists in the skb itself (in my test
with ping at least) and skb_padto() easily succeeds, so use it here.
In the event of -ENOMEM drop the frame like other drivers do.
The use of one more padding byte instead of a USB zero-length packet
is retained to avoid regression. I have a dodgy Etron xHCI controller
which doesn't seem to support sending ZLPs at all.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251014203528.3f9783c4.michal.pecio@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
amd-drm-fixes-6.18-2025-10-16:
amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes
amdkfd:
- MES fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20251016132224.2534946-1-alexander.deucher@amd.com
When scheduling the deferred balance callbacks, check SCX_RQ_BAL_CB_PENDING
instead of SCX_RQ_BAL_PENDING. This way schedule_deferred() properly tests
whether there is already a pending request for queue_balance_callback() to
be invoked at the end of .balance().
Fixes: a8ad873113 ("sched_ext: defer queue_balance_callback() until after ops.dispatch")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
With TLS enabled, records that are encrypted and appended to TLS TX
list can fail to see a retry if the underlying TCP socket is busy, for
example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known
to the NVMe TCP driver, as the TLS layer successfully generated a record.
Typically, the TLS write_space() callback would ensure such records are
retried, but in the NVMe TCP Host driver, write_space() invokes
nvme_tcp_write_space(). This causes a partially sent record in the TLS TX
list to timeout after not being retried.
This patch fixes the above by calling queue->write_space(), which calls
into the TLS layer to retry any pending records.
Fixes: be8e82caa6 ("nvme-tcp: enable TLS handshake upcall")
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
ASoC: Fixes for v6.18
A moderately large collection of driver specific fixes, plus a few new
quirks and device IDs. The NAU8821 changes are a little large but more
in mechanical ways than in ways that are complex.