linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-03-03 18:28:01 +01:00

Author	SHA1	Message	Date
Lizhi Hou	75c151ceaa	accel/amdxdna: Use a different name for latest firmware Using legacy driver with latest firmware causes a power off issue. Fix this by assigning a different filename (npu_7.sbin) to the latest firmware. The driver attempts to load the latest firmware first and falls back to the previous firmware version if loading fails. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5009 Fixes: `f1eac46fe5` ("accel/amdxdna: Update firmware version check for latest firmware") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260225204752.2711734-1-lizhi.hou@amd.com	2026-02-25 13:51:31 -08:00
Lizhi Hou	901ec34709	accel/amdxdna: Validate command buffer payload count The count field in the command header is used to determine the valid payload size. Verify that the valid payload does not exceed the remaining buffer space. Fixes: `aac243092b` ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260219211946.1920485-1-lizhi.hou@amd.com	2026-02-23 09:24:21 -08:00
Lizhi Hou	03808abb1d	accel/amdxdna: Prevent ubuf size overflow The ubuf size calculation may overflow, resulting in an undersized allocation and possible memory corruption. Use check_add_overflow() helpers to validate the size calculation before allocation. Fixes: `bd72d4acda` ("accel/amdxdna: Support user space allocated buffer") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260217192815.1784689-1-lizhi.hou@amd.com	2026-02-23 09:24:21 -08:00
Lizhi Hou	1110a94967	accel/amdxdna: Fix out-of-bounds memset in command slot handling The remaining space in a command slot may be smaller than the size of the command header. Clearing the command header with memset() before verifying the available slot space can result in an out-of-bounds write and memory corruption. Fix this by moving the memset() call after the size validation. Fixes: `3d32eb7a5e` ("accel/amdxdna: Fix cu_idx being cleared by memset() during command setup") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260217185415.1781908-1-lizhi.hou@amd.com	2026-02-23 09:24:20 -08:00
Lizhi Hou	07efce5a66	accel/amdxdna: Fix command hang on suspended hardware context When a hardware context is suspended, the job scheduler is stopped. If a command is submitted while the context is suspended, the job is queued in the scheduler but aie2_sched_job_run() is never invoked to restart the hardware context. As a result, the command hangs. Fix this by modifying the hardware context suspend routine to keep the job scheduler running so that queued jobs can trigger context restart properly. Fixes: `aac243092b` ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com	2026-02-23 09:24:19 -08:00
Lizhi Hou	fdb65acfe6	accel/amdxdna: Fix suspend failure after enabling turbo mode Enabling turbo mode disables hardware clock gating. Suspend requires hardware clock gating to be re-enabled, otherwise suspend will fail. Fix this by calling aie2_runtime_cfg() from aie2_hw_stop() to re-enable clock gating during suspend. Also ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings during resume. Fixes: `f4d7b8a6bc` ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211204716.722788-1-lizhi.hou@amd.com	2026-02-23 09:24:19 -08:00
Lizhi Hou	1aa82181a3	accel/amdxdna: Fix dead lock for suspend and resume When an application issues a query IOCTL while auto suspend is running, a deadlock can occur. The query path holds dev_lock and then calls pm_runtime_resume_and_get(), which waits for the ongoing suspend to complete. Meanwhile, the suspend callback attempts to acquire dev_lock and blocks, resulting in a deadlock. Fix this by releasing dev_lock before calling pm_runtime_resume_and_get() and reacquiring it after the call completes. Also acquire dev_lock in the resume callback to keep the locking consistent. Fixes: `063db45183` ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211204644.722758-1-lizhi.hou@amd.com	2026-02-23 09:24:17 -08:00
Mario Limonciello	57aa3917a3	accel/amdxdna: Reduce log noise during process termination During process termination, several error messages are logged that are not actual errors but expected conditions when a process is killed or interrupted. This creates unnecessary noise in the kernel log. The specific scenarios are: 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by a signal during process cleanup. This is expected when a process is being terminated and should not be logged as an error. 2. Context destruction returns -ENODEV when the firmware or device has already stopped, which commonly occurs during cleanup if the device was already torn down. This is also an expected condition during orderly shutdown. Downgrade these expected error conditions from error level to debug level to reduce log noise while still keeping genuine errors visible. Fixes: `97f2757383` ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260210164521.1094274-3-mario.limonciello@amd.com	2026-02-23 09:24:16 -08:00
Lizhi Hou	8363c02863	accel/amdxdna: Fix crash when destroying a suspended hardware context If userspace issues an ioctl to destroy a hardware context that has already been automatically suspended, the driver may crash because the mailbox channel pointer is NULL for the suspended context. Fix this by checking the mailbox channel pointer in aie2_destroy_context() before accessing it. Fixes: `97f2757383` ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060306.4050531-1-lizhi.hou@amd.com	2026-02-23 09:24:15 -08:00
Lizhi Hou	c68a6af400	accel/amdxdna: Switch to always use chained command Preempt commands are only supported when submitted as chained commands. To ensure preempt support works consistently, always submit commands in chained command format. Set force_cmdlist to true so that single commands are filled using the chained command layout, enabling correct handling of preempt commands. Fixes: `3a0ff7b98a` ("accel/amdxdna: Support preemption requests") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060251.4050512-1-lizhi.hou@amd.com	2026-02-23 09:24:15 -08:00
Lizhi Hou	08fe1b5166	accel/amdxdna: Remove buffer size check when creating command BO Large command buffers may be used, and they do not always need to be mapped or accessed by the driver. Performing a size check at command BO creation time unnecessarily rejects valid use cases. Remove the buffer size check from command BO creation, and defer vmap and size validation to the paths where the driver actually needs to map and access the command buffer. Fixes: `ac49797c18` ("accel/amdxdna: Add GEM buffer object management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060237.4050492-1-lizhi.hou@amd.com	2026-02-23 09:23:41 -08:00
Maxime Ripard	c17ee635fd	Merge drm/drm-fixes into drm-misc-fixes 7.0-rc1 was just released, let's merge it to kick the new release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2026-02-23 10:09:45 +01:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Dan Carpenter	7be41fb00e	accel: ethosu: Fix shift overflow in cmd_to_addr() The "((cmd[0] & 0xff0000) << 16)" shift is zero. This was intended to be (((u64)cmd[0] & 0xff0000) << 16). Move the cast to the correct location. Fixes: `5a5e9c0228` ("accel: Add Arm Ethos-U NPU driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aQGmY64tWcwOGFP4@stanley.mountain Signed-off-by: Rob Herring (Arm) <robh@kernel.org>	2026-02-18 16:28:08 -06:00
Linus Torvalds	505d195b0f	Merge tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO driver updates from Greg KH: "Here is the big set of char/misc/iio and other smaller driver subsystem changes for 7.0-rc1. Lots of little things in here, including: - Loads of iio driver changes and updates and additions - gpib driver updates - interconnect driver updates - i3c driver updates - hwtracing (coresight and intel) driver updates - deletion of the obsolete mwave driver - binder driver updates (rust and c versions) - mhi driver updates (causing a merge conflict, see below) - mei driver updates - fsi driver updates - eeprom driver updates - lots of other small char and misc driver updates and cleanups All of these have been in linux-next for a while, with no reported issues" * tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (297 commits) mux: mmio: fix regmap leak on probe failure rust_binder: return p from rust_binder_transaction_target_node() drivers: android: binder: Update ARef imports from sync::aref rust_binder: fix needless borrow in context.rs iio: magn: mmc5633: Fix Kconfig for combination of I3C as module and driver builtin iio: sca3000: Fix a resource leak in sca3000_probe() iio: proximity: rfd77402: Add interrupt handling support iio: proximity: rfd77402: Document device private data structure iio: proximity: rfd77402: Use devm-managed mutex initialization iio: proximity: rfd77402: Use kernel helper for result polling iio: proximity: rfd77402: Align polling timeout with datasheet iio: cros_ec: Allow enabling/disabling calibration mode iio: frequency: ad9523: correct kernel-doc bad line warning iio: buffer: buffer_impl.h: fix kernel-doc warnings iio: gyro: itg3200: Fix unchecked return value in read_raw MAINTAINERS: add entry for ADE9000 driver iio: accel: sca3000: remove unused last_timestamp field iio: accel: adxl372: remove unused int2_bitmask field iio: adc: ad7766: Use iio_trigger_generic_data_rdy_poll() iio: magnetometer: Remove IRQF_ONESHOT ...	2026-02-17 09:11:04 -08:00
Lizhi Hou	69674c1c70	accel/amdxdna: Move RPM resume into job run function Currently, amdxdna_pm_resume_get() is called during job creation, and amdxdna_pm_suspend_put() is called when the hardware notifies job completion. If a job is canceled before it is run, no hardware completion notification is generated, resulting in an unbalanced runtime PM resume/suspend pair. Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring runtime PM is only resumed for jobs that are actually executed. Fixes: `063db45183` ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com	2026-02-04 13:08:48 -08:00
Lizhi Hou	d19d963d2a	accel/amdxdna: Fix incorrect DPM level after suspend/resume The suspend routine sets the DPM level to 0, which unintentionally overwrites the previously saved DPM level. As a result, the device always resumes with DPM level 0 instead of restoring the original value. Fix this by ensuring the suspend path does not overwrite the saved DPM level, allowing the correct DPM level to be restored during resume. Fixes: `f4d7b8a6bc` ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171048.3165580-1-lizhi.hou@amd.com	2026-02-04 13:08:35 -08:00
Lizhi Hou	750817a7c4	accel/amdxdna: Fix incorrect error code returned for failed chain command The driver currently returns an incorrect error code when a chain command fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for failed chain commands. Fixes: `aac243092b` ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com	2026-02-03 11:35:53 -08:00
Lizhi Hou	b853007fdc	accel/amdxdna: Remove hardware context status One newly supported command does not require hardware context configuration to be performed upfront. As a result, checking hardware context status causes this command to fail incorrectly. Remove hardware context status handling entirely. For other commands, if userspace submits a request without configuring the hardware context first, the firmware will report an error or time out as appropriate. Fixes: `aac243092b` ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com	2026-02-03 09:20:24 -08:00
Zishun Yi	84dd57fb03	accel/amdxdna: Fix memory leak in amdxdna_ubuf_map The amdxdna_ubuf_map() function allocates memory for sg and internal sg table structures, but it fails to free them if subsequent operations (sg_alloc_table_from_pages or dma_map_sgtable) fail. Fixes: `bd72d4acda` ("accel/amdxdna: Support user space allocated buffer") Signed-off-by: Zishun Yi <zishun.yi.dev@gmail.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Min Ma <mamin506@gmail.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260129171022.68578-1-zishun.yi.dev@gmail.com	2026-01-30 11:52:59 -08:00
Lizhi Hou	f1370241fe	accel/amdxdna: Stop job scheduling across aie2_release_resource() Running jobs on a hardware context while it is in the process of releasing resources can lead to use-after-free and crashes. Fix this by stopping job scheduling before calling aie2_release_resource() and restarting it after the release completes. Additionally, aie2_sched_job_run() now checks whether the hardware context is still active. Fixes: `4fd6ca90fc` ("accel/amdxdna: Refactor hardware context destroy routine") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260130003255.2083255-1-lizhi.hou@amd.com	2026-01-30 11:52:53 -08:00
Lizhi Hou	a9162439ad	accel/amdxdna: Hold mm structure across iommu_sva_unbind_device() Some tests trigger a crash in iommu_sva_unbind_device() due to accessing iommu_mm after the associated mm structure has been freed. Fix this by taking an explicit reference to the mm structure after successfully binding the device, and releasing it only after the device is unbound. This ensures the mm remains valid for the entire SVA bind/unbind lifetime. Fixes: `be462c97b7` ("accel/amdxdna: Add hardware context") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260128002356.1858122-1-lizhi.hou@amd.com	2026-01-30 11:52:45 -08:00
Greg Kroah-Hartman	740aeaf40f	Merge tag 'mhi-for-v6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host -------- - Add support for loading dual ELF image format firmware to Qcom Trust Management Engine Lit (TME-L) supported devices like QCC2072, which require separate ELF header for SBL and WLAN firmware segments in a single firmware. - Remove the MHI auto_queue feature support. This feature was added to offload the queuing of buffers from the client drivers to the MHI stack, but it caused a lot of race over the time. So remove this feature from the QRTR client driver and also from the MHI stack/controller drivers. - Move the .probe() and .remove() callbacks from driver level to bus level. MHI Endpoint ------------ - Move the .probe() and .remove() callbacks from driver level to bus level. * tag 'mhi-for-v6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: ep: Use bus callbacks for .probe() and .remove() bus: mhi: host: Use bus callbacks for .probe() and .remove() bus: mhi: host: Drop the auto_queue support net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels mhi: host: Add support for loading dual ELF image format	2026-01-20 16:40:49 +01:00
Dave Airlie	37b812b7fd	Merge tag 'drm-misc-next-2026-01-15' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.20: Core Changes: - atomic: Introduce Gamma/Degamma LUT size check - gem: Fix a leak in drm_gem_get_unmapped_area - gpuvm: API sanitation for Rust bindings - panic: Few corner-cases fixes Driver Changes: - Replace system workqueue with percpu equivalent - amdxdna: Update message buffer allocation requirements, Update firmware version check - imagination: Add AM62P support - ivpu: Implement warm boot flow - rockchip: Get rid of atomic_check fixups, Add Rockchip RK3506 Support - rocket: Cleanups - bridge: - dw-hdmi-qp: Add support for HPD-less setups - panel: - mantix: Various power management related improvements - new panels: Innolux G150XGE-L05, - dma-buf: - cma: Call clear_page instead of memset Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260115-lilac-dragon-of-opposition-ac0a30@houat	2026-01-16 11:04:03 +10:00
Lizhi Hou	b36178488d	accel/amdxdna: Fix notifier_wq flushing warning Create notifier_wq with WQ_MEM_RECLAIM flag to fix the possible warning. workqueue: WQ_MEM_RECLAIM amdxdna_js:drm_sched_free_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM notifier_wq:0x0 Fixes: `e486147c91` ("accel/amdxdna: Add BO import and export") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260113173624.256053-1-lizhi.hou@amd.com	2026-01-14 09:07:33 -08:00
Quentin Schulz	57d8ae1569	accel/rocket: factor out code with find_core_for_dev in rocket_remove There already is a function to return the offset of the core for a given struct device, so let's reuse that function instead of reimplementing the same logic. There's one change in behavior when a struct device is passed which doesn't match any core's. Before, we would continue through rocket_remove() but now we exit early, to match what other callers of find_core_for_dev() (rocket_device_runtime_resume/suspend()) are doing. This however should never happen. Aside from that, no intended change in behavior. Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-reuse-find-core-v1-1-be86a1d2734c@cherry.de	2026-01-10 17:49:14 +01:00
Quentin Schulz	34f4495a7f	accel/rocket: fix unwinding in error path in rocket_probe When rocket_core_init() fails (as could be the case with EPROBE_DEFER), we need to properly unwind by decrementing the counter we just incremented and if this is the first core we failed to probe, remove the rocket DRM device with rocket_device_fini() as well. This matches the logic in rocket_remove(). Failing to properly unwind results in out-of-bounds accesses. Fixes: `0810d5ad88` ("accel/rocket: Add job submission IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-error-path-v1-2-eec3bf29dc3b@cherry.de	2026-01-10 17:49:14 +01:00
Quentin Schulz	f509a081f6	accel/rocket: fix unwinding in error path in rocket_core_init When rocket_job_init() is called, iommu_group_get() has already been called, therefore we should call iommu_group_put() and make the iommu_group pointer NULL. This aligns with what's done in rocket_core_fini(). If pm_runtime_resume_and_get() somehow fails, not only should rocket_job_fini() be called but we should also unwind everything done before that, that is, disable PM, put the iommu_group, NULLify it and then call rocket_job_fini(). This is exactly what's done in rocket_core_fini() so let's call that function instead of duplicating the code. Fixes: `0810d5ad88` ("accel/rocket: Add job submission IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-error-path-v1-1-eec3bf29dc3b@cherry.de	2026-01-10 17:49:14 +01:00
Lizhi Hou	f1eac46fe5	accel/amdxdna: Update firmware version check for latest firmware The latest firmware increases the major version number. Update aie2_check_protocol() to accept and support the new firmware version. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251219014356.2234241-2-lizhi.hou@amd.com	2026-01-08 09:48:43 -08:00
Lizhi Hou	1222e06934	accel/amdxdna: Update message DMA buffer allocation The latest firmware requires the message DMA buffer to - have a minimum size of 8K - use a power-of-two size - be aligned to the buffer size - not cross 64M boundary Update the buffer allocation logic to meet these requirements and support the latest firmware. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251219014356.2234241-1-lizhi.hou@amd.com	2026-01-08 09:48:34 -08:00
Karol Wachowski	44e4c88951	accel/ivpu: Implement warm boot flow for NPU6 and unify boot handling Starting from NPU6, the driver can pass boot parameters address through the AON retention register and toggle between cold/warm boot types using the boot_type parameter, while setting the cold boot entry point in both cases. Refactor the existing cold/warm boot handling to be consistent with the new NPU6 boot flow requirements and still maintain compatibility with older boot flows. This will allow firmware to remove support for legacy warm boot starting from NPU6. Fixes: `550f4dd2ce` ("accel/ivpu: Add support for Nova Lake's NPU") Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Link: https://patch.msgid.link/20251230142116.540026-1-maciej.falkowski@linux.intel.com	2026-01-08 11:49:37 +01:00
Manivannan Sadhasivam	51731792a2	net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels MHI stack offers the 'auto_queue' feature, which allows the MHI stack to auto queue the buffers for the RX path (DL channel). Though this feature simplifies the client driver design, it introduces race between the client drivers and the MHI stack. For instance, with auto_queue, the 'dl_callback' for the DL channel may get called before the client driver is fully probed. This means, by the time the dl_callback gets called, the client driver's structures might not be initialized, leading to NULL ptr dereference. Currently, the drivers have to workaround this issue by initializing the internal structures before calling mhi_prepare_for_transfer_autoqueue(). But even so, there is a chance that the client driver's internal code path may call the MHI queue APIs before mhi_prepare_for_transfer_autoqueue() is called, leading to similar NULL ptr dereference. This issue has been reported on the Qcom X1E80100 CRD machines affecting boot. So to properly fix all these races, drop the MHI 'auto_queue' feature altogether and let the client driver (QRTR) manage the RX buffers manually. In the QRTR driver, queue the RX buffers based on the ring length during probe and recycle the buffers in 'dl_callback' once they are consumed. This also warrants removing the setting of 'auto_queue' flag from controller drivers. Currently, this 'auto_queue' feature is only enabled for IPCR DL channel. So only the QRTR client driver requires the modification. Fixes: `227fee5fc9` ("bus: mhi: core: Add an API for auto queueing buffers for DL channel") Fixes: `68a838b84e` ("net: qrtr: start MHI channel after endpoit creation") Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/linux-arm-msm/ZyTtVdkCCES0lkl4@hovoldconsulting.com Suggested-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Acked-by: Jeff Johnson <jjohnson@kernel.org> # drivers/net/wireless/ath/... Acked-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251218-qrtr-fix-v2-1-c7499bfcfbe0@oss.qualcomm.com	2025-12-31 16:24:04 +05:30
Dave Airlie	7bc0f871f9	Merge tag 'drm-misc-next-2025-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.20: Core Changes: - dma-buf: Add tracepoints - sched: Introduce new helpers Driver Changes: - amdxdna: Enable hardware context priority, Remove (obsolete and never public) NPU2 Support, Race condition fix - rockchip: Add RK3368 HDMI Support - rz-du: Add RZ/V2H(P) MIPI-DSI Support - panels: - st7571: Introduce SPI support - New panels: Sitronix ST7920, Samsung LTL106HL02, LG LH546WF1-ED01, HannStar HSD156JUW2 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20251219-arcane-quaint-skunk-e383b0@houat	2025-12-26 19:00:41 +10:00
Dave Airlie	6c8e404891	Merge tag 'drm-misc-next-2025-12-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.19: UAPI Changes: - panfrost: Add PANFROST_BO_SYNC ioctl - panthor: Add PANTHOR_BO_SYNC ioctl Core Changes: - atomic: Add drm_device pointer to drm_private_obj - bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and drm_bridge_exit - dma-buf: Improve sg_table debugging - dma-fence: Add new helpers, and use them when needed - dp_mst: Avoid out-of-bounds access with VCPI==0 - gem: Reduce page table overhead with transparent huge pages - panic: Report invalid panic modes - sched: Add TODO entries - ttm: Various cleanups - vblank: Various refactoring and cleanups - Kconfig cleanups - Removed support for kdb Driver Changes: - amdxdna: Fix race conditions at suspend, Improve handling of zero tail pointers, Fix cu_idx being overwritten during command setup - ast: Support imported cursor buffers - - panthor: Enable timestamp propagation, Multiple improvements and fixes to improve the overall robustness, notably of the scheduler. - panels: - panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H Signed-off-by: Dave Airlie <airlied@redhat.com> [airlied: fix mm conflict] From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20251212-spectacular-agama-of-abracadabra-aaef32@penduick	2025-12-26 18:15:33 +10:00
Lizhi Hou	332070795b	accel/amdxdna: Enable hardware context priority Newer firmware supports hardware context priority. Set the priority based on application input. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217171719.2139025-1-lizhi.hou@amd.com	2025-12-18 10:36:44 -08:00
Lizhi Hou	7818618a09	accel/amdxdna: Enable temporal sharing only mode Newer firmware versions prefer temporal sharing only mode. In this mode, the driver no longer needs to manage AIE array column allocation. Instead, a new field, num_unused_col, is added to the hardware context creation request to specify how many columns will not be used by this hardware context. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217191150.2145937-1-lizhi.hou@amd.com	2025-12-18 10:36:33 -08:00
Lizhi Hou	3ef9384103	accel/amdxdna: Remove NPU2 support NPU2 hardware was never publicly released and is now obsolete. Remove all remaining NPU2 support from the driver. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217190818.2145781-1-lizhi.hou@amd.com	2025-12-18 10:36:22 -08:00
Lizhi Hou	e8c28e16c3	accel/amdxdna: Remove amdxdna_flush() amdxdna_flush() was introduced to ensure that the device does not access a process address space after it has been freed. However, this is no longer necessary because the driver now increments the mm reference count when a command is submitted and decrements it only after the command has completed. This guarantees that the process address space remains valid for the entire duration of command execution. Remove amdxdna_flush to simplify the teardown path. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251216031311.2033399-1-lizhi.hou@amd.com	2025-12-17 08:28:53 -08:00
Karol Wachowski	2359fe9313	accel/ivpu: Validate scatter-gather size against buffer size Validate scatter-gather table size matches buffer object size before mapping. Break mapping early if the table exceeds buffer size to prevent overwriting existing mappings. Also validate the table is not smaller than buffer size to avoid unmapped regions that trigger MMU translation faults. Log error and fail mapping operation on size mismatch to prevent data corruption from mismatched host memory locations and NPU addresses. Unmap any partially mapped buffer on failure. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20251215070933.520377-1-karol.wachowski@linux.intel.com	2025-12-16 08:10:19 +01:00
Mario Limonciello (AMD)	7bbf6d15e9	accel/amdxdna: Block running under a hypervisor SVA support is required, which isn't configured by hypervisor solutions. Closes: https://github.com/QubesOS/qubes-issues/issues/10275 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4656 Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251213054513.87925-1-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-12-15 13:00:03 -06:00
Lizhi Hou	97f2757383	accel/amdxdna: Fix potential NULL pointer dereference in context cleanup aie_destroy_context() is invoked during error handling in aie2_create_context(). However, aie_destroy_context() assumes that the context's mailbox channel pointer is non-NULL. If mailbox channel creation fails, the pointer remains NULL and calling aie_destroy_context() can lead to a NULL pointer dereference. In aie2_create_context(), replace aie_destroy_context() with a function which request firmware to remove the context created previously. Fixes: `be462c97b7` ("accel/amdxdna: Add hardware context") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251212183244.1826318-1-lizhi.hou@amd.com	2025-12-15 09:52:59 -08:00
Lizhi Hou	343f5683cf	accel/amdxdna: Fix race where send ring appears full due to delayed head update The firmware sends a response and interrupts the driver before advancing the mailbox send ring head pointer. As a result, the driver may observe the response and attempt to send a new request before the firmware has updated the head pointer. In this window, the send ring still appears full, causing the driver to incorrectly fail the send operation. This race can be triggered more easily in a multithreaded environment, leading to unexpected and spurious "send ring full" failures. To address this, poll the send ring head pointer for up to 100us before returning a full-ring condition. This allows the firmware time to update the head pointer. Fixes: `b87f920b93` ("accel/amdxdna: Support hardware mailbox") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251211045125.1724604-1-lizhi.hou@amd.com	2025-12-12 09:49:10 -08:00
Lizhi Hou	3d32eb7a5e	accel/amdxdna: Fix cu_idx being cleared by memset() during command setup For one command type, cu_idx is assigned before calling memset() on the command structure. This results in cu_idx being overwritten, causing the firmware to receive an incomplete or invalid command and leading to unexpected command failures. Fix this by moving the memset() call before initializing cu_idx so that all fields are populated in the correct order. Fixes: `71829d7f2f` ("accel/amdxdna: Use MSG_OP_CHAIN_EXEC_NPU when supported") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251209211639.1636888-1-lizhi.hou@amd.com	2025-12-10 09:07:18 -08:00
Lizhi Hou	00ffe45ece	accel/amdxdna: Fix race condition when checking rpm_on When autosuspend is triggered, driver rpm_on flag is set to indicate that a suspend/resume is already in progress. However, when a userspace application submits a command during this narrow window, amdxdna_pm_resume_get() may incorrectly skip the resume operation because the rpm_on flag is still set. This results in commands being submitted while the device has not actually resumed, causing unexpected behavior. The set_dpm() is called by suspend/resume, it relied on rpm_on flag to avoid calling into rpm suspend/resume recursivly. So to fix this, remove the use of the rpm_on flag entirely. Instead, introduce aie2_pm_set_dpm() which explicitly resumes the device before invoking set_dpm(). With this change, set_dpm() is called directly inside the suspend or resume execution path. Otherwise, aie2_pm_set_dpm() is called. Fixes: `063db45183` ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251208165356.1549237-1-lizhi.hou@amd.com	2025-12-09 11:51:03 -08:00
Lizhi Hou	cd77d5a4aa	accel/amdxdna: Fix tail-pointer polling in mailbox_get_msg() In mailbox_get_msg(), mailbox_reg_read_non_zero() is called to poll for a non-zero tail pointer. This assumed that a zero value indicates an error. However, certain corner cases legitimately produce a zero tail pointer. To handle these cases, remove mailbox_reg_read_non_zero(). The zero tail pointer will be treated as a valid rewind event. Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251204181603.793824-1-lizhi.hou@amd.com	2025-12-08 08:29:34 -08:00
Lizhi Hou	3d3ac202c7	accel/amdxdna: Poll MPNPU_PWAITMODE after requesting firmware suspend After issuing a firmware suspend request, the driver must ensure that the suspend operation has completed before proceeding. Add polling of the MPNPU_PWAITMODE register to confirm that the firmware has fully entered the suspended state. This prevents race conditions where subsequent operations assume the firmware is idle before it has actually completed its suspend sequence. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251202165427.507414-1-lizhi.hou@amd.com	2025-12-02 16:31:14 -08:00

1 2 3 4 5 ...

1024 Commits