linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Shawn Lin	76cce803bf	arm64: dts: rockchip: Fix rk3588 PCIe range mappings [ Upstream commit `46c56b7371` ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: `0acf4fa7f1` ("arm64: dts: rockchip: add PCIe3 support for rk3588") Fixes: `8d81b77f4c` ("arm64: dts: rockchip: add rk3588 PCIe2 support") Cc: stable@vger.kernel.org Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-2-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Shawn Lin	497b04231b	arm64: dts: rockchip: Fix rk356x PCIe range mappings [ Upstream commit `f63ea193a4` ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: `568a67e742` ("arm64: dts: rockchip: Fix rk356x PCIe register and range mappings") Fixes: `66b51ea7d7` ("arm64: dts: rockchip: Add rk3568 PCIe2x1 controller") Cc: stable@vger.kernel.org Cc: Andrew Powers-Holmes <aholmes@omnom.net> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-1-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Jinhui Guo	48b3f08e68	iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode [ Upstream commit `42662d1983` ] PCIe endpoints with ATS enabled and passed through to userspace (e.g., QEMU, DPDK) can hard-lock the host when their link drops, either by surprise removal or by a link fault. Commit `4fc82cd907` ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") adds pci_dev_is_disconnected() to devtlb_invalidation_with_pasid() so ATS invalidation is skipped only when the device is being safely removed, but it applies only when Intel IOMMU scalable mode is enabled. With scalable mode disabled or unsupported, a system hard-lock occurs when a PCIe endpoint's link drops because the Intel IOMMU waits indefinitely for an ATS invalidation that cannot complete. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 domain_context_clear_one_cb pci_for_each_dma_alias device_block_translation blocking_domain_attach_dev iommu_deinit_device __iommu_group_remove_device iommu_release_device iommu_bus_notifier blocking_notifier_call_chain bus_notify device_del pci_remove_bus_device pci_stop_and_remove_bus_device pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist Commit `81e921fd32` ("iommu/vt-d: Fix NULL domain on device release") adds intel_pasid_teardown_sm_context() to intel_iommu_release_device(), which calls qi_flush_dev_iotlb() and can also hard-lock the system when a PCIe endpoint's link drops. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 intel_context_flush_no_pasid device_pasid_table_teardown pci_pasid_table_teardown pci_for_each_dma_alias intel_pasid_teardown_sm_context intel_iommu_release_device iommu_deinit_device __iommu_group_remove_device iommu_release_device iommu_bus_notifier blocking_notifier_call_chain bus_notify device_del pci_remove_bus_device pci_stop_and_remove_bus_device pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist Sometimes the endpoint loses connection without a link-down event (e.g., due to a link fault); killing the process (virsh destroy) then hard-locks the host. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 domain_context_clear_one_cb pci_for_each_dma_alias device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput pci_dev_is_disconnected() only covers safe-removal paths; pci_device_is_present() tests accessibility by reading vendor/device IDs and internally calls pci_dev_is_disconnected(). On a ConnectX-5 (8 GT/s, x2) this costs ~70 µs. Since __context_flush_dev_iotlb() is only called on {attach,release}_dev paths (not hot), add pci_device_is_present() there to skip inaccessible devices and avoid the hard-lock. Fixes: `37764b952e` ("iommu/vt-d: Global devTLB flush when present context entry changed") Fixes: `81e921fd32` ("iommu/vt-d: Fix NULL domain on device release") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Link: https://lore.kernel.org/r/20251211035946.2071-2-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Minseong Kim	9b5df5eedc	Input: synaptics_i2c - guard polling restart in resume [ Upstream commit `870c2e7cd8` ] synaptics_i2c_resume() restarts delayed work unconditionally, even when the input device is not opened. Guard the polling restart by taking the input device mutex and checking input_device_enabled() before re-queuing the delayed work. Fixes: `eef3e4cab7` ("Input: add driver for Synaptics I2C touchpad") Signed-off-by: Minseong Kim <ii4gsp@gmail.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260121063738.799967-1-ii4gsp@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Marco Crivellari	432612bf47	Input: synaptics_i2c - replace use of system_wq with system_dfl_wq [ Upstream commit `b3ee88e277` ] Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. This patch continues the effort to refactor worqueue APIs, which has begun with the change introducing new workqueues and a new alloc_workqueue flag: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") This specific workload do not benefit from a per-cpu workqueue, so use the default unbound workqueue (system_dfl_wq) instead. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251106141955.218911-4-marco.crivellari@suse.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Stable-dep-of: `870c2e7cd8` ("Input: synaptics_i2c - guard polling restart in resume") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Marco Crivellari	c8272cfe36	workqueue: Add system_percpu_wq and system_dfl_wq [ Upstream commit `128ea9f6cc` ] Currently, if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_wq is a per-CPU worqueue, yet nothing in its name tells about that CPU affinity constraint, which is very often not required by users. Make it clear by adding a system_percpu_wq. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Stable-dep-of: `870c2e7cd8` ("Input: synaptics_i2c - guard polling restart in resume") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:26 +01:00
Jan Kara	9eea2f57d1	ext4: always allocate blocks only from groups inode can use [ Upstream commit `4865c768b5` ] For filesystems with more than 2^32 blocks inodes using indirect block based format cannot use blocks beyond the 32-bit limit. ext4_mb_scan_groups_linear() takes care to not select these unsupported groups for such inodes however other functions selecting groups for allocation don't. So far this is harmless because the other selection functions are used only with mb_optimize_scan and this is currently disabled for inodes with indirect blocks however in the following patch we want to enable mb_optimize_scan regardless of inode format. Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: stable@kernel.org Link: https://patch.msgid.link/20260114182836.14120-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	ec23aa36dd	ext4: implement linear-like traversal across order xarrays [ Upstream commit `a3ce570a5d` ] Although we now perform ordered traversal within an xarray, this is currently limited to a single xarray. However, we have multiple such xarrays, which prevents us from guaranteeing a linear-like traversal where all groups on the right are visited before all groups on the left. For example, suppose we have 128 block groups, with a target group of 64, a target length corresponding to an order of 1, and available free groups of 16 (order 1) and group 65 (order 8): For linear traversal, when no suitable free block is found in group 64, it will search in the next block group until group 127, then start searching from 0 up to block group 63. It ensures continuous forward traversal, which is consistent with the unidirectional rotation behavior of HDD platters. Additionally, the block group lock contention during freeing block is unavoidable. The goal increasing from 0 to 64 indicates that previously scanned groups (which had no suitable free space and are likely to free blocks later) and skipped groups (which are currently in use) have newly freed some used blocks. If we allocate blocks in these groups, the probability of competing with other processes increases. For non-linear traversal, we first traverse all groups in order_1. If only group 16 has free space in this list, we first traverse [63, 128), then traverse [0, 64) to find the available group 16, and then allocate blocks in group 16. Therefore, it cannot guarantee continuous traversal in one direction, thus increasing the probability of contention. So refactor ext4_mb_scan_groups_xarray() to ext4_mb_scan_groups_xa_range() to only traverse a fixed range of groups, and move the logic for handling wrap around to the caller. The caller first iterates through all xarrays in the range [start, ngroups) and then through the range [0, start). This approach simulates a linear scan, which reduces contention between freeing blocks and allocating blocks. Assume we have the following groups, where "\|" denotes the xarray traversal start position: order_1_groups: AB \| CD order_2_groups: EF \| GH Traversal order: Before: C > D > A > B > G > H > E > F After: C > D > G > H > A > B > E > F Performance test data follows: \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 19555 \| 20049 (+2.5%) \| 315636 \| 316724 (-0.3%) \| \|mb_optimize_scan=1 \| 15496 \| 19342 (+24.8%) \| 323569 \| 328324 (+1.4%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53192 \| 52125 (-2.0%) \| 212678 \| 215136 (+1.1%) \| \|mb_optimize_scan=1 \| 37636 \| 50331 (+33.7%) \| 214189 \| 209431 (-2.2%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-18-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	5abec3fee5	ext4: refactor choose group to scan group [ Upstream commit `6347558764` ] This commit converts the `choose group` logic to `scan group` using previously prepared helper functions. This allows us to leverage xarrays for ordered non-linear traversal, thereby mitigating the "bouncing" issue inherent in the `choose group` mechanism. This also decouples linear and non-linear traversals, leading to cleaner and more readable code. Key changes: * ext4_mb_choose_next_group() is refactored to ext4_mb_scan_groups(). * Replaced ext4_mb_good_group() with ext4_mb_scan_group() in non-linear traversals, and related functions now return error codes instead of group info. * Added ext4_mb_scan_groups_linear() for performing linear scans starting from a specific group for a set number of times. * Linear scans now execute up to sbi->s_mb_max_linear_groups times, so ac_groups_linear_remaining is removed as it's no longer used. * ac->ac_criteria is now used directly instead of passing cr around. Also, ac->ac_criteria is incremented directly after groups scan fails for the corresponding criteria. * Since we're now directly scanning groups instead of finding a good group then scanning, the following variables and flags are no longer needed, s_bal_cX_groups_considered is sufficient. s_bal_p2_aligned_bad_suggestions s_bal_goal_fast_bad_suggestions s_bal_best_avail_bad_suggestions EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-17-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	d99d714f71	ext4: convert free groups order lists to xarrays [ Upstream commit `f7eaacbb4e` ] While traversing the list, holding a spin_lock prevents load_buddy, making direct use of ext4_try_lock_group impossible. This can lead to a bouncing scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group() fails, forcing the list traversal to repeatedly restart from grp_A. In contrast, linear traversal directly uses ext4_try_lock_group(), avoiding this bouncing. Therefore, we need a lockless, ordered traversal to achieve linear-like efficiency. Therefore, this commit converts both average fragment size lists and largest free order lists into ordered xarrays. In an xarray, the index represents the block group number and the value holds the block group information; a non-empty value indicates the block group's presence. While insertion and deletion complexity remain O(1), lookup complexity changes from O(1) to O(nlogn), which may slightly reduce single-threaded performance. Additionally, xarray insertions might fail, potentially due to memory allocation issues. However, since we have linear traversal as a fallback, this isn't a major problem. Therefore, we've only added a warning message for insertion failures here. A helper function ext4_mb_find_good_group_xarray() is added to find good groups in the specified xarray starting at the specified position start, and when it reaches ngroups-1, it wraps around to 0 and then to start-1. This ensures an ordered traversal within the xarray. Performance test results are as follows: Single-process operations on an empty disk show negligible impact, while multi-process workloads demonstrate a noticeable performance gain. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 20097 \| 19555 (-2.6%) \| 316141 \| 315636 (-0.2%) \| \|mb_optimize_scan=1 \| 13318 \| 15496 (+16.3%) \| 325273 \| 323569 (-0.5%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53603 \| 53192 (-0.7%) \| 214243 \| 212678 (-0.7%) \| \|mb_optimize_scan=1 \| 20887 \| 37636 (+80.1%) \| 213632 \| 214189 (+0.2%) \| [ Applied spelling fixes per discussion on the ext4-list see thread referened in the Link tag. --tytso] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-16-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	007967dab6	ext4: factor out ext4_mb_scan_group() [ Upstream commit `9c08e42db9` ] Extract ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-15-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	2475d11bc4	ext4: factor out ext4_mb_might_prefetch() [ Upstream commit `5abd85f667` ] Extract ext4_mb_might_prefetch() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-14-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	ca4c90dd2a	ext4: factor out __ext4_mb_scan_group() [ Upstream commit `45704f92e5` ] Extract __ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-13-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Baokun Li	f0bb92429c	ext4: add ext4_try_lock_group() to skip busy groups [ Upstream commit `e9eec6f339` ] When ext4 allocates blocks, we used to just go through the block groups one by one to find a good one. But when there are tons of block groups (like hundreds of thousands or even millions) and not many have free space (meaning they're mostly full), it takes a really long time to check them all, and performance gets bad. So, we added the "mb_optimize_scan" mount option (which is on by default now). It keeps track of some group lists, so when we need a free block, we can just grab a likely group from the right list. This saves time and makes block allocation much faster. But when multiple processes or containers are doing similar things, like constantly allocating 8k blocks, they all try to use the same block group in the same list. Even just two processes doing this can cut the IOPS in half. For example, one container might do 300,000 IOPS, but if you run two at the same time, the total is only 150,000. Since we can already look at block groups in a non-linear way, the first and last groups in the same list are basically the same for finding a block right now. Therefore, add an ext4_try_lock_group() helper function to skip the current group when it is locked by another process, thereby avoiding contention with other processes. This helps ext4 make better use of having multiple block groups. Also, to make sure we don't skip all the groups that have free space when allocating blocks, we won't try to skip busy groups anymore when ac_criteria is CR_ANY_FREE. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| \|Memory: 512GB \|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 2667 \| 4821 (+80.7%) \| \|mb_optimize_scan=1 \| 2643 \| 4784 (+81.0%) \| \|CPU: AMD 9654 * 2 \| P96 \| \|Memory: 1536GB \|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 3450 \| 15371 (+345%) \| \|mb_optimize_scan=1 \| 3209 \| 6101 (+90.0%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `4865c768b5` ("ext4: always allocate blocks only from groups inode can use") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:25 +01:00
Joonwon Kang	01d9a8c261	mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate() [ Upstream commit `fcd7f96c78` ] Although it is guided that `#mbox-cells` must be at least 1, there are many instances of `#mbox-cells = <0>;` in the device tree. If that is the case and the corresponding mailbox controller does not provide `fw_xlate` and of_xlate` function pointers, `fw_mbox_index_xlate()` will be used by default and out-of-bounds accesses could occur due to lack of bounds check in that function. Cc: stable@vger.kernel.org Signed-off-by: Joonwon Kang <joonwonkang@google.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Anup Patel	ac75c29c9c	mailbox: Allow controller specific mapping using fwnode [ Upstream commit `ba879dfc05` ] Introduce optional fw_node() callback which allows a mailbox controller driver to provide controller specific mapping using fwnode. The Linux OF framework already implements fwnode operations for the Linux DD framework so the fw_xlate() callback works fine with device tree as well. Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/r/20250818040920.272664-6-apatel@ventanamicro.com Signed-off-by: Paul Walmsley <pjw@kernel.org> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Peng Fan	b981602648	mailbox: Use guard/scoped_guard for con_mutex [ Upstream commit `16da9a653c` ] Use guard and scoped_guard for con_mutex to simplify code. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Peng Fan	186ad0c303	mailbox: Use dev_err when there is error [ Upstream commit `8da4988b6e` ] Use dev_err to show the error log instead of using dev_dbg. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Tudor Ambarus	0469ec15c9	mailbox: remove unused header files [ Upstream commit `4de14ec76b` ] There's nothing used from these header files, remove their inclusion. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Tudor Ambarus	c7b57c4d29	mailbox: sort headers alphabetically [ Upstream commit `db824c1119` ] Sorting headers alphabetically helps locating duplicates, and makes it easier to figure out where to insert new headers. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Tudor Ambarus	492c8eb02e	mailbox: don't protect of_parse_phandle_with_args with con_mutex [ Upstream commit `8c71c61fc6` ] There are no concurrency problems if multiple consumers parse the phandle, don't gratuiously protect the parsing with the mutex used for the controllers list. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Stable-dep-of: `fcd7f96c78` ("mailbox: Prevent out-of-bounds access in fw_mbox_index_xlate()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Zhang Yi	2920ec61c9	ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O [ Upstream commit `feaf2a80e7` ] When allocating blocks during within-EOF DIO and writeback with dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was set when calling ext4_split_convert_extents(), which may potentially result in stale data issues. Assume we have an unwritten extent, and then DIO writes the second half. [UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent [UUUUUUUUUUUUUUUU] extent status tree \|<- ->\| ----> dio write this range First, ext4_iomap_alloc() call ext4_map_blocks() with EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the above flags set. Then, ext4_split_convert_extents() calls ext4_split_extent() with EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2 flags set, and it calls ext4_split_extent_at() to split the second half with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at() failed to insert extent since a temporary lack -ENOSPC. It zeroes out the first half but convert the entire on-disk extent to written since the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten in the extent status tree. [0000000000SSSSSS] data S: stale data, 0: zeroed [WWWWWWWWWWWWWWWW] on-disk extent W: written extent [WWWWWWWWWWUUUUUU] extent status tree Finally, if the DIO failed to write data to the disk, the stale data in the second half will be exposed once the cached extent entry is gone. Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting an unwritten extent before submitting I/O, and make ext4_split_convert_extents() to zero out the entire extent range to zero for this case, and also mark the extent in the extent status tree for consistency. Fixes: `b8a8684502` ("ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Cc: stable@kernel.org Message-ID: <20251129103247.686136-4-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:24 +01:00
Yang Erkun	3e00915433	ext4: correct the comments place for EXT4_EXT_MAY_ZEROOUT [ Upstream commit `cc742fd1d1` ] Move the comments just before we set EXT4_EXT_MAY_ZEROOUT in ext4_split_convert_extents. Signed-off-by: Yang Erkun <yangerkun@huawei.com> Message-ID: <20251112084538.1658232-4-yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: `feaf2a80e7` ("ext4: don't set EXT4_GET_BLOCKS_CONVERT when splitting before submitting I/O") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Johan Hovold	97ac11dbcb	drm/tegra: dsi: fix device leak on probe [ Upstream commit `bfef062695` ] Make sure to drop the reference taken when looking up the companion (ganged) device and its driver data during probe(). Note that holding a reference to a device does not prevent its driver data from going away so there is no point in keeping the reference. Fixes: `e94236cde4` ("drm/tegra: dsi: Add ganged mode support") Fixes: `221e3638fe` ("drm/tegra: Fix reference leak in tegra_dsi_ganged_probe") Cc: stable@vger.kernel.org # 3.19: `221e3638fe` Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20251121164201.13188-1-johan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	ce22aaed01	ata: libata-scsi: avoid Non-NCQ command starvation [ Upstream commit `0ea84089db` ] When a non-NCQ command is issued while NCQ commands are being executed, ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command deferring is correct and as mandated by the ACS specifications since NCQ and non-NCQ commands cannot be mixed. However, in the case of a host adapter using multiple submission queues, when the target device is under a constant load of NCQ commands, there are no guarantees that requeueing the non-NCQ command will be executed later and it may be deferred again repeatedly as other submission queues can constantly issue NCQ commands from different CPUs ahead of the non-NCQ command. This can lead to very long delays for the execution of non-NCQ commands, and even complete starvation for these commands in the worst case scenario. Since the block layer and the SCSI layer do not distinguish between queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT implementation must ensure forward progress for non-NCQ commands in the presence of NCQ command traffic. This is similar to what SAS HBAs with a hardware/firmware based SAT implementation do. Implement such forward progress guarantee by limiting requeueing of non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is received and NCQ commands are in-flight, do not force a requeue of the non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0 to indicate that the command was accepted but hold on to the qc using the new deferred_qc field of struct ata_port. This deferred qc will be issued using the work item deferred_qc_work running the function ata_scsi_deferred_qc_work() once all in-flight commands complete, which is checked with the port qc_defer() callback return value indicating that no further delay is necessary. This check is done using the helper function ata_scsi_schedule_deferred_qc() which is called from ata_scsi_qc_complete(). This thus excludes this mechanism from all internal non-NCQ commands issued by ATA EH. When a port deferred_qc is non NULL, that is, the port has a command waiting for the device queue to drain, the issuing of all incoming commands (both NCQ and non-NCQ) is deferred using the regular busy mechanism. This simplifies the code and also avoids potential denial of service problems if a user issues too many non-NCQ commands. Finally, whenever ata EH is scheduled, regardless of the reason, a deferred qc is always requeued so that it can be retried once EH completes. This is done by calling the function ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids the need for any special processing for the deferred qc in case of NCQ error, link or device reset, or device timeout. Reported-by: Xingui Yang <yangxingui@huawei.com> Reported-by: Igor Pylypiv <ipylypiv@google.com> Fixes: `bdb01301f3` ("scsi: Add host and host template flag 'host_tagset'") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Tested-by: Igor Pylypiv <ipylypiv@google.com> Tested-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	e44a354024	ata: libata: Introduce ata_port_eh_scheduled() [ Upstream commit `7aae547bbe` ] Introduce the inline helper function ata_port_eh_scheduled() to test if EH is pending (ATA_PFLAG_EH_PENDING port flag is set) or running (ATA_PFLAG_EH_IN_PROGRESS port flag is set) for a port. Use this helper in ata_port_wait_eh() and __ata_scsi_queuecmd() to replace the hardcoded port flag tests. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250704104601.310643-1-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	8709745762	ata: libata: Remove ATA_DFLAG_ZAC device flag [ Upstream commit `a0f26fcc38` ] The ATA device flag ATA_DFLAG_ZAC is used to indicate if a devie is a host managed or host aware zoned device. However, this flag is not used in the hot path and only used during device scanning/revalidation and for inquiry and sense SCSI command translation. Save one bit from struct ata_device flags field by replacing this flag with the internal helper function ata_dev_is_zac(). This function returns true if the device class is ATA_DEV_ZAC (host managed ZAC device case) or if its identify data reports it supports the zoned command set (host aware ZAC device case). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	4e0e3c784f	ata: libata-scsi: Remove struct ata_scsi_args [ Upstream commit `2365278e03` ] The data structure struct ata_scsi_args is used to pass the target ATA device, the SCSI command to simulate and the device identification data to ata_scsi_rbuf_fill() and to its actor function. This method of passing information does not improve the code in any way and in fact increases the number of pointer dereferences for no gains. Drop this data structure by modifying the interface of ata_scsi_rbuf_fill() and its actor function to take an ATA device and a SCSI command as argument. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241022024537.251905-6-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	bcb93aeed8	ata: libata-scsi: Document all VPD page inquiry actors [ Upstream commit `47000e84b3` ] Add the missing kdoc comments for the ata_scsiop_inq_XX functions used to emulate access to VPD pages. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241022024537.251905-5-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:23 +01:00
Damien Le Moal	b057df3393	ata: libata-scsi: Refactor ata_scsiop_maint_in() [ Upstream commit `4ab7bb9763` ] Move the check for MI_REPORT_SUPPORTED_OPERATION_CODES from ata_scsi_simulate() into ata_scsiop_maint_in() to simplify ata_scsi_simulate() code. Furthermore, since an rbuff fill actor function returning a non-zero value causes no data to be returned for the command, directly return an error (return 1) for invalid command formt after setting the invalid field in cdb error. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241022024537.251905-4-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Damien Le Moal	d62d732121	ata: libata-scsi: Refactor ata_scsiop_read_cap() [ Upstream commit `44bdde151a` ] Move the check for the scsi command service action being SAI_READ_CAPACITY_16 from ata_scsi_simulate() into ata_scsiop_read_cap() to simplify ata_scsi_simulate() for processing capacity reading commands (READ_CAPACITY and SERVICE_ACTION_IN_16). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241022024537.251905-3-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Damien Le Moal	2b3717897c	ata: libata-scsi: Refactor ata_scsi_simulate() [ Upstream commit `b055e3be63` ] Factor out the code handling the INQUIRY command in ata_scsi_simulate() using the function ata_scsi_rbuf_fill() with the new actor ata_scsiop_inquiry(). This new actor function calls the existing actors to handle the standard inquiry as well as extended inquiry (VPD page access). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241022024537.251905-2-dlemoal@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `0ea84089db` ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Sean Christopherson	2657439265	KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block() [ Upstream commit `ead63640d4` ] Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as something has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti <alessandro@0x65c.net> Cc: stable@vger.kernel.org Fixes: `26844fee6a` ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: `45405155d8` ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Ricardo Ribalda	630a61552e	media: dw9714: Fix powerup sequence [ Upstream commit `401aec35ac` ] We have experienced seen multiple I2C errors while doing stress test on the module: dw9714 i2c-PRP0001:01: dw9714_vcm_resume I2C failure: -5 dw9714 i2c-PRP0001:01: I2C write fail Inspecting the powerup sequence we found that it does not match the documentation at: https://blog.arducam.com/downloads/DW9714A-DONGWOON(Autofocus_motor_manual).pdf """ (2) DW9714A requires waiting time of 12ms after power on. During this waiting time, the offset calibration of internal amplifier is operating for minimization of output offset current . """ This patch increases the powerup delay to follow the documentation. Fixes: `9d00ccabfb` ("media: i2c: dw9714: Fix occasional probe errors") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Tested-by: Neil Sun <neil.sun@lcfuturecenter.com> Reported-by: Naomi Huang <naomi.huang@lcfuturecenter.com> Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Matthias Fend	a4cf3497fb	media: dw9714: add support for powerdown pin [ Upstream commit `03dca18424` ] Add support for the powerdown pin (xSD), which can be used to put the VCM driver into power down mode. This is useful, for example, if the VCM driver's power supply cannot be controlled. The use of the powerdown pin is optional. Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Stable-dep-of: `401aec35ac` ("media: dw9714: Fix powerup sequence") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Matthias Fend	1158193e76	media: dw9714: move power sequences to dedicated functions [ Upstream commit `1eefe42e9d` ] Move the power-up and power-down sequences to their own functions. This is a preparation for the upcoming powerdown pin support. Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Stable-dep-of: `401aec35ac` ("media: dw9714: Fix powerup sequence") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Zilin Guan	3ca2f09061	media: tegra-video: Fix memory leak in __tegra_channel_try_format() [ Upstream commit `43e5302d22` ] The state object allocated by __v4l2_subdev_state_alloc() must be freed with __v4l2_subdev_state_free() when it is no longer needed. In __tegra_channel_try_format(), two error paths return directly after v4l2_subdev_call() fails, without freeing the allocated 'sd_state' object. This violates the requirement and causes a memory leak. Fix this by introducing a cleanup label and using goto statements in the error paths to ensure that __v4l2_subdev_state_free() is always called before the function returns. Fixes: `56f64b8235` ("media: tegra-video: Use zero crop settings if subdev has no get_selection") Fixes: `1ebaeb0983` ("media: tegra-video: Add support for external sensor capture") Cc: stable@vger.kernel.org Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:22 +01:00
Ilpo Järvinen	a29639a342	PCI: Use resource_set_range() that correctly sets ->end [ Upstream commit `11721c45a8` ] __pci_read_base() sets resource start and end addresses when resource is larger than 4G but pci_bus_addr_t or resource_size_t are not capable of representing 64-bit PCI addresses. This creates a problematic resource that has non-zero flags but the start and end addresses do not yield to resource size of 0 but 1. Replace custom resource addresses setup with resource_set_range() that correctly sets end address as -1 which results in resource_size() returning 0. For consistency, also use resource_set_range() in the other branch that does size based resource setup. Fixes: `23b13bc76f` ("PCI: Fail safely if we can't handle BARs larger than 4GB") Link: https://lore.kernel.org/all/20251207215359.28895-1-ansuelsmth@gmail.com/T/#m990492684913c5a158ff0e5fc90697d8ad95351b Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: stable@vger.kernel.org Cc: Christian Marangi <ansuelsmth@gmail.com> Link: https://patch.msgid.link/20251208145654.5294-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Ilpo Järvinen	889b5cb678	resource: Add resource set range and size helpers [ Upstream commit `9fb6fef0fb` ] Setting the end address for a resource with a given size lacks a helper and is therefore coded manually unlike the getter side which has a helper for resource size calculation. Also, almost all callsites that calculate the end address for a resource also set the start address right before it like this: res->start = start_addr; res->end = res->start + size - 1; Add resource_set_range(res, start_addr, size) that sets the start address and calculates the end address to simplify this often repeated fragment. Also add resource_set_size() for the cases where setting the start address of the resource is not necessary but mention in its kerneldoc that resource_set_range() is preferred when setting both addresses. Link: https://lore.kernel.org/r/20240614100606.15830-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: `11721c45a8` ("PCI: Use resource_set_range() that correctly sets ->end") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Niklas Cassel	52d89a64dd	Revert "PCI: qcom: Don't wait for link if we can detect Link Up" [ Upstream commit `e9ce5b3804` ] This reverts commit `36971d6c5a`. While this fake hotplugging was a nice idea, it has shown that this feature does not handle PCIe switches correctly: pci_bus 0004:43: busn_res: can not insert [bus 43-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:43: busn_res: [bus 43-41] end is updated to 43 pci_bus 0004:43: busn_res: can not insert [bus 43] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:00.0: devices behind bridge are unusable because [bus 43] cannot be assigned for them pci_bus 0004:44: busn_res: can not insert [bus 44-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:44: busn_res: [bus 44-41] end is updated to 44 pci_bus 0004:44: busn_res: can not insert [bus 44] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:02.0: devices behind bridge are unusable because [bus 44] cannot be assigned for them pci_bus 0004:45: busn_res: can not insert [bus 45-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:45: busn_res: [bus 45-41] end is updated to 45 pci_bus 0004:45: busn_res: can not insert [bus 45] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:06.0: devices behind bridge are unusable because [bus 45] cannot be assigned for them pci_bus 0004:46: busn_res: can not insert [bus 46-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:46: busn_res: [bus 46-41] end is updated to 46 pci_bus 0004:46: busn_res: can not insert [bus 46] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:0e.0: devices behind bridge are unusable because [bus 46] cannot be assigned for them pci_bus 0004:42: busn_res: [bus 42-41] end is updated to 46 pci_bus 0004:42: busn_res: can not insert [bus 42-46] under [bus 41] (conflicts with (null) [bus 41]) pci 0004:41:00.0: devices behind bridge are unusable because [bus 42-46] cannot be assigned for them pcieport 0004:40:00.0: bridge has subordinate 41 but max busn 46 During the initial scan, PCI core doesn't see the switch and since the Root Port is not hot plug capable, the secondary bus number gets assigned as the subordinate bus number. This means, the PCI core assumes that only one bus will appear behind the Root Port since the Root Port is not hot plug capable. This works perfectly fine for PCIe endpoints connected to the Root Port, since they don't extend the bus. However, if a PCIe switch is connected, then there is a problem when the downstream busses starts showing up and the PCI core doesn't extend the subordinate bus number and bridge resources after initial scan during boot. The long term plan is to migrate this driver to the upcoming pwrctrl APIs that are supposed to handle this problem elegantly. Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251222064207.3246632-11-cassel@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Krishna chaitanya chundru	29ff60fda5	PCI: qcom: Don't wait for link if we can detect Link Up [ Upstream commit `36971d6c5a` ] If we have a 'global' IRQ for Link Up events, we need not wait for the link to be up during PCI initialization, which reduces startup time. Check for 'global' IRQ, and if present, set 'use_linkup_irq', so dw_pcie_host_init() doesn't wait for the link to come up. Link: https://lore.kernel.org/r/20241123-remove_wait2-v5-2-b5f9e6b794c2@quicinc.com Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Stable-dep-of: `e9ce5b3804` ("Revert "PCI: qcom: Don't wait for link if we can detect Link Up"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Niklas Cassel	ff6c9a40e4	Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up" [ Upstream commit `fc6298086b` ] This reverts commit `ec9fd499b9`. While this fake hotplugging was a nice idea, it has shown that this feature does not handle PCIe switches correctly: pci_bus 0004:43: busn_res: can not insert [bus 43-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:43: busn_res: [bus 43-41] end is updated to 43 pci_bus 0004:43: busn_res: can not insert [bus 43] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:00.0: devices behind bridge are unusable because [bus 43] cannot be assigned for them pci_bus 0004:44: busn_res: can not insert [bus 44-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:44: busn_res: [bus 44-41] end is updated to 44 pci_bus 0004:44: busn_res: can not insert [bus 44] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:02.0: devices behind bridge are unusable because [bus 44] cannot be assigned for them pci_bus 0004:45: busn_res: can not insert [bus 45-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:45: busn_res: [bus 45-41] end is updated to 45 pci_bus 0004:45: busn_res: can not insert [bus 45] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:06.0: devices behind bridge are unusable because [bus 45] cannot be assigned for them pci_bus 0004:46: busn_res: can not insert [bus 46-41] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci_bus 0004:46: busn_res: [bus 46-41] end is updated to 46 pci_bus 0004:46: busn_res: can not insert [bus 46] under [bus 42-41] (conflicts with (null) [bus 42-41]) pci 0004:42:0e.0: devices behind bridge are unusable because [bus 46] cannot be assigned for them pci_bus 0004:42: busn_res: [bus 42-41] end is updated to 46 pci_bus 0004:42: busn_res: can not insert [bus 42-46] under [bus 41] (conflicts with (null) [bus 41]) pci 0004:41:00.0: devices behind bridge are unusable because [bus 42-46] cannot be assigned for them pcieport 0004:40:00.0: bridge has subordinate 41 but max busn 46 During the initial scan, PCI core doesn't see the switch and since the Root Port is not hot plug capable, the secondary bus number gets assigned as the subordinate bus number. This means, the PCI core assumes that only one bus will appear behind the Root Port since the Root Port is not hot plug capable. This works perfectly fine for PCIe endpoints connected to the Root Port, since they don't extend the bus. However, if a PCIe switch is connected, then there is a problem when the downstream busses starts showing up and the PCI core doesn't extend the subordinate bus number and bridge resources after initial scan during boot. The long term plan is to migrate this driver to the upcoming pwrctrl APIs that are supposed to handle this problem elegantly. Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251222064207.3246632-9-cassel@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Niklas Cassel	fdf5e16605	PCI: dw-rockchip: Don't wait for link since we can detect Link Up [ Upstream commit `ec9fd499b9` ] The Root Complex specific device tree binding for pcie-dw-rockchip has the 'sys' interrupt marked as required. The driver requests the 'sys' IRQ unconditionally, and errors out if not provided. Thus, we can unconditionally set 'use_linkup_irq', so dw_pcie_host_init() doesn't wait for the link to come up. This will skip the wait for link up (since the bus will be enumerated once the link up IRQ is triggered), which reduces the bootup time. Link: https://lore.kernel.org/r/20250113-rockchip-no-wait-v1-1-25417f37b92f@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Stable-dep-of: `fc6298086b` ("Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"") Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Johan Hovold	1f23a48ff2	memory: mtk-smi: fix device leak on larb probe [ Upstream commit `9dae65913b` ] Make sure to drop the reference taken when looking up the SMI device during larb probe on late probe failure (e.g. probe deferral) and on driver unbind. Fixes: `cc8bbe1a83` ("memory: mediatek: Add SMI driver") Fixes: `038ae37c51` ("memory: mtk-smi: add missing put_device() call in mtk_smi_device_link_common") Cc: stable@vger.kernel.org # 4.6: `038ae37c51` Cc: stable@vger.kernel.org # 4.6 Cc: Yong Wu <yong.wu@mediatek.com> Cc: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251121164624.13685-3-johan@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:21 +01:00
Johan Hovold	984992f31c	memory: mtk-smi: fix device leaks on common probe [ Upstream commit `6cfa038bdd` ] Make sure to drop the reference taken when looking up the SMI device during common probe on late probe failure (e.g. probe deferral) and on driver unbind. Fixes: `4740475770` ("memory: mtk-smi: Add device link for smi-sub-common") Fixes: `038ae37c51` ("memory: mtk-smi: add missing put_device() call in mtk_smi_device_link_common") Cc: stable@vger.kernel.org # 5.16: `038ae37c51` Cc: stable@vger.kernel.org # 5.16 Cc: Yong Wu <yong.wu@mediatek.com> Cc: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251121164624.13685-2-johan@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00
Yazen Ghannam	a7df760f70	x86/acpi/boot: Correct acpi_is_processor_usable() check again [ Upstream commit `adbf61cc47` ] ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is used in conjunction with the "Enabled" MADT LAPIC flag to determine if a CPU can be enabled/hotplugged by the OS after boot. Before the new bit was defined, the "Enabled" bit was explicitly described like this (ACPI v6.0 wording provided): "If zero, this processor is unusable, and the operating system support will not attempt to use it" This means that CPU hotplug (based on MADT) is not possible. Many BIOS implementations follow this guidance. They may include LAPIC entries in MADT for unavailable CPUs, but since these entries are marked with "Enabled=0" it is expected that the OS will completely ignore these entries. However, QEMU will do the same (include entries with "Enabled=0") for the purpose of allowing CPU hotplug within the guest. Comment from QEMU function pc_madt_cpu_entry(): /* ACPI spec says that LAPIC entry for non present * CPU may be omitted from MADT or it must be marked * as disabled. However omitting non present CPU from * MADT breaks hotplug on linux. So possible CPUs * should be put in MADT but kept disabled. */ Recent Linux topology changes broke the QEMU use case. A following fix for the QEMU use case broke bare metal topology enumeration. Rework the Linux MADT LAPIC flags check to allow the QEMU use case only for guests and to maintain the ACPI spec behavior for bare metal. Remove an unnecessary check added to fix a bare metal case introduced by the QEMU "fix". [ bp: Change logic as Michal suggested. ] [ mingo: Removed misapplied -stable tag. ] Fixes: `fed8d8773b` ("x86/acpi/boot: Correct acpi_is_processor_usable() check") Fixes: `f0551af021` ("x86/topology: Ignore non-present APIC IDs in a present package") Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com Reported-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Michal Pecio <michal.pecio@gmail.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00
Bjorn Helgaas	2129ce65c8	PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value [ Upstream commit `39195990e4` ] `fb82437fdd` ("PCI: Change capability register offsets to hex") incorrectly converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex 0x32: -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here / +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 / end of v2 EPs w/ link */ This broke PCI capabilities in a VMM because subsequent ones weren't DWORD-aligned. Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34. `fb82437fdd` was from Baruch Siach <baruch@tkos.co.il>, but this was not Baruch's fault; it's a mistake I made when applying the patch. Fixes: `fb82437fdd` ("PCI: Change capability register offsets to hex") Reported-by: David Woodhouse <dwmw2@infradead.org> Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00
Kohei Enju	d2c31d8e03	bpf: Fix stack-out-of-bounds write in devmap [ Upstream commit `b7bf516c3e` ] get_upper_ifindexes() iterates over all upper devices and writes their indices into an array without checking bounds. Also the callers assume that the max number of upper devices is MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack, but that assumption is not correct and the number of upper devices could be larger than MAX_NEST_DEV (e.g., many macvlans), causing a stack-out-of-bounds write. Add a max parameter to get_upper_ifindexes() to avoid the issue. When there are too many upper devices, return -EOVERFLOW and abort the redirect. To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with an XDP program attached using BPF_F_BROADCAST \| BPF_F_EXCLUDE_INGRESS. Then send a packet to the device to trigger the XDP redirect path. Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/ Fixes: `aeea1b86f9` ("bpf, devmap: Exclude XDP broadcast to master device") Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Kohei Enju <kohei@enjuk.jp> Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00
Fuad Tabba	80ad264da0	bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing [ Upstream commit `ef06fd16d4` ] struct bpf_plt contains a u64 target field. Currently, the BPF JIT allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT buffer. Because the base address of the JIT buffer can be 4-byte aligned (e.g., ending in 0x4 or 0xc), the relative padding logic in build_plt() fails to ensure that target lands on an 8-byte boundary. This leads to two issues: 1. UBSAN reports misaligned-access warnings when dereferencing the structure. 2. More critically, target is updated concurrently via WRITE_ONCE() in bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64, 64-bit loads/stores are only guaranteed to be single-copy atomic if they are 64-bit aligned. A misaligned target risks a torn read, causing the JIT to jump to a corrupted address. Fix this by increasing the allocation alignment requirement to 8 bytes (sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of the JIT buffer to an 8-byte boundary, allowing the relative padding math in build_plt() to correctly align the target field. Fixes: `b2ad54e153` ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00
Mark Harmstone	283cc83f63	btrfs: fix compat mask in error messages in btrfs_check_features() [ Upstream commit `587bb33b10` ] Commit `d7f67ac9a9` ("btrfs: relax block-group-tree feature dependency checks") introduced a regression when it comes to handling unsupported incompat or compat_ro flags. Beforehand we only printed the flags that we didn't recognize, afterwards we printed them all, which is less useful. Fix the error handling so it behaves like it used to. Fixes: `d7f67ac9a9` ("btrfs: relax block-group-tree feature dependency checks") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-13 17:20:20 +01:00

... 5 6 7 8 9 ...

1326813 Commits