linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Longfang Liu	dbf9416360	hisi_acc_vfio_pci: update status after RAS error [ Upstream commit `8be14dd48d` ] After a RAS error occurs on the accelerator device, the accelerator device will be reset. The live migration state will be abnormal after reset, and the original state needs to be restored during the reset process. Therefore, reset processing needs to be performed in a live migration scenario. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20260122020205.2884497-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:21:20 -05:00
Anthony Pighin (Nokia)	3d2b60a8d6	vfio/pci: Lock upstream bridge for vfio_pci_core_disable() [ Upstream commit `962ae6892d` ] The commit `7e89efc6e9` ("Lock upstream bridge for pci_reset_function()") added locking of the upstream bridge to the reset function. To catch paths that are not properly locked, the commit `920f646892` ("Warn on missing cfg_access_lock during secondary bus reset") added a warning if the PCI configuration space was not locked during a secondary bus reset request. When a VFIO PCI device is released from userspace ownership, an attempt to reset the PCI device function may be made. If so, and the upstream bridge is not locked, the release request results in a warning: pcieport 0000:00:00.0: unlocked secondary bus reset via: pci_reset_bus_function+0x188/0x1b8 Add missing upstream bridge locking to vfio_pci_core_disable(). Fixes: `7e89efc6e9` ("PCI: Lock upstream bridge for pci_reset_function()") Signed-off-by: Anthony Pighin <anthony.pighin@nokia.com> Link: https://lore.kernel.org/r/BN0PR08MB695171D3AB759C65B6438B5D838DA@BN0PR08MB6951.namprd08.prod.outlook.com Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:20:19 -05:00
Kevin Tian	ce19b17163	vfio/pci: Disable qword access to the PCI ROM bar [ Upstream commit `dc85a46928` ] Commit `2b938e3db3` ("vfio/pci: Enable iowrite64 and ioread64 for vfio pci") enables qword access to the PCI bar resources. However certain devices (e.g. Intel X710) are observed with problem upon qword accesses to the rom bar, e.g. triggering PCI aer errors. This is triggered by Qemu which caches the rom content by simply does a pread() of the remaining size until it gets the full contents. The other bars would only perform operations at the same access width as their guest drivers. Instead of trying to identify all broken devices, universally disable qword access to the rom bar i.e. going back to the old way which worked reliably for years. Reported-by: Farrah Chen <farrah.chen@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220740 Fixes: `2b938e3db3` ("vfio/pci: Enable iowrite64 and ioread64 for vfio pci") Cc: stable@vger.kernel.org Signed-off-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Link: https://lore.kernel.org/r/20251218081650.555015-2-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:15:05 +01:00
Zilin Guan	a09b30ddd4	vfio/pds: Fix memory leak in pds_vfio_dirty_enable() [ Upstream commit `665077d78d` ] pds_vfio_dirty_enable() allocates memory for region_info. If interval_tree_iter_first() returns NULL, the function returns -EINVAL immediately without freeing the allocated memory, causing a memory leak. Fix this by jumping to the out_free_region_info label to ensure region_info is freed. Fixes: `2e7c6feb4e` ("vfio/pds: Add multi-region support") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://lore.kernel.org/r/20251225143150.1117366-1-zilin@seu.edu.cn Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:14:35 +01:00
Alex Williamson	45d7cf5174	vfio/pci: Use RCU for error/request triggers to avoid circular locking [ Upstream commit `98693e0897` ] Thanks to a device generating an ACS violation during bus reset, lockdep reported the following circular locking issue: CPU0: SET_IRQS (MSI/X): holds igate, acquires memory_lock CPU1: HOT_RESET: holds memory_lock, acquires pci_bus_sem CPU2: AER: holds pci_bus_sem, acquires igate This results in a potential 3-way deadlock. Remove the pci_bus_sem->igate leg of the triangle by using RCU to peek at the eventfd rather than locking it with igate. Fixes: `3be3a074cf` ("vfio-pci: Don't use device_lock around AER interrupt setup") Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20251124223623.2770706-1-alex@shazbot.org Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:12 +01:00
Timothy Pearson	47ed17f2cb	vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices [ Upstream commit `8b9f128947` ] PCI devices prior to PCI 2.3 both use level interrupts and do not support interrupt masking, leading to a failure when passed through to a KVM guest on at least the ppc64 platform. This failure manifests as receiving and acknowledging a single interrupt in the guest, while the device continues to assert the level interrupt indicating a need for further servicing. When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following sequence occurs: * Level IRQ assertion on device * IRQ marked disabled in kernel * Host interrupt handler exits without clearing the interrupt on the device * Eventfd is delivered to userspace * Guest processes IRQ and clears device interrupt * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked * Newly asserted interrupt acknowledged by kernel VMM without being handled * Software mask removed by VFIO driver * Device INTx still asserted, host controller does not see new edge after EOI The behavior is now platform-dependent. Some platforms (amd64) will continue to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ will be handled by the host as soon as the mask is dropped. Others (ppc64) will only send the one request, and if it is not handled no further interrupts will be sent. The former behavior theoretically leaves the system vulnerable to interrupt storm, and the latter will result in the device stalling after receiving exactly one interrupt in the guest. Work around this by disabling lazy IRQ masking for DisINTx- INTx devices. Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Link: https://lore.kernel.org/r/333803015.1744464.1758647073336.JavaMail.zimbra@raptorengineeringinc.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-13 15:34:30 -05:00
Alex Mastro	76ff15eace	vfio: return -ENOTTY for unsupported device feature [ Upstream commit `16df67f218` ] The two implementers of vfio_device_ops.device_feature, vfio_cdx_ioctl_feature and vfio_pci_core_ioctl_feature, return -ENOTTY in the fallthrough case when the feature is unsupported. For consistency, the base case, vfio_ioctl_device_feature, should do the same when device_feature == NULL, indicating an implementation has no feature extensions. Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20250908-vfio-enotty-v1-1-4428e1539e2e@fb.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-11-13 15:34:28 -05:00
Zilin Guan	81b43dd85c	vfio/pds: replace bitmap_free with vfree [ Upstream commit `acb59a4bb8` ] host_seq_bmp is allocated with vzalloc but is currently freed with bitmap_free, which uses kfree internally. This mismach prevents the resource from being released properly and may result in memory leaks or other issues. Fix this by freeing host_seq_bmp with vfree to match the vzalloc allocation. Fixes: `f232836a91` ("vfio/pds: Add support for dirty page tracking") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20250913153154.1028835-1-zilin@seu.edu.cn Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:00:14 +02:00
Artem Sadovnikov	410e899811	vfio/mlx5: fix possible overflow in tracking max message size [ Upstream commit `b306019848` ] MLX cap pg_track_log_max_msg_size consists of 5 bits, value of which is used as power of 2 for max_msg_size. This can lead to multiplication overflow between max_msg_size (u32) and integer constant, and afterwards incorrect value is being written to rq_size. Fix this issue by extending integer constant to u64 type. Found by Linux Verification Center (linuxtesting.org) with SVACE. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20250701144017.2410-2-a.sadovnikov@ispras.ru Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-20 18:30:48 +02:00
Keith Busch	bcbad21fe9	vfio/type1: conditional rescheduling while pinning [ Upstream commit `b1779e4f20` ] A large DMA mapping request can loop through dma address pinning for many pages. In cases where THP can not be used, the repeated vmf_insert_pfn can be costly, so let the task reschedule as need to prevent CPU stalls. Failure to do so has potential harmful side effects, like increased memory pressure as unrelated rcu tasks are unable to make their reclaim callbacks and result in OOM conditions. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 36-....: (20999 ticks this GP) idle=b01c/1/0x4000000000000000 softirq=35839/35839 fqs=3538 rcu: hardirqs softirqs csw/system rcu: number: 0 107 0 rcu: cputime: 50 0 10446 ==> 10556(ms) rcu: (t=21075 jiffies g=377761 q=204059 ncpus=384) ... <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? walk_system_ram_range+0x63/0x120 ? walk_system_ram_range+0x46/0x120 ? pgprot_writethrough+0x20/0x20 lookup_memtype+0x67/0xf0 track_pfn_insert+0x20/0x40 vmf_insert_pfn_prot+0x88/0x140 vfio_pci_mmap_huge_fault+0xf9/0x1b0 [vfio_pci_core] __do_fault+0x28/0x1b0 handle_mm_fault+0xef1/0x2560 fixup_user_fault+0xf5/0x270 vaddr_get_pfns+0x169/0x2f0 [vfio_iommu_type1] vfio_pin_pages_remote+0x162/0x8e0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0x1121/0x1810 [vfio_iommu_type1] ? futex_wake+0x1c1/0x260 x64_sys_call+0x234/0x17a0 do_syscall_64+0x63/0x130 ? exc_page_fault+0x63/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20250715184622.3561598-1-kbusch@meta.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-20 18:30:47 +02:00
Alex Williamson	fa1607f943	vfio/pci: Separate SR-IOV VF dev_set [ Upstream commit `e908f58b6b` ] In the below noted Fixes commit we introduced a reflck mutex to allow better scaling between devices for open and close. The reflck was based on the hot reset granularity, device level for root bus devices which cannot support hot reset or bus/slot reset otherwise. Overlooked in this were SR-IOV VFs, where there's also no bus reset option, but the default for a non-root-bus, non-slot-based device is bus level reflck granularity. The reflck mutex has since become the dev_set mutex (via commit `2cd8b14aaa` ("vfio/pci: Move to the device set infrastructure")) and is our defacto serialization for various operations and ioctls. It still seems to be the case though that sets of vfio-pci devices really only need serialization relative to hot resets affecting the entire set, which is not relevant to SR-IOV VFs. As described in the Closes link below, this serialization contributes to startup latency when multiple VFs sharing the same "bus" are opened concurrently. Mark the device itself as the basis of the dev_set for SR-IOV VFs. Reported-by: Aaron Lewis <aaronlewis@google.com> Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis <aaronlewis@google.com> Fixes: `e309df5b0c` ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:14:02 +02:00
Brett Creeley	1df8150ab4	vfio/pds: Fix missing detach_ioas op [ Upstream commit `fe24d5bc63` ] When CONFIG_IOMMUFD is enabled and a device is bound to the pds_vfio_pci driver, the following WARN_ON() trace is seen and probe fails: WARNING: CPU: 0 PID: 5040 at drivers/vfio/vfio_main.c:317 __vfio_register_dev+0x130/0x140 [vfio] <...> pds_vfio_pci 0000:08:00.1: probe with driver pds_vfio_pci failed with error -22 This is because the driver's vfio_device_ops.detach_ioas isn't set. Fix this by using the generic vfio_iommufd_physical_detach_ioas function. Fixes: `38fe3975b4` ("vfio/pds: Initial support for pds VFIO driver") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250702163744.69767-1-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:14:02 +02:00
Jacob Pan	12964e77c8	vfio: Prevent open_count decrement to negative [ Upstream commit `982ddd59ed` ] When vfio_df_close() is called with open_count=0, it triggers a warning in vfio_assert_device_open() but still decrements open_count to -1. This allows a subsequent open to incorrectly pass the open_count == 0 check, leading to unintended behavior, such as setting df->access_granted = true. For example, running an IOMMUFD compat no-IOMMU device with VFIO tests (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first run, but the second run succeeds incorrectly. Add checks to avoid decrementing open_count below zero. Fixes: `05f37e1c03` ("vfio: Pass struct vfio_device_file * to vfio_device_open/close()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20250618234618.1910456-2-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:14:02 +02:00
Jacob Pan	7b2db63ad8	vfio: Fix unbalanced vfio_df_close call in no-iommu mode [ Upstream commit `b25e271b37` ] For devices with no-iommu enabled in IOMMUFD VFIO compat mode, the group open path skips vfio_df_open(), leaving open_count at 0. This causes a warning in vfio_assert_device_open(device) when vfio_df_close() is called during group close. The correct behavior is to skip only the IOMMUFD bind in the device open path for no-iommu devices. Commit `6086efe734` omitted vfio_df_open(), which was too broad. This patch restores the previous behavior, ensuring the vfio_df_open is called in the group open path. Fixes: `6086efe734` ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250618234618.1910456-1-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:14:02 +02:00
Longfang Liu	be1e0287ac	hisi_acc_vfio_pci: bugfix the problem of uninstalling driver [ Upstream commit `db6525a857` ] In a live migration scenario. If the number of VFs at the destination is greater than the source, the recovery operation will fail and qemu will not be able to complete the process and exit after shutting down the device FD. This will cause the driver to be unable to be unloaded normally due to abnormal reference counting of the live migration driver caused by the abnormal closing operation of fd. Therefore, make sure the migration file descriptor references are always released when the device is closed. Fixes: `b0eed08590` ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-5-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:00 +02:00
Longfang Liu	bac4641756	hisi_acc_vfio_pci: bugfix cache write-back issue [ Upstream commit `e63c466398` ] At present, cache write-back is placed in the device data copy stage after stopping the device operation. Writing back to the cache at this stage will cause the data obtained by the cache to be written back to be empty. In order to ensure that the cache data is written back successfully, the data needs to be written back into the stop device stage. Fixes: `b0eed08590` ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:00 +02:00
Li RongQing	46e6822489	vfio/type1: Fix error unwind in migration dirty bitmap allocation [ Upstream commit `4518e5a60c` ] When setting up dirty page tracking at the vfio IOMMU backend for device migration, if an error is encountered allocating a tracking bitmap, the unwind loop fails to free previously allocated tracking bitmaps. This occurs because the wrong loop index is used to generate the tracking object. This results in unintended memory usage for the life of the current DMA mappings where bitmaps were successfully allocated. Use the correct loop index to derive the tracking object for freeing during unwind. Fixes: `d6a4c18566` ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20250521034647.2877-1-lirongqing@baidu.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:52 +02:00
Longfang Liu	59a834592d	hisi_acc_vfio_pci: bugfix live migration function without VF device driver [ Upstream commit `2777a40998` ] If the VF device driver is not loaded in the Guest OS and we attempt to perform device data migration, the address of the migrated data will be NULL. The live migration recovery operation on the destination side will access a null address value, which will cause access errors. Therefore, live migration of VMs without added VF device drivers does not require device data migration. In addition, when the queue address data obtained by the destination is empty, device queue recovery processing will not be performed. Fixes: `b0eed08590` ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-6-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:50 +02:00
Longfang Liu	89729b8152	hisi_acc_vfio_pci: add eq and aeq interruption restore [ Upstream commit `3495cec078` ] In order to ensure that the task packets of the accelerator device are not lost during the migration process, it is necessary to send an EQ and AEQ command to the device after the live migration is completed and to update the completion position of the task queue. Let the device recheck the completed tasks data and if there are uncollected packets, device resend a task completion interrupt to the software. Fixes: `b0eed08590` ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:50 +02:00
Longfang Liu	884a76e813	hisi_acc_vfio_pci: fix XQE dma address error [ Upstream commit `8bb7170c5a` ] The dma addresses of EQE and AEQE are wrong after migration and results in guest kernel-mode encryption services failure. Comparing the definition of hardware registers, we found that there was an error when the data read from the register was combined into an address. Therefore, the address combination sequence needs to be corrected. Even after fixing the above problem, we still have an issue where the Guest from an old kernel can get migrated to new kernel and may result in wrong data. In order to ensure that the address is correct after migration, if an old magic number is detected, the dma address needs to be updated. Fixes: `b0eed08590` ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:50 +02:00
Alex Williamson	66e8f1d64b	vfio/pci: Handle INTx IRQ_NOTCONNECTED [ Upstream commit `860be250fc` ] Some systems report INTx as not routed by setting pdev->irq to IRQ_NOTCONNECTED, resulting in a -ENOTCONN error when trying to setup eventfd signaling. Include this in the set of conditions for which the PIN register is virtualized to zero. Additionally consolidate vfio_pci_get_irq_count() to use this virtualized value in reporting INTx support via ioctl and sanity checking ioctl paths since pdev->irq is re-used when the device is in MSI mode. The combination of these results in both the config space of the device and the ioctl interface behaving as if the device does not support INTx. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250311230623.1264283-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:15 +02:00
Alex Williamson	afa5cdce06	vfio/pci: Align huge faults to order commit `c1d9dac0db` upstream. The vfio-pci huge_fault handler doesn't make any attempt to insert a mapping containing the faulting address, it only inserts mappings if the faulting address and resulting pfn are aligned. This works in a lot of cases, particularly in conjunction with QEMU where DMA mappings linearly fault the mmap. However, there are configurations where we don't get that linear faulting and pages are faulted on-demand. The scenario reported in the bug below is such a case, where the physical address width of the CPU is greater than that of the IOMMU, resulting in a VM where guest firmware has mapped device MMIO beyond the address width of the IOMMU. In this configuration, the MMIO is faulted on demand and tracing indicates that occasionally the faults generate a VM_FAULT_OOM. Given the use case, this results in a "error: kvm run failed Bad address", killing the VM. The host is not under memory pressure in this test, therefore it's suspected that VM_FAULT_OOM is actually the result of a NULL return from __pte_offset_map_lock() in the get_locked_pte() path from insert_pfn(). This suggests a potential race inserting a pte concurrent to a pmd, and maybe indicates some deficiency in the mm layer properly handling such a case. Nevertheless, Peter noted the inconsistency of vfio-pci's huge_fault handler where our mapping granularity depends on the alignment of the faulting address relative to the order rather than aligning the faulting address to the order to more consistently insert huge mappings. This change not only uses the page tables more consistently and efficiently, but as any fault to an aligned page results in the same mapping, the race condition suspected in the VM_FAULT_OOM is avoided. Reported-by: Adolfo <adolfotregosa@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220057 Fixes: `09dfc8a5f2` ("vfio/pci: Fallback huge faults for unaligned pfn") Cc: stable@vger.kernel.org Tested-by: Adolfo <adolfotregosa@gmail.com> Co-developed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250502224035.3183451-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-05-18 08:24:47 +02:00
Greg Kroah-Hartman	61749c0359	Revert "vfio/platform: check the bounds of read/write syscalls" This reverts commit `61ba518195`. It had been committed multiple times to the tree, and isn't needed again. Link: https://lore.kernel.org/r/a082db2605514513a0a8568382d5bd2b6f1877a0.camel@cyberus-technology.de Reported-by: Stefan Nürnberger <stefan.nuernberger@cyberus-technology.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-21 14:01:47 +01:00
Ankit Agrawal	44e35bfd2e	vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM [ Upstream commit `6a9eb2d125` ] There is a HW defect on Grace Hopper (GH) to support the Multi-Instance GPU (MIG) feature [1] that necessiated the presence of a 1G region carved out from the device memory and mapped as uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3) to workaround the issue. The Grace Blackwell systems (GB) differ from GH systems in the following aspects: 1. The aforementioned HW defect is fixed on GB systems. 2. There is a usable BAR1 (region 2 and 3) on GB systems for the GPUdirect RDMA feature [2]. This patch accommodate those GB changes by showing the 64b physical device BAR1 (region2 and 3) to the VM instead of the fake one. This takes care of both the differences. Moreover, the entire device memory is exposed on GB as cacheable to the VM as there is no carveout required. Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1] Link: https://docs.nvidia.com/cuda/gpudirect-rdma/ [2] Cc: Kevin Tian <kevin.tian@intel.com> CC: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20250124183102.3976-3-ankita@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-21 14:01:26 +01:00
Ankit Agrawal	18457b697f	vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem [ Upstream commit `bd53764a60` ] NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a continuation with the Grace Hopper (GH) superchip that provides a cache coherent access to CPU and GPU to each other's memory with an internal proprietary chip-to-chip cache coherent interconnect. There is a HW defect on GH systems to support the Multi-Instance GPU (MIG) feature [1] that necessiated the presence of a 1G region with uncached mapping carved out from the device memory. The 1G region is shown as a fake BAR (comprising region 2 and 3) to workaround the issue. This is fixed on the GB systems. The presence of the fix for the HW defect is communicated by the device firmware through the DVSEC PCI config register with ID 3. The module reads this to take a different codepath on GB vs GH. Scan through the DVSEC registers to identify the correct one and use it to determine the presence of the fix. Save the value in the device's nvgrace_gpu_pci_core_device structure. Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1] CC: Jason Gunthorpe <jgg@nvidia.com> CC: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20250124183102.3976-2-ankita@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-21 14:01:26 +01:00
Ramesh Thomas	758a5e1dc6	vfio/pci: Enable iowrite64 and ioread64 for vfio pci [ Upstream commit `2b938e3db3` ] Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP. They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to generic implementation in lib/iomap.c. The generic implementation does 64 bit rw if readq/writeq is defined for the architecture, otherwise it would do 32 bit back to back rw. Note that there are two versions of the generic implementation that differs in the order the 32 bit words are written if 64 bit support is not present. This is not the little/big endian ordering, which is handled separately. This patch uses the lo followed by hi word ordering which is consistent with current back to back implementation in the vfio/pci code. Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-21 14:01:26 +01:00
Alex Williamson	61ba518195	vfio/platform: check the bounds of read/write syscalls commit `ce9ff21ea8` upstream. count and offset are passed from user space and not checked, only offset is capped to 40 bits, which can be used to read/write out of bounds of the device. Fixes: `6e3f264560` (“vfio/platform: read and write support for the device fd”) Cc: stable@vger.kernel.org Reported-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-17 10:05:38 +01:00
Alex Williamson	a20fcaa230	vfio/platform: check the bounds of read/write syscalls commit `ce9ff21ea8` upstream. count and offset are passed from user space and not checked, only offset is capped to 40 bits, which can be used to read/write out of bounds of the device. Fixes: `6e3f264560` (“vfio/platform: read and write support for the device fd”) Cc: stable@vger.kernel.org Reported-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-01 18:39:38 +01:00
Alex Williamson	f3e8a16c23	vfio/pci: Fallback huge faults for unaligned pfn commit `09dfc8a5f2` upstream. The PFN must also be aligned to the fault order to insert a huge pfnmap. Test the alignment and fallback when unaligned. Fixes: `f9e54c3a2f` ("vfio/pci: implement huge_fault support") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219619 Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com> Reported-by: Precific <precification@posteo.de> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Precific <precification@posteo.de> Link: https://lore.kernel.org/r/20250102183416.1841878-1-alex.williamson@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-17 13:40:45 +01:00
Yishai Hadas	ad4095d125	vfio/mlx5: Align the page tracking max message size with the device capability [ Upstream commit `9c7c5430bc` ] Align the page tracking maximum message size with the device's capability instead of relying on PAGE_SIZE. This adjustment resolves a mismatch on systems where PAGE_SIZE is 64K, but the firmware only supports a maximum message size of 4K. Now that we rely on the device's capability for max_message_size, we must account for potential future increases in its value. Key considerations include: - Supporting message sizes that exceed a single system page (e.g., an 8K message on a 4K system). - Ensuring the RQ size is adjusted to accommodate at least 4 WQEs/messages, in line with the device specification. The above has been addressed as part of the patch. Fixes: `79c3cf2799` ("vfio/mlx5: Init QP based resources for dirty tracking") Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Yingshun Cui <yicui@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241205122654.235619-1-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-14 20:04:14 +01:00
Giovanni Cabiddu	15bfacdb85	vfio/qat: fix overflow check in qat_vf_resume_write() commit `9283b73925` upstream. The unsigned variable `size_t len` is cast to the signed type `loff_t` when passed to the function check_add_overflow(). This function considers the type of the destination, which is of type loff_t (signed), potentially leading to an overflow. This issue is similar to the one described in the link below. Remove the cast. Note that even if check_add_overflow() is bypassed, by setting `len` to a value that is greater than LONG_MAX (which is considered as a negative value after the cast), the function copy_from_user(), invoked a few lines later, will not perform any copy and return `len` as (len > INT_MAX) causing qat_vf_resume_write() to fail with -EFAULT. Fixes: `bb208810b1` ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices") CC: stable@vger.kernel.org # 6.10+ Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain Reported-by: Zijie Zhao <zzjas98@gmail.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Xin Zeng <xin.zeng@intel.com> Link: https://lore.kernel.org/r/20241021123843.42979-1-giovanni.cabiddu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-09 10:41:04 +01:00
Avihai Horon	1ef195178f	vfio/pci: Properly hide first-in-list PCIe extended capability [ Upstream commit `fe4bf8d0b6` ] There are cases where a PCIe extended capability should be hidden from the user. For example, an unknown capability (i.e., capability with ID greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally chosen to be hidden from the user. Hiding a capability is done by virtualizing and modifying the 'Next Capability Offset' field of the previous capability so it points to the capability after the one that should be hidden. The special case where the first capability in the list should be hidden is handled differently because there is no previous capability that can be modified. In this case, the capability ID and version are zeroed while leaving the next pointer intact. This hides the capability and leaves an anchor for the rest of the capability list. However, today, hiding the first capability in the list is not done properly if the capability is unknown, as struct vfio_pci_core_device->pci_config_map is set to the capability ID during initialization but the capability ID is not properly checked later when used in vfio_config_do_rw(). This leads to the following warning [1] and to an out-of-bounds access to ecap_perms array. Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct read only access instead of the ecap_perms array. Note that this is safe since the above is the only case where cap_id can exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which are already checked before). [1] WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1 (snip) Call Trace: <TASK> ? show_regs+0x69/0x80 ? __warn+0x8d/0x140 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? report_bug+0x18f/0x1a0 ? handle_bug+0x63/0xa0 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core] vfio_pci_rw+0x101/0x1b0 [vfio_pci_core] vfio_pci_core_read+0x1d/0x30 [vfio_pci_core] vfio_device_fops_read+0x27/0x40 [vfio] vfs_read+0xbd/0x340 ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio] ? __rseq_handle_notify_resume+0xa4/0x4b0 __x64_sys_pread64+0x96/0xc0 x64_sys_call+0x1c3d/0x20d0 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `89e1f7d4c6` ("vfio: Add PCI device driver") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241124142739.21698-1-avihaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:29 +01:00
Yishai Hadas	2b24c64923	vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data() [ Upstream commit `cb04444c24` ] Fix unwind flows in mlx5vf_pci_save_device_data() and mlx5vf_pci_resume_device_data() to avoid freeing the migf pointer at the 'end' label, as this will be handled by fput(migf->filp) through mlx5vf_release_file(). To ensure mlx5vf_release_file() functions correctly, move the initialization of migf fields (such as migf->lock) to occur before any potential unwind flow, as these fields may be accessed within mlx5vf_release_file(). Fixes: `9945a67ea4` ("vfio/mlx5: Refactor PD usage") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241114095318.16556-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:28 +01:00
Yishai Hadas	c44f1b2ddf	vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages() [ Upstream commit `22e87bf3f7` ] Fix an unwind issue in mlx5vf_add_migration_pages(). If a set of pages is allocated but fails to be added to the SG table, they need to be freed to prevent a memory leak. Any pages successfully added to the SG table will be freed as part of mlx5vf_free_data_buffer(). Fixes: `6fadb02126` ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241114095318.16556-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:28 +01:00
Al Viro	cb787f4ac0	[tree-wide] finally take no_llseek out no_llseek had been defined to NULL two years ago, in commit `868941b144` ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek \| grep -v porting.rst \| while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-09-27 08:18:43 -07:00
Linus Torvalds	7bc21c5e1f	Merge tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "Just a few cleanups this cycle: - Remove several unused structure and function declarations, and unused variables (Dr. David Alan Gilbert, Yue Haibing, Zhang Zekun) - Constify unmodified structure in mdev (Hongbo Li) - Convert to unsigned type to catch overflow with less fanfare than passing a negative value to kcalloc() (Dan Carpenter)" * tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups() vfio/mdev: Constify struct kobj_type vfio: mdev: Remove unused function declarations vfio/fsl-mc: Remove unused variable 'hwirq' vfio/pci: Remove unused struct 'vfio_pci_mmap_vma'	2024-09-24 12:07:47 -07:00
Linus Torvalds	f8ffbc365f	Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.	2024-09-23 09:35:36 -07:00
Alex Williamson	f9e54c3a2f	vfio/pci: implement huge_fault support With the addition of pfnmap support in vmf_insert_pfn_{pmd,pud}() we can take advantage of PMD and PUD faults to PCI BAR mmaps and create more efficient mappings. PCI BARs are always a power of two and will typically get at least PMD alignment without userspace even trying. Userspace alignment for PUD mappings is also not too difficult. Consolidate faults through a single handler with a new wrapper for standard single page faults. The pre-faulting behavior of commit `d71a989cf5` ("vfio/pci: Insert full vma on mmap'd MMIO fault") is removed in this refactoring since huge_fault will cover the bulk of the faults and results in more efficient page table usage. We also want to avoid that pre-faulted single page mappings preempt huge page mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-20-peterx@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-17 01:07:00 -07:00
Peter Xu	a77f9489f1	vfio: use the new follow_pfnmap API Use the new API that can understand huge pfn mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-17 01:06:59 -07:00
Dan Carpenter	aab439ffa1	vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups() The "array_count" value comes from the copy_from_user() in vfio_pci_ioctl_pci_hot_reset(). If the user passes a value larger than INT_MAX then we'll pass a negative value to kcalloc() which triggers an allocation failure and a stack trace. It's better to make the type unsigned so that if (array_count > count) returns -EINVAL instead. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/262ada03-d848-4369-9c37-81edeeed2da2@stanley.mountain Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-09-12 14:15:07 -06:00
Hongbo Li	27a8204b26	vfio/mdev: Constify struct kobj_type This 'struct kobj_type' is not modified. It is only used in kobject_init_and_add() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure and moving it to a read-only section, and this can increase over all security. ``` [Before] text data bss dec hex filename 2372 600 0 2972 b9c drivers/vfio/mdev/mdev_sysfs.o [After] text data bss dec hex filename 2436 568 0 3004 bbc drivers/vfio/mdev/mdev_sysfs.o ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240904011837.2010444-1-lihongbo22@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-09-06 08:20:52 -06:00
Zhang Zekun	7555c7d2cf	vfio: mdev: Remove unused function declarations The definition of mdev_bus_register() and mdev_bus_unregister() have been removed since commit `6c7f98b334` ("vfio/mdev: Remove vfio_mdev.c"). So, let's remove the unused declarations. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240812120823.10968-1-zhangzekun11@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-09-03 08:42:07 -06:00
Yue Haibing	a7aaa65f9c	vfio/fsl-mc: Remove unused variable 'hwirq' Commit `7447d911af` ("vfio/fsl-mc: Block calling interrupt handler without trigger") left this variable unused, so remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240730141133.525771-1-yuehaibing@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-09-03 08:42:06 -06:00
Dr. David Alan Gilbert	e1bf0f2ac9	vfio/pci: Remove unused struct 'vfio_pci_mmap_vma' 'vfio_pci_mmap_vma' has been unused since commit `aac6db75a9` ("vfio/pci: Use unmap_mapping_range()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240727160307.1000476-1-linux@treblig.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-09-03 08:42:06 -06:00
Al Viro	1da91ea87a	introduce fd_file(), convert all accessors to it. For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:00:43 -04:00
Linus Torvalds	c2a96b7f18	Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add\|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create\|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...	2024-07-25 10:42:22 -07:00
Linus Torvalds	3c3ff7be97	Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove support for 40x CPUs & platforms - Add support to the 64-bit BPF JIT for cpu v4 instructions - Fix PCI hotplug driver crash on powernv - Fix doorbell emulation for KVM on PAPR guests (nestedv2) - Fix KVM nested guest handling of some less used SPRs - Online NUMA nodes with no CPU/memory if they have a PCI device attached - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and Vaibhav Jain. * tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits) Documentation/powerpc: Mention 40x is removed powerpc: Remove 40x leftovers macintosh/therm_windtunnel: fix module unload. powerpc: Check only single values are passed to CPU/MMU feature checks powerpc/xmon: Fix disassembly CPU feature checks powerpc: Drop clang workaround for builtin constant checks powerpc64/bpf: jit support for signed division and modulo powerpc64/bpf: jit support for sign extended mov powerpc64/bpf: jit support for sign extended load powerpc64/bpf: jit support for unconditional byte swap powerpc64/bpf: jit support for 32bit offset jmp instruction powerpc/pci: Hotplug driver bridge support pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC powerpc: add missing MODULE_DESCRIPTION() macros macintosh/mac_hid: add MODULE_DESCRIPTION() KVM: PPC: add missing MODULE_DESCRIPTION() macros powerpc/kexec: Use of_property_read_reg() powerpc/64s/radix/kfence: map __kfence_pool at page granularity powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API ...	2024-07-19 21:00:33 -07:00
Linus Torvalds	f66b07c561	Merge tag 'vfio-v6.11-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Add support for 8-byte accesses when using read/write through the device regions. This fills a gap for userspace drivers that might not be able to use access through mmap to perform native register width accesses (Gerd Bayer) - Add missing MODULE_DESCRIPTION to vfio-mdev sample drivers and replace a non-standard MODULE_INFO usage (Jeff Johnson) * tag 'vfio-v6.11-rc1' of https://github.com/awilliam/linux-vfio: vfio-mdev: add missing MODULE_DESCRIPTION() macros vfio/pci: Fix typo in macro to declare accessors vfio/pci: Support 8-byte PCI loads and stores vfio/pci: Extract duplicated code into macro	2024-07-19 11:53:09 -07:00
Linus Torvalds	ebcfbf02ab	Merge tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...	2024-07-19 09:59:58 -07:00
Yi Liu	5a88a3f67e	vfio/pci: Init the count variable in collecting hot-reset devices The count variable is used without initialization, it results in mistakes in the device counting and crashes the userspace if the get hot reset info path is triggered. Fixes: `f6944d4a0b` ("vfio/pci: Collect hot-reset devices to local buffer") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219010 Reported-by: Žilvinas Žaltiena <zaltys@natrix.lt> Cc: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240710004150.319105-1-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2024-07-10 08:47:46 -06:00

1 2 3 4 5 ...

1239 Commits