linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-06-21 15:43:21 +02:00

Author	SHA1	Message	Date
Arnd Bergmann	0ee8ac903e	RDMA/hfi1: Open-code rvt_set_ibdev_name() clang warns about a function missing a printf attribute: include/rdma/rdma_vt.h:457:47: error: diagnostic behavior may be improved by adding the 'format(printf, 2, 3)' attribute to the declaration of 'rvt_set_ibdev_name' [-Werror,-Wmissing-format-attribute] 447 \| static inline void rvt_set_ibdev_name(struct rvt_dev_info rdi, \| __attribute__((format(printf, 2, 3))) 448 \| const char fmt, const char *name, 449 \| const int unit) The helper was originally added as an abstraction for the hfi1 and qib drivers needing the same thing, but now qib is gone, and hfi1 is the only remaining user of rdma_vt. Avoid the warning and allow the compiler to check the format string by open-coding the helper and directly assigning the device name. Fixes: `5084c8ff21` ("IB/{rdmavt, hfi1, qib}: Self determine driver name") Link: https://patch.msgid.link/r/20260602140453.3542427-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-05 12:38:42 -03:00
Jason Gunthorpe	55d984dae6	RDMA/umem: Make ib_umem_is_contiguous() safe on 32 bit Sashiko points out the roundup_pow_of_two() only uses unsigned long but dma_addr_t can be u64. Change this algorithm to be simpler, compute the page size, if any page size is found and it results in a single block then it is contiguous. Link: https://patch.msgid.link/r/3-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-05 12:36:33 -03:00
Jason Gunthorpe	09ea6837a0	RDMA/umem: Be careful about boundary conditions in ib_umem_find_best_pgsz() Several corner cases, especially important on 32 bits: - umem->iova is u64, the function argument should pass in u64 or iova will be truncated - Check that the length is not too large for the iova - Check that lengths > 4G don't overflow the GENMASK Link: https://patch.msgid.link/r/2-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-05 12:36:33 -03:00
Jason Gunthorpe	bad4e98893	RDMA: Update the query_device() op This op hasn't followed the normal pattern of passing NULL for udata when invoked by the kernel. Instead the kernel caller creates a dummy ib_udata on the stack and passes that in. It does not seem to currently be a bug, but this flow should be modernized to use the new API flow and in the process accept NULL as well. Only mlx4 uses an input request structure, have every other driver call ib_is_udata_in_empty() to enforce the lack of request structs. Use ib_respond_empty_udata() in every driver that does not use a response struct. Ensure a check for NULL udata before calling ib_respond_udata() in bnxt_re, efa, and mlx5. Make mlx4 safe to be called with NULL. Link: https://patch.msgid.link/r/2-v1-922fa8e828ba+f7-ib_udata_stack_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-03 15:12:43 -03:00
Jason Gunthorpe	43b57d73eb	RDMA/core: Don't make a dummy ib_udata on the stack in create_qp Sashiko points out the udata for destruction has to be created using uverbs_get_cleared_udata(). Move it to ib_core_uverbs.c so that the core qp code can call it. Rework the call chain to pass the struct uverbs_attr_bundle right up to the driver op callback. Fixes a possible wild stack reference in drivers during error unwinding, mlx5 can call rdma_udata_to_drv_context() from destroy_qp() when destroying a QP. Fixes: `00a79d6b99` ("RDMA/core: Configure selinux QP during creation") Link: https://patch.msgid.link/r/1-v1-922fa8e828ba+f7-ib_udata_stack_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-03 15:12:43 -03:00
Cyrill Gorcunov	b548a6c4ee	RDMA/irdma: Fix typo in SQ completions generation When we generate completion for SQ the opcode while being properly read from ring buffer is ignored when written back to completion. Seems to be a simple typo. Link: https://patch.msgid.link/r/ahjB87k54bYdFbft@grain Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-03 15:05:29 -03:00
Maoyi Xie	ba7c4912f7	RDMA/hns: drop dead empty check in setup_root_hem() setup_root_hem() reads the first entry of head->root and checks the returned pointer against NULL: root_hem = list_first_entry(&head->root, struct hns_roce_hem_item, list); if (!root_hem) return -ENOMEM; list_first_entry() never returns NULL. On an empty list it returns container_of(head, ..., list), a non-NULL garbage pointer that aliases the head. So the check is dead. The only caller adds an entry to head.root right before invoking setup_root_hem(): list_add(&root_hem->list, &head.root); ret = setup_root_hem(..., &head, ...); So head.root is guaranteed non-empty on entry. Drop the check. Link: https://patch.msgid.link/r/20260526054653.2054800-1-maoyixie.tju@gmail.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-06-03 15:04:40 -03:00
Tristan Madani	d6ab440240	RDMA/rxe: Copy WQE to local buffer in non-SRQ receive path For non-SRQ QPs, the responder reads WQE fields directly from the shared queue buffer mapped into userspace. This allows a malicious user to modify fields like num_sge or sge entries while the kernel is processing the WQE, leading to out-of-bounds reads in rxe_resp_check_length() and copy_data(). Introduce get_recv_wqe() that validates num_sge and copies the WQE to a kernel-local buffer before processing, matching the approach already used for SRQ WQEs in get_srq_wqe(). The srq_wqe buffer is reused since SRQ and non-SRQ paths are mutually exclusive per QP. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://patch.msgid.link/r/20260518215040.1598586-3-tristan@talencesecurity.com Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:33:59 -03:00
Tristan Madani	22b8fbded6	RDMA/rxe: Fix TOCTOU heap overflow in get_srq_wqe get_srq_wqe() reads wqe->dma.num_sge from the shared receive queue buffer, which is mapped into userspace. It validates num_sge against max_sge, but then re-reads the same field to calculate the memcpy size. A concurrent userspace thread can modify num_sge between validation and use, causing a heap buffer overflow when copying the WQE into qp->resp.srq_wqe. Read num_sge into a local variable and use it for both the bounds check and the size calculation. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://patch.msgid.link/r/20260518215040.1598586-2-tristan@talencesecurity.com Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:32:48 -03:00
Jiri Pirko	3b6384dac1	RDMA/umem: Block plain userspace memory registration under CoCo bounce When a device requires DMA bounce buffering inside a Confidential Computing guest, __ib_umem_get_va() cannot work. The DMA mapping layer redirects all mappings through swiotlb bounce buffers, so the device receives DMA addresses pointing to bounce buffer memory rather than the user's pages. Since RDMA devices access registered memory directly without CPU involvement, there is no opportunity for swiotlb to synchronize between the bounce buffer and the original pages. The registration would already fail later on, since the umem mapping is requested with DMA_ATTR_REQUIRE_COHERENT and gets rejected under is_swiotlb_force_bounce() with -EIO. Fail early with -EOPNOTSUPP instead, so the user gets a specific error code to react to. Link: https://patch.msgid.link/r/20260517141311.2409230-3-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:27:29 -03:00
Jiri Pirko	d7a40b5194	RDMA/uverbs: Expose CoCo DMA bounce requirement to userspace In CoCo guests, guest memory is encrypted and untrusted (T=0) devices cannot DMA to it directly; such transfers must go through unencrypted bounce buffers. RDMA registers user pages for direct device access, bypassing the DMA layer and thus any bouncing, so registered memory does not work in this configuration. Until trusted (T=1) device detection is available, conservatively flag every device attached to a CoCo guest. Expose the condition to userspace as IB_UVERBS_DEVICE_CC_DMA_BOUNCE in device_cap_flags_ex so applications can avoid memory registration and fall back to copying buffers through send/recv. Link: https://patch.msgid.link/r/20260517141311.2409230-2-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:27:29 -03:00
Jiri Pirko	93ce776a5b	RDMA/mlx5: Use UMEM attribute for QP doorbell record Add an optional mlx5 driver-namespace UMEM attribute on QP create so userspace can supply the doorbell record umem explicitly, symmetric to the CQ side. Resolve it inside mlx5_ib_db_map_user() and use it as a private DBR page when present; otherwise take the existing UHW share-or-pin path that preserves per-page DBR sharing across CQ/QP/SRQ in the same process. Add mlx5's first UVERBS_OBJECT_QP UAPI definition chain to attach the new attr. Link: https://patch.msgid.link/r/20260529134312.2836341-17-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:20:00 -03:00
Jiri Pirko	d6ab1d2439	RDMA/mlx5: Use UMEM attribute for CQ doorbell record Add an optional mlx5 driver-namespace UMEM attribute on CQ create so userspace can supply the doorbell record buffer explicitly. mlx5_ib_db_map_user() resolves the attribute (or falls back to the legacy UHW VA) into a struct ib_uverbs_buffer_desc and runs a unified lookup-then-pin: VA-typed descriptors share a per-page umem across CQ/QP/SRQ in the same process, FD-typed descriptors are pinned per call. Link: https://patch.msgid.link/r/20260529134312.2836341-16-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	2cc10972f5	RDMA/umem: Add ib_umem_is_contiguous() stub for !CONFIG_INFINIBAND_USER_MEM ib_umem_is_contiguous() is defined under #ifdef CONFIG_INFINIBAND_USER_MEM, but the #else branch lacks a stub. Add the missing inline to fix potential broken build. Fixes: `c897c2c8b8` ("RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers") Link: https://patch.msgid.link/r/20260529134312.2836341-15-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	38fc5bab6c	RDMA/mlx5: Use UMEM attributes for QP buffers in create_qp Use the per-attribute UMEM helpers to pin QP buffer umems on demand. The QP-type predicate selects between the BUF and RQ_BUF attrs; raw-packet SQ uses its own dedicated SQ_BUF attr. Link: https://patch.msgid.link/r/20260529134312.2836341-14-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	cd767d980c	RDMA/uverbs: Use UMEM attributes for QP creation Apply the per-attribute UMEM model to the QP create method. Add three optional UMEM attributes that drivers pick from based on how their user ABI lays out the QP rings: - CREATE_QP_BUF_UMEM is a single user buffer that backs both the SQ and RQ of one QP. This is the common case where userspace pins one contiguous WQE region for the QP. - CREATE_QP_SQ_BUF_UMEM and CREATE_QP_RQ_BUF_UMEM are a pair of user buffers backing the SQ and RQ independently, used when the two rings live in physically distinct user allocations and must be pinned and addressed separately. Existing drivers would map their current umems as follows: - mlx5: BUF for normal QPs (one ucmd->buf_addr covers SQ+RQ); for IB_QPT_RAW_PACKET and IB_QP_CREATE_SOURCE_QPN, the RQ side comes from ucmd->buf_addr (RQ-sized) via RQ_BUF and the SQ from ucmd->sq_buf_addr via SQ_BUF. - mlx4: BUF, single ucmd.buf_addr covering SQ+RQ. - hns: BUF, single ucmd.buf_addr covering SQ + ext-SGE + RQ. - erdma: BUF, single ureq.qbuf_va sliced by the kernel into SQ at offset 0 and RQ at rq_offset. - bnxt_re: SQ_BUF (ureq->qpsva) + RQ_BUF (ureq->qprva, the RQ side is skipped when the QP uses an SRQ). - vmw_pvrdma: SQ_BUF (sbuf_addr) + RQ_BUF (rbuf_addr, the RQ side is skipped when the QP uses an SRQ). - qedr: SQ_BUF (sq_addr) + RQ_BUF (rq_addr) for whichever side the QP type actually has (no SQ for XRC_TGT/GSI; no RQ for XRC_INI/XRC_TGT/SRQ). - ionic: SQ_BUF (req.sq.addr) + RQ_BUF (req.rq.addr); both are skipped when the rings are placed in CMB instead of host memory. - mana: raw-packet QP uses SQ_BUF (sq_buf_addr) only; the RC path uses multiple per-queue user buffers (ucmd.queue_buf[]) that do not fit the SQ/RQ pair semantics of these attrs and stays on the legacy UHW path. - efa, irdma, hfi1, ocrdma, mthca, cxgb4 and usnic do not pin a QP WQE buffer via umem; none of these attributes apply. Link: https://patch.msgid.link/r/20260529134312.2836341-13-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	7c5bbaaf44	RDMA/uverbs: Remove legacy umem field from struct ib_cq Now that all drivers use helper to get umem and manage the lifetime, legacy umem field in struct ib_cq is no longer needed. Remove it along with ib_umem_get_cq_tmp() helper that populated it and both error and destroy paths. Link: https://patch.msgid.link/r/20260529134312.2836341-12-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	c0a94fecec	RDMA/mlx4: Use ib_umem_get_cq_buf() for user CQ buffer Pin the user CQ buffer with ib_umem_get_cq_buf() and take ownership of the umem in the driver; fall back to ib_umem_get_va() for the legacy UHW VA path. Apply the same ownership pattern to the resize path. Link: https://patch.msgid.link/r/20260529134312.2836341-11-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:59 -03:00
Jiri Pirko	c1837879e4	RDMA/bnxt_re: Use ib_umem_get_cq_buf_or_va() for user CQ buffer Pin the user CQ buffer with ib_umem_get_cq_buf_or_va() and take ownership of the umem in the driver. Apply the same ownership pattern to the resize path. Link: https://patch.msgid.link/r/20260529134312.2836341-10-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:58 -03:00
Jiri Pirko	ffdc91c993	RDMA/mlx5: Use ib_umem_get_cq_buf_or_va() for user CQ buffer Pin the user CQ buffer with ib_umem_get_cq_buf_or_va() and take ownership of the umem in the driver. Apply the same ownership pattern to the resize path. Link: https://patch.msgid.link/r/20260529134312.2836341-9-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:58 -03:00
Jiri Pirko	e3433474db	RDMA/efa: Use ib_umem_get_cq_buf() for user CQ buffer Pin the user CQ buffer with ib_umem_get_cq_buf() and take ownership of the umem in the driver. Fall back to the existing kernel-DMA path on NULL. Link: https://patch.msgid.link/r/20260529134312.2836341-8-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:58 -03:00
Jiri Pirko	f6d2e53ca5	RDMA/uverbs: Add CQ buffer UMEM attribute and driver helpers Add UVERBS_ATTR_CREATE_CQ_BUF_UMEM and two driver-facing wrappers, ib_umem_get_cq_buf() and ib_umem_get_cq_buf_or_va(), that pin a CQ buffer umem from it. The wrappers reuse the existing legacy CQ buffer-attr filler. Link: https://patch.msgid.link/r/20260529134312.2836341-7-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:58 -03:00
Jiri Pirko	057bb70f53	RDMA/uverbs: Push out CQ buffer umem processing into a helper Extract the UVERBS_ATTR_CREATE_CQ_BUFFER_* parser from the CQ create handler into uverbs_create_cq_get_buffer_desc(), and wrap it in ib_umem_get_cq_tmp(), the umem-producing helper the cq_create handler now calls. ib_umem_get_cq_tmp() is temporary; subsequent patches replace it with driver-owned ib_umem_get_cq_buf*() wrappers built on the same parser, and remove it once all CQ drivers have switched. Link: https://patch.msgid.link/r/20260529134312.2836341-6-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:58 -03:00
Jiri Pirko	df67f15c44	RDMA/umem: Route ib_umem_get_va() through ib_umem_get_attr_or_va() ib_umem_get_va() is now redundant: ib_umem_get_attr_or_va() with attrs=NULL and attr_id=0 covers the exact same path. Make it a static inline wrapper instead of a separately exported symbol. Link: https://patch.msgid.link/r/20260529134312.2836341-5-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:57 -03:00
Jiri Pirko	3cfdff484d	RDMA/core: Introduce generic buffer descriptor infrastructure for umem Introduce a per-attribute UVERBS_ATTR_UMEM model so each uverbs command's umem set is explicit in its UAPI definition. Add driver-facing wrapper helpers that pin a umem on demand from an attribute or a VA addr; the driver owns the returned umem and releases it from its destroy/error paths. Link: https://patch.msgid.link/r/20260529134312.2836341-4-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:57 -03:00
Jiri Pirko	28fb701c64	RDMA/umem: Split ib_umem_get_va() into a thin wrapper around __ib_umem_get_va() The follow-up patch is going to introduce ib_umem_get_desc(), the canonical desc-to-umem helper, which needs to pin a userspace VA without going through the exported ib_umem_get_va() helper so later on ib_umem_get_va() would use the ib_umem_get_desc() flow too. Move the existing ib_umem_get_va() to a static __ib_umem_get_va() and have ib_umem_get_va() as a thin wrapper that calls it. Link: https://patch.msgid.link/r/20260529134312.2836341-3-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:57 -03:00
Jiri Pirko	af84105686	RDMA/umem: Rename ib_umem_get() to ib_umem_get_va() The new umem getter family being introduced in follow-up patches need a fitting name for the central all-source helper that resolves attributes, legacy fillers and a UHW VA fallback. Rename the existing VA-pinning helper ib_umem_get() to ib_umem_get_va() so the name is freed up. The new name is consistent with names of rest of the helpers that are about to be introduced. Link: https://patch.msgid.link/r/20260529134312.2836341-2-jiri@resnulli.us Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 20:19:57 -03:00
Purushothaman Ramalingam	ded0abacdc	RDMA/rxe: Fix typos in comments Fix typos found by codespell in driver comments: rxe.c: s/mangagement/management/ rxe_param.h: s/interations/iterations/ rxe_resp.c: s/recive/receive/ No functional change. Link: https://patch.msgid.link/r/20260527104527.3222-1-purush.ramalingam@gmail.com Signed-off-by: Purushothaman Ramalingam <purush.ramalingam@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 10:39:10 -03:00
Dave Hansen	57d8790620	MAINTAINERS: Remove bouncing Intel RDMA ethernet protocol maintainer The email for Krzysztof Czurylo is bouncing. Remove the entry. Link: https://patch.msgid.link/r/20260526205140.32714-1-dave.hansen@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-29 10:38:06 -03:00
Jason Gunthorpe	9733e9f580	RDMA/core: Move flow related functions to ib_uverbs_support.ko mlx5 uses these as part of the driver implementation, move them to the support module instead. Link: https://patch.msgid.link/r/6-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:11:43 -03:00
Jason Gunthorpe	1a76adc9b3	RDMA/core: Move ucaps into ib_uverbs_support.ko mlx5 uses these move them into the support module from ib_uverbs.ko. Link: https://patch.msgid.link/r/5-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:11:43 -03:00
Jason Gunthorpe	2274d8cb49	RDMA/core: Make a new module for the uverbs components needed by drivers To maintain the split where ib_uverbs.ko should not be depended on by drivers, add a new module ib_uverbs_support.ko which contains the driver called functions that are too large or too rare to be placed in ib_uverbs_core.ko Start by moving most of rdma_core.c into this module, making some adjustments to split it from the actual uverbs FD code. This was not done originally because we lacked EXPORT_SYMBOL_NS and I had a fear that drivers would abuse this interface surface. Link: https://patch.msgid.link/r/4-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:09:37 -03:00
Jason Gunthorpe	2c296df0ad	RDMA/core: Remove uverbs_async_event_release() Instead of having an alternative fops release always use the standard uverbs_uobject_fd_release() and route the special async behavior back up through uverbs_obj_fd_type ops pointer. This removes a dependency where the technically lower level rdma_core.c is referring to a symbol from uverbs_std_types_async_fd.c. Link: https://patch.msgid.link/r/3-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:09:37 -03:00
Jason Gunthorpe	ab2d2b2872	RDMA/core: Move many of the little EXPORTs from uverbs_ioctl into ib_core_uverbs Not as many drivers need these functions but it does free efa from the ib_uverbs.ko dependency and follows the general design better. Link: https://patch.msgid.link/r/2-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:07:26 -03:00
Jason Gunthorpe	33d1117eb5	RDMA/core: Do not compile ib_core_uverbs without USER_ACCESS Remove the entire ib_core_uverbs.c from the build if CONFIG_INFINIBAND_USER_ACCESS is not set. These functions are only used to support uverbs and are never callable even if they happen to get linked in. Provide inlines for the missing ones to return errors to further push code elimination in drivers. Link: https://patch.msgid.link/r/1-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-26 10:07:25 -03:00
Jason Gunthorpe	e312f0ff9e	Merge tag 'v7.1-rc5' into rdma.git for-next For dependencies in the following patches Resolve conflicts, use the goto labels from the rc tag. * tag 'v7.1-rc5': (1526 commits) Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 13:48:00 -03:00
Tao Cui	b86fd95805	RDMA/counter: Fix incorrect port index in rdma_counter_init() error cleanup The error cleanup loop in rdma_counter_init() iterates with variable 'i' but accesses dev->port_data[port] instead of dev->port_data[i]. This causes the failed port's hstats to be freed multiple times while leaking hstats of previously initialized ports. Fixes: `56594ae1d2` ("RDMA/core: Annotate destroy of mutex to ensure that it is released as unlocked") Link: https://patch.msgid.link/r/20260520104546.1776253-3-cuitao@kylinos.cn Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 12:40:08 -03:00
Tao Cui	4fbc823000	RDMA/counter: Fix num_counters leak on bind_qp failure in alloc_and_bind() When __rdma_counter_bind_qp() fails in alloc_and_bind(), the error path jumps to err_mode which frees the counter without decrementing port_counter->num_counters. The only place that decrements is rdma_counter_free(), which is unreachable since the counter was never successfully bound. This leak accumulates across repeated failures, permanently preventing the port from switching to AUTO mode (-EBUSY in __counter_set_mode()) and blocking the MANUAL→NONE auto-revert in rdma_counter_free(). When the mode was NONE before the call, the MANUAL mode set by __counter_set_mode() also leaks since the revert logic is never reached. Add an err_bind label between the num_counters increment and the existing err_mode label. It decrements num_counters and mirrors the MANUAL→NONE revert from rdma_counter_free(), ensuring the port state is fully restored on bind failure. Link: https://patch.msgid.link/r/20260520104546.1776253-2-cuitao@kylinos.cn Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 12:40:08 -03:00
Lianfa Weng	bbd97d71e5	RDMA/hns: Fix log flood after cmd_mbox failure hns_roce_cmd_mbox() is the command interface between driver and hardware. When hardware is abnormal, the unlimited error printings after hns_roce_cmd_mbox() failure will cause log flood and even system crash. Replace ibdev_err() and ibdev_warn() with their ratelimited versions in the error handling path after hns_roce_cmd_mbox() (and its wrappers hns_roce_create_hw_ctx/hns_roce_destroy_hw_ctx) fails. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://patch.msgid.link/r/20260520055759.2354037-4-huangjunxian6@hisilicon.com Signed-off-by: Lianfa Weng <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 11:39:04 -03:00
Lianfa Weng	3f19c2a385	RDMA/hns: Fix warning in poll cq direct mode CQs allocated by ib_alloc_cq() always have a comp_handler. Though in direct mode this handler is never expected to be called, it is still called when the driver is reset, triggering the following WARN_ONCE(): Call trace: ib_cq_completion_direct+0x38/0x60 hns_roce_cq_completion+0x54/0x90 (hns_roce_hw_v2] hns_roce_handle_device_err+Ox1c8/0x340 [hns_roce_hw_v2] hns_roce_hw_v2_uninit_instance.constprop.0+0x34/0x70 [hns_roce_hw_v2] hns_roce_hw_v2_reset_notify+0xc4/0xe0 [hns_roce_hw_v2] hclge_notify_roce_client+0x60/0xbc [hclge] hclge_reset_rebuild+0x48/0x34c [hclge] hclge_reset_subtask+0xcc/0xec [hclge] hclge_reset_service_task+0x80/0x160 [hclge] hclge_service_task+0x50/0x80 (hclge] process_one_work+0x1cc/0x4d0 worker_thread+0x154/0x414 kthread+0x104/0x144 ret_from_fork+0x10/0x18 Fixes: `f295e4cece` ("RDMA/hns: Delete unnecessary callback functions for cq") Link: https://patch.msgid.link/r/20260520055759.2354037-3-huangjunxian6@hisilicon.com Signed-off-by: Lianfa Weng <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 11:39:04 -03:00
Guangshuo Li	9a8826fdfb	IB/mlx4: Fix refcount leak in add_port() error path After kobject_init_and_add(), the lifetime of the embedded struct kobject is expected to be managed through the kobject core reference counting. In add_port(), failure paths after kobject_init_and_add() must not free struct mlx4_port directly, because the embedded kobject is then managed by the kobject core. Freeing it directly leaves the kobject reference counting unbalanced and can lead to incorrect lifetime handling. Allocate the pkey and gid attribute arrays before kobject_init_and_add(), so failures before kobject initialization can be handled by directly freeing the allocated memory. Once kobject_init_and_add() has been called, unwind later failures by removing any successfully created sysfs groups, calling kobject_del(), and then releasing the embedded kobject with kobject_put(). Fixes: `c1e7e46612` ("IB/mlx4: Add iov directory in sysfs under the ib device") Link: https://patch.msgid.link/r/20260518021910.972900-1-lgs201920130244@gmail.com Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 11:25:17 -03:00
Zhu Yanjun	35744ab3d0	RDMA/rxe: Fix a use-after-free problem in rxe_mmap rxe_mmap() removes a rxe_mmap_info struct from the pending_mmaps list and releases pending_lock while the struct's kref is still at 1: list_del_init(&ip->pending_mmaps); spin_unlock_bh(&rxe->pending_lock); /* ref == 1, no lock held / ret = remap_vmalloc_range(vma, ip->obj, 0); / walks PTEs / [...] rxe_vma_open(vma); / kref_get, ref → 2 / remap_vmalloc_range_partial() walks PTEs without any lock. A concurrent DESTROY_CQ ioctl on another CPU calls: kref_put(&q->ip->ref, rxe_mmap_release) / ref 1→0 / vfree(ip->obj) / clears vmalloc PTEs mid-walk / kfree(ip) / frees rxe_mmap_info */ This yields: 1. Kernel crash, vmalloc_to_page() returns NULL when vfree wins the per-PTE race -> vm_insert_page(NULL) → GPF in validate_page_before_insert 2. Page UAF, vmalloc_to_page() reads a stale PTE before vfree clears it. User VMA holds a PTE to a free'd page which might eventually get reallocated later by vmalloc which allows the attacker to get a clean page-level UAF. It is worth noting that even though a page-level UAF is possible given the strong primitive, it is statistically very difficult to achieve given the very short time window (after the last insert_page and before the kref_get). The call trace are as below: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 UID: 1000 PID: 413 Comm: poc Not tainted 7.0.0-rc5-dirty #28 PREEMPT(lazy) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:validate_page_before_insert+0x32/0x300 Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 e8 93 b5 a3 ff 48 8d 7b 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 7b 02 00 00 4c 8b 63 08 31 ff 4d 89 e5 41 83 e5 RSP: 0018:ffff88811b15f2f0 EFLAGS: 00000202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 RBP: ffff88811b15f318 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881181eee00 R13: 0000000000000000 R14: ffff8881181eee00 R15: ffff8881181eee20 FS: 00007b1e000f76c0(0000) GS:ffff8884268e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007b1e00a24ac0 CR3: 0000000116eb3000 CR4: 00000000000006f0 Call Trace: <TASK> insert_page+0x8f/0x190 ? __pfx_insert_page+0x10/0x10 ? kasan_save_alloc_info+0x38/0x60 vm_insert_page+0x2e7/0x400 remap_vmalloc_range_partial+0x212/0x3e0 remap_vmalloc_range+0x6e/0xb0 ? __kasan_check_write+0x14/0x30 rxe_mmap+0x2e9/0x5d0 ib_uverbs_mmap+0x1ad/0x2c0 __mmap_region+0x12c2/0x2ad0 ? __pfx___mmap_region+0x10/0x10 ? __sanitizer_cov_trace_switch+0x58/0xb0 ? mas_prev_slot+0x360/0x39c0 ? __sanitizer_cov_trace_switch+0x58/0xb0 ? mas_next_slot+0x1e5b/0x2f40 ? __sanitizer_cov_trace_cmp8+0x18/0x30 ? unmapped_area_topdown+0x4dd/0x610 ? kfree+0x1b1/0x440 ? free_cpumask_var+0x16/0x30 ? __kasan_slab_free+0x7d/0xa0 ? __sanitizer_cov_trace_cmp8+0x18/0x30 mmap_region+0x2e6/0x3c0 do_mmap+0xa3e/0x12a0 ? __pfx_do_mmap+0x10/0x10 ? __kasan_check_write+0x14/0x30 ? down_write_killable+0xba/0x160 ? __pfx_down_write_killable+0x10/0x10 ? __sanitizer_cov_trace_cmp4+0x16/0x30 vm_mmap_pgoff+0x2d4/0x4a0 ? __pfx_vm_mmap_pgoff+0x10/0x10 ? fget+0x1bf/0x270 ksys_mmap_pgoff+0x40c/0x690 ? __sanitizer_cov_trace_const_cmp4+0x16/0x30 ? __pfx_ksys_mmap_pgoff+0x10/0x10 ? __kasan_check_write+0x14/0x30 ? _raw_spin_trylock+0xbb/0x130 ? __pfx__raw_spin_trylock+0x10/0x10 __x64_sys_mmap+0x135/0x1e0 x64_sys_call+0x1c14/0x2790 do_syscall_64+0xd2/0x1050 ? rcu_core+0x352/0x7d0 ? rcu_core_si+0xe/0x20 ? handle_softirqs+0x1aa/0x650 ? __sanitizer_cov_trace_cmp4+0x16/0x30 ? fpregs_assert_state_consistent+0xe1/0x160 ? irqentry_exit+0xb1/0x670 entry_SYSCALL_64_after_hwframe+0x76/0x7e Link: https://patch.msgid.link/r/20260515002537.6209-1-yanjun.zhu@linux.dev Reported-and-tested-by: nasm <n4sm@protonmail.com> Suggested-by: nasm <n4sm@protonmail.com> Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 11:25:16 -03:00
Jacob Moroni	5ebb3ed757	RDMA/irdma: Fix out-of-bounds write in irdma_copy_user_pgaddrs The irdma_copy_user_pgaddrs function loops through all of the umem DMA blocks to populate the PBLEs and will stop when either the last DMA block is reached or palloc->total_cnt is reached. The issue is that the logic for checking palloc->total_cnt would only work for non-zero values. When irdma_setup_pbles is called with lvl==0, it calls irdma_copy_user_pgaddrs with palloc->total_cnt==0, which means the only way to break out of the loop is to reach the last umem DMA block, which means it could end up going beyond the fixed size of 4 iwmr->pgaddrmem array that is used in the lvl==0 case. In the case of QP/CQ/SRQ rings, the value of lvl is determined by a separate input (for example, req.cq_pages in the case of a CQ). So, we must perform explicit checking to ensure we don't overflow the pgaddrmem array if the user provides a umem that consists of more blocks than their provided req.cq_pages. Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Link: https://patch.msgid.link/r/20260512183852.614045-1-jmoroni@google.com Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-25 10:50:42 -03:00
Linus Torvalds	e7ae89a0c9	Linux 7.1-rc5 v7.1-rc5	2026-05-24 13:48:06 -07:00
Shiraz Saleem	d28654518c	RDMA/mana_ib: Use ib_get_eth_speed for reporting port speed Replace hardcoded IB_WIDTH_4X/IB_SPEED_EDR with ib_get_eth_speed() to report the actual link speed in mana_ib_query_port(). Fixes: `4bda1d5332` ("RDMA/mana_ib: Implement port parameters") Link: https://patch.msgid.link/r/20260512094056.264827-1-kotaranov@linux.microsoft.com Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-24 17:24:24 -03:00
Rosen Penev	992ad0c012	RDMA/rtrs: Use flexible array for client path stats Store the client path statistics in the RTRS client path allocation instead of allocating them separately. This ties the stats lifetime directly to the path and removes a separate allocation failure path. Keep freeing the per-CPU stats data separately, but do not free the embedded stats object from error paths or the stats kobject release handler. Link: https://patch.msgid.link/r/20260511041812.378030-1-rosenp@gmail.com Assisted-by: Codex:GPT-5.5 Signed-off-by: Rosen Penev <rosenp@gmail.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-05-24 17:17:10 -03:00
Linus Torvalds	6a97c4d526	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "arm64: - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only. RISC-V: - Fix invalid HVA warning in steal-time recording - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and pmu_snapshot_set_shmem() - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler - Fix sign extension of value for MMIO loads s390: - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by the page table rewrite. x86: - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on Hygon Family 18h, just like on AMD Family 17h. - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the VM's configured APIC bus frequency instead of the default. This is less confusing (read: not wrong) and makes it easier to fill in CPUID information that communicates the APIC bus frequency to the guest. Selftests: - Do not include glibc-internal <bits/endian.h>; it worked by chance and broke building KVM selftests with musl" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235) KVM: selftests: Verify that KVM returns the configured APIC cycle length KVM: x86: Return the VM's configured APIC bus frequency when queried KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h> KVM: s390: Properly reset zero bit in PGSTE KVM: s390: vsie: Fix redundant rmap entries KVM: s390: vsie: Fix unshadowing logic KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors KVM: s390: vsie: Fix memory leak when unshadowing KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc KVM: arm64: vgic: Free private_irqs when init fails after allocation KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits RISC-V: KVM: Fix sign extension for MMIO loads RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM RISC-V: KVM: Fix invalid HVA warning in steal-time recording	2026-05-24 12:50:36 -07:00
Linus Torvalds	3526d74623	Merge tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - On SEV guests, handle set_memory_{encrypted,decrypted}() failures more conservatively by assuming that all affected pages are unencrypted (Carlos López) - Disable broadcast TLB flush when PCID is disabled (Tom Lendacky) - Fix VMX vs. hrtimer_rearm_deferred() regression (Peter Zijlstra) - Move IRQ/NMI dispatch code from KVM into x86 core, to prepare for a KVM x2apic fix (Peter Zijlstra) - Fix incorrect munmap() size on map_vdso() failure (Guilherme Giacomo Simoes) * tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: sev-guest: Explicitly leak pages in unknown state x86/mm: Disable broadcast TLB flush when PCID is disabled x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred() x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core x86/vdso: Fix incorrect size in munmap() on map_vdso() failure	2026-05-24 11:00:45 -07:00
Linus Torvalds	a674bf74b3	Merge tag 'irq-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irqchip driver fixes from Ingo Molnar: - Fix the hardware probing error path of the renesas-rzt2h irqchip driver - Fix the exynos-combiner irqchip driver on -rt kernels by turning the IRQ controller spinlock into a raw spinlock * tag 'irq-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/renesas-rzt2h: Use pm_runtime_put_sync() in probe error path irqchip/exynos-combiner: Switch to raw_spinlock	2026-05-24 10:55:21 -07:00
Linus Torvalds	ee651da6d3	Merge tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects fix from Ingo Molnar:: - Fix debugobjects regression on -rt kernels: don't fill the pool (which uses a coarse lock) if ->pi_blocked_on, because that messes up the priority inheritance of callers * tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Do not fill_pool() if pi_blocked_on	2026-05-24 10:48:55 -07:00

1 2 3 4 5 ...

1446336 Commits