clang warns about a function missing a printf attribute:
include/rdma/rdma_vt.h:457:47: error: diagnostic behavior may be improved by adding the 'format(printf, 2, 3)' attribute to the declaration of 'rvt_set_ibdev_name' [-Werror,-Wmissing-format-attribute]
447 | static inline void rvt_set_ibdev_name(struct rvt_dev_info *rdi,
| __attribute__((format(printf, 2, 3)))
448 | const char *fmt, const char *name,
449 | const int unit)
The helper was originally added as an abstraction for the hfi1 and
qib drivers needing the same thing, but now qib is gone, and hfi1
is the only remaining user of rdma_vt.
Avoid the warning and allow the compiler to check the format string by
open-coding the helper and directly assigning the device name.
Fixes: 5084c8ff21 ("IB/{rdmavt, hfi1, qib}: Self determine driver name")
Link: https://patch.msgid.link/r/20260602140453.3542427-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Sashiko points out the roundup_pow_of_two() only uses unsigned long but
dma_addr_t can be u64.
Change this algorithm to be simpler, compute the page size, if any page
size is found and it results in a single block then it is contiguous.
Link: https://patch.msgid.link/r/3-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Several corner cases, especially important on 32 bits:
- umem->iova is u64, the function argument should pass in u64 or
iova will be truncated
- Check that the length is not too large for the iova
- Check that lengths > 4G don't overflow the GENMASK
Link: https://patch.msgid.link/r/2-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This op hasn't followed the normal pattern of passing NULL for udata when
invoked by the kernel. Instead the kernel caller creates a dummy ib_udata
on the stack and passes that in. It does not seem to currently be a bug,
but this flow should be modernized to use the new API flow and in the
process accept NULL as well.
Only mlx4 uses an input request structure, have every other driver call
ib_is_udata_in_empty() to enforce the lack of request structs.
Use ib_respond_empty_udata() in every driver that does not use a response
struct.
Ensure a check for NULL udata before calling ib_respond_udata() in
bnxt_re, efa, and mlx5.
Make mlx4 safe to be called with NULL.
Link: https://patch.msgid.link/r/2-v1-922fa8e828ba+f7-ib_udata_stack_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Sashiko points out the udata for destruction has to be created using
uverbs_get_cleared_udata(). Move it to ib_core_uverbs.c so that the core
qp code can call it. Rework the call chain to pass the struct
uverbs_attr_bundle right up to the driver op callback.
Fixes a possible wild stack reference in drivers during error unwinding,
mlx5 can call rdma_udata_to_drv_context() from destroy_qp() when
destroying a QP.
Fixes: 00a79d6b99 ("RDMA/core: Configure selinux QP during creation")
Link: https://patch.msgid.link/r/1-v1-922fa8e828ba+f7-ib_udata_stack_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
setup_root_hem() reads the first entry of head->root and checks
the returned pointer against NULL:
root_hem = list_first_entry(&head->root,
struct hns_roce_hem_item, list);
if (!root_hem)
return -ENOMEM;
list_first_entry() never returns NULL. On an empty list it returns
container_of(head, ..., list), a non-NULL garbage pointer that
aliases the head. So the check is dead.
The only caller adds an entry to head.root right before invoking
setup_root_hem():
list_add(&root_hem->list, &head.root);
ret = setup_root_hem(..., &head, ...);
So head.root is guaranteed non-empty on entry. Drop the check.
Link: https://patch.msgid.link/r/20260526054653.2054800-1-maoyixie.tju@gmail.com
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For non-SRQ QPs, the responder reads WQE fields directly from the
shared queue buffer mapped into userspace. This allows a malicious
user to modify fields like num_sge or sge entries while the kernel
is processing the WQE, leading to out-of-bounds reads in
rxe_resp_check_length() and copy_data().
Introduce get_recv_wqe() that validates num_sge and copies the WQE
to a kernel-local buffer before processing, matching the approach
already used for SRQ WQEs in get_srq_wqe(). The srq_wqe buffer is
reused since SRQ and non-SRQ paths are mutually exclusive per QP.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260518215040.1598586-3-tristan@talencesecurity.com
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
get_srq_wqe() reads wqe->dma.num_sge from the shared receive queue
buffer, which is mapped into userspace. It validates num_sge against
max_sge, but then re-reads the same field to calculate the memcpy
size. A concurrent userspace thread can modify num_sge between
validation and use, causing a heap buffer overflow when copying the
WQE into qp->resp.srq_wqe.
Read num_sge into a local variable and use it for both the bounds
check and the size calculation.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260518215040.1598586-2-tristan@talencesecurity.com
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When a device requires DMA bounce buffering inside a Confidential
Computing guest, __ib_umem_get_va() cannot work. The DMA mapping layer
redirects all mappings through swiotlb bounce buffers, so the device
receives DMA addresses pointing to bounce buffer memory rather than the
user's pages. Since RDMA devices access registered memory directly without
CPU involvement, there is no opportunity for swiotlb to synchronize
between the bounce buffer and the original pages.
The registration would already fail later on, since the umem mapping is
requested with DMA_ATTR_REQUIRE_COHERENT and gets rejected under
is_swiotlb_force_bounce() with -EIO. Fail early with -EOPNOTSUPP instead,
so the user gets a specific error code to react to.
Link: https://patch.msgid.link/r/20260517141311.2409230-3-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In CoCo guests, guest memory is encrypted and untrusted (T=0) devices
cannot DMA to it directly; such transfers must go through unencrypted
bounce buffers. RDMA registers user pages for direct device access,
bypassing the DMA layer and thus any bouncing, so registered memory does
not work in this configuration.
Until trusted (T=1) device detection is available, conservatively flag
every device attached to a CoCo guest. Expose the condition to userspace
as IB_UVERBS_DEVICE_CC_DMA_BOUNCE in device_cap_flags_ex so applications
can avoid memory registration and fall back to copying buffers through
send/recv.
Link: https://patch.msgid.link/r/20260517141311.2409230-2-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add an optional mlx5 driver-namespace UMEM attribute on QP
create so userspace can supply the doorbell record umem
explicitly, symmetric to the CQ side. Resolve it inside
mlx5_ib_db_map_user() and use it as a private DBR page when
present; otherwise take the existing UHW share-or-pin path
that preserves per-page DBR sharing across CQ/QP/SRQ in the
same process.
Add mlx5's first UVERBS_OBJECT_QP UAPI definition chain to
attach the new attr.
Link: https://patch.msgid.link/r/20260529134312.2836341-17-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add an optional mlx5 driver-namespace UMEM attribute on CQ
create so userspace can supply the doorbell record buffer
explicitly. mlx5_ib_db_map_user() resolves the attribute (or
falls back to the legacy UHW VA) into a struct
ib_uverbs_buffer_desc and runs a unified lookup-then-pin:
VA-typed descriptors share a per-page umem across CQ/QP/SRQ
in the same process, FD-typed descriptors are pinned per call.
Link: https://patch.msgid.link/r/20260529134312.2836341-16-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Apply the per-attribute UMEM model to the QP create method. Add
three optional UMEM attributes that drivers pick from based on
how their user ABI lays out the QP rings:
- CREATE_QP_BUF_UMEM is a single user buffer that backs both
the SQ and RQ of one QP. This is the common case where
userspace pins one contiguous WQE region for the QP.
- CREATE_QP_SQ_BUF_UMEM and CREATE_QP_RQ_BUF_UMEM are a pair
of user buffers backing the SQ and RQ independently, used
when the two rings live in physically distinct user
allocations and must be pinned and addressed separately.
Existing drivers would map their current umems as follows:
- mlx5: BUF for normal QPs (one ucmd->buf_addr covers SQ+RQ);
for IB_QPT_RAW_PACKET and IB_QP_CREATE_SOURCE_QPN, the RQ
side comes from ucmd->buf_addr (RQ-sized) via RQ_BUF and
the SQ from ucmd->sq_buf_addr via SQ_BUF.
- mlx4: BUF, single ucmd.buf_addr covering SQ+RQ.
- hns: BUF, single ucmd.buf_addr covering SQ + ext-SGE + RQ.
- erdma: BUF, single ureq.qbuf_va sliced by the kernel into
SQ at offset 0 and RQ at rq_offset.
- bnxt_re: SQ_BUF (ureq->qpsva) + RQ_BUF (ureq->qprva, the
RQ side is skipped when the QP uses an SRQ).
- vmw_pvrdma: SQ_BUF (sbuf_addr) + RQ_BUF (rbuf_addr, the RQ
side is skipped when the QP uses an SRQ).
- qedr: SQ_BUF (sq_addr) + RQ_BUF (rq_addr) for whichever
side the QP type actually has (no SQ for XRC_TGT/GSI; no
RQ for XRC_INI/XRC_TGT/SRQ).
- ionic: SQ_BUF (req.sq.addr) + RQ_BUF (req.rq.addr); both
are skipped when the rings are placed in CMB instead of
host memory.
- mana: raw-packet QP uses SQ_BUF (sq_buf_addr) only; the RC
path uses multiple per-queue user buffers (ucmd.queue_buf[])
that do not fit the SQ/RQ pair semantics of these attrs and
stays on the legacy UHW path.
- efa, irdma, hfi1, ocrdma, mthca, cxgb4 and usnic do not pin
a QP WQE buffer via umem; none of these attributes apply.
Link: https://patch.msgid.link/r/20260529134312.2836341-13-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Now that all drivers use helper to get umem and manage the lifetime,
legacy umem field in struct ib_cq is no longer needed. Remove it
along with ib_umem_get_cq_tmp() helper that populated it and both
error and destroy paths.
Link: https://patch.msgid.link/r/20260529134312.2836341-12-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Extract the UVERBS_ATTR_CREATE_CQ_BUFFER_* parser from the CQ
create handler into uverbs_create_cq_get_buffer_desc(), and wrap
it in ib_umem_get_cq_tmp(), the umem-producing helper the cq_create
handler now calls.
ib_umem_get_cq_tmp() is temporary; subsequent patches replace it
with driver-owned ib_umem_get_cq_buf*() wrappers built on the
same parser, and remove it once all CQ drivers have switched.
Link: https://patch.msgid.link/r/20260529134312.2836341-6-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Introduce a per-attribute UVERBS_ATTR_UMEM model so each uverbs
command's umem set is explicit in its UAPI definition. Add
driver-facing wrapper helpers that pin a umem on demand from an
attribute or a VA addr; the driver owns the returned umem and
releases it from its destroy/error paths.
Link: https://patch.msgid.link/r/20260529134312.2836341-4-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The follow-up patch is going to introduce ib_umem_get_desc(),
the canonical desc-to-umem helper, which needs to pin a userspace VA
without going through the exported ib_umem_get_va() helper so later on
ib_umem_get_va() would use the ib_umem_get_desc() flow too.
Move the existing ib_umem_get_va() to a static __ib_umem_get_va()
and have ib_umem_get_va() as a thin wrapper that calls it.
Link: https://patch.msgid.link/r/20260529134312.2836341-3-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The new umem getter family being introduced in follow-up patches
need a fitting name for the central all-source helper that resolves
attributes, legacy fillers and a UHW VA fallback.
Rename the existing VA-pinning helper ib_umem_get() to ib_umem_get_va()
so the name is freed up. The new name is consistent with names of rest
of the helpers that are about to be introduced.
Link: https://patch.msgid.link/r/20260529134312.2836341-2-jiri@resnulli.us
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
To maintain the split where ib_uverbs.ko should not be depended on by
drivers, add a new module ib_uverbs_support.ko which contains the driver
called functions that are too large or too rare to be placed in
ib_uverbs_core.ko
Start by moving most of rdma_core.c into this module, making some
adjustments to split it from the actual uverbs FD code.
This was not done originally because we lacked EXPORT_SYMBOL_NS and I had
a fear that drivers would abuse this interface surface.
Link: https://patch.msgid.link/r/4-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Instead of having an alternative fops release always use the standard
uverbs_uobject_fd_release() and route the special async behavior back up
through uverbs_obj_fd_type ops pointer.
This removes a dependency where the technically lower level rdma_core.c is
referring to a symbol from uverbs_std_types_async_fd.c.
Link: https://patch.msgid.link/r/3-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Remove the entire ib_core_uverbs.c from the build if
CONFIG_INFINIBAND_USER_ACCESS is not set. These functions are only used to
support uverbs and are never callable even if they happen to get linked
in.
Provide inlines for the missing ones to return errors to further push code
elimination in drivers.
Link: https://patch.msgid.link/r/1-v3-43aba1969751+1988-ib_uverbs_support_ko_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For dependencies in the following patches
Resolve conflicts, use the goto labels from the rc tag.
* tag 'v7.1-rc5': (1526 commits)
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The error cleanup loop in rdma_counter_init() iterates with variable
'i' but accesses dev->port_data[port] instead of dev->port_data[i].
This causes the failed port's hstats to be freed multiple times while
leaking hstats of previously initialized ports.
Fixes: 56594ae1d2 ("RDMA/core: Annotate destroy of mutex to ensure that it is released as unlocked")
Link: https://patch.msgid.link/r/20260520104546.1776253-3-cuitao@kylinos.cn
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When __rdma_counter_bind_qp() fails in alloc_and_bind(), the error path
jumps to err_mode which frees the counter without decrementing
port_counter->num_counters. The only place that decrements is
rdma_counter_free(), which is unreachable since the counter was never
successfully bound.
This leak accumulates across repeated failures, permanently preventing
the port from switching to AUTO mode (-EBUSY in __counter_set_mode())
and blocking the MANUAL→NONE auto-revert in rdma_counter_free(). When
the mode was NONE before the call, the MANUAL mode set by
__counter_set_mode() also leaks since the revert logic is never
reached.
Add an err_bind label between the num_counters increment and the
existing err_mode label. It decrements num_counters and mirrors the
MANUAL→NONE revert from rdma_counter_free(), ensuring the port state
is fully restored on bind failure.
Link: https://patch.msgid.link/r/20260520104546.1776253-2-cuitao@kylinos.cn
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
hns_roce_cmd_mbox() is the command interface between driver and
hardware. When hardware is abnormal, the unlimited error printings
after hns_roce_cmd_mbox() failure will cause log flood and even
system crash.
Replace ibdev_err() and ibdev_warn() with their ratelimited versions
in the error handling path after hns_roce_cmd_mbox() (and its wrappers
hns_roce_create_hw_ctx/hns_roce_destroy_hw_ctx) fails.
Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://patch.msgid.link/r/20260520055759.2354037-4-huangjunxian6@hisilicon.com
Signed-off-by: Lianfa Weng <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
CQs allocated by ib_alloc_cq() always have a comp_handler. Though
in direct mode this handler is never expected to be called, it
is still called when the driver is reset, triggering the following
WARN_ONCE():
Call trace:
ib_cq_completion_direct+0x38/0x60
hns_roce_cq_completion+0x54/0x90 (hns_roce_hw_v2]
hns_roce_handle_device_err+Ox1c8/0x340 [hns_roce_hw_v2]
hns_roce_hw_v2_uninit_instance.constprop.0+0x34/0x70 [hns_roce_hw_v2]
hns_roce_hw_v2_reset_notify+0xc4/0xe0 [hns_roce_hw_v2]
hclge_notify_roce_client+0x60/0xbc [hclge]
hclge_reset_rebuild+0x48/0x34c [hclge]
hclge_reset_subtask+0xcc/0xec [hclge]
hclge_reset_service_task+0x80/0x160 [hclge]
hclge_service_task+0x50/0x80 (hclge]
process_one_work+0x1cc/0x4d0
worker_thread+0x154/0x414
kthread+0x104/0x144
ret_from_fork+0x10/0x18
Fixes: f295e4cece ("RDMA/hns: Delete unnecessary callback functions for cq")
Link: https://patch.msgid.link/r/20260520055759.2354037-3-huangjunxian6@hisilicon.com
Signed-off-by: Lianfa Weng <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
After kobject_init_and_add(), the lifetime of the embedded struct
kobject is expected to be managed through the kobject core reference
counting.
In add_port(), failure paths after kobject_init_and_add() must not free
struct mlx4_port directly, because the embedded kobject is then managed
by the kobject core. Freeing it directly leaves the kobject reference
counting unbalanced and can lead to incorrect lifetime handling.
Allocate the pkey and gid attribute arrays before kobject_init_and_add(),
so failures before kobject initialization can be handled by directly
freeing the allocated memory. Once kobject_init_and_add() has been
called, unwind later failures by removing any successfully created sysfs
groups, calling kobject_del(), and then releasing the embedded kobject
with kobject_put().
Fixes: c1e7e46612 ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: https://patch.msgid.link/r/20260518021910.972900-1-lgs201920130244@gmail.com
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The irdma_copy_user_pgaddrs function loops through all of the umem DMA
blocks to populate the PBLEs and will stop when either the last DMA
block is reached or palloc->total_cnt is reached. The issue is that
the logic for checking palloc->total_cnt would only work for non-zero
values.
When irdma_setup_pbles is called with lvl==0, it
calls irdma_copy_user_pgaddrs with palloc->total_cnt==0, which means
the only way to break out of the loop is to reach the last umem DMA
block, which means it could end up going beyond the fixed size of 4
iwmr->pgaddrmem array that is used in the lvl==0 case.
In the case of QP/CQ/SRQ rings, the value of lvl is determined by a
separate input (for example, req.cq_pages in the case of a CQ). So,
we must perform explicit checking to ensure we don't overflow the
pgaddrmem array if the user provides a umem that consists of more
blocks than their provided req.cq_pages.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://patch.msgid.link/r/20260512183852.614045-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Store the client path statistics in the RTRS client path allocation
instead of allocating them separately.
This ties the stats lifetime directly to the path and removes a separate
allocation failure path. Keep freeing the per-CPU stats data separately,
but do not free the embedded stats object from error paths or the stats
kobject release handler.
Link: https://patch.msgid.link/r/20260511041812.378030-1-rosenp@gmail.com
Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Fix ITS EventID sanitisation when restoring an interrupt
translation table.
- Fix PPI memory leak when failing to initialise a vcpu.
- Correctly return an error when the validation of a hypervisor trace
descriptor fails, and limit this validation to protected mode only.
RISC-V:
- Fix invalid HVA warning in steal-time recording
- Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and
pmu_snapshot_set_shmem()
- Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
- Fix sign extension of value for MMIO loads
s390:
- Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by
the page table rewrite.
x86:
- Apply erratum #1235 workaround (disable AVIC IPI virtualization) on
Hygon Family 18h, just like on AMD Family 17h.
- When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM,
return the VM's configured APIC bus frequency instead of the
default. This is less confusing (read: not wrong) and makes it
easier to fill in CPUID information that communicates the APIC bus
frequency to the guest.
Selftests:
- Do not include glibc-internal <bits/endian.h>; it worked by chance
and broke building KVM selftests with musl"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
KVM: selftests: Verify that KVM returns the configured APIC cycle length
KVM: x86: Return the VM's configured APIC bus frequency when queried
KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>
KVM: s390: Properly reset zero bit in PGSTE
KVM: s390: vsie: Fix redundant rmap entries
KVM: s390: vsie: Fix unshadowing logic
KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors
KVM: s390: vsie: Fix memory leak when unshadowing
KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc
KVM: arm64: vgic: Free private_irqs when init fails after allocation
KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits
RISC-V: KVM: Fix sign extension for MMIO loads
RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM
riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM
RISC-V: KVM: Fix invalid HVA warning in steal-time recording
Pull x86 fixes from Ingo Molnar:
- On SEV guests, handle set_memory_{encrypted,decrypted}() failures
more conservatively by assuming that all affected pages are
unencrypted (Carlos López)
- Disable broadcast TLB flush when PCID is disabled (Tom Lendacky)
- Fix VMX vs. hrtimer_rearm_deferred() regression (Peter Zijlstra)
- Move IRQ/NMI dispatch code from KVM into x86 core, to prepare for a
KVM x2apic fix (Peter Zijlstra)
- Fix incorrect munmap() size on map_vdso() failure (Guilherme Giacomo
Simoes)
* tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt: sev-guest: Explicitly leak pages in unknown state
x86/mm: Disable broadcast TLB flush when PCID is disabled
x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred()
x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core
x86/vdso: Fix incorrect size in munmap() on map_vdso() failure
Pull irqchip driver fixes from Ingo Molnar:
- Fix the hardware probing error path of the renesas-rzt2h
irqchip driver
- Fix the exynos-combiner irqchip driver on -rt kernels
by turning the IRQ controller spinlock into a raw spinlock
* tag 'irq-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/renesas-rzt2h: Use pm_runtime_put_sync() in probe error path
irqchip/exynos-combiner: Switch to raw_spinlock
Pull debugobjects fix from Ingo Molnar::
- Fix debugobjects regression on -rt kernels: don't fill the pool
(which uses a coarse lock) if ->pi_blocked_on, because that messes up
the priority inheritance of callers
* tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Do not fill_pool() if pi_blocked_on