linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-06-21 15:43:21 +02:00

Author	SHA1	Message	Date
Linus Torvalds	9c87e61e3c	Merge tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "Major changes: - Recover from BPF arena page faults using a scratch page and add ptep_try_set() for lockless empty-slot installs on x86 and arm64. This allows BPF kfuncs to access arena pointers directly. The 'arena_direct_access' stable branch was created for this work and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar Kartikeya Dwivedi) - Lift old restriction and support 6+ arguments in BPF programs and kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan) Other features and fixes: - Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease addition of new BTF kinds (Alan Maguire) - Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei Starovoitov) - Refactor object relationship tracking in the verifier and fix a dynptr use-after-free bug (Amery Hung) - Harden the signed program loader and reject exclusive maps as inner maps (Daniel Borkmann) - Replace the verifier min/max bounds fields with a circular number (cnum) representation and improve 32->64 bit range refinements (Eduard Zingerman) - Introduce the arena library and runtime (libarena) with a buddy allocator, rbtree and SPMC queue data structures, ASAN support and a parallel test harness. Allow subprograms to return arena pointers and switch to a BTF type-tag based __arena annotation (Emil Tsalapatis) - Cache build IDs in the sleepable stackmap path and avoid faultable build ID reads under mm locks (Ihor Solodrai) - Introduce the tracing_multi link to attach a single BPF program to many kernel functions at once. Allow specifying the uprobe_multi target via FD (Jiri Olsa) - Extend the bpf_list family of kfuncs with bpf_list_add/del(), and bpf_list_is_first/is_last/empty() (Kaitao Cheng) - Extend the BPF syscall with common attributes support for prog_load, btf_load and map_create (Leon Hwang) - Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu) - Add sleepable support for tracepoint programs and fix deadlocks in LRU map due to NMI reentry (Mykyta Yatsenko) - Fix OOB access in bpf_flow_keys, fix nullness analysis of inner arrays, enforce write checks for global subprograms (Nuoqi Gui) - Report the maximum combined stack depth and print a breakdown of instructions processed per subprogram (Paul Chaignon) - Add an XDP load-balancer benchmark and arm64 JIT support for stack arguments (Puranjay Mohan) - Add kfuncs to traverse over wakeup_sources (Samuel Wu) - Allow sleepable BPF programs to use LPM trie maps directly (Vlad Poenaru) - Many more fixes and cleanups across the verifier, BTF, sockmap, devmap, bpffs, security hooks, s390/riscv/loongarch JITs, rqspinlock, libbpf, bpftool, selftests" * tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits) selftests/bpf: Work around llvm stack overflow in crypto progs selftests/bpf: add test for bpf_msg_pop_data() overflow bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check sockmap: Fix use-after-free in udp_bpf_recvmsg() bpf, sockmap: keep sk_msg copy state in sync bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() selftsets/bpf: Retry map update on helper_fill_hashmap() selftests/bpf: Add test for sleepable lsm_cgroup rejection selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket selftests/bpf: Avoid static LLVM linking for cross builds selftests/bpf: Use common CFLAGS for urandom_read selftests/bpf: Initialize operation name before use tools/bpf: build: Append extra cflags libbpf: Initialize CFLAGS before including Makefile.include bpftool: Append extra host flags bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS bpftool: Pass host flags to bootstrap libbpf selftests/bpf: correct CONFIG_PPC64 macro name in comment ...	2026-06-17 09:18:14 +01:00
Linus Torvalds	b85966adbf	Merge tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Work on removing rtnl_lock protection throughout the stack continues. In this chapter: - don't use rtnl_lock for IPv6 multicast routing configuration - don't take rtnl_lock in ethtool for modern drivers - prepare Qdisc dump callbacks for rtnl_lock removal - Support dumping just ifindex + name of all interfaces, under RCU. It's a common operation for Netlink CLI tools (when translating names to ifindexes) and previously required full rtnl_lock. - Support dumping qdiscs and page pools for a specific netdev. Even tho user space wants a dump of all netdevs, most of the time, the OOO programming model results in repeating the dump for each netdev. Which, in absence of a cache, leads to a O(n^2) behavior. - Flush nexthops once on multi-nexthop removal (e.g. when device goes down), another O(n^2) -> O(n) improvement. - Rehash locally generated traffic to a different nexthop on retransmit timeout. - Honor oif when choosing nexthop for locally generated IPv6 traffic. - Convert TCP Auth Option to crypto library, and drop non-RFC algos. - Increase subflow limits in MPTCP to 64 and endpoint limit to 256. - Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need to selectively skip reporting of the standard TCP Timestamp option, because they won't fit into the header space together (12 + 30 > 40). - Support using bridge neighbor suppression, Duplicate Address Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN deployments, e.g. VXLAN fabrics (IPv4 and IPv6). - Improve link state reporting for upper netdevs (e.g. macvlan) over tunnel devices (again, mostly for EVPN deployments). - Support binding GENEVE tunnels to a local address. - Speed up UDP tunnel destruction (remove one synchronize_rcu()). - Support exponential field encoding in multicast (IGMPv3 and MLDv2). - Support attaching PSP crypto offload to containers (veth, netkit). - Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows migrating individual IPsec SAs independently of their policies. The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA migration, lacks SPI for unique SA identification, and cannot express reqid changes or migrate Transport mode selectors. The new interface identifies the SA via SPI and mark, supports reqid changes, address family changes, encap removal, and uses an atomic create+install flow under x->lock to prevent SN/IV reuse during AEAD SA migration. - Implement GRO/GSO support for PPPoE. - Convert sockopt callbacks in a number of protocols to iov_iter. Cross-tree stuff: - Remove support for Crypto TFM cloning (unblocked after the TCP Auth Option rework). This feature regressed performance for all crypto API users, since it changed crypto transformation objects into reference-counted objects. - Add FCrypt-PCBC implementation to rxrpc and remove it from the global crypto API as obsolete and insecure. Wireless: - Major rework of station bandwidth handling, fixing issues with lower capability than AP. - Cleanups for EMLSR spec issues (drafts differed). - More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast, schedule improvements, multi-station etc.) - Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work (e.g. non-primary channel access, UHR DBE support). - Fine Timing Measurement ranging (i.e. distance measurement) APIs. Netfilter: - Use per-rule hash initval in nf_conncount. This avoids unnecessary lock contention with short keys (e.g. conntrack zones) in different namespaces. - Various safety improvements, both in packet parsing and object lifetimes. Notably add refcounts to conntrack timeout policy. Deletions: - Remove TLS + sockmap integration. TLS wants to pin user pages to avoid a copy, and sockmap wants to write to the input stream. More work on this integration is clearly needed, and we can't find any users (original author admitted that they never deployed it). - Remove support for TLS offload with TCP Offload Engine (the far more common opportunistic offload is retained). The locking looks unfixable (driver sleeps under TCP spin locks) and people from the vendor that added this are AWOL. - Remove more ATM code, trying to leave behind only what PPPoATM needs, AAL5 and br2684 with permanent circuits. - Remove AppleTalk. Let it join hamradio in our out of tree protocol graveyard, I mean, repository. - Disable 32-bit x_tables compatibility (32bit binaries on 64bit kernel) interface in user namespaces. To be deleted completely, soon. - Remove 5/10 MHz support from cfg80211/mac80211. Drivers: - Software: - Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit) - bonding: add knob to strictly follow 802.3ad for link state - New drivers: - Alibaba Elastic Ethernet Adaptor (cloud vNIC). - NXP NETC switch within i.MX94. - DPLL: - Add operational state to pins (implement in zl3073x). - Add generic DPLL type, for daisy-chaining DPLLs (implement in ice). - Ethernet high-speed NICs: - Huawei (hinic3): - enhance tc flow offload support with queue selection, tunnels - nVidia/Mellanox: - avoid over-copying payload to the skb's linear part (up to 60% win for LRO on slow CPUs like ARM64 V2) - expose more per-queue stats over the standard API - support additional, unprivileged PFs in the DPU configuration - support Socket Direct (multi-PF) with switchdev offloads - add a pool / frag allocator for DMA mapped buffers for control objects, save memory on systems with 64kB page size - take advantage of the ability to dynamically change RSS table size, even when table is configured by the user - increase the max RSS table size for even traffic distribution - Ethernet NICs: - Marvell/Aquantia: - AQC113 PTP support - Realtek USB (r8152): - support 10Gbit Link Speeds and Energy-Efficient Ethernet (EEE) - support firmware loaded (for RTL8157/RTL8159) - support for the RTL8159 - Intel (ixgbe): - support Energy-Efficient Ethernet (EEE) on E610 devices - Ethernet switches: - Airoha: - support multiple netdevs on a single GDM block / port - Marvell (mv88e6xxx): - support SERDES of mv88e6321 - Microchip (ksz8/9): - rework the driver callbacks to remove one indirection layer - Motorcomm (yt921x): - support port rate policing - support TBF qdisc offload - support ACL/flower offload - nVidia/Mellanox: - expose per-PG rx_discards - Realtek: - rtl8365mb: bridge offloading and VLAN support - Ethernet PHYs: - Airoha: - support Airoha AN8801R Gigabit PHYs. - Micrel: - implement 3 low-loss cable tunables - Realtek: - support MDI swapping for RTL8226-CG - support MDIO for RTL931x - Qualcomm: - at803x: Rx and Tx clock management for IPQ5018 PHY - Motorcomm: - support YT8522 100M RMII PHY - set drive strength in YT8531s RGMII - TI: - dp83822: add optional external PHY clock - Bluetooth: - hci_sync: add support for HCI_LE_Set_Host_Feature [v2] - SMP: use AES-CMAC library API - Intel: - support Product level reset - support smart trigger dump - Mediatek: - add event filter to filter specific event - Realtek: - fix RTL8761B/BU broken LE extended scan - WiFi: - Broadcom (b43): - new support for a 11n device - MediaTek (mt76): - support mt7927 - mt792x: broken usb transport detection - mt7921: regulatory improvements - Qualcomm (ath9k): - GPIO interface improvements - Qualcomm (ath12k): - WDS support - replace dynamic memory allocation in WMI Rx path - thermal throttling/cooling device support - 6 GHz incumbent interference detection - channel 177 in 5 GHz - Realtek (rt89): - RTL8922AU support - USB 3 mode switch for performance - better monitor radiotap support - RTL8922DE preparations" * tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits) ipv4: fib_rule: Move fib4_rules_exit() to ->exit(). net: serialize netif_running() check in enqueue_to_backlog() net: skmsg: preserve sg.copy across SG transforms appletalk: move the protocol out of tree appletalk: stop storing per-interface state in struct net_device selftests/bpf: test that TLS crypto is rejected on a sockmap socket selftests/bpf: drop the unused kTLS program from test_sockmap selftests/bpf: remove sockmap + ktls tests tls: remove dead sockmap (psock) handling from the SW path tls: reject the combination of TLS and sockmap atm: remove orphaned uAPI for deleted drivers, protocols and SVCs atm: remove unused ATM PHY operations atm: remove the unused pre_send and send_bh device operations atm: remove the unused change_qos device operation atm: remove SVC socket support and the signaling daemon interface atm: remove the local ATM (NSAP) address registry atm: remove dead SONET PHY ioctls atm: remove the unused send_oam / push_oam callbacks atm: remove AAL3/4 transport support net: dsa: sja1105: fix lastused timestamp in flower stats ...	2026-06-17 08:17:00 +01:00
Dragos Tatulea	5c4adb7fb4	netdev: expose io_uring rx_page_order order via netlink This adds observability for the io_uring zcrx rx-buf-len configuration. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20260612211709.1456966-3-dtatulea@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-06-15 12:58:18 -07:00
Jiri Olsa	26330a9226	bpf: Add support to specify uprobe_multi target via file descriptor Allow uprobe_multi link to identify the target binary by an already opened file descriptor. Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for the attr.link_create.uprobe_multi struct. When the flag is set, we resolve the target from path_fd, without the flag, we keep the existing string path behavior. I don't see a use case for supporting O_PATH file descriptors, because we need to read the binary first to get probes offsets, so I'm using the CLASS(fd, f), which fails for O_PATH fds. Assisted-by: Codex:GPT-5.4 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-14 17:24:25 -07:00
Linus Torvalds	50b900c564	Merge tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull openat2 updates from Christian Brauner: "Features: - Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file descriptor from an O_PATH file descriptor it is possible to use openat(fd, ".", O_DIRECTORY) for directories, but other file types require going through open("/proc/<pid>/fd/<nr>") and thus depend on a functioning procfs. With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY is set at path resolution time, allowing to reopen the file behind the file descriptor directly. Selftests are included. - Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open anything but regular files with the new EFTYPE error code. This implements the "ability to only open regular files" feature requested by userspace via uapi-group.org and protects services from being redirected to fifos, device nodes, and friends. All atomic_open implementations were audited for OPENAT2_REGULAR handling. Explicit checks were added to ceph, gfs2, nfs (v4), and cifs/smb - these are the filesystems whose atomic_open can encounter an existing non-regular file and would otherwise call finish_open() on it or return a misleading error code. The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only call finish_open() on freshly created files and use finish_no_open() for lookup hits, letting the VFS catch non-regular files via the do_open() safety net. Cleanups: - Migrate the openat2 selftests to the kselftest harness and move them under selftests/filesystems/. The tests were written in the early days of selftests' TAP support and the modern kselftest harness is much easier to follow and maintain. The contents of the tests are unchanged and the new emptypath tests are ported on top. - Make the LAST_XXX last-type constants private to fs/namei.c. The only user outside of fs/namei.c was ksmbd which only needs to know whether the last component is a regular one, so vfs_path_parent_lookup() now performs the LAST_NORM check internally. The ints are replaced with a dedicated enum last_type" * tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: replace ints with enum last_type for LAST_XXX vfs: make LAST_XXX private to fs/namei.c selftests: openat2: port emptypath_test to kselftest harness kselftest/openat2: test for OPENAT2_REGULAR flag openat2: new OPENAT2_REGULAR flag support openat2: introduce EFTYPE error code selftest: add tests for O_EMPTYPATH vfs: add O_EMPTYPATH to openat(2)/openat2(2) selftests: openat2: migrate to kselftest harness selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE selftests: openat2: move helpers to header selftests: move openat2 tests to selftests/filesystems/	2026-06-15 03:11:05 +05:30
Jakub Kicinski	dad4d4b92a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.1-rc8). Conflicts: drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c `f67aead16e` ("net: txgbe: rework service event handling") `57d39faed4` ("net: txgbe: improve functions of AML 40G devices") net/rds/info.c `512db8267b` ("rds: mark snapshot pages dirty in rds_info_getsockopt()") `6e94eeb2a2` ("rds: convert to getsockopt_iter") Adjacent changes: include/net/sock.h `1ee90b77b7` ("net: guard timestamp cmsgs to real error queue skbs") `f0de88303d` ("net: make is_skb_wmem() available to modules") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-06-11 14:33:35 -07:00
Louis Scalbert	32b0b89533	bonding: 3ad: add lacp_strict configuration knob When an 802.3ad (LACP) bonding interface has no slaves in the collecting/distributing state, the bonding master still reports carrier as up as long as at least 'min_links' slaves have carrier. In this situation, only one slave is effectively used for TX/RX, while traffic received on other slaves is dropped. Upper-layer daemons therefore consider the interface operational, even though traffic may be blackholed if the lack of LACP negotiation means the partner is not ready to deal with traffic. Introduce a configuration knob to control this behavior. It allows the bonding master to assert carrier only when at least 'min_links' slaves are in Collecting_Distributing state. The default mode preserves the existing behavior. This patch only introduces the knob; its behavior is implemented in the subsequent commit. Fixes: `655f8919d5` ("bonding: add min links parameter to 802.3ad") Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260603150331.1919611-4-louis.scalbert@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-06-10 07:53:26 -07:00
Louis Scalbert	0134432215	tools: missed broadcast_neigh if_link uapi header Add missing IFLA_BOND_BROADCAST_NEIGH in if_link uapi header. Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20260603150331.1919611-2-louis.scalbert@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-06-10 07:53:08 -07:00
Jiri Olsa	ba042ed644	bpf: Add support for tracing_multi link session Adding support to use session attachment with tracing_multi link. Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows the BPF_TRACE_FSESSION behaviour but on the tracing_multi link. Such program is called on entry and exit of the attached function and allows to pass cookie value from entry to exit execution. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	46b42af27d	bpf: Add support for tracing_multi link cookies Add support to specify cookies for tracing_multi link. Cookies are provided in array where each value is paired with provided BTF ID value with the same array index. Such cookie can be retrieved by bpf program with bpf_get_attach_cookie helper call. We need to sort cookies array together with ids array in check_dup_ids, to keep the id->cookie relation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-15-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	c1d32dea5d	bpf: Add support for tracing multi link Adding new link to allow to attach program to multiple function BTF IDs. The link is represented by struct bpf_tracing_multi_link. To configure the link, new fields are added to bpf_attr::link_create to pass array of BTF IDs; struct { __aligned_u64 ids; __u32 cnt; } tracing_multi; Each BTF ID represents function (BTF_KIND_FUNC) that the link will attach bpf program to. We use previously added bpf_trampoline_multi_attach/detach functions to attach/detach the link. The linkinfo/fdinfo callbacks will be implemented in following changes. Note this is supported only for archs (x86_64) with ftrace direct and have single ops support. CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS && CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS Note using sort_r (instead of plain sort) in check_dup_ids, because we will use the swap callback in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	d14e6b4346	bpf: Add multi tracing attach types Adding new program attach types multi tracing attachment: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI and their base support in verifier code. Programs with such attach type will use specific link attachment interface coming in following changes. This was suggested by Andrii some (long) time ago and turned out to be easier than having special program flag for that. Bpf programs with such types have 'bpf_multi_func' function set as their attach_btf_id and keep module reference when it's specified by attach_prog_fd. They are also accepted as sleepable programs during verification, and the real validation for specific BTF_IDs/functions will happen during the multi link attachment in following changes. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-11-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Leon Hwang	786be2b059	bpf: Check tail zero of bpf_prog_info Since there're 4 bytes padding at the end of struct bpf_prog_info, they won't be checked by bpf_check_uarg_tail_zero(). pahole -C bpf_prog_info ./vmlinux struct bpf_prog_info { ... __u32 attach_btf_obj_id; /* 220 4 / __u32 attach_btf_id; / 224 4 / / size: 232, cachelines: 4, members: 38 / / sum members: 224 / / sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 31 bits / / padding: 4 / / forced alignments: 9 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); If a future kernel extension adds a new 4-byte field, older userspace programs allocating this structure on the stack might inadvertently pass uninitialized stack garbage into the new field, permanently breaking backward compatibility. -- sashiko [1] Fix it by changing sizeof(info) to offsetofend(struct bpf_prog_info, attach_btf_id). And, add "__u32 :32" to the tail of struct bpf_prog_info. [1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/ Fixes: `aba64c7da9` ("bpf: Add verified_insns to bpf_prog_info and fdinfo") Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260605155249.20772-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-05 15:21:24 -07:00
Leon Hwang	e2a49fdb1b	bpf: Check tail zero of bpf_map_info Since there're 4 bytes padding at the end of struct bpf_map_info, they won't be checked by bpf_check_uarg_tail_zero(). pahole -C bpf_map_info ./vmlinux struct bpf_map_info { ... __u64 hash __attribute__((__aligned__(8))); /* 88 8 / __u32 hash_size; / 96 4 / / size: 104, cachelines: 2, members: 18 / / padding: 4 / / forced alignments: 1 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); If a future kernel extension adds a new 4-byte field, older userspace programs allocating this structure on the stack might inadvertently pass uninitialized stack garbage into the new field, permanently breaking backward compatibility. -- sashiko [1] Fix it by changing sizeof(info) to offsetofend(struct bpf_map_info, hash_size). And, add "__u32 :32" to the tail of struct bpf_map_info. [1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/ Fixes: `ea2e6467ac` ("bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD") Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260605155249.20772-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-05 15:21:24 -07:00
Mykyta Yatsenko	16b4d3e2fb	bpf: Implement resizable hashmap basic functions Use rhashtable_lookup_likely() for lookups, rhashtable_remove_fast() for deletes, and rhashtable_lookup_get_insert_fast() for inserts. Updates modify values in place under RCU rather than allocating a new element and swapping the pointer (as regular htab does). This trades read consistency for performance: concurrent readers may see partial updates. BPF_F_LOCK support and special-field handling (timers, kptrs, etc.) follow in a later commit. Initialize rhashtable with bpf_mem_alloc element cache. Require BPF_F_NO_PREALLOC. Limit max_entries to 2^31. Free elements via rhashtable_free_and_destroy(). Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260605-rhash-v7-4-5b8e05f8630d@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-05 08:00:08 -07:00
Wang Yaxin	5921629dd6	tools headers UAPI: sync linux/taskstats.h for procacct.c After commit `9b93f7e327` ("tools/getdelays: use the static UAPI headers from tools/include/uapi"), the Makefile was changed to use -I../include/uapi/ instead of -I../../usr/include to ensure tools always use the up-to-date UAPI headers. However, only linux/taskstats.h was added to tools/include/uapi/ in commit `e5bbb35a07` ("tools headers UAPI: sync linux/taskstats.h"), but linux/acct.h was missing. This causes procacct.c to fail to compile with: procacct.c:234:37: error: 'AGROUP' undeclared (first use in this function) gcc -I../include/uapi/ getdelays.c -o getdelays gcc -I../include/uapi/ procacct.c -o procacct procacct.c: In function `print_procacct': procacct.c:234:37: error: `AGROUP' undeclared (first use in this function) did you mean `NOGROUP'? 234 \| , t->version >= 12 ? (t->ac_flag & AGROUP ? 'P' : 'T') : '?' \| ^~~~~~ \| NOGROUP procacct.c:234:37: note: each undeclared ident because procacct.c uses the AGROUP macro defined in linux/acct.h. Add the missing linux/acct.h to complete the static UAPI header set. Link: https://lore.kernel.org/20260527213558929EhiHHy9EDTMjmg3uuDOMi@zte.com.cn Fixes: `9b93f7e327` ("tools/getdelays: use the static UAPI headers from tools/include/uapi") Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-03 16:25:50 -07:00
Jori Koolstra	c0329020da	selftest: add tests for O_EMPTYPATH Add tests for the new O_EMPTYPATH flag of openat(2)/openat2(2). Also, the current openat2 tests include a helper header file that defines the necessary structs and constants to use openat2(2), such as struct open_how. This may result in conflicting definitions when the system header openat2.h is present as well. So add openat2.h generated by 'make headers' to the uapi header files in ./tools/include and remove the helper file definitions of the current openat2 selftests. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20260424114611.1678641-3-jkoolstra@xs4all.nl Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-05-21 10:53:37 +02:00
Leon Hwang	f28771c069	bpf: Extend BPF syscall with common attributes support Add generic BPF syscall support for passing common attributes. The initial set of common attributes includes: 1. 'log_buf': User-provided buffer for storing logs. 2. 'log_size': Size of the log buffer. 3. 'log_level': Log verbosity level. 4. 'log_true_size': Actual log size reported by kernel. The common-attribute pointer and its size are passed as the 4th and 5th syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'), indicates that common attributes are supplied. This commit adds syscall and uapi plumbing. Command-specific handling is added in follow-up patches. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260512153157.28382-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-12 12:44:38 -07:00
Alexei Starovoitov	7e033543a2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc3 Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-10 13:24:49 -07:00
Paul Chaignon	0c7ae13069	tools/headers: Regenerate stddef.h to fix BPF selftests With commit `dacbfc1678` ("crypto: af_alg - Annotate struct af_alg_iv with __counted_by"), two selftests, test_tag and crypto_sanity, now indirectly rely on the __counted_by macro. On systems with commit `dacbfc1678` in the installed UAPI headers, the selftests build fails with: In file included from tools/testing/selftests/bpf/prog_tests/crypto_sanity.c:7: /usr/include/linux/if_alg.h:45:22: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘__counted_by’ 45 \| __u8 iv[] __counted_by(ivlen); \| ^~~~~~~~~~~~ This patch fixes it by regenerating stddef.h in tools/include using the instructions from commit `a778f5d46b` ("tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI"). Fixes: `dacbfc1678` ("crypto: af_alg - Annotate struct af_alg_iv with __counted_by") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/8da8ef16055aa452d940668ed5359ce54adc6b0b.1777715500.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-02 09:37:44 -07:00
Alan Maguire	f7a6b9eaff	bpf: Extend BTF UAPI vlen, kinds to use unused bits BTF maximum vlen is encoded using 16 bits with a maximum vlen of 65535. This has sufficed for structs, function parameters and enumerated type values. However, with upcoming BTF location information - in particular information about inline sites - this limit is surpassed. Use bits 16-23 - currently unused in BTF info - to extend to 24 bits, giving a max vlen of (2^24 - 1), or 16 million. Also extend BTF kind encoding from 5 to 7 bits, giving a maximum available number of kinds of 128. Since with the BTF location work we use another 3 kinds, we are fast approaching the current limit of 32. Convert BTF_MAX_* values to enums to allow them to be encoded in kernel BTF; this will allow us to detect if the running kernel supports a 24-bit vlen or not. Add one for max _possible_ (not used) kind. Fix up a few places in the kernel where a 16-bit vlen is assumed; remove BTF_INFO_MASK as now all bits are used. The vlen expansion was suggested by Andrii in [1]; the kind expansion is tackled here too as it may be needed also to support new kinds in BTF. [1] https://lore.kernel.org/bpf/CAEf4BzZx=X6vGqcA8SPU6D+v6k+TR=ZewebXMuXtpmML058piw@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260417143023.1551481-2-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-20 17:52:48 -07:00
Linus Torvalds	df8f6181ab	Merge tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf report: - Add 'comm_nodigit' sort key to combine similar threads that only have different numbers in the comm. In the following example, the 'comm_nodigit' will have samples from all threads starting with "bpfrb/" into an entry "bpfrb/<N>". $ perf report -s comm_nodigit,comm -H ... # # Overhead CommandNoDigit / Command # ........... ........................ # 20.30% swapper 20.30% swapper 13.37% chrome 13.37% chrome 10.07% bpfrb/<N> 7.47% bpfrb/0 0.70% bpfrb/1 0.47% bpfrb/3 0.46% bpfrb/2 0.25% bpfrb/4 0.23% bpfrb/5 0.20% bpfrb/6 0.14% bpfrb/10 0.07% bpfrb/7 - Support flat layout for symfs. The --symfs option is to specify the location of debugging symbol files. The default 'hierarchy' layout would search the symbol file using the same path of the original file under the symfs root. The new 'flat' layout would search only in the root directory. - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more predicate flags. perf stat: - Add --pmu-filter option to select specific PMUs. This would be useful when you measure metrics from multiple instance of uncore PMUs with similar names. # perf stat -M cpa_p0_avg_bw Performance counter stats for 'system wide': 19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl0_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/ 19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl10_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/ 19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw 75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/ 18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/ 19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw 0 hisi_sicl8_cpa0/cpa_p0_wr_dat/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/ 0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/ 19.417734480 seconds time elapsed With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU. # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw Performance counter stats for 'system wide': 6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw 50,548,465 cpa_p0_wr_dat 7,552,182 cpa_p0_rd_dat_64b 0 cpa_p0_rd_dat_32b 6.234139320 seconds time elapsed Data type profiling: - Quality improvements by tracking register state more precisely - Ensure array members to get the type - Handle more cases for global variables Vendor event/metric updates: - Update various Intel events and metrics - Add NVIDIA Tegra 410 Olympus events Internal changes: - Verify perf.data header for maliciously crafted files - Update perf test to cover more usages and make them robust - Move a couple of copied kernel headers not to annoy objtool build - Fix a bug in map sorting in name order - Remove some unused codes Misc: - Fix module symbol resolution with non-zero text address - Add -t/--threads option to `perf bench mem mmap` - Track duration of exit() syscall by `perf trace -s` - Add core.addr2line-timeout and core.addr2line-disable-warn config items" tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits) perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND perf annotate: Use jump__delete when freeing LoongArch jumps perf test: Fixes for check branch stack sampling perf test: Fix inet_pton probe failure and unroll call graph perf build: fix "argument list too long" in second location perf header: Add sanity checks to HEADER_BPF_BTF processing perf header: Sanity check HEADER_BPF_PROG_INFO perf header: Sanity check HEADER_PMU_CAPS perf header: Sanity check HEADER_HYBRID_TOPOLOGY perf header: Sanity check HEADER_CACHE perf header: Sanity check HEADER_GROUP_DESC perf header: Sanity check HEADER_PMU_MAPPINGS perf header: Sanity check HEADER_MEM_TOPOLOGY perf header: Sanity check HEADER_NUMA_TOPOLOGY perf header: Sanity check HEADER_CPU_TOPOLOGY perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO perf header: Bump up the max number of command line args allowed perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO perf sample: Fix documentation typo perf arm_spe: Improve SIMD flags setting ...	2026-04-18 09:24:56 -07:00
Linus Torvalds	01f492e181	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Linus Torvalds	440d6635b2	Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...	2026-04-16 20:11:56 -07:00
Linus Torvalds	7c8a4671dc	Merge tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree(). This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms. This accompanies OPEN_TREE_NAMESPACE and avoids a needless detour via OPEN_TREE_NAMESPACE to get the same effect. Will be especially useful when you mount an actual filesystem to be used as the container rootfs. - Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down. This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced: - CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare() Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there. - Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to switch out the rootfs without pivot_root(2). The traditional approach to switching the rootfs involves pivot_root(2) or a chroot_fs_refs()-based mechanism that atomically updates fs->root for all tasks sharing the same fs_struct. This has consequences for fork(), unshare(CLONE_FS), and setns(). This series instead decomposes root-switching into individually atomic, locally-scoped steps: fd_tree = open_tree(-EBADF, "/newroot", OPEN_TREE_CLONE \| OPEN_TREE_CLOEXEC); fchdir(fd_tree); move_mount(fd_tree, "", AT_FDCWD, "/", MOVE_MOUNT_BENEATH \| MOVE_MOUNT_F_EMPTY_PATH); chroot("."); umount2(".", MNT_DETACH); Since each step only modifies the caller's own state, the fork/unshare/setns races are eliminated by design. A key step to making this possible is to remove the locked mount restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting beneath a mount that is locked. The locked mount protects the underlying mount from being revealed. This is a core mechanism of unshare(CLONE_NEWUSER \| CLONE_NEWNS). The mounts in the new mount namespace become locked. That effectively makes the new mount table useless as the caller cannot ever get rid of any of the mounts no matter how useless they are. We can lift this restriction though. We simply transfer the locked property from the top mount to the mount beneath. This works because what we care about is to protect the underlying mount aka the parent. The mount mounted between the parent and the top mount takes over the job of protecting the parent mount from the top mount mount. This leaves us free to remove the locked property from the top mount which can consequently be unmounted: unshare(CLONE_NEWUSER \| CLONE_NEWNS) and we inherit a clone of procfs on /proc then currently we cannot unmount it as: umount -l /proc will fail with EINVAL because the procfs mount is locked. After this series we can now do: mount --beneath -t tmpfs tmpfs /proc umount -l /proc after which a tmpfs mount has been placed beneath the procfs mount. The tmpfs mount has become locked and the procfs mount has become unlocked. This means you can safely modify an inherited mount table after unprivileged namespace creation. Afterwards we simply make it possible to move a mount beneath the rootfs allowing to upgrade the rootfs. Removing the locked restriction makes this very useful for containers created with unshare(CLONE_NEWUSER \| CLONE_NEWNS) to reshuffle an inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to switch out the rootfs instead of using the costly pivot_root(2). * tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/namespaces: remove unused utils.h include from listns_efault_test selftests/fsmount_ns: add missing TARGETS and fix cap test selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment selftests/empty_mntns: fix statmount_alloc() signature mismatch selftests/statmount: remove duplicate wait_for_pid() mount: always duplicate mount selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests move_mount: allow MOVE_MOUNT_BENEATH on the rootfs move_mount: transfer MNT_LOCKED selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces selftests: add FSMOUNT_NAMESPACE tests selftests/statmount: add statmount_alloc() helper tools: update mount.h header mount: add FSMOUNT_NAMESPACE mount: simplify __do_loopback() mount: start iterating from start of rbtree	2026-04-14 19:59:25 -07:00
Linus Torvalds	91a4855d6c	Merge tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...	2026-04-14 18:36:10 -07:00
Linus Torvalds	f5ad410100	Merge tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Welcome new BPF maintainers: Kumar Kartikeya Dwivedi, Eduard Zingerman while Martin KaFai Lau reduced his load to Reviwer. - Lots of fixes everywhere from many first time contributors. Thank you All. - Diff stat is dominated by mechanical split of verifier.c into multiple components: - backtrack.c: backtracking logic and jump history - states.c: state equivalence - cfg.c: control flow graph, postorder, strongly connected components - liveness.c: register and stack liveness - fixups.c: post-verification passes: instruction patching, dead code removal, bpf_loop inlining, finalize fastcall 8k line were moved. verifier.c still stands at 20k lines. Further refactoring is planned for the next release. - Replace dynamic stack liveness with static stack liveness based on data flow analysis. This improved the verification time by 2x for some programs and equally reduced memory consumption. New logic is in liveness.c and supported by constant folding in const_fold.c (Eduard Zingerman, Alexei Starovoitov) - Introduce BTF layout to ease addition of new BTF kinds (Alan Maguire) - Use kmalloc_nolock() universally in BPF local storage (Amery Hung) - Fix several bugs in linked registers delta tracking (Daniel Borkmann) - Improve verifier support of arena pointers (Emil Tsalapatis) - Improve verifier tracking of register bounds in min/max and tnum domains (Harishankar Vishwanathan, Paul Chaignon, Hao Sun) - Further extend support for implicit arguments in the verifier (Ihor Solodrai) - Add support for nop,nop5 instruction combo for USDT probes in libbpf (Jiri Olsa) - Support merging multiple module BTFs (Josef Bacik) - Extend applicability of bpf_kptr_xchg (Kaitao Cheng) - Retire rcu_trace_implies_rcu_gp() (Kumar Kartikeya Dwivedi) - Support variable offset context access for 'syscall' programs (Kumar Kartikeya Dwivedi) - Migrate bpf_task_work and dynptr to kmalloc_nolock() (Mykyta Yatsenko) - Fix UAF in in open-coded task_vma iterator (Puranjay Mohan) * tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (241 commits) selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room bpf: reject short IPv4/IPv6 inputs in bpf_prog_test_run_skb selftests/bpf: Use memfd_create instead of shm_open in cgroup_iter_memcg selftests/bpf: Add test for cgroup storage OOB read bpf: Fix OOB in pcpu_init_value selftests/bpf: Fix reg_bounds to match new tnum-based refinement selftests/bpf: Add tests for non-arena/arena operations bpf: Allow instructions with arena source and non-arena dest registers bpftool: add missing fsession to the usage and docs of bpftool docs/bpf: add missing fsession attach type to docs bpf: add missing fsession to the verifier log bpf: Move BTF checking logic into check_btf.c bpf: Move backtracking logic to backtrack.c bpf: Move state equivalence logic to states.c bpf: Move check_cfg() into cfg.c bpf: Move compute_insn_live_regs() into liveness.c bpf: Move fixup/post-processing logic from verifier.c into fixups.c bpf: Simplify do_check_insn() bpf: Move checks for reserved fields out of the main pass bpf: Delete unused variable ...	2026-04-14 18:04:04 -07:00
Linus Torvalds	88b29f3f57	Merge tag 'modules-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull module updates from Sami Tolvanen: "Kernel symbol flags: - Replace the separate _gpl symbol sections (__ksymtab_gpl and __kcrctab_gpl) with a unified symbol table and a new __kflagstab section. This section stores symbol flags, such as the GPL-only flag, as an 8-bit bitset for each exported symbol. This is a cleanup that simplifies symbol lookup in the module loader by avoiding table fragmentation and will allow a cleaner way to add more flags later if needed. Module signature UAPI: - Move struct module_signature to the UAPI headers to allow reuse by tools outside the kernel proper, such as kmod and scripts/sign-file. This also renames a few constants for clarity and drops unused signature types as preparation for hash-based module integrity checking work that's in progress. Sysfs: - Add a /sys/module/<module>/import_ns sysfs attribute to show the symbol namespaces imported by loaded modules. This makes it easier to verify driver API access at runtime on systems that care about such things (e.g. Android). Cleanups and fixes: - Force sh_addr to 0 for all sections in module.lds. This prevents non-zero section addresses when linking modules with 'ld.bfd -r', which confused elfutils. - Fix a memory leak of charp module parameters on module unload when the kernel is configured with CONFIG_SYSFS=n. - Override the -EEXIST error code returned by module_init() to userspace. This prevents confusion with the errno reserved by the module loader to indicate that a module is already loaded. - Simplify the warning message and drop the stack dump on positive returns from module_init(). - Drop unnecessary extern keywords from function declarations and synchronize parse_args() arguments with their implementation" tag 'modules-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (23 commits) module: Simplify warning on positive returns from module_init() module: Override -EEXIST module return documentation: remove references to _gpl sections module: remove _gpl sections from vmlinux and modules module: deprecate usage of _gpl sections in module loader module: use kflagstab instead of _gpl sections module: populate kflagstab in modpost module: add kflagstab section to vmlinux and modules module: define ksym_flags enumeration to represent kernel symbol flags selftests/bpf: verify_pkcs7_sig: Use 'struct module_signature' from the UAPI headers sign-file: use 'struct module_signature' from the UAPI headers tools uapi headers: add linux/module_signature.h module: Move 'struct module_signature' to UAPI module: Give MODULE_SIG_STRING a more descriptive name module: Give 'enum pkey_id_type' a more specific name module: Drop unused signature types extract-cert: drop unused definition of PKEY_ID_PKCS7 docs: symbol-namespaces: mention sysfs attribute module: expose imported namespaces via sysfs module: Remove extern keyword from param prototypes ...	2026-04-14 17:16:38 -07:00
Paolo Bonzini	e74c3a8891	Merge tag 'kvmarm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 7.1 * New features: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This comes with a full infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware. - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM. - Finally add support for pKVM protected guests, with anonymous memory being used as a backing store. About time! * Improvements and bug fixes: - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable. - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis. - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow. - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups. - A small set of patches fixing the Stage-2 MMU freeing in error cases. - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls. - The usual cleanups and other selftest churn.	2026-04-13 11:49:54 +02:00
Daniel Borkmann	7789c6bb76	net: Add queue-create operation Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queues less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is optional and can specify a netns-id relative to the caller's netns. It requires cap_net_admin and if the netns-id attribute is not specified, the lease ifindex will be retrieved from the current netns. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Link: https://patch.msgid.link/20260402231031.447597-2-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 18:21:45 -07:00
Alexei Starovoitov	891a05ccba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.0-rc6+ Cross-merge BPF and other fixes after downstream PR. Minor conflict in kernel/bpf/verifier.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-03 08:14:13 -07:00
Arnaldo Carvalho de Melo	7f8969aa73	perf beauty: Move copy of fadvise.h from tools/include/ to tools/perf/trace/beauty/include/ As it is not really used when compiling anything, just being parsed to collect number->string tables for 'perf trace'. $ git grep fadvise.h tools/ tools/perf/Makefile.perf:$(fadvise_advice_array): $(beauty_uapi_linux_dir)/fadvise.h $(fadvise_advice_tbl) tools/perf/check-headers.sh: "include/uapi/linux/fadvise.h" tools/perf/trace/beauty/fadvise.sh:grep -E $regex ${header_dir}/fadvise.h \| \ tools/perf/trace/beauty/fadvise.sh:# tools/include/uapi/linux/fadvise.h for details. $ Link: https://lore.kernel.org/r/CAP-5=fVBNQVF8k3JUQjH1nkP69ZVp8BqP+uwygcx=xO0zC4xrg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-03-31 20:23:06 -07:00
Eyal Birger	f9a80c7ce4	bpf: Clarify BPF_RB_NO_WAKEUP behavior for bpf_ringbuf_discard() Clarify bpf_ringbuf_discard() documentation for BPF_RB_NO_WAKEUP. Discarded ring buffer records are still left in the ring buffer and are only skipped when user space consumes them. This can matter when BPF_RB_NO_WAKEUP is used: a later submit relying on adaptive wakeup might not wake the consumer, because the discarded record still needs to be consumed first. Scenario: epoll_wait(rb_fd); // blocks rec = bpf_ringbuf_reserve(&rb, ...); bpf_ringbuf_discard(rec, BPF_RB_NO_WAKEUP); rec = bpf_ringbuf_reserve(&rb, ...); bpf_ringbuf_submit(rec, 0); // valid record, but no wakeup Document this in bpf_ringbuf_discard() to make the interaction between discarded records, user-space consumption, and adaptive wakeups explicit. Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260331130612.3762433-1-eyal.birger@gmail.com ---- v2: adapt wording per feedback from Andrii.	2026-03-31 15:46:34 -07:00
Thomas Weißschuh	e5bbb35a07	tools headers UAPI: sync linux/taskstats.h Patch series "tools/getdelays: use the static UAPI headers from tools/include/uapi". The include directory ../../usr/include is only present if an in-tree kernel build with CONFIG_HEADERS_INSTALL was done before. Otherwise the system UAPI headers are used, which most likely are not the most recent ones. To make sure to always have access to up-to-date UAPI headers, use the static copy in tools/include/uapi. This patch (of 2): To give the accounting tools access to the new fields introduced in commit `503efe850c` ("delayacct: add timestamp of delay max") Link: https://lkml.kernel.org/r/20260307-accounting-taskstats-h-v1-0-0b75915c6ce5@weissschuh.net Link: https://lkml.kernel.org/r/20260307-accounting-taskstats-h-v1-1-0b75915c6ce5@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Jiang Kun <jiang.kun2@zte.com.cn> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-27 21:19:43 -07:00
Alan Maguire	222edc843c	btf: Add BTF kind layout encoding to UAPI BTF kind layouts provide information to parse BTF kinds. By separating parsing BTF from using all the information it provides, we allow BTF to encode new features even if they cannot be used by readers. This will be helpful in particular for cases where older tools are used to parse newer BTF with kinds the older tools do not recognize; the BTF can still be parsed in such cases using kind layout. The intent is to support encoding of kind layouts optionally so that tools like pahole can add this information. For each kind, we record - length of singular element following struct btf_type - length of each of the btf_vlen() elements following - a (currently unused) flags field The ideas here were discussed at [1], [2]; hence Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-2-alan.maguire@oracle.com [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/	2026-03-26 13:53:56 -07:00
Thomas Weißschuh	d2d7561dc6	tools uapi headers: add linux/module_signature.h This header is going to be used from scripts/sign-file. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>	2026-03-24 21:42:37 +00:00
Arnaldo Carvalho de Melo	3c71ae8ec9	tools headers UAPI: Sync linux/kvm.h with the kernel sources To pick the changes in: `da142f3d37` ("KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlay") That just rebuilds perf, as these patches don't add any new KVM ioctl to be harvested for the 'perf trace' ioctl syscall argument beautifiers. This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Please see tools/include/uapi/README for further details. Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-03-22 18:31:54 -03:00
Sascha Bischoff	c547c51ff4	KVM: arm64: gic-v5: Add ARM_VGIC_V5 device to KVM headers This is the base GICv5 device which is to be used with the KVM_CREATE_DEVICE ioctl to create a GICv5-based vgic. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-9-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:27 +00:00
Christian Brauner	fc1a05de00	tools: update mount.h header Update the mount.h header so we can rely on it in the selftests. Link: https://patch.msgid.link/20260122-work-fsmount-namespace-v1-4-5ef0a886e646@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-12 13:33:54 +01:00
Arnaldo Carvalho de Melo	9cd284105b	tools headers UAPI: Sync linux/kvm.h with the kernel sources To pick the changes in: `f7ab71f178` ("KVM: s390: Add explicit padding to struct kvm_s390_keyop") `0ee4ddc164` ("KVM: s390: Storage key manipulation IOCTL") `fa9893fadb` ("KVM: Introduce KVM_EXIT_SNP_REQ_CERTS for SNP certificate-fetching") `f174a9ffcd` ("KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots") That just rebuilds perf, as these patches add just one new KVM ioctl, but for S390, that is not being considered by tools/perf/trace/beauty/kvm_ioctl.sh so far. This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Please see tools/include/uapi/README for further details. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Roth <michael.roth@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-03-04 12:32:19 -03:00
Arnaldo Carvalho de Melo	ecd5a2fd4c	perf beauty: Update the linux/perf_event.h copy with the kernel sources Update it as one comment got realigned, probably in a merge, so no changes in perf tooling, just silences this build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2026-03-04 11:51:36 -03:00
Linus Torvalds	4d84667627	Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance event updates from Ingo Molnar: "x86 PMU driver updates: - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs (Dapeng Mi) Compared to previous iterations of the Intel PMU code, there's been a lot of changes, which center around three main areas: - Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the Off-Core Response (OCR) facility - New PEBS data source encoding layout - Support the new "RDPMC user disable" feature - Likewise, a large series adds uncore PMU support for Intel Diamond Rapids (DMR) CPUs (Zide Chen) This centers around these four main areas: - DMR may have two Integrated I/O and Memory Hub (IMH) dies, separate from the compute tile (CBB) dies. Each CBB and each IMH die has its own discovery domain. - Unlike prior CPUs that retrieve the global discovery table portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON discovery and MSR for CBB PMON discovery. - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR, PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6. - IIO free-running counters in DMR are MMIO-based, unlike SPR. - Also add support for Add missing PMON units for Intel Panther Lake, and support Nova Lake (NVL), which largely maps to Panther Lake. (Zide Chen) - KVM integration: Add support for mediated vPMUs (by Kan Liang and Sean Christopherson, with fixes and cleanups by Peter Zijlstra, Sandipan Das and Mingwei Zhang) - Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs, which are a low-power variant of Panther Lake (Zide Chen) - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU (aka MaxLinear Lightning Mountain), which maps to the existing Airmont code (Martin Schiller) Performance enhancements: - Speed up kexec shutdown by avoiding unnecessary cross CPU calls (Jan H. Schönherr) - Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim) User-space stack unwinding support: - Various cleanups and refactorings in preparation to generalize the unwinding code for other architectures (Jens Remus) Uprobes updates: - Transition from kmap_atomic to kmap_local_page (Keke Ming) - Fix incorrect lockdep condition in filter_chain() (Breno Leitao) - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov) Misc fixes and cleanups: - s390: Remove kvm_types.h from Kbuild (Randy Dunlap) - x86/intel/uncore: Convert comma to semicolon (Chen Ni) - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman) - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)" * tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) s390: remove kvm_types.h from Kbuild uprobes: Fix incorrect lockdep condition in filter_chain() x86/ibs: Fix typo in dc_l2tlb_miss comment x86/uprobes: Fix XOL allocation failure for 32-bit tasks perf/x86/intel/uncore: Convert comma to semicolon perf/x86/intel: Add support for rdpmc user disable feature perf/x86: Use macros to replace magic numbers in attr_rdpmc perf/x86/intel: Add core PMU support for Novalake perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL perf/x86/intel: Add core PMU support for DMR perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL perf/core: Fix slow perf_event_task_exit() with LBR callstacks perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls uprobes: use kmap_local_page() for temporary page mappings arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() perf/x86/intel/uncore: Add Nova Lake support ...	2026-02-10 12:00:46 -08:00
Matt Bobrowski	752b807028	bpf: add new BPF_CGROUP_ITER_CHILDREN control option Currently, the BPF cgroup iterator supports walking descendants in either pre-order (BPF_CGROUP_ITER_DESCENDANTS_PRE) or post-order (BPF_CGROUP_ITER_DESCENDANTS_POST). These modes perform an exhaustive depth-first search (DFS) of the hierarchy. In scenarios where a BPF program may need to inspect only the direct children of a given parent cgroup, a full DFS is unnecessarily expensive. This patch introduces a new BPF cgroup iterator control option, BPF_CGROUP_ITER_CHILDREN. This control option restricts the traversal to the immediate children of a specified parent cgroup, allowing for more targeted and efficient iteration, particularly when exhaustive depth-first search (DFS) traversal is not required. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260127085112.3608687-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-27 09:05:54 -08:00
Menglong Dong	2d419c4465	bpf: add fsession support The fsession is something that similar to kprobe session. It allow to attach a single BPF program to both the entry and the exit of the target functions. Introduce the struct bpf_fsession_link, which allows to add the link to both the fentry and fexit progs_hlist of the trampoline. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-24 18:49:35 -08:00
Dapeng Mi	d2bdcde962	perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR With the introduction of the OMR feature, the PEBS memory auxiliary info field for load and store latency events has been restructured for DMR. The memory auxiliary info field's bit[8] indicates whether a L2 cache miss occurred for a memory load or store instruction. If bit[8] is 0, it signifies no L2 cache miss, and bits[7:0] specify the exact cache data source (up to the L2 cache level). If bit[8] is 1, bits[7:0] represent the OMR encoding, indicating the specific L3 cache or memory region involved in the memory access. A significant enhancement is OMR encoding provides up to 8 fine-grained memory regions besides the cache region. A significant enhancement for OMR encoding is the ability to provide up to 8 fine-grained memory regions in addition to the cache region, offering more detailed insights into memory access regions. For detailed information on the memory auxiliary info encoding, please refer to section 16.2 "PEBS LOAD LATENCY AND STORE LATENCY FACILITY" in the ISE documentation. This patch ensures that the PEBS memory auxiliary info field is correctly interpreted and utilized in DMR. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260114011750.350569-3-dapeng1.mi@linux.intel.com	2026-01-15 10:04:26 +01:00
Alexei Starovoitov	e3d0dbb3b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5 Cross-merge BPF and other fixes after downstream PR. No conflicts. Adjacent: Auto-merging MAINTAINERS Auto-merging Makefile Auto-merging kernel/bpf/verifier.c Auto-merging kernel/sched/ext.c Auto-merging mm/memcontrol.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-14 15:22:01 -08:00
Thomas Gleixner	2e4b28c48f	treewide: Update email address In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-11 06:09:11 -10:00
Leon Hwang	2b421662c7	bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags and check them for following APIs: * 'map_lookup_elem()' * 'map_update_elem()' * 'generic_map_lookup_batch()' * 'generic_map_update_batch()' And, get the correct value size for these APIs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260107022022.12843-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-01-06 20:48:32 -08:00
Namhyung Kim	34524cde88	tools headers: Sync UAPI KVM headers with kernel sources To pick up changes from: `ad9c62bd89` ("KVM: arm64: VM exit to userspace to handle SEA") `8e8678e740` ("KVM: s390: Add capability that forwards operation exceptions") `e0c26d47de` ("Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD") `7a61d61396` ("KVM: SEV: Publish supported SEV-SNP policy bits") This should be used to beautify DRM syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Please see tools/include/uapi/README. Cc: kvm@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-24 11:42:13 -08:00
Alexei Starovoitov	ec439c3801	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.19-rc1 Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-16 21:29:38 -08:00

1 2 3 4 5 ...

1332 Commits