linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-06-21 15:43:21 +02:00

Author	SHA1	Message	Date
Linus Torvalds	9c87e61e3c	Merge tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "Major changes: - Recover from BPF arena page faults using a scratch page and add ptep_try_set() for lockless empty-slot installs on x86 and arm64. This allows BPF kfuncs to access arena pointers directly. The 'arena_direct_access' stable branch was created for this work and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar Kartikeya Dwivedi) - Lift old restriction and support 6+ arguments in BPF programs and kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan) Other features and fixes: - Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease addition of new BTF kinds (Alan Maguire) - Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei Starovoitov) - Refactor object relationship tracking in the verifier and fix a dynptr use-after-free bug (Amery Hung) - Harden the signed program loader and reject exclusive maps as inner maps (Daniel Borkmann) - Replace the verifier min/max bounds fields with a circular number (cnum) representation and improve 32->64 bit range refinements (Eduard Zingerman) - Introduce the arena library and runtime (libarena) with a buddy allocator, rbtree and SPMC queue data structures, ASAN support and a parallel test harness. Allow subprograms to return arena pointers and switch to a BTF type-tag based __arena annotation (Emil Tsalapatis) - Cache build IDs in the sleepable stackmap path and avoid faultable build ID reads under mm locks (Ihor Solodrai) - Introduce the tracing_multi link to attach a single BPF program to many kernel functions at once. Allow specifying the uprobe_multi target via FD (Jiri Olsa) - Extend the bpf_list family of kfuncs with bpf_list_add/del(), and bpf_list_is_first/is_last/empty() (Kaitao Cheng) - Extend the BPF syscall with common attributes support for prog_load, btf_load and map_create (Leon Hwang) - Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu) - Add sleepable support for tracepoint programs and fix deadlocks in LRU map due to NMI reentry (Mykyta Yatsenko) - Fix OOB access in bpf_flow_keys, fix nullness analysis of inner arrays, enforce write checks for global subprograms (Nuoqi Gui) - Report the maximum combined stack depth and print a breakdown of instructions processed per subprogram (Paul Chaignon) - Add an XDP load-balancer benchmark and arm64 JIT support for stack arguments (Puranjay Mohan) - Add kfuncs to traverse over wakeup_sources (Samuel Wu) - Allow sleepable BPF programs to use LPM trie maps directly (Vlad Poenaru) - Many more fixes and cleanups across the verifier, BTF, sockmap, devmap, bpffs, security hooks, s390/riscv/loongarch JITs, rqspinlock, libbpf, bpftool, selftests" * tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits) selftests/bpf: Work around llvm stack overflow in crypto progs selftests/bpf: add test for bpf_msg_pop_data() overflow bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check sockmap: Fix use-after-free in udp_bpf_recvmsg() bpf, sockmap: keep sk_msg copy state in sync bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data() bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data() selftsets/bpf: Retry map update on helper_fill_hashmap() selftests/bpf: Add test for sleepable lsm_cgroup rejection selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket selftests/bpf: Avoid static LLVM linking for cross builds selftests/bpf: Use common CFLAGS for urandom_read selftests/bpf: Initialize operation name before use tools/bpf: build: Append extra cflags libbpf: Initialize CFLAGS before including Makefile.include bpftool: Append extra host flags bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS bpftool: Pass host flags to bootstrap libbpf selftests/bpf: correct CONFIG_PPC64 macro name in comment ...	2026-06-17 09:18:14 +01:00
Linus Torvalds	c071a4fbb0	Merge tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing latency updates from Steven Rostedt: - Dump the stack to the buffer on timerlat uret threashold event Record the stack trace in the buffer for THREAD_URET as well as THREAD_CONTEXT when the threshold is hit. Otherwise, if the threshold was not hit at task wakeup, but was at task return, it will not produce a stack trace making it harder to debug. - Have osnoise trace prints print to all buffers The osnoise tracer is allowed to print to the main buffer. Add a osnoise_print() helper function and use trace_array_vprintk() to print osnoise output. * tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/osnoise: Array printk init and cleanup tracing/osnoise: Dump stack on timerlat uret threshold event	2026-06-16 17:38:19 +05:30
Linus Torvalds	18ecdd4d0a	Merge tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - BTF support for dereferencing pointers Add syntax to the parsing of eprobes to typecast structure pointer trace event fields, enabling BTF-based dereferencing instead of relying on manual offsets. - Improvements and robustness enhancements - Use flexible array for entry fetch code. Store probe entry fetch instructions in the probe_entry_arg allocation via a flexible array member to simplify memory allocation and lifetime management. - Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions Replace BUG_ON() calls with lockdep_assert_held() in uprobe buffer enable/disable paths to prevent kernel crashes and better verify lock ownership. - Ensure the uprobe buffer size is bigger than event size. Add a BUILD_BUG_ON() assertion to guarantee that the per-CPU uprobe working buffer size is always larger than the maximum probe event size. * tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/eprobes: Allow use of BTF names to dereference pointers tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions tracing: Use flexible array for entry fetch code tracing/probes: Ensure the uprobe buffer size is bigger than event size	2026-06-16 17:33:20 +05:30
Linus Torvalds	2cbf335f8c	Merge tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "SMP load-balancing updates: - A large series to introduce infrastructure for cache-aware load balancing, with the goal of co-locating tasks that share data within the same Last Level Cache (LLC) domain. By improving cache locality, the scheduler can reduce cache bouncing and cache misses, ultimately improving data access efficiency. Implemented by Chen Yu and Tim Chen, based on early prototype work by Peter Zijlstra, with fixes by Jianyong Wu, Peter Zijlstra and Shrikanth Hegde. - A series to simplify CONFIG_SCHED_SMT ifdef usage (Shrikanth Hegde) Fair scheduler updates: - A series to improve SD_ASYM_CPUCAPACITY scheduling by introducing SMT awareness (Andrea Righi, K Prateek Nayak) - A series to optimize cfs_rq and sched_entity allocation for better data locality (Zecheng Li) - A preparatory series to change fair/cgroup scheduling to a single runqueue, without the final change (Peter Zijlstra) - Auto-manage ext/fair dl_server bandwidth (Andrea Righi) - Fix cpu_util runnable_avg arithmetic (Hongyan Xia) - Optimize update_tg_load_avg()'s rate-limiting code (Rik van Riel) - Allow account_cfs_rq_runtime() to throttle current hierarchy (K Prateek Nayak) - Update util_est after updating util_avg during dequeue, to fix the util signal update logic, which reduces signal noise (Vincent Guittot) Scheduler topology updates: - Allow multiple domains to claim sched_domain_shared (K Prateek Nayak) - Add parameter to split LLC (Peter Zijlstra) Core scheduler updates: - Use trace_call__<tp>() to save a static branch (Gabriele Monaco) Scheduler statistics updates: - Drop now-stale mul_u64_u64_div_u64() cputime over-approximation guard (Nicolas Pitre) Deadline scheduler updates: - Reject debugfs dl_server writes for offline CPUs (Andrea Righi) - Fix replenishment logic for non-deferred servers (Yuri Andriaccio) RT scheduling updates: - Turn RT_PUSH_IPI default off for non PREEMPT_RT (Steven Rostedt) - Update default bandwidth for real-time tasks to 1.0 (Yuri Andriaccio) Proxy scheduling updates: - A series to implement Optimized Donor Migration for Proxy Execution (John Stultz, Peter Zijlstra) - Various proxy scheduling cleanups and fixes (Peter Zijlstra, K Prateek Nayak) Misc fixes, improvements and cleanups by Aaron Lu, Andrea Righi, Zenghui Yu, Chen Yu, Guanyou.Chen, John Stultz, Shrikanth Hegde, Peter Zijlstra, Liang Luo and Yiyang Chen" * tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (91 commits) sched/fair: Fix newidle vs core-sched sched/deadline: Use task_on_rq_migrating() helper sched/core: Combine separate 'else' and 'if' statements sched/fair: Fix cpu_util runnable_avg arithmetic sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime() sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up() sched/fair: Call update_curr() before unthrottling the hierarchy sched/fair: Use throttled_csd_list for local unthrottle sched/fair: Convert cfs bandwidth throttling to use guards sched/fair: Allocate cfs_tg_state with percpu allocator sched/fair: Remove task_group->se pointer array sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state sched: restore timer_slack_ns when resetting RT policy on fork MAINTAINERS: Fix spelling mistake in Peter's name sched: Simplify ttwu_runnable() sched/proxy: Remove superfluous clear_task_blocked_in() sched/proxy: Remove PROXY_WAKING sched/proxy: Switch proxy to use p->is_blocked sched/proxy: Only return migrate when needed sched: Be more strict about p->is_blocked ...	2026-06-15 14:50:18 +05:30
Jiri Olsa	26330a9226	bpf: Add support to specify uprobe_multi target via file descriptor Allow uprobe_multi link to identify the target binary by an already opened file descriptor. Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for the attr.link_create.uprobe_multi struct. When the flag is set, we resolve the target from path_fd, without the flag, we keep the existing string path behavior. I don't see a use case for supporting O_PATH file descriptors, because we need to read the binary first to get probes offsets, so I'm using the CLASS(fd, f), which fails for O_PATH fds. Assisted-by: Codex:GPT-5.4 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-14 17:24:25 -07:00
Jiri Olsa	65d81609e9	bpf: Use user_path_at for path resolution in uprobe_multi Resolve the uprobe_multi user path with user_path_at() instead of copying the string with strndup_user() and passing it to kern_path(). This removes the temporary allocation and keeps the lookup logic in one helper. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-14 17:24:25 -07:00
Jiri Olsa	4d87a251d4	bpf: Guard __get_user acesss with access_ok for uprobe_multi data As reported by sashiko [1] we need to use access_ok to check the user space data bounds before we use __get-user to get it. [1] https://lore.kernel.org/bpf/20260610145235.CB1441F00893@smtp.kernel.org/ Fixes: `0b779b61f6` ("bpf: Add cookies support for uprobe_multi link") Fixes: `89ae89f53d` ("bpf: Add multi uprobe link") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260611114230.950379-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-14 17:24:25 -07:00
Linus Torvalds	acb7500801	Merge tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier fixes from Steven Rostedt: - Fix reset ordering on per-task destruction Reset the task before dropping the slot instead of after, which was causing out-of-bound memory accesses. - Fix HA monitor synchronization and cleanup Ensure synchronous cleanup for HA monitors by running timer callbacks in RCU read-side critical sections and using synchronize_rcu() during destruction. - Avoid armed timers after tasks exit Add automatic cleanup for per-task HA monitors to prevent timers from firing after task exit. - Fix memory ordering for DA/HA monitors Fix race conditions during monitor start by using release-acquire semantics for the monitoring flag. - Fix initialization for DA/HA monitors Ensure monitors are not initialized relying on potentially corrupted state like the monitoring flag, that is not reset by all monitors type and may have an unknown state in monitors reusing the storage (per-task). - Fix memory safety in per-task and per-object monitors Prevent use-after-free and out-of-bounds access by synchronizing with in-flight tracepoint probes using tracepoint_synchronize_unregister() before freeing monitor storage or releasing task slots. - Adjust monitors for preemptible tracepoints Fix monitors that relied on tracepoints disabling preemption. Explicitly disable task migration when per-CPU monitors handle events to avoid accessing the wrong state and update the opid monitor logic. - Fix incorrect __user specifier usage Remove __user from a non-pointer variable in the extract_params() helper. - Fix bugs in the rv tool Ensure strings are NUL-terminated, fix substring matching in monitor searches, and improve cleanup and exit status handling. - Fix several bugs in rvgen Fix LTL literal stringification, subparsers' options handling, and suffix stripping in dot2k. * tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: verification/rvgen: Fix ltl2k writing True as a literal verification/rvgen: Fix options shared among commands verification/rvgen: Fix suffix strip in dot2k tools/rv: Fix cleanup after failed trace setup tools/rv: Fix substring match when listing container monitors tools/rv: Fix substring match bug in monitor name search tools/rv: Ensure monitor name and desc are NUL-terminated rv: Use 0 to check preemption enabled in opid rv: Prevent task migration while handling per-CPU events rv: Ensure synchronous cleanup for HA monitors rv: Add automatic cleanup handlers for per-task HA monitors rv: Do not rely on clean monitor when initialising HA rv: Fix monitor start ordering and memory ordering for monitoring flag rv: Ensure all pending probes terminate on per-obj monitor destroy rv: Prevent in-flight per-task handlers from using invalid slots rv: Reset per-task DA monitors before releasing the slot rv: Fix __user specifier usage in extract_params()	2026-06-09 17:20:00 -07:00
Jiri Olsa	8abecdafd5	bpf: Add support for tracing_multi link fdinfo Adding tracing_multi link fdinfo support with following output: pos: 0 flags: 02000000 mnt_id: 19 ino: 3087 link_type: tracing_multi link_id: 9 prog_tag: 599ba0e317244f86 prog_id: 94 attach_type: 59 cnt: 10 obj-id btf-id cookie func 1 91593 8 bpf_fentry_test1+0x4/0x10 1 91595 9 bpf_fentry_test2+0x4/0x10 1 91596 7 bpf_fentry_test3+0x4/0x20 1 91597 5 bpf_fentry_test4+0x4/0x20 1 91598 4 bpf_fentry_test5+0x4/0x20 1 91599 2 bpf_fentry_test6+0x4/0x20 1 91600 3 bpf_fentry_test7+0x4/0x10 1 91601 1 bpf_fentry_test8+0x4/0x10 1 91602 10 bpf_fentry_test9+0x4/0x10 1 91594 6 bpf_fentry_test10+0x4/0x10 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-17-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:02 -07:00
Jiri Olsa	ba042ed644	bpf: Add support for tracing_multi link session Adding support to use session attachment with tracing_multi link. Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows the BPF_TRACE_FSESSION behaviour but on the tracing_multi link. Such program is called on entry and exit of the attached function and allows to pass cookie value from entry to exit execution. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	46b42af27d	bpf: Add support for tracing_multi link cookies Add support to specify cookies for tracing_multi link. Cookies are provided in array where each value is paired with provided BTF ID value with the same array index. Such cookie can be retrieved by bpf program with bpf_get_attach_cookie helper call. We need to sort cookies array together with ids array in check_dup_ids, to keep the id->cookie relation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-15-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	c1d32dea5d	bpf: Add support for tracing multi link Adding new link to allow to attach program to multiple function BTF IDs. The link is represented by struct bpf_tracing_multi_link. To configure the link, new fields are added to bpf_attr::link_create to pass array of BTF IDs; struct { __aligned_u64 ids; __u32 cnt; } tracing_multi; Each BTF ID represents function (BTF_KIND_FUNC) that the link will attach bpf program to. We use previously added bpf_trampoline_multi_attach/detach functions to attach/detach the link. The linkinfo/fdinfo callbacks will be implemented in following changes. Note this is supported only for archs (x86_64) with ftrace direct and have single ops support. CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS && CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS Note using sort_r (instead of plain sort) in check_dup_ids, because we will use the swap callback in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:01 -07:00
Jiri Olsa	2cd298c106	ftrace: Add add_ftrace_hash_entry function Renaming __add_hash_entry to add_ftrace_hash_entry and making it global, it will be used in following changes outside ftrace.c object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:00 -07:00
Jiri Olsa	af7c323650	ftrace: Add ftrace_hash_remove function Adding ftrace_hash_remove function that removes all entries from struct ftrace_hash object without freeing them. It will be used in following changes where entries are allocated as part of another structure and are free-ed separately. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:00 -07:00
Jiri Olsa	e57f13eaab	ftrace: Add ftrace_hash_count function Adding external ftrace_hash_count function so we could get hash count outside of ftrace object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260606123955.345967-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-06-07 10:03:00 -07:00
Gabriele Monaco	d9022172c1	rv: Use 0 to check preemption enabled in opid Tracepoint handlers no longer run with preemption disabled by default since `a46023d561` ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"), the opid monitor should now count 1 in the preemption count as preemption disabled. Change the rule for preempt_off to preempt > 0. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-11-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-06-03 12:33:25 +02:00
Gabriele Monaco	700782ec8f	rv: Add automatic cleanup handlers for per-task HA monitors Hybrid automata monitors may start timers, depending on the model, these may remain active on an exiting task and cause false positives or even access freed memory. Add an enable/disable hook in the HA code, currently only populated by the per-task handler for registration and deregistration. This hooks to the sched_process_exit event and ensures the timer is stopped for every exiting task. The handler is enabled automatically but may be disabled, for instance if the monitor uses the event for another purpose (but should still manually ensure timers are stopped). Fixes: `f5587d1b6e` ("rv: Add Hybrid Automata monitor type") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-06-03 12:33:24 +02:00
Gabriele Monaco	4793e8a6e2	rv: Fix __user specifier usage in extract_params() The attributes variables extracted from syscalls in the helper are both defined with the __user specifier although only the actual pointer to user data should be marked. Remove the __user specifier from attr. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604150820.Ny143u6X-lkp@intel.com Fixes: `b133207deb` ("rv: Add nomiss deadline monitor") Reviewed-by: Wen Yang <wen.yang@linux.dev> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>	2026-06-03 12:33:23 +02:00
Steven Rostedt	69efd863a7	tracing/eprobes: Allow use of BTF names to dereference pointers Add syntax to the parsing of eprobes to be able to typecast a trace event field that is a pointer to a structure. Currently, a dereference must be a number, where the user has to figure out manually the offset of a member of a structure that they want to dereference. But for event probes that records a field that happens to be a pointer to a structure, it cannot dereference these values with BTF naming, but must use numerical offsets. For example, to find out what device a sk_buff is pointing to in the net_dev_xmit trace event, one must first use gdb to find the offsets of the members of the structures: (gdb) p &((struct sk_buff )0)->dev $1 = (struct net_device ) 0x10 (gdb) p &((struct net_device )0)->name $2 = (char (*)[16]) 0x118 And then use the raw numbers to dereference: # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events If BTF is in the kernel, then instead, the skbaddr can be typecast to sk_buff and use the normal dereference logic. # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events # echo 1 > events/eprobes/xmit/enable # cat trace [..] sshd-session-1022 [000] b..2. 860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0" The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..] Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS to know what they are for. Link: https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-06-02 23:36:22 +09:00
Yash Suthar	585abc02be	tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions Replace BUG_ON(!mutex_is_locked(&event_mutex)) with lockdep_assert_held(&event_mutex) in uprobe_buffer_enable() and uprobe_buffer_disable(). BUG_ON() will crash the kernel. mutex_is_locked() only checks if any task holds lock,but not the caller task. lockdep_assert_held() also check current task for lock and no crash on true condition. Link: https://lore.kernel.org/all/20260521192846.8306-1-yashsuthar983@gmail.com/ Signed-off-by: Yash Suthar <yashsuthar983@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-06-01 23:35:21 +09:00
Rosen Penev	cf24cbb4e5	tracing: Use flexible array for entry fetch code Store probe entry fetch instructions in the probe_entry_arg allocation instead of allocating a separate instruction array. This keeps the entry fetch code tied to the entry argument lifetime while leaving regular probe_arg instruction arrays separately allocated and freed. Assisted-by: Codex:GPT-5.5 Link: https://lore.kernel.org/all/20260520215817.16560-1-rosenp@gmail.com/ Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-06-01 23:35:21 +09:00
Masami Hiramatsu (Google)	4304c81652	tracing/probes: Ensure the uprobe buffer size is bigger than event size Add BUILD_BUG_ON() to ensure the uprobe per-CPU working buffer size is bigger than the event size. Link: https://lore.kernel.org/all/177849383209.8038.1902170479780501237.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-06-01 23:35:21 +09:00
Masami Hiramatsu (Google)	85e0f27dd1	tracing/probes: Point the error offset correctly for eprobe argument error Fix to point the error offset correctly for eprobe argument error. In the cleanup commit `1b8b0cd754` ("tracing/probes: Move event parameter fetching code to common parser"), due to incorrect backward compatibility aimed at conforming to the test specifications, the error location was set to 0 when a non-existent formal parameter was specified for Eprobe. However, this should be corrected in both the test and the implementation to point correct error position. Link: https://lore.kernel.org/all/177967567399.209006.1451571244515632097.stgit@devnote2/ Fixes: `1b8b0cd754` ("tracing/probes: Move event parameter fetching code to common parser") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-30 22:45:50 +09:00
Crystal Wood	9cb99c5986	tracing/osnoise: Array printk init and cleanup None of the calls to trace_array_printk_buf() will do anything if we don't initialize the buffer on instance creation (unless some other tracer called it), so do that. Add an osnoise_print() function to facilitate adding debug prints (without tainting). Use trace_array_printk() instead of trace_array_printk_buf(), as we're only writing to the main buffer (of a non-main instance) anyway -- and trace_array_printk_buf() skips the check to make sure we're not printing to the global instance. Link: https://patch.msgid.link/20260511223035.1475676-1-crwood@redhat.com Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-29 10:40:27 -04:00
Alexei Starovoitov	eb19eead36	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5 Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-25 06:33:30 -07:00
Linus Torvalds	23884007af	Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Avoid NULL return from hist_field_name() The function hist_field_name() is directly passed to a strcat() which does not handle "NULL" characters. Return a zero length string when size is greater than the limit. This is used only to output already created histograms and no field currently is greater than the limit. But it should still not return NULL. - Do not call map->ops->elt_free() on allocation failure When elt_alloc() fails, it should not call the map->ops->elt_free() function if it exists, as that function may not be able to handle the free on allocation failures. The ->elt_free() should only be called when elt_alloc() succeeds. * tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not call map->ops->elt_free() if elt_alloc() fails tracing: Avoid NULL return from hist_field_name() on truncation	2026-05-22 06:09:58 -07:00
Masami Hiramatsu (Google)	8f0f5c4fb9	tracing: Do not call map->ops->elt_free() if elt_alloc() fails In paths where tracing_map_elt_alloc() failed to allocate objects, the map->ops->elt_alloc() call was never successful. In this case, map->ops->elt_free() should not be called. Link: https://sashiko.dev/#/patchset/20260520223101.34710-1-rosenp%40gmail.com Cc: stable@vger.kernel.org Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rosen Penev <rosenp@gmail.com> Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: `2734b62952` ("tracing: Add per-element variable support to tracing_map") Link: https://patch.msgid.link/177933895460.108746.5396070821443932634.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 11:29:03 -04:00
Thomas Weißschuh	057caace52	tracing: Create output file from cmd_check_undefined As the output file is currently never created, the check will run every time, even if the inputs have not changed. Create an empty output file which allows make to skip the execution when it is not necessary. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260520-tracing-ringbuffer-check-v1-1-d979cfab1338@weissschuh.net Fixes: `1211907ac0` ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Fixes: `58b4bd1839` ("tracing: Adjust cmd_check_undefined to show unexpected undefined symbols") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 08:31:55 -04:00
Vincent Donnefort	a0a2f42a37	tracing: Fix unload_page for simple_ring_buffer init rollback The unload_page callback expects the return value of load_page() as its argument: ret = load_page(va); unload(ret). Fix the rollback code in simple_ring_buffer_init_mm() where the descriptor's VA is used instead of the loaded page address. Link: https://patch.msgid.link/20260512141614.1759430-1-vdonnefort@google.com Fixes: `635923081c` ("tracing: load/unload page callbacks for simple_ring_buffer") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 08:26:22 -04:00
David Carlier	c2d2856cf6	tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm() nr_subbufs in the ring buffer metadata is always initialized to zero because it is assigned from cpu_buffer->nr_pages before the page initialization loop has run. While nr_subbufs is not currently read by the kernel, it should reflect the actual buffer geometry in the meta page for correctness. Move the assignment after the page loop so that cpu_buffer->nr_pages holds the final count. Link: https://patch.msgid.link/20260512135420.99194-1-devnexen@gmail.com Fixes: `34e5b958bd` ("tracing: Introduce simple_ring_buffer") Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 08:24:59 -04:00
Masami Hiramatsu (Google)	a494d3c8d5	ring-buffer: Flush and stop persistent ring buffer on panic On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: `e645535a95` ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 08:20:58 -04:00
Steven Rostedt	a254b6d13b	ring-buffer: Fix reporting of missed events in iterator When tracing is active while reading the trace file, if the iterator reading the buffer detects that the writer has passed the iterator head, it will reset and set a "missed events" flag. This flag is passed to the output processing to show the user that events were missed: CPU:4 [LOST EVENTS] The problem is that the flag is reset after it is checked in ring_buffer_iter_dropped(). But the "trace" file iterates over all the CPU ring buffers and it will check if they are dropped when figuring out which buffer to print next. This prematurely clears the missed_events flag if the CPU buffer with the missed events is not the one that is printed next. On the iteration where the CPU buffer with the missed events is printed, the check if it had missed events would return false and the output does not show that events were missed. Do not reset the missed_events flag when checking if there were missed events, but instead clear it when moving the iterator head to the next event. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260520220801.4fd09d13@fedora Fixes: `c9b7a4a72f` ("ring-buffer/tracing: Have iterator acknowledge dropped events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-21 08:20:29 -04:00
Crystal Wood	e11c9c8365	tracing/osnoise: Dump stack on timerlat uret threshold event Dump the saved IRQ stack trace regardless of whether the event was THREAD_CONTEXT or THREAD_URET. In the uret case, the latency presumably had not yet crossed the threshold at IRQ time (or else it would have dumped the stack at thread wakeup time, unless we're racing with a change to the threshold), but it may have at least contributed -- and this is possible with THREAD_CONTEXT as well. In any case, it helps with writing reliable rtla tests if we always get a stack trace on a threshold event. Cc: John Kacur <jkacur@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Wander Lairson Costa <wander@redhat.com> Link: https://patch.msgid.link/20260511223143.1477332-1-crwood@redhat.com Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-20 16:29:25 -04:00
David Carlier	576ec047d2	tracing: Avoid NULL return from hist_field_name() on truncation hist_field_name() returns "" everywhere except the fully-qualified VAR_REF/EXPR case, where snprintf() truncation returns NULL early and bypasses the bottom NULL->"" guard. Callers don't expect NULL: strcat(expr, hist_field_name(field, 0)) at trace_events_hist.c:1758 and the strcmp() in the sort-key match loop at :4804 both deref it. system and event_name are bounded by MAX_EVENT_NAME_LEN, but the field name on a VAR_REF is kstrdup'd from a histogram variable name parsed out of the trigger string and has no length cap, so a long enough var name in a fully qualified reference can reach the truncation path. Keep the length check but leave field_name as "" on overflow. Link: https://patch.msgid.link/20260508195747.25492-1-devnexen@gmail.com Fixes: `5ec1d1e97d` ("tracing: Rebuild full_name on each hist_field_name() call") Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-20 16:10:56 -04:00
Yiyang Chen	ea19506013	sched/clock: Provide !HAVE_UNSTABLE_SCHED_CLOCK stub for sched_clock_stable() When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is disabled, sched_clock() is already assumed to provide stable semantics, but the public header doesn't provide a sched_clock_stable() stub for that case. Add a header stub that always returns true and clean up the duplicate local stub in ring_buffer.c, so callers can use sched_clock_stable() unconditionally. Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Link: https://patch.msgid.link/56e45338858946cd9581b75c8bd45dd37dba52c5.1778773587.git.cyyzero16@gmail.com	2026-05-19 12:17:35 +02:00
Linus Torvalds	e5d505e366	Merge tag 'trace-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add more functions to the remote allowed list randconfig found more functions that are allowed for the remote code for s390 and arm. Add them to the allowed list. - Fix remote_test error path If one of the simple ring buffers fails to load, the code is supposed to rollback its initialized buffers. Instead of rolling back the buffers for the failed load, it uses the global variable and rolls back all the successfully loaded buffers. * tag 'trace-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix desc in error path for the trace remote test module ring-buffer remote: Avoid unexpected symbol warnings (arm, s390)	2026-05-17 12:02:31 -07:00
Vincent Donnefort	55a0005518	tracing: Fix desc in error path for the trace remote test module During initialisation in remote_test_load(), if one of the simple_ring_buffer fails to initialise, the error path attempts to rollback initialised buffers. However, the rollback incorrectly uses the global pointer to the trace descriptor, which is only set upon successful load completion. Fix the error path by using the local pointer to the descriptor. Link: https://patch.msgid.link/20260515201616.337469-1-vdonnefort@google.com Fixes: `ea908a2b79` ("tracing: Add a trace remote module for testing") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> base-commit: `5d6919055d` Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-16 16:11:04 -04:00
Arnd Bergmann	96350db80e	ring-buffer remote: Avoid unexpected symbol warnings (arm, s390) The now more verbose check found more architecture specific symbol missing from the whitelist, during randconfig testing on s390 and 32-bit arm: Unexpected symbols in kernel/trace/simple_ring_buffer.o: U __aeabi_unwind_cpp_pr1 Unexpected symbols in kernel/trace/simple_ring_buffer.o: U __s390_indirect_jump_r1 U __s390_indirect_jump_r10 U __s390_indirect_jump_r14 U __s390_indirect_jump_r2 U __s390_indirect_jump_r5 U __s390_indirect_jump_r7 U __s390_indirect_jump_r8 U __s390_indirect_jump_r9 make[6]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:160: kernel/trace/simple_ring_buffer.o.checked] Error 1 Add these to the list and keep it roughly sorted into sanitizer and architecture symbols. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20260515105717.1023007-1-arnd@kernel.org Fixes: `1211907ac0` ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-05-15 14:59:30 -04:00
Masami Hiramatsu (Google)	657b594b20	fprobe: Fix unregister_fprobe() to wait for RCU grace period Commit `4346ba1604` ("fprobe: Rewrite fprobe on function-graph tracer") changed fprobe to register struct fprobe to an rcu-hlist, but it forgot to wait for RCU GP. Thus there can be use-after-free if the fprobe is released right after unregistering. This can be happened on fprobe event and sample module code. To fix this issue, add synchronize_rcu() in unregister_fprobe(). Note that BPF is OK because fprobe is used as a part of bpf_kprobe_multi_link. This unregisters its fprobe in bpf_kprobe_multi_link_release() and it is deallocated via bpf_kprobe_multi_link_dealloc(), which is invoked from bpf_link_defer_dealloc_rcu_gp() RCU callback. For BPF, this also introduced unregister_fprobe_async() which does NOT wait for RCU grace priod. Link: https://lore.kernel.org/all/177813998919.256460.2809243930741138224.stgit@mhiramat.tok.corp.google.com/ Fixes: `4346ba1604` ("fprobe: Rewrite fprobe on function-graph tracer") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-05-11 19:04:46 +09:00
Alexei Starovoitov	7e033543a2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc3 Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-10 13:24:49 -07:00
Steven Rostedt	b2aa3b4d64	tracing/probes: Limit size of event probe to 3K There currently isn't a max limit an event probe can be. One could make an event greater than PAGE_SIZE, which makes the event useless because if it's bigger than the max event that can be recorded into the ring buffer, then it will never be recorded. A event probe should never need to be greater than 3K, so make that the max size. As long as the max is less than the max that can be recorded onto the ring buffer, it should be fine. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: `93ccae7a22` ("tracing/kprobes: Support basic types on dynamic events") Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-29 16:07:38 -04:00
Breno Leitao	3b75dd76e6	tracing: branch: Fix inverted check on stat tracer registration init_annotated_branch_stats() and all_annotated_branch_stats() check the return value of register_stat_tracer() with "if (!ret)", but register_stat_tracer() returns 0 on success and a negative errno on failure. The inverted check causes the warning to be printed on every successful registration, e.g.: Warning: could not register annotated branches stats while leaving real failures silent. The initcall also returned a hard-coded 1 instead of the actual error. Invert the check and propagate ret so that the warning fires on real errors and the initcall reports the correct status. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://patch.msgid.link/20260420-tracing-v1-1-d8f4cd0d6af1@debian.org Fixes: `002bb86d8d` ("tracing/ftrace: separate events tracing and stats tracing engine") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-28 14:28:29 -04:00
Linus Torvalds	27d128c1cf	Merge tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Fix accounting of persistent ring buffer rewind On boot up, the head page is moved back to the earliest point of the saved ring buffer. This is because the ring buffer being read by user space on a crash may not save the part it read. Rewinding the head page back to the earliest saved position helps keep those events from being lost. The number of events is also read during boot up and displayed in the stats file in the tracefs directory. It's also used for other accounting as well. On boot up, the "reader page" is accounted for but a rewind may put it back into the buffer and then the reader page may be accounted for again. Save off the original reader page and skip accounting it when scanning the pages in the ring buffer. * tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Do not double count the reader_page	2026-04-24 15:17:23 -07:00
Masami Hiramatsu (Google)	92d5a60672	ring-buffer: Do not double count the reader_page Since the cpu_buffer->reader_page is updated if there are unwound pages. After that update, we should skip the page if it is the original reader_page, because the original reader_page is already checked. Cc: stable@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177701353063.2223789.1471163147644103306.stgit@mhiramat.tok.corp.google.com Fixes: `ca296d32ec` ("tracing: ring_buffer: Rewind persistent ring buffer on reboot") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-24 15:34:39 -04:00
Linus Torvalds	1e18ed5727	Merge tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Make undefsyms_base.c into a real file The file undefsyms_base.c is used to catch any symbols used by a remote ring buffer that is made for use of a pKVM hypervisor. As it doesn't share the same text as the rest of the kernel, referencing any symbols within the kernel will make it fail to be built for the standalone hypervisor. A file was created by the Makefile that checked for any symbols that could cause issues. There's no reason to have this file created by the Makefile, just create it as a normal file instead. * tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Make undefsyms_base.c a first-class citizen	2026-04-22 14:47:52 -07:00
Mykyta Yatsenko	57918341dd	bpf: Add sleepable support for classic tracepoint programs Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for faultable tracepoints that supports sleepable BPF programs. It uses rcu_tasks_trace for lifetime protection and bpf_prog_run_array_sleepable() for per-program RCU flavor selection, following the uprobe_prog_run() pattern. Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF programs before perf event processing. Previously, BPF ran after the per-cpu perf trace buffer was allocated under preempt_disable, requiring cleanup via perf_swevent_put_recursion_context() on filter. Now BPF runs in faultable context before preempt_disable, reading syscall arguments from local variables instead of the per-cpu trace record, removing the dependency on buffer allocation. This allows sleepable BPF programs to execute and avoids unnecessary buffer allocation when BPF filters the event. The perf event submission path (buffer allocation, fill, submit) remains under preempt_disable as before. Since BPF no longer runs within the buffer allocation context, the fake_regs output parameter to perf_trace_buf_alloc() is no longer needed and is replaced with NULL. Add an attach-time check in __perf_event_set_bpf_prog() to reject sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall tracepoints, since only syscall tracepoints run in faultable context. This prepares the classic tracepoint runtime and attach paths for sleepable programs. The verifier changes to allow loading sleepable BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch. To: Peter Zijlstra <peterz@infradead.org> To: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits Acked-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-3-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>	2026-04-22 22:44:29 +02:00
Mykyta Yatsenko	439ebd5b57	bpf: Add sleepable support for raw tracepoint programs Rework __bpf_trace_run() to support sleepable BPF programs by using explicit RCU flavor selection, following the uprobe_prog_run() pattern. For sleepable programs, use rcu_read_lock_tasks_trace() for lifetime protection with migrate_disable(). For non-sleepable programs, use the regular rcu_read_lock_dont_migrate(). Remove the preempt_disable_notrace/preempt_enable_notrace pair from the faultable tracepoint BPF probe wrapper in bpf_probe.h, since migration protection and RCU locking are now handled per-program inside __bpf_trace_run(). Adapt bpf_prog_test_run_raw_tp() for sleepable programs: reject BPF_F_TEST_RUN_ON_CPU since sleepable programs cannot run in hardirq or preempt-disabled context, and call __bpf_prog_test_run_raw_tp() directly instead of via smp_call_function_single(). Rework __bpf_prog_test_run_raw_tp() to select RCU flavor per-program and add per-program recursion context guard for private stack safety. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-1-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>	2026-04-22 22:44:24 +02:00
Paolo Bonzini	5335e318ad	tracing: Make undefsyms_base.c a first-class citizen Linus points out that dumping undefsyms_base.c form the Makefile is rather ugly, and that a much better course of action would be to have this file as a first-class citizen in the git tree. This allows some extra cleanup in the Makefile, and the removal of the .gitignore file in kernel/trace. Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/CAHk-=wieqGd_XKpu8UxDoyADZx8TDe8CF3RmkUXt5N_9t5Pf_w@mail.gmail.com Link: https://lore.kernel.org/all/20260421095446.2951646-1-maz@kernel.org/ Link: https://patch.msgid.link/20260421100455.324333-1-pbonzini@redhat.com Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2026-04-22 11:24:41 -04:00
Masami Hiramatsu (Google)	476c5bbae6	tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading Fix fprobe to unregister ftrace_ops if corresponding type of fprobe does not exist on the fprobe_ip_table and it is expected to be empty when unloading modules. Since ftrace thinks that the empty hash means everything to be traced, if we set fprobes only on the unloaded module, all functions are traced unexpectedly after unloading module. e.g. # modprobe xt_LOG.ko # echo 'f:test log_tg*' > dynamic_events # echo 1 > events/fprobes/test/enable # cat enabled_functions log_tg [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_check [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_destroy [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 # rmmod xt_LOG # wc -l enabled_functions 34085 enabled_functions Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 09:24:13 +09:00
Masami Hiramatsu (Google)	0ac0058a74	tracing/fprobe: Check the same type fprobe on table as the unregistered one Commit `2c67dc457b` ("tracing: fprobe: optimization for entry only case") introduced a different ftrace_ops for entry-only fprobes. However, when unregistering an fprobe, the kernel only checks if another fprobe exists at the same address, without checking which type of fprobe it is. If different fprobes are registered at the same address, the same address will be registered in both fgraph_ops and ftrace_ops, but only one of them will be deleted when unregistering. (the one removed first will not be deleted from the ops). This results in junk entries remaining in either fgraph_ops or ftrace_ops. For example: ======= cd /sys/kernel/tracing # 'Add entry and exit events on the same place' echo 'f:event1 vfs_read' >> dynamic_events echo 'f:event2 vfs_read%return' >> dynamic_events # 'Enable both of them' echo 1 > events/fprobes/enable cat enabled_functions vfs_read (2) ->arch_ftrace_ops_list_func+0x0/0x210 # 'Disable and remove exit event' echo 0 > events/fprobes/event2/enable echo -:event2 >> dynamic_events # 'Disable and remove all events' echo 0 > events/fprobes/enable echo > dynamic_events # 'Add another event' echo 'f:event3 vfs_open%return' > dynamic_events cat dynamic_events f:fprobes/event3 vfs_open%return echo 1 > events/fprobes/enable cat enabled_functions vfs_open (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150} vfs_read (1) tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60 subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150} ======= As you can see, an entry for the vfs_read remains. To fix this issue, when unregistering, the kernel should also check if there is the same type of fprobes still exist at the same address, and if not, delete its entry from either fgraph_ops or ftrace_ops. Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/ Fixes: `2c67dc457b` ("tracing: fprobe: optimization for entry only case") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 00:03:10 +09:00

1 2 3 4 5 ...

7193 Commits