Commit Graph

7193 Commits

Author SHA1 Message Date
Linus Torvalds 9c87e61e3c Merge tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
 "Major changes:

   - Recover from BPF arena page faults using a scratch page and add
     ptep_try_set() for lockless empty-slot installs on x86 and arm64.

     This allows BPF kfuncs to access arena pointers directly.

     The 'arena_direct_access' stable branch was created for this work
     and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar
     Kartikeya Dwivedi)

   - Lift old restriction and support 6+ arguments in BPF programs and
     kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan)

  Other features and fixes:

   - Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease
     addition of new BTF kinds (Alan Maguire)

   - Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei
     Starovoitov)

   - Refactor object relationship tracking in the verifier and fix a
     dynptr use-after-free bug (Amery Hung)

   - Harden the signed program loader and reject exclusive maps as inner
     maps (Daniel Borkmann)

   - Replace the verifier min/max bounds fields with a circular number
     (cnum) representation and improve 32->64 bit range refinements
     (Eduard Zingerman)

   - Introduce the arena library and runtime (libarena) with a buddy
     allocator, rbtree and SPMC queue data structures, ASAN support and
     a parallel test harness. Allow subprograms to return arena pointers
     and switch to a BTF type-tag based __arena annotation (Emil
     Tsalapatis)

   - Cache build IDs in the sleepable stackmap path and avoid faultable
     build ID reads under mm locks (Ihor Solodrai)

   - Introduce the tracing_multi link to attach a single BPF program to
     many kernel functions at once. Allow specifying the uprobe_multi
     target via FD (Jiri Olsa)

   - Extend the bpf_list family of kfuncs with bpf_list_add/del(), and
     bpf_list_is_first/is_last/empty() (Kaitao Cheng)

   - Extend the BPF syscall with common attributes support for
     prog_load, btf_load and map_create (Leon Hwang)

   - Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu)

   - Add sleepable support for tracepoint programs and fix deadlocks in
     LRU map due to NMI reentry (Mykyta Yatsenko)

   - Fix OOB access in bpf_flow_keys, fix nullness analysis of inner
     arrays, enforce write checks for global subprograms (Nuoqi Gui)

   - Report the maximum combined stack depth and print a breakdown of
     instructions processed per subprogram (Paul Chaignon)

   - Add an XDP load-balancer benchmark and arm64 JIT support for stack
     arguments (Puranjay Mohan)

   - Add kfuncs to traverse over wakeup_sources (Samuel Wu)

   - Allow sleepable BPF programs to use LPM trie maps directly (Vlad
     Poenaru)

   - Many more fixes and cleanups across the verifier, BTF, sockmap,
     devmap, bpffs, security hooks, s390/riscv/loongarch JITs,
     rqspinlock, libbpf, bpftool, selftests"

* tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits)
  selftests/bpf: Work around llvm stack overflow in crypto progs
  selftests/bpf: add test for bpf_msg_pop_data() overflow
  bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
  sockmap: Fix use-after-free in udp_bpf_recvmsg()
  bpf, sockmap: keep sk_msg copy state in sync
  bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  selftsets/bpf: Retry map update on helper_fill_hashmap()
  selftests/bpf: Add test for sleepable lsm_cgroup rejection
  selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper
  bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
  selftests/bpf: Avoid static LLVM linking for cross builds
  selftests/bpf: Use common CFLAGS for urandom_read
  selftests/bpf: Initialize operation name before use
  tools/bpf: build: Append extra cflags
  libbpf: Initialize CFLAGS before including Makefile.include
  bpftool: Append extra host flags
  bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS
  bpftool: Pass host flags to bootstrap libbpf
  selftests/bpf: correct CONFIG_PPC64 macro name in comment
  ...
2026-06-17 09:18:14 +01:00
Linus Torvalds c071a4fbb0 Merge tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing latency updates from Steven Rostedt:

 - Dump the stack to the buffer on timerlat uret threashold event

   Record the stack trace in the buffer for THREAD_URET as well as
   THREAD_CONTEXT when the threshold is hit. Otherwise, if the threshold
   was not hit at task wakeup, but was at task return, it will not
   produce a stack trace making it harder to debug.

 - Have osnoise trace prints print to all buffers

   The osnoise tracer is allowed to print to the main buffer. Add a
   osnoise_print() helper function and use trace_array_vprintk() to
   print osnoise output.

* tag 'trace-latency-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/osnoise: Array printk init and cleanup
  tracing/osnoise: Dump stack on timerlat uret threshold event
2026-06-16 17:38:19 +05:30
Linus Torvalds 18ecdd4d0a Merge tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:

 - BTF support for dereferencing pointers

   Add syntax to the parsing of eprobes to typecast structure pointer
   trace event fields, enabling BTF-based dereferencing instead of
   relying on manual offsets.

 - Improvements and robustness enhancements

    - Use flexible array for entry fetch code.

      Store probe entry fetch instructions in the probe_entry_arg
      allocation via a flexible array member to simplify memory
      allocation and lifetime management.

    - Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions

      Replace BUG_ON() calls with lockdep_assert_held() in uprobe buffer
      enable/disable paths to prevent kernel crashes and better verify
      lock ownership.

    - Ensure the uprobe buffer size is bigger than event size.

      Add a BUILD_BUG_ON() assertion to guarantee that the per-CPU
      uprobe working buffer size is always larger than the maximum probe
      event size.

* tag 'probes-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobes: Allow use of BTF names to dereference pointers
  tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions
  tracing: Use flexible array for entry fetch code
  tracing/probes: Ensure the uprobe buffer size is bigger than event size
2026-06-16 17:33:20 +05:30
Linus Torvalds 2cbf335f8c Merge tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "SMP load-balancing updates:

   - A large series to introduce infrastructure for cache-aware load
     balancing, with the goal of co-locating tasks that share data
     within the same Last Level Cache (LLC) domain. By improving cache
     locality, the scheduler can reduce cache bouncing and cache misses,
     ultimately improving data access efficiency.

     Implemented by Chen Yu and Tim Chen, based on early prototype work
     by Peter Zijlstra, with fixes by Jianyong Wu, Peter Zijlstra and
     Shrikanth Hegde.

   - A series to simplify CONFIG_SCHED_SMT ifdef usage (Shrikanth Hegde)

  Fair scheduler updates:

   - A series to improve SD_ASYM_CPUCAPACITY scheduling by introducing
     SMT awareness (Andrea Righi, K Prateek Nayak)

   - A series to optimize cfs_rq and sched_entity allocation for better
     data locality (Zecheng Li)

   - A preparatory series to change fair/cgroup scheduling to a single
     runqueue, without the final change (Peter Zijlstra)

   - Auto-manage ext/fair dl_server bandwidth (Andrea Righi)

   - Fix cpu_util runnable_avg arithmetic (Hongyan Xia)

   - Optimize update_tg_load_avg()'s rate-limiting code (Rik van Riel)

   - Allow account_cfs_rq_runtime() to throttle current hierarchy
     (K Prateek Nayak)

   - Update util_est after updating util_avg during dequeue, to fix the
     util signal update logic, which reduces signal noise (Vincent
     Guittot)

  Scheduler topology updates:

   - Allow multiple domains to claim sched_domain_shared (K Prateek
     Nayak)

   - Add parameter to split LLC (Peter Zijlstra)

  Core scheduler updates:

   - Use trace_call__<tp>() to save a static branch (Gabriele Monaco)

  Scheduler statistics updates:

   - Drop now-stale mul_u64_u64_div_u64() cputime over-approximation
     guard (Nicolas Pitre)

  Deadline scheduler updates:

   - Reject debugfs dl_server writes for offline CPUs (Andrea Righi)

   - Fix replenishment logic for non-deferred servers (Yuri Andriaccio)

  RT scheduling updates:

   - Turn RT_PUSH_IPI default off for non PREEMPT_RT (Steven Rostedt)

   - Update default bandwidth for real-time tasks to 1.0 (Yuri
     Andriaccio)

  Proxy scheduling updates:

   - A series to implement Optimized Donor Migration for Proxy Execution
     (John Stultz, Peter Zijlstra)

   - Various proxy scheduling cleanups and fixes (Peter Zijlstra,
     K Prateek Nayak)

  Misc fixes, improvements and cleanups by Aaron Lu, Andrea Righi,
  Zenghui Yu, Chen Yu, Guanyou.Chen, John Stultz, Shrikanth Hegde,
  Peter Zijlstra, Liang Luo and Yiyang Chen"

* tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (91 commits)
  sched/fair: Fix newidle vs core-sched
  sched/deadline: Use task_on_rq_migrating() helper
  sched/core: Combine separate 'else' and 'if' statements
  sched/fair: Fix cpu_util runnable_avg arithmetic
  sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()
  sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()
  sched/fair: Call update_curr() before unthrottling the hierarchy
  sched/fair: Use throttled_csd_list for local unthrottle
  sched/fair: Convert cfs bandwidth throttling to use guards
  sched/fair: Allocate cfs_tg_state with percpu allocator
  sched/fair: Remove task_group->se pointer array
  sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state
  sched: restore timer_slack_ns when resetting RT policy on fork
  MAINTAINERS: Fix spelling mistake in Peter's name
  sched: Simplify ttwu_runnable()
  sched/proxy: Remove superfluous clear_task_blocked_in()
  sched/proxy: Remove PROXY_WAKING
  sched/proxy: Switch proxy to use p->is_blocked
  sched/proxy: Only return migrate when needed
  sched: Be more strict about p->is_blocked
  ...
2026-06-15 14:50:18 +05:30
Jiri Olsa 26330a9226 bpf: Add support to specify uprobe_multi target via file descriptor
Allow uprobe_multi link to identify the target binary by an already
opened file descriptor.

Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for
the attr.link_create.uprobe_multi struct.

When the flag is set, we resolve the target from path_fd, without the
flag, we keep the existing string path behavior.

I don't see a use case for supporting O_PATH file descriptors, because
we need to read the binary first to get probes offsets, so I'm using
the CLASS(fd, f), which fails for O_PATH fds.

Assisted-by: Codex:GPT-5.4
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-14 17:24:25 -07:00
Jiri Olsa 65d81609e9 bpf: Use user_path_at for path resolution in uprobe_multi
Resolve the uprobe_multi user path with user_path_at() instead of copying
the string with strndup_user() and passing it to kern_path(). This removes
the temporary allocation and keeps the lookup logic in one helper.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260611114230.950379-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-14 17:24:25 -07:00
Jiri Olsa 4d87a251d4 bpf: Guard __get_user acesss with access_ok for uprobe_multi data
As reported by sashiko [1] we need to use access_ok to check the user
space data bounds before we use __get-user to get it.

[1] https://lore.kernel.org/bpf/20260610145235.CB1441F00893@smtp.kernel.org/
Fixes: 0b779b61f6 ("bpf: Add cookies support for uprobe_multi link")
Fixes: 89ae89f53d ("bpf: Add multi uprobe link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260611114230.950379-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-14 17:24:25 -07:00
Linus Torvalds acb7500801 Merge tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull runtime verifier fixes from Steven Rostedt:

 - Fix reset ordering on per-task destruction

   Reset the task before dropping the slot instead of after, which was
   causing out-of-bound memory accesses.

 - Fix HA monitor synchronization and cleanup

   Ensure synchronous cleanup for HA monitors by running timer callbacks
   in RCU read-side critical sections and using synchronize_rcu() during
   destruction.

 - Avoid armed timers after tasks exit

   Add automatic cleanup for per-task HA monitors to prevent timers from
   firing after task exit.

 - Fix memory ordering for DA/HA monitors

   Fix race conditions during monitor start by using release-acquire
   semantics for the monitoring flag.

 - Fix initialization for DA/HA monitors

   Ensure monitors are not initialized relying on potentially corrupted
   state like the monitoring flag, that is not reset by all monitors
   type and may have an unknown state in monitors reusing the storage
   (per-task).

 - Fix memory safety in per-task and per-object monitors

   Prevent use-after-free and out-of-bounds access by synchronizing with
   in-flight tracepoint probes using tracepoint_synchronize_unregister()
   before freeing monitor storage or releasing task slots.

 - Adjust monitors for preemptible tracepoints

   Fix monitors that relied on tracepoints disabling preemption.
   Explicitly disable task migration when per-CPU monitors handle events
   to avoid accessing the wrong state and update the opid monitor logic.

 - Fix incorrect __user specifier usage

   Remove __user from a non-pointer variable in the extract_params()
   helper.

 - Fix bugs in the rv tool

   Ensure strings are NUL-terminated, fix substring matching in monitor
   searches, and improve cleanup and exit status handling.

 - Fix several bugs in rvgen

   Fix LTL literal stringification, subparsers' options handling, and
   suffix stripping in dot2k.

* tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  verification/rvgen: Fix ltl2k writing True as a literal
  verification/rvgen: Fix options shared among commands
  verification/rvgen: Fix suffix strip in dot2k
  tools/rv: Fix cleanup after failed trace setup
  tools/rv: Fix substring match when listing container monitors
  tools/rv: Fix substring match bug in monitor name search
  tools/rv: Ensure monitor name and desc are NUL-terminated
  rv: Use 0 to check preemption enabled in opid
  rv: Prevent task migration while handling per-CPU events
  rv: Ensure synchronous cleanup for HA monitors
  rv: Add automatic cleanup handlers for per-task HA monitors
  rv: Do not rely on clean monitor when initialising HA
  rv: Fix monitor start ordering and memory ordering for monitoring flag
  rv: Ensure all pending probes terminate on per-obj monitor destroy
  rv: Prevent in-flight per-task handlers from using invalid slots
  rv: Reset per-task DA monitors before releasing the slot
  rv: Fix __user specifier usage in extract_params()
2026-06-09 17:20:00 -07:00
Jiri Olsa 8abecdafd5 bpf: Add support for tracing_multi link fdinfo
Adding tracing_multi link fdinfo support with following output:

pos:    0
flags:  02000000
mnt_id: 19
ino:    3087
link_type:      tracing_multi
link_id:        9
prog_tag:       599ba0e317244f86
prog_id:        94
attach_type:    59
cnt:    10
obj-id   btf-id  cookie  func
1        91593   8       bpf_fentry_test1+0x4/0x10
1        91595   9       bpf_fentry_test2+0x4/0x10
1        91596   7       bpf_fentry_test3+0x4/0x20
1        91597   5       bpf_fentry_test4+0x4/0x20
1        91598   4       bpf_fentry_test5+0x4/0x20
1        91599   2       bpf_fentry_test6+0x4/0x20
1        91600   3       bpf_fentry_test7+0x4/0x10
1        91601   1       bpf_fentry_test8+0x4/0x10
1        91602   10      bpf_fentry_test9+0x4/0x10
1        91594   6       bpf_fentry_test10+0x4/0x10

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-17-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:02 -07:00
Jiri Olsa ba042ed644 bpf: Add support for tracing_multi link session
Adding support to use session attachment with tracing_multi link.

Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows
the BPF_TRACE_FSESSION behaviour but on the tracing_multi link.

Such program is called on entry and exit of the attached function
and allows to pass cookie value from entry to exit execution.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:01 -07:00
Jiri Olsa 46b42af27d bpf: Add support for tracing_multi link cookies
Add support to specify cookies for tracing_multi link.

Cookies are provided in array where each value is paired with provided
BTF ID value with the same array index.

Such cookie can be retrieved by bpf program with bpf_get_attach_cookie
helper call.

We need to sort cookies array together with ids array in check_dup_ids,
to keep the id->cookie relation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:01 -07:00
Jiri Olsa c1d32dea5d bpf: Add support for tracing multi link
Adding new link to allow to attach program to multiple function
BTF IDs. The link is represented by struct bpf_tracing_multi_link.

To configure the link, new fields are added to bpf_attr::link_create
to pass array of BTF IDs;

  struct {
    __aligned_u64 ids;
    __u32         cnt;
  } tracing_multi;

Each BTF ID represents function (BTF_KIND_FUNC) that the link will
attach bpf program to.

We use previously added bpf_trampoline_multi_attach/detach functions
to attach/detach the link.

The linkinfo/fdinfo callbacks will be implemented in following changes.

Note this is supported only for archs (x86_64) with ftrace direct and
have single ops support.

  CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS &&
  CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS

Note using sort_r (instead of plain sort) in check_dup_ids, because we
will use the swap callback in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:01 -07:00
Jiri Olsa 2cd298c106 ftrace: Add add_ftrace_hash_entry function
Renaming __add_hash_entry to add_ftrace_hash_entry and making it global,
it will be used in following changes outside ftrace.c object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:00 -07:00
Jiri Olsa af7c323650 ftrace: Add ftrace_hash_remove function
Adding ftrace_hash_remove function that removes all entries
from struct ftrace_hash object without freeing them.

It will be used in following changes where entries are allocated
as part of another structure and are free-ed separately.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:00 -07:00
Jiri Olsa e57f13eaab ftrace: Add ftrace_hash_count function
Adding external ftrace_hash_count function so we could get hash
count outside of ftrace object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-07 10:03:00 -07:00
Gabriele Monaco d9022172c1 rv: Use 0 to check preemption enabled in opid
Tracepoint handlers no longer run with preemption disabled by default
since a46023d561 ("tracing: Guard __DECLARE_TRACE() use of
__DO_TRACE_CALL() with SRCU-fast"), the opid monitor should now count 1
in the preemption count as preemption disabled.

Change the rule for preempt_off to preempt > 0.

Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-11-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-06-03 12:33:25 +02:00
Gabriele Monaco 700782ec8f rv: Add automatic cleanup handlers for per-task HA monitors
Hybrid automata monitors may start timers, depending on the model, these
may remain active on an exiting task and cause false positives or even
access freed memory.

Add an enable/disable hook in the HA code, currently only populated by
the per-task handler for registration and deregistration.
This hooks to the sched_process_exit event and ensures the timer is
stopped for every exiting task. The handler is enabled automatically but
may be disabled, for instance if the monitor uses the event for another
purpose (but should still manually ensure timers are stopped).

Fixes: f5587d1b6e ("rv: Add Hybrid Automata monitor type")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-06-03 12:33:24 +02:00
Gabriele Monaco 4793e8a6e2 rv: Fix __user specifier usage in extract_params()
The attributes variables extracted from syscalls in the helper are both
defined with the __user specifier although only the actual pointer to
user data should be marked.

Remove the __user specifier from attr.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604150820.Ny143u6X-lkp@intel.com
Fixes: b133207deb ("rv: Add nomiss deadline monitor")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-06-03 12:33:23 +02:00
Steven Rostedt 69efd863a7 tracing/eprobes: Allow use of BTF names to dereference pointers
Add syntax to the parsing of eprobes to be able to typecast a trace event
field that is a pointer to a structure.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference.

But for event probes that records a field that happens to be a pointer to
a structure, it cannot dereference these values with BTF naming, but
must use numerical offsets.

For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:

 (gdb) p &((struct sk_buff *)0)->dev
 $1 = (struct net_device **) 0x10
 (gdb) p &((struct net_device *)0)->name
 $2 = (char (*)[16]) 0x118

And then use the raw numbers to dereference:

  # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events

If BTF is in the kernel, then instead, the skbaddr can be typecast to
sk_buff and use the normal dereference logic.

  # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events
  # echo 1 > events/eprobes/xmit/enable
  # cat trace
[..]
    sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..]

Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
to know what they are for.

Link: https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-06-02 23:36:22 +09:00
Yash Suthar 585abc02be tracing: Replace BUG_ON with lockdep_assert_held in uprobe_buffer functions
Replace BUG_ON(!mutex_is_locked(&event_mutex)) with
lockdep_assert_held(&event_mutex) in uprobe_buffer_enable() and
uprobe_buffer_disable().

BUG_ON() will crash the kernel. mutex_is_locked() only checks
if any task holds lock,but not the caller task. lockdep_assert_held()
also check current task for lock and no crash on true condition.

Link: https://lore.kernel.org/all/20260521192846.8306-1-yashsuthar983@gmail.com/

Signed-off-by: Yash Suthar <yashsuthar983@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-06-01 23:35:21 +09:00
Rosen Penev cf24cbb4e5 tracing: Use flexible array for entry fetch code
Store probe entry fetch instructions in the probe_entry_arg
allocation instead of allocating a separate instruction array.

This keeps the entry fetch code tied to the entry argument lifetime while
leaving regular probe_arg instruction arrays separately allocated and
freed.

Assisted-by: Codex:GPT-5.5
Link: https://lore.kernel.org/all/20260520215817.16560-1-rosenp@gmail.com/

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-06-01 23:35:21 +09:00
Masami Hiramatsu (Google) 4304c81652 tracing/probes: Ensure the uprobe buffer size is bigger than event size
Add BUILD_BUG_ON() to ensure the uprobe per-CPU working buffer
size is bigger than the event size.

Link: https://lore.kernel.org/all/177849383209.8038.1902170479780501237.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-06-01 23:35:21 +09:00
Masami Hiramatsu (Google) 85e0f27dd1 tracing/probes: Point the error offset correctly for eprobe argument error
Fix to point the error offset correctly for eprobe argument error.
In the cleanup commit 1b8b0cd754 ("tracing/probes: Move event parameter
fetching code to common parser"), due to incorrect backward compatibility
aimed at conforming to the test specifications, the error location was set
to 0 when a non-existent formal parameter was specified for Eprobe.
However, this should be corrected in both the test and the implementation
to point correct error position.

Link: https://lore.kernel.org/all/177967567399.209006.1451571244515632097.stgit@devnote2/

Fixes: 1b8b0cd754 ("tracing/probes: Move event parameter fetching code to common parser")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-30 22:45:50 +09:00
Crystal Wood 9cb99c5986 tracing/osnoise: Array printk init and cleanup
None of the calls to trace_array_printk_buf() will do anything
if we don't initialize the buffer on instance creation (unless
some other tracer called it), so do that.

Add an osnoise_print() function to facilitate adding debug prints
(without tainting).

Use trace_array_printk() instead of trace_array_printk_buf(), as we're
only writing to the main buffer (of a non-main instance) anyway -- and
trace_array_printk_buf() skips the check to make sure we're not printing
to the global instance.

Link: https://patch.msgid.link/20260511223035.1475676-1-crwood@redhat.com
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-29 10:40:27 -04:00
Alexei Starovoitov eb19eead36 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5
Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-25 06:33:30 -07:00
Linus Torvalds 23884007af Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Avoid NULL return from hist_field_name()

   The function hist_field_name() is directly passed to a strcat() which
   does not handle "NULL" characters. Return a zero length string when
   size is greater than the limit.

   This is used only to output already created histograms and no field
   currently is greater than the limit. But it should still not return
   NULL.

 - Do not call map->ops->elt_free() on allocation failure

   When elt_alloc() fails, it should not call the map->ops->elt_free()
   function if it exists, as that function may not be able to handle the
   free on allocation failures. The ->elt_free() should only be called
   when elt_alloc() succeeds.

* tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Do not call map->ops->elt_free() if elt_alloc() fails
  tracing: Avoid NULL return from hist_field_name() on truncation
2026-05-22 06:09:58 -07:00
Masami Hiramatsu (Google) 8f0f5c4fb9 tracing: Do not call map->ops->elt_free() if elt_alloc() fails
In paths where tracing_map_elt_alloc() failed to allocate objects,
the map->ops->elt_alloc() call was never successful. In this case,
map->ops->elt_free() should not be called.

Link: https://sashiko.dev/#/patchset/20260520223101.34710-1-rosenp%40gmail.com

Cc: stable@vger.kernel.org
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rosen Penev <rosenp@gmail.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 2734b62952 ("tracing: Add per-element variable support to tracing_map")
Link: https://patch.msgid.link/177933895460.108746.5396070821443932634.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 11:29:03 -04:00
Thomas Weißschuh 057caace52 tracing: Create output file from cmd_check_undefined
As the output file is currently never created, the check will run every
time, even if the inputs have not changed.

Create an empty output file which allows make to skip the execution when
it is not necessary.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260520-tracing-ringbuffer-check-v1-1-d979cfab1338@weissschuh.net
Fixes: 1211907ac0 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Fixes: 58b4bd1839 ("tracing: Adjust cmd_check_undefined to show unexpected undefined symbols")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 08:31:55 -04:00
Vincent Donnefort a0a2f42a37 tracing: Fix unload_page for simple_ring_buffer init rollback
The unload_page callback expects the return value of load_page() as its
argument: ret = load_page(va); unload(ret). Fix the rollback code in
simple_ring_buffer_init_mm() where the descriptor's VA is used instead
of the loaded page address.

Link: https://patch.msgid.link/20260512141614.1759430-1-vdonnefort@google.com
Fixes: 635923081c ("tracing: load/unload page callbacks for simple_ring_buffer")
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 08:26:22 -04:00
David Carlier c2d2856cf6 tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm()
nr_subbufs in the ring buffer metadata is always initialized to zero
because it is assigned from cpu_buffer->nr_pages before the page
initialization loop has run. While nr_subbufs is not currently read
by the kernel, it should reflect the actual buffer geometry in the
meta page for correctness.

Move the assignment after the page loop so that cpu_buffer->nr_pages
holds the final count.

Link: https://patch.msgid.link/20260512135420.99194-1-devnexen@gmail.com
Fixes: 34e5b958bd ("tracing: Introduce simple_ring_buffer")
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 08:24:59 -04:00
Masami Hiramatsu (Google) a494d3c8d5 ring-buffer: Flush and stop persistent ring buffer on panic
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.

To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.

Fixes: e645535a95 ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 08:20:58 -04:00
Steven Rostedt a254b6d13b ring-buffer: Fix reporting of missed events in iterator
When tracing is active while reading the trace file, if the iterator
reading the buffer detects that the writer has passed the iterator head,
it will reset and set a "missed events" flag. This flag is passed to the
output processing to show the user that events were missed:

  CPU:4 [LOST EVENTS]

The problem is that the flag is reset after it is checked in
ring_buffer_iter_dropped(). But the "trace" file iterates over all the CPU
ring buffers and it will check if they are dropped when figuring out which
buffer to print next. This prematurely clears the missed_events flag if
the CPU buffer with the missed events is not the one that is printed next.

On the iteration where the CPU buffer with the missed events is printed,
the check if it had missed events would return false and the output does
not show that events were missed.

Do not reset the missed_events flag when checking if there were missed
events, but instead clear it when moving the iterator head to the next
event.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260520220801.4fd09d13@fedora
Fixes: c9b7a4a72f ("ring-buffer/tracing: Have iterator acknowledge dropped events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21 08:20:29 -04:00
Crystal Wood e11c9c8365 tracing/osnoise: Dump stack on timerlat uret threshold event
Dump the saved IRQ stack trace regardless of whether the event was
THREAD_CONTEXT or THREAD_URET.

In the uret case, the latency presumably had not yet crossed the
threshold at IRQ time (or else it would have dumped the stack at thread
wakeup time, unless we're racing with a change to the threshold), but it
may have at least contributed -- and this is possible with THREAD_CONTEXT
as well.

In any case, it helps with writing reliable rtla tests if we always get
a stack trace on a threshold event.

Cc: John Kacur <jkacur@redhat.com>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Wander Lairson Costa <wander@redhat.com>
Link: https://patch.msgid.link/20260511223143.1477332-1-crwood@redhat.com
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20 16:29:25 -04:00
David Carlier 576ec047d2 tracing: Avoid NULL return from hist_field_name() on truncation
hist_field_name() returns "" everywhere except the fully-qualified
VAR_REF/EXPR case, where snprintf() truncation returns NULL early
and bypasses the bottom NULL->"" guard. Callers don't expect NULL:
strcat(expr, hist_field_name(field, 0)) at trace_events_hist.c:1758
and the strcmp() in the sort-key match loop at :4804 both deref it.

system and event_name are bounded by MAX_EVENT_NAME_LEN, but the
field name on a VAR_REF is kstrdup'd from a histogram variable
name parsed out of the trigger string and has no length cap, so
a long enough var name in a fully qualified reference can reach
the truncation path.

Keep the length check but leave field_name as "" on overflow.

Link: https://patch.msgid.link/20260508195747.25492-1-devnexen@gmail.com
Fixes: 5ec1d1e97d ("tracing: Rebuild full_name on each hist_field_name() call")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20 16:10:56 -04:00
Yiyang Chen ea19506013 sched/clock: Provide !HAVE_UNSTABLE_SCHED_CLOCK stub for sched_clock_stable()
When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is disabled, sched_clock() is
already assumed to provide stable semantics, but the public header
doesn't provide a sched_clock_stable() stub for that case.

Add a header stub that always returns true and clean up the duplicate
local stub in ring_buffer.c, so callers can use sched_clock_stable()
unconditionally.

Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://patch.msgid.link/56e45338858946cd9581b75c8bd45dd37dba52c5.1778773587.git.cyyzero16@gmail.com
2026-05-19 12:17:35 +02:00
Linus Torvalds e5d505e366 Merge tag 'trace-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Add more functions to the remote allowed list

   randconfig found more functions that are allowed for the remote code
   for s390 and arm. Add them to the allowed list.

 - Fix remote_test error path

   If one of the simple ring buffers fails to load, the code is supposed
   to rollback its initialized buffers. Instead of rolling back the
   buffers for the failed load, it uses the global variable and rolls
   back all the successfully loaded buffers.

* tag 'trace-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix desc in error path for the trace remote test module
  ring-buffer remote: Avoid unexpected symbol warnings (arm, s390)
2026-05-17 12:02:31 -07:00
Vincent Donnefort 55a0005518 tracing: Fix desc in error path for the trace remote test module
During initialisation in remote_test_load(), if one of the
simple_ring_buffer fails to initialise, the error path attempts to
rollback initialised buffers. However, the rollback incorrectly uses the
global pointer to the trace descriptor, which is only set upon
successful load completion. Fix the error path by using the local
pointer to the descriptor.

Link: https://patch.msgid.link/20260515201616.337469-1-vdonnefort@google.com
Fixes: ea908a2b79 ("tracing: Add a trace remote module for testing")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

base-commit: 5d6919055d
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-16 16:11:04 -04:00
Arnd Bergmann 96350db80e ring-buffer remote: Avoid unexpected symbol warnings (arm, s390)
The now more verbose check found more architecture specific symbol
missing from the whitelist, during randconfig testing on s390
and 32-bit arm:

Unexpected symbols in kernel/trace/simple_ring_buffer.o:
         U __aeabi_unwind_cpp_pr1

Unexpected symbols in kernel/trace/simple_ring_buffer.o:
                 U __s390_indirect_jump_r1
                 U __s390_indirect_jump_r10
                 U __s390_indirect_jump_r14
                 U __s390_indirect_jump_r2
                 U __s390_indirect_jump_r5
                 U __s390_indirect_jump_r7
                 U __s390_indirect_jump_r8
                 U __s390_indirect_jump_r9
make[6]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:160: kernel/trace/simple_ring_buffer.o.checked] Error 1

Add these to the list and keep it roughly sorted into sanitizer
and architecture symbols.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://patch.msgid.link/20260515105717.1023007-1-arnd@kernel.org
Fixes: 1211907ac0 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-15 14:59:30 -04:00
Masami Hiramatsu (Google) 657b594b20 fprobe: Fix unregister_fprobe() to wait for RCU grace period
Commit 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
changed fprobe to register struct fprobe to an rcu-hlist, but it forgot
to wait for RCU GP. Thus there can be use-after-free if the fprobe is
released right after unregistering. This can be happened on fprobe
event and sample module code.

To fix this issue, add synchronize_rcu() in unregister_fprobe().

Note that BPF is OK because fprobe is used as a part of
bpf_kprobe_multi_link. This unregisters its fprobe in
bpf_kprobe_multi_link_release() and it is deallocated via
bpf_kprobe_multi_link_dealloc(), which is invoked from
bpf_link_defer_dealloc_rcu_gp() RCU callback.

For BPF, this also introduced unregister_fprobe_async() which does
NOT wait for RCU grace priod.

Link: https://lore.kernel.org/all/177813998919.256460.2809243930741138224.stgit@mhiramat.tok.corp.google.com/

Fixes: 4346ba1604 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-05-11 19:04:46 +09:00
Alexei Starovoitov 7e033543a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc3
Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-10 13:24:49 -07:00
Steven Rostedt b2aa3b4d64 tracing/probes: Limit size of event probe to 3K
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.

A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 93ccae7a22 ("tracing/kprobes: Support basic types on dynamic events")
Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-29 16:07:38 -04:00
Breno Leitao 3b75dd76e6 tracing: branch: Fix inverted check on stat tracer registration
init_annotated_branch_stats() and all_annotated_branch_stats() check the
return value of register_stat_tracer() with "if (!ret)", but
register_stat_tracer() returns 0 on success and a negative errno on
failure. The inverted check causes the warning to be printed on every
successful registration, e.g.:

  Warning: could not register annotated branches stats

while leaving real failures silent. The initcall also returned a
hard-coded 1 instead of the actual error.

Invert the check and propagate ret so that the warning fires on real
errors and the initcall reports the correct status.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://patch.msgid.link/20260420-tracing-v1-1-d8f4cd0d6af1@debian.org
Fixes: 002bb86d8d ("tracing/ftrace: separate events tracing and stats tracing engine")
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-28 14:28:29 -04:00
Linus Torvalds 27d128c1cf Merge tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer fix from Steven Rostedt:

 - Fix accounting of persistent ring buffer rewind

   On boot up, the head page is moved back to the earliest point of the
   saved ring buffer. This is because the ring buffer being read by user
   space on a crash may not save the part it read. Rewinding the head
   page back to the earliest saved position helps keep those events from
   being lost.

   The number of events is also read during boot up and displayed in the
   stats file in the tracefs directory. It's also used for other
   accounting as well. On boot up, the "reader page" is accounted for
   but a rewind may put it back into the buffer and then the reader page
   may be accounted for again.

   Save off the original reader page and skip accounting it when
   scanning the pages in the ring buffer.

* tag 'trace-ring-buffer-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not double count the reader_page
2026-04-24 15:17:23 -07:00
Masami Hiramatsu (Google) 92d5a60672 ring-buffer: Do not double count the reader_page
Since the cpu_buffer->reader_page is updated if there are unwound
pages. After that update, we should skip the page if it is the
original reader_page, because the original reader_page is already
checked.

Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/177701353063.2223789.1471163147644103306.stgit@mhiramat.tok.corp.google.com
Fixes: ca296d32ec ("tracing: ring_buffer: Rewind persistent ring buffer on reboot")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-24 15:34:39 -04:00
Linus Torvalds 1e18ed5727 Merge tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer fix from Steven Rostedt:

 - Make undefsyms_base.c into a real file

   The file undefsyms_base.c is used to catch any symbols used by a
   remote ring buffer that is made for use of a pKVM hypervisor. As it
   doesn't share the same text as the rest of the kernel, referencing
   any symbols within the kernel will make it fail to be built for the
   standalone hypervisor.

   A file was created by the Makefile that checked for any symbols that
   could cause issues. There's no reason to have this file created by
   the Makefile, just create it as a normal file instead.

* tag 'trace-ring-buffer-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Make undefsyms_base.c a first-class citizen
2026-04-22 14:47:52 -07:00
Mykyta Yatsenko 57918341dd bpf: Add sleepable support for classic tracepoint programs
Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
faultable tracepoints that supports sleepable BPF programs. It uses
rcu_tasks_trace for lifetime protection and
bpf_prog_run_array_sleepable() for per-program RCU flavor selection,
following the uprobe_prog_run() pattern.

Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
programs before perf event processing. Previously, BPF ran after the
per-cpu perf trace buffer was allocated under preempt_disable,
requiring cleanup via perf_swevent_put_recursion_context() on filter.
Now BPF runs in faultable context before preempt_disable, reading
syscall arguments from local variables instead of the per-cpu trace
record, removing the dependency on buffer allocation. This allows
sleepable BPF programs to execute and avoids unnecessary buffer
allocation when BPF filters the event. The perf event submission
path (buffer allocation, fill, submit) remains under preempt_disable
as before. Since BPF no longer runs within the buffer allocation
context, the fake_regs output parameter to perf_trace_buf_alloc()
is no longer needed and is replaced with NULL.

Add an attach-time check in __perf_event_set_bpf_prog() to reject
sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
tracepoints, since only syscall tracepoints run in faultable context.

This prepares the classic tracepoint runtime and attach paths for
sleepable programs. The verifier changes to allow loading sleepable
BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.

To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-3-99005dff21ef@meta.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-04-22 22:44:29 +02:00
Mykyta Yatsenko 439ebd5b57 bpf: Add sleepable support for raw tracepoint programs
Rework __bpf_trace_run() to support sleepable BPF programs by using
explicit RCU flavor selection, following the uprobe_prog_run() pattern.

For sleepable programs, use rcu_read_lock_tasks_trace() for lifetime
protection with migrate_disable(). For non-sleepable programs, use the
regular rcu_read_lock_dont_migrate().

Remove the preempt_disable_notrace/preempt_enable_notrace pair from
the faultable tracepoint BPF probe wrapper in bpf_probe.h, since
migration protection and RCU locking are now handled per-program
inside __bpf_trace_run().

Adapt bpf_prog_test_run_raw_tp() for sleepable programs: reject
BPF_F_TEST_RUN_ON_CPU since sleepable programs cannot run in hardirq
or preempt-disabled context, and call __bpf_prog_test_run_raw_tp()
directly instead of via smp_call_function_single(). Rework
__bpf_prog_test_run_raw_tp() to select RCU flavor per-program and
add per-program recursion context guard for private stack safety.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-1-99005dff21ef@meta.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-04-22 22:44:24 +02:00
Paolo Bonzini 5335e318ad tracing: Make undefsyms_base.c a first-class citizen
Linus points out that dumping undefsyms_base.c form the Makefile
is rather ugly, and that a much better course of action would be
to have this file as a first-class citizen in the git tree.

This allows some extra cleanup in the Makefile, and the removal of
the .gitignore file in kernel/trace.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/CAHk-=wieqGd_XKpu8UxDoyADZx8TDe8CF3RmkUXt5N_9t5Pf_w@mail.gmail.com
Link: https://lore.kernel.org/all/20260421095446.2951646-1-maz@kernel.org/
Link: https://patch.msgid.link/20260421100455.324333-1-pbonzini@redhat.com
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-22 11:24:41 -04:00
Masami Hiramatsu (Google) 476c5bbae6 tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading
Fix fprobe to unregister ftrace_ops if corresponding type of fprobe
does not exist on the fprobe_ip_table and it is expected to be empty
when unloading modules.

Since ftrace thinks that the empty hash means everything to be traced,
if we set fprobes only on the unloaded module, all functions are traced
unexpectedly after unloading module.
e.g.

 # modprobe xt_LOG.ko
 # echo 'f:test log_tg*' > dynamic_events
 # echo 1 > events/fprobes/test/enable
 # cat enabled_functions
log_tg [xt_LOG] (1)             tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_check [xt_LOG] (1)               tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_destroy [xt_LOG] (1)             tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
 # rmmod xt_LOG
 # wc -l enabled_functions
34085 enabled_functions

Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-04-22 09:24:13 +09:00
Masami Hiramatsu (Google) 0ac0058a74 tracing/fprobe: Check the same type fprobe on table as the unregistered one
Commit 2c67dc457b ("tracing: fprobe: optimization for entry only case")
introduced a different ftrace_ops for entry-only fprobes.

However, when unregistering an fprobe, the kernel only checks if another
fprobe exists at the same address, without checking which type of fprobe
it is.
If different fprobes are registered at the same address, the same address
will be registered in both fgraph_ops and ftrace_ops, but only one of
them will be deleted when unregistering. (the one removed first will not
be deleted from the ops).

This results in junk entries remaining in either fgraph_ops or ftrace_ops.
For example:
 =======
 cd /sys/kernel/tracing

 # 'Add entry and exit events on the same place'
 echo 'f:event1 vfs_read' >> dynamic_events
 echo 'f:event2 vfs_read%return' >> dynamic_events

 # 'Enable both of them'
 echo 1 > events/fprobes/enable
 cat enabled_functions
vfs_read (2)            ->arch_ftrace_ops_list_func+0x0/0x210

 # 'Disable and remove exit event'
 echo 0 > events/fprobes/event2/enable
 echo -:event2 >> dynamic_events

 # 'Disable and remove all events'
 echo 0 > events/fprobes/enable
 echo > dynamic_events

 # 'Add another event'
 echo 'f:event3 vfs_open%return' > dynamic_events
 cat dynamic_events
f:fprobes/event3 vfs_open%return

 echo 1 > events/fprobes/enable
 cat enabled_functions
vfs_open (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
vfs_read (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
 =======

As you can see, an entry for the vfs_read remains.

To fix this issue, when unregistering, the kernel should also check if
there is the same type of fprobes still exist at the same address, and
if not, delete its entry from either fgraph_ops or ftrace_ops.

Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/

Fixes: 2c67dc457b ("tracing: fprobe: optimization for entry only case")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-04-22 00:03:10 +09:00