The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. The kernel
only dumps the number of occurrences into a record. The perf tool has
to map the number to the corresponding event.
Add evlist__update_br_cntr() to go through the evlist to pick the
events that are configured to be logged. Assign a logical idx to track
them, and add the total number of the events in the leader event.
The total number will be used to allocate the space to save the branch
counters for a block. The logical idx will be used to locate the
corresponding event quickly in the following patches.
It only needs to iterate the evlist once. The
evsel__has_branch_counters() is also optimized.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When retire_latency value is used in a metric formula, evsel would fork
a 'perf record' process with "-e" and "-W" options. 'perf record' will
collect required retire_latency values in parallel while 'perf stat' is
collecting counting values.
At the point of time that 'perf stat' stops counting, evsel would stop
'perf record' by sending sigterm signal to 'perf record' process.
Sampled data will be processed to get retire latency value. Another
thread is required to synchronize between 'perf stat' and 'perf record'
when we pass data through pipe.
Retire_latency evsel is not opened for 'perf stat' so that there is no
counter wasted on it. This commit includes code suggested by Namhyung to
adjust reading size for groups that include retire_latency evsels.
In current :R parsing implementation, the parser would recognize events
with retire_latency modifier and insert them into the evlist like a
normal event. Ideally, we need to avoid counting these events.
In this commit, at the time when a retire_latency evsel is read, set the
retire latency value processed from the sampled data to count value.
This sampled retire latency value will be used for metric calculation
and final event count print out. No special metric calculation and event
print out code required for retire_latency events.
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-4-weilin.wang@intel.com
[ Squashed the 3rd and 4th commit in the series to keep it building patch by patch ]
[ Constified the 'struct perf_tool' pointer in process_sample_event() ]
[ Use perf_tool__init(&tool, false) to address a segfault I reported and Ian/Weilin diagnosed ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so:
- duration_time - call `rdclock` on open and then report the count as
a delta since the start in evsel__read_counter. This moves code out
of builtin-stat making it more general purpose.
- user_time/system_time - open the fd as either `/proc/pid/stat` or
`/proc/stat` for cases like system wide. evsel__read_counter will
read the appropriate field out of the procfs file. These values
were previously supplied by wait4, if the procfs read fails then
the wait4 values are used, assuming the process/thread terminated.
By reading user_time and system_time this way, interval mode, per
PID and per CPU can be supported although there are restrictions
given what the files provide (e.g. per PID can't be combined with
per CPU).
Opening any of the tool events for `perf record` is changed to return
invalid.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240503232849.17752-1-irogers@google.com
Since get_states() assumes the existence of libtraceevent, so move
to where it should belong, i.e, util/trace-event-parse.c, and also
rename it to parse_task_states().
Leave evsel_getstate() untouched as it fits well in the evsel
category.
Also make some necessary tweaks for python support, and get it
verified with: perf test python.
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240123070210.1669843-2-zegao@tencent.com
Perf uses a hard coded string "RSDTtXZPI" to index the sched_switch
prev_state field raw bitmask value. This works well except for when
the kernel changes this string, in which case this will break again.
Instead we add a new way to parse task state string from tracepoint
print format already recorded by perf, which eliminates the further
dependencies with this hardcode and unmaintainable macro, and this
is exactly what libtraceevent[1] does for now.
So we borrow the print flags parsing logic from libtraceevent[1].
And in get_states(), we walk the print arguments until the
__print_flags() for the target state field is found, and use that to
build the states string for future parsing.
[1]: https://lore.kernel.org/linux-trace-devel/20231224140732.7d41698d@rorschach.local.home/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Ze Gao <zegao@tencent.com>
Link: https://lore.kernel.org/r/20240122070859.1394479-4-zegao@tencent.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add a new branch filter, "counter", for the branch counter option. It is
used to mark the events which should be logged in the branch. If it is
applied with the -j option, the counters of all the events should be
logged in the branch. If the legacy kernel doesn't support the new
branch sample type, switching off the branch counter filter.
The stored counter values in each branch are displayed right after the
regular branch stack information via perf report -D.
Usage examples:
# perf record -e "{branch-instructions,branch-misses}:S" -j any,counter
Only the first event, branch-instructions, collect the LBR. Both
branch-instructions and branch-misses are marked as logged events. The
occurrences information of them can be found in the branch stack
extension space of each branch.
# perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}"
Only the first event, branch-instructions, collect the LBR. Only the
branch-misses event is marked as a logged event.
Committer notes:
I noticed 'perf test "Sample parsing"' failing, reported to the list and
Kan provided a patch that checks if the evsel has a leader and that
evsel->evlist is set, the comment in the source code further explains
it.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tinghao Zhang <tinghao.zhang@intel.com>
Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
evsel__increase_rlimit() helper does nothing with evsel, and description
of the functionality is inaccurate, rename it and move to util/rlimit.c.
By the way, fix a checkppatch warning about misplaced license tag:
WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
#160: FILE: tools/perf/util/rlimit.h:3:
/* SPDX-License-Identifier: LGPL-2.1 */
No functional change.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Noticed with:
make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin
Direct leak of 45 byte(s) in 1 object(s) allocated from:
#0 0x7f213f87243b in strdup (/lib64/libasan.so.8+0x7243b)
#1 0x63d15f in evsel__set_filter util/evsel.c:1371
#2 0x63d15f in evsel__append_filter util/evsel.c:1387
#3 0x63d15f in evsel__append_tp_filter util/evsel.c:1400
#4 0x62cd52 in evlist__append_tp_filter util/evlist.c:1145
#5 0x62cd52 in evlist__append_tp_filter_pids util/evlist.c:1196
#6 0x541e49 in trace__set_filter_loop_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3646
#7 0x541e49 in trace__set_filter_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3670
#8 0x541e49 in trace__run /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3970
#9 0x541e49 in cmd_trace /home/acme/git/perf-tools/tools/perf/builtin-trace.c:5141
#10 0x5ef1a2 in run_builtin /home/acme/git/perf-tools/tools/perf/perf.c:323
#11 0x4196da in handle_internal_command /home/acme/git/perf-tools/tools/perf/perf.c:377
#12 0x4196da in run_argv /home/acme/git/perf-tools/tools/perf/perf.c:421
#13 0x4196da in main /home/acme/git/perf-tools/tools/perf/perf.c:537
#14 0x7f213e84a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)
Free it on evsel__exit().
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20230719202951.534582-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools fixes for v6.4: 2nd batch
- Fix BPF CO-RE naming convention for checking the availability of fields on
'union perf_mem_data_src' on the running kernel.
- Remove the use of llvm-strip on BPF skel object files, not needed, fixes a
build breakage when the llvm package, that contains it in most distros, isn't
installed.
- Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a
union.
- Remove extra "--" from the 'perf ftrace latency' --use-nsec option,
previously it was working only when using the '-n' alternative.
- Don't stop building when both binutils-devel and a C++ compiler isn't
available to compile the alternative C++ demangle support code, disable that
feature instead.
- Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources.
- Fix relative include path to cs-etm.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'struct evsel' uses a union for the two lists. This turned out to be
error prone.
For example:
[root@quaco ~]# perf stat --bpf-prog 5246
Error: cpu-clock event does not have sample flags 66c660
failed to set filter "(null)" on event cpu-clock with 2 (No such file or directory)
[root@quaco ~]# perf stat --bpf-prog 5246
Fix this issue by separating the two lists.
Fixes: 56ec9457a4 ("perf bpf filter: Implement event sample filtering")
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kernel-team@meta.com
Link: https://lore.kernel.org/r/20230519235757.3613719-1-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf stat' with no arguments will use default events and metrics. These
events may fail to open even with kernel and hypervisor disabled. When
these fail then the permissions error appears even though they were
implicitly selected. This is particularly a problem with the automatic
selection of the TopdownL1 metric group on certain architectures like
Skylake:
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
$
This patch adds skippable evsels that when they fail to open won't cause
termination and will appear as "<not supported>" in output. The
TopdownL1 events, from the metric group, are marked as skippable. This
turns the failure above to:
$ perf stat perf bench internals synthesize
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 49.287 usec (+- 0.083 usec)
Average num. events: 3.000 (+- 0.000)
Average time per event 16.429 usec
Average data synthesis took: 49.641 usec (+- 0.085 usec)
Average num. events: 11.000 (+- 0.000)
Average time per event 4.513 usec
Performance counter stats for 'perf bench internals synthesize':
1,222.38 msec task-clock:u # 0.993 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
162 page-faults:u # 132.529 /sec
774,445,184 cycles:u # 0.634 GHz (49.61%)
1,640,969,811 instructions:u # 2.12 insn per cycle (59.67%)
302,052,148 branches:u # 247.102 M/sec (59.69%)
1,807,718 branch-misses:u # 0.60% of all branches (59.68%)
5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/sec
# 17.3 % tma_frontend_bound
# 56.4 % tma_retiring
# nan % tma_backend_bound
# nan % tma_bad_speculation (60.01%)
536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/sec (60.33%)
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274 M/sec (40.31%)
774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/sec (50.34%)
1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/sec (50.12%)
1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/sec (49.70%)
1.231055525 seconds time elapsed
0.258327000 seconds user
0.965749000 seconds sys
$
The event INT_MISC.RECOVERY_CYCLES_ANY:u is skipped as it can't be
opened with paranoia 2 on Skylake. With a lower paranoia, or as root,
all events/metrics are computed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Also remove one NULL test before free(), as it accepts a NULL arg and we
get one line shaved not doing it explicitely.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Seen in "perf stat --bpf-counters --for-each-cgroup test" running in a
container:
libbpf: Failed to bump RLIMIT_MEMLOCK (err = -1), you might need to do it explicitly!
libbpf: Error in bpf_object__probe_loading():Operation not permitted(1). Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough value.
libbpf: failed to load object 'bperf_cgroup_bpf'
libbpf: failed to load BPF skeleton 'bperf_cgroup_bpf': -1
Failed to load cgroup skeleton
#0 0x55f28a650981 in list_empty tools/include/linux/list.h:189
#1 0x55f28a6593b4 in evsel__exit util/evsel.c:1518
#2 0x55f28a6596af in evsel__delete util/evsel.c:1544
#3 0x55f28a89d166 in bperf_cgrp__destroy util/bpf_counter_cgroup.c:283
#4 0x55f28a899e9a in bpf_counter__destroy util/bpf_counter.c:816
#5 0x55f28a659455 in evsel__exit util/evsel.c:1520
#6 0x55f28a6596af in evsel__delete util/evsel.c:1544
#7 0x55f28a640d4d in evlist__purge util/evlist.c:148
#8 0x55f28a640ea6 in evlist__delete util/evlist.c:169
#9 0x55f28a4efbf2 in cmd_stat tools/perf/builtin-stat.c:2598
#10 0x55f28a6050c2 in run_builtin tools/perf/perf.c:330
#11 0x55f28a605633 in handle_internal_command tools/perf/perf.c:384
#12 0x55f28a6059fb in run_argv tools/perf/perf.c:428
#13 0x55f28a6061d3 in main tools/perf/perf.c:562
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230410205659.3131608-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show the branch speculation info if provided by the branch recording
hardware feature. This can be useful for optimizing code further.
The speculation info is appended to the end of the list of fields so any
existing tools that use "/" as a delimiter for access fields via an index
remain unaffected. Also show "-" instead of "N/A" when speculation info
is unavailable because "/" is used as the field separator.
E.g.
$ perf record -j any,u,save_type ./test_branch
$ perf script --fields brstacksym
Before:
[...]
check_match+0x60/strcmp+0x0/P/-/-/0/CALL
do_lookup_x+0x3c5/check_match+0x0/P/-/-/0/CALL
[...]
After:
[...]
check_match+0x60/strcmp+0x0/P/-/-/0/CALL/NON_SPEC_CORRECT_PATH
do_lookup_x+0x3c5/check_match+0x0/P/-/-/0/CALL/NON_SPEC_CORRECT_PATH
[...]
The bitfield swapping scheme used duing sample parsing has changed
because of the addition of new branch flags, namely "spec", "new_type"
and "priv". Earlier, these were all part of the "reserved" field but
now, each of these fields get swapped separately. Change the expected
flag values accordingly for the test to pass.
E.g.
$ perf test -v 27
Before:
27: Sample parsing :
--- start ---
test child forked, pid 61979
parsing failed for sample_type 0x800
test child finished with -1
---- end ----
Sample parsing: FAILED!
After:
27: Sample parsing :
--- start ---
test child forked, pid 63293
test child finished with 0
---- end ----
Sample parsing: Ok
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/56e272583552526e999ba0b536ac009ae3613966.1675333809.git.sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>