VMS.VMSDevelopPerfEventUsage History

Show minor edits - Show changes to output - Cancel

November 08, 2012, at 03:35 AM by -
Changed lines 26-30 from:
* machine-independent access to a set of predefined performance counters that, depending
on the hardware’s capabilities, are either implemented by hardware performance counters
or estimated by sampling,
* machine-specific access through “raw” mode, where the hardware counter configuration
bits are given directly.
* machine-independent access to a set of predefined performance counters that, depending on the hardware’s capabilities, are either implemented by hardware performance counters or estimated by sampling,
* machine-specific access through “raw” mode, where the hardware counter configuration bits are given directly.
Changed lines 79-80 from:
* Note, when pinning threads, try making the system call to create the counter file handles
before pinning the threads to the cores.
* Note, when pinning threads, try making the system call to create the counter file handles before pinning the threads to the cores.
Changed lines 81-88 from:
* The explanations in design.txt are a bit weird – found one combination that works and just
use that.
* Supposed to be able to set up a group of counters that can be all returned together, as a
group. Supposeldy, when create the file handle attached to a counter, can give it the handle
of a different counter to attach to. In this case, first create a group leader, with file handle to
attach to of ”-1”. Then, for the other counters, use the file descriptor returned by the leader.
But this didn’t work – gave an error message ”fail”.
* The explanations in design.txt are a bit weird – found one combination that works and just use that.
* Supposed to be able to set up a group of counters that can be all returned together, as a group. Supposeldy, when create the file handle attached to a counter, can give it the handle of a different counter to attach to. In this case, first create a group leader, with file handle to attach to of ”-1”. Then, for the other counters, use the file descriptor returned by the leader. But this didn’t work – gave an error message ”fail”.
Added line 86:
November 08, 2012, at 03:33 AM by -
Changed lines 80-82 from:
* Some combinations of parameters to the system call don’t work. Even though the explanation
in design.txt seems mostly accurate. We happened to find one combination that works, as shown
in the code below, and now don’t touch it anymore.
* Some combinations of parameters to the system call don’t work. Even though the explanation in design.txt seems mostly accurate. We happened to find one combination that works, as shown in the code below, and now don’t touch it anymore.
November 08, 2012, at 03:28 AM by -
Changed lines 17-18 from:
* reading the counters at specific locations in the VMS code
Changed line 20 from:
processors, the choice was made to use the Perf package, which is built-in to the Linux Kernel.
processors, the choice was made to use the perf_event system, which is built-in to the Linux Kernel.
November 08, 2012, at 03:25 AM by -
Changed line 39 from:
!!!12.2 Usage
!!!2.2 Usage
Changed lines 96-103 from:
//setup performance counters
struct perf_event_attr hw_event;
hw_event.type = PERF_TYPE_HARDWARE;
hw_event.size = sizeof(hw_event);
hw_event.disabled = 1;
hw_event.freq = 0;
hw_event.inherit = 1; /* children inherit it */
//setup performance counters
struct perf_event_attr hw_event;
hw_event.type = PERF_TYPE_HARDWARE;
hw_event.size = sizeof(hw_event);
hw_event.disabled = 1;
hw_event.freq = 0;
hw_event.inherit = 1; /* children inherit it */
Changed lines 113-136 from:
hw_event.config = 0x0000000000000000; //cycles
cycles_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (cycles_counter_fd[i]<
fprintf(stderr,"On core %d: ",i);
perror("Failed to open cycles counter");
hw_event.config = 0x0000000000000001; //instrs
_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (instrs_counter_fd[
fprintf(stderr,"On core %d:
to open instrs counter");
hw_event.config = 0x0000000000000000; //cycles
cycles_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (cycles_counter_fd[i]<0)
fprintf(stderr,"On core %d: ",i);
perror("Failed to open cycles counter");
hw_event.config = 0x0000000000000001; //instrs
instrs_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (instrs_counter_fd[i]<0)
fprintf(stderr,"On core %d: ",i);
perror("Failed to open instrs counter");
Added line 142:
November 08, 2012, at 03:21 AM by -
Changed lines 99-137 from:
hw_event.type = PERF_TYPE_HARDWARE;
hw_event.size = sizeof(hw_event);
hw_event.disabled = 1;
hw_event.freq = 0;
hw_event.inherit = 1; /* children inherit it */
hw_event.pinned = 1; /* must always be on PMU */
hw_event.exclusive = 0; /* only group on PMU */
hw_event.exclude_user = 0; /* don’t count user */
hw_event.exclude_kernel = 0; /* ditto kernel */
hw_event.exclude_hv = 1; /* ditto hypervisor */
hw_event.exclude_idle = 0; /* don’t count when idle */
hw_event.mmap = 0; /* include mmap data */
hw_event.comm = 0; /* include comm data */
for( i = 0; i < NUM_CORES; i++ )
hw_event.config = 0x0000000000000000; //cycles
cycles_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (cycles_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open cycles counter");
hw_event.config = 0x0000000000000001; //instrs
instrs_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
3if (instrs_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open instrs counter");
hw_event.type = PERF_TYPE_HARDWARE;
hw_event.size = sizeof(hw_event);
hw_event.disabled = 1;
hw_event.freq = 0;
hw_event.inherit = 1; /* children inherit it */
hw_event.pinned = 1; /* must always be on PMU */
hw_event.exclusive = 0; /* only group on PMU */
hw_event.exclude_user = 0; /* don’t count user */
hw_event.exclude_kernel = 0; /* ditto kernel */
hw_event.exclude_hv = 1; /* ditto hypervisor */
hw_event.exclude_idle = 0; /* don’t count when idle */
hw_event.mmap = 0; /* include mmap data */
hw_event.comm = 0; /* include comm data */
for( i = 0; i < NUM_CORES; i++ )
hw_event.config = 0x0000000000000000; //cycles
cycles_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (cycles_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open cycles counter");
hw_event.config = 0x0000000000000001; //instrs
instrs_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (instrs_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open instrs counter");
Changed line 140 from:
2.6 Integration in VMS
!!2.6 Integration in VMS
Deleted line 143:
November 08, 2012, at 03:19 AM by -
Changed lines 79-80 from:
Some combinations of parameters to the system call don’t work. Even though the explanation
in design.txt seems mostly accurate. Happened to find one combination that works, as shown
* Some combinations of parameters to the system call don’t work. Even though the explanation
in design.txt seems mostly accurate. We happened to find one combination that works, as shown
Changed line 88 from:
2Supposed to be able to set up a group of counters that can be all returned together, as a
* Supposed to be able to set up a group of counters that can be all returned together, as a
Changed lines 93-94 from:
2.5 Sample code

2.5 Sample code
Changed lines 96-98 from:
//setup performance counters
struct perf_event_attr hw
//setup performance counters
struct perf
_event_attr hw_event;
November 08, 2012, at 03:15 AM by -
Changed line 25 from:
machine-independent access to a set of predefined performance counters that, depending
* machine-independent access to a set of predefined performance counters that, depending
Changed line 28 from:
machine-specific access through “raw” mode, where the hardware counter configuration
* machine-specific access through “raw” mode, where the hardware counter configuration
Changed lines 34-37 from:
2.1 Prerequisites
Kernel compiled with CONFIG PERF COUNTERS=y
/proc/sys/kernel/perf event paranoid contains 1 (for access by non-privileged users)
12.2 Usage

2.1 Prerequisites
* Kernel compiled with CONFIG PERF COUNTERS=y
* /proc/sys/kernel/perf event paranoid contains -1 (for access by non-privileged users)

12.2 Usage
Added line 45:
Added line 53:
Changed lines 57-58 from:
2.3 details of format

2.3 details of format
Changed lines 66-67 from:
design.txt can be found at:
design.txt can be found at: [[https://github.com/torvalds/linux/blob/master/tools/perf/design.txt]]
Added line 72:
Changed lines 77-78 from:
2.4 Things that don’t work

2.4 Things that don’t work
Changed lines 82-83 from:
The group counter functionality hasn’t been gotten to work yet.
Note, when pinning threads, try making the system call to create the counter file handles
* The group counter functionality hasn’t been gotten to work yet.
* Note, when pinning threads, try making the system call to create the counter file handles
Changed lines 85-86 from:
Can’t count all tasks on all CPUs.
The explanations in design.txt are a bit weird – found one combination that works and just
* Can’t count all tasks on all CPUs.
* The explanations in design.txt are a bit weird – found one combination that works and just
November 08, 2012, at 03:11 AM by -
Added lines 1-135:
!Using HW Performance Counters via perf_event Linux kernel calls

Nina Engelhardt Merten Sach Sean Halle

!!1 Motivation and Purpose

Several projects in the AES group require measurements of various things happening inside the
CPU cores and inter-core communication network. One approach is to use the Time Stamp
Counter (TSC) to measure intervals. However, in the course of doing so, apparent inconsistencies have arisen. Also, the TSC is limited, as it only reports clock cycles, and reports all clock
cycles, including time spent in the kernel and time when the thread recording is swapped out.

The desired functionality includes measuring:
* time (exclude swap-out periods and system calls, and adjust for frequency changes)
* Instructions executed (exclude kernel instructions)
* Cache events

After a survey of approaches to using the hardware performance counters built in to x86
processors, the choice was made to use the Perf package, which is built-in to the Linux Kernel.

!!2 Performance Counters for Linux

Since Linux Kernel ver. 2.6.31, the “performance counters” subsystem has been included into
the mainline kernel. This is a unified interface allowing
• machine-independent access to a set of predefined performance counters that, depending
on the hardware’s capabilities, are either implemented by hardware performance counters
or estimated by sampling,
• machine-specific access through “raw” mode, where the hardware counter configuration
bits are given directly.
Usually this system is used via the “perf” tool, which works similarly to other profiling tools,
by launching a given command as a child process and measuring its progress. However, if more
specific measurements are needed, the performance counters can be set up and measurements
made at specific points in the application code.
2.1 Prerequisites
• Kernel compiled with CONFIG PERF COUNTERS=y
• /proc/sys/kernel/perf event paranoid contains 1 (for access by non-privileged users)
12.2 Usage
A counter is set up with a number of properties, which are set via the contents of struct
perf event attr as defined in #include <linux/perf_event.h>. Of particular interest is the
field config, which contains the bit-pattern indicating the kind of counter to set up. This can
be either one of the predefined counter types, such as PERF COUNT HW INSTRUCTIONS, or a raw
value [?].
The syscall itself also allows choosing process vs core behavior. The counter can be attached
to a specific process, or it can be attached to a core. It can also be attached to both. When
attached to a process, but not a core, the kernel will stop the counter when the process loses
its current core and start it again the next time it gets scheduled, on whichever core it gets
scheduled to. When attached to a process and a core, the counter only counts when the process
is scheduled on the target core. And when attached just to a core, the counter logs events
regardless of which process is running.
Thanks to these features, it is possible to track all events from a process (and only that
process), no matter how it happens to be scheduled; or it is possible to separately track the
usage of each core by a process.
2.3 details of format
There are two different files of interest: linux/perf_event.h defines struct perf event attr.
It is the actual code compiled into the kernel, and is the newest and most up to date. Then
design.txt gives a full explanation of how the counter system works, including details of bit
formats in particular fields. Although, design.txt is precise for an older version that has fewer
features, it is still accurate. There are new fields in perf event.h that are not explained. These
fields must be set to 0, to indicate they’re not being used. In fact, everything that is not set to
a specific value must be set to 0, even fields that the documentation claims are not used.
design.txt can be found at:
The config field configures the events recorded by the hardware counter. It has pre-defined
constants that can be used, or raw mode, in which case user-code specifies a bit pattern that
is directly written into the hardware counter configuration register. When not raw mode, then
design.txt gives the meaning of the bit fields.
struct perf event attr has several fields not explained in the file. It is important to zero
out these, even fields specified as ”reserve” must contain zero otherwise the counter fails.
The field ”type” of perf event attr is unknown, but it’s a good guess that it’s the same as the
”perf event types” in design.txt. When it is set to zero, the counters seem to work.
2.4 Things that don’t work
Some combinations of parameters to the system call don’t work. Even though the explanation
in design.txt seems mostly accurate. Happened to find one combination that works, as shown
in the code below, and now don’t touch it anymore.
The group counter functionality hasn’t been gotten to work yet.
Note, when pinning threads, try making the system call to create the counter file handles
before pinning the threads to the cores.
Can’t count all tasks on all CPUs.
The explanations in design.txt are a bit weird – found one combination that works and just
use that.
2Supposed to be able to set up a group of counters that can be all returned together, as a
group. Supposeldy, when create the file handle attached to a counter, can give it the handle
of a different counter to attach to. In this case, first create a group leader, with file handle to
attach to of ”-1”. Then, for the other counters, use the file descriptor returned by the leader.
But this didn’t work – gave an error message ”fail”.
2.5 Sample code
Here is sample code for using the perf counters:
//setup performance counters
struct perf_event_attr hw_event;
hw_event.type = PERF_TYPE_HARDWARE;
hw_event.size = sizeof(hw_event);
hw_event.disabled = 1;
hw_event.freq = 0;
hw_event.inherit = 1; /* children inherit it */
hw_event.pinned = 1; /* must always be on PMU */
hw_event.exclusive = 0; /* only group on PMU */
hw_event.exclude_user = 0; /* don’t count user */
hw_event.exclude_kernel = 0; /* ditto kernel */
hw_event.exclude_hv = 1; /* ditto hypervisor */
hw_event.exclude_idle = 0; /* don’t count when idle */
hw_event.mmap = 0; /* include mmap data */
hw_event.comm = 0; /* include comm data */
for( i = 0; i < NUM_CORES; i++ )
hw_event.config = 0x0000000000000000; //cycles
cycles_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
if (cycles_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open cycles counter");
hw_event.config = 0x0000000000000001; //instrs
instrs_counter_fd[i] = syscall(__NR_perf_event_open, &hw_event,
0,//pid_t pid,
i,//int cpu,
-1,//int group_fd,
0//unsigned long flags
3if (instrs_counter_fd[i]<0){
fprintf(stderr,"On core %d: ",i);
perror("Failed to open instrs counter");
for example, hw_event.type = PERF_TYPE_HARDWARE; is a constant that means use hardware counters, instead of software sampling. As explained in design.txt.
2.6 Integration in VMS
VMS Slave Virtual Processors (VPs) loop through a number of stages during execution. These
stages are controlled by the Master VPs, so we set up counters in the master environment and
insert counter reads between each stage.