Performance plays an important role in any computer program. It’s something which makes a user stay with your software. Imagine if your software took minutes to start even on a powerful machine. Or imagine it showed visible performance drops when doing some important work. Both of these cases would reflect badly on your application. The operating system kernel is even more performance critical, because if it lags, the whole system lags. It’s the developer’s responsibility to write code that provides the highest possible performance.
To write programs that provide good performance, we should know which part of our program is becoming a bottleneck. That way, we can focus our efforts on optimizing that region of our code. There are a lot of tools out there to help you as a developer profile your program, and better understand which part of your code needs attention. This article discusses one of the tools to help you profile your program on Linux.
The perf command in Linux gives you access to various tools integrated into the Linux kernel. These tools can help you collect and analyze the performance data about your program or system. The perf profiler is fast, lightweight, and precise.
To use the perf command requires the perf package to be installed on your distro. You can install it with this command:
sudo dnf install perf
Once you have the required package installed, fire up your terminal and execute the perf command. You’ll see output similar to below:
The perf command gives a lot of options you can use to profile your code. Let’s go through some of the commands which can come to our rescue frequently.
Listing the events
The list command shows the list of events which can be traced by the perf command. The output will look something like below:
There are a lot of events that can be traced via the perf command. Broadly, these events can be software events such as context switches or page faults, or hardware events that originate from the processor itself, like L1 cache misses, or number of clock cycles.
Counting events with perf stat
The perf stat command can be used to count the events related to a particular process or command. Let’s look at a sample of this usage by running the following command:
The output of the command lists the counters associated with different types of events that occurred during the execution of the above command.
To get the system wide statistics for all the cores on your system, run the following command:
The command collects and reports all the event statistics collected until you press Ctrl+C.
The stat command gives you the option to select only specific events for reporting. To select the events, run the stat command with the -e option followed by a comma-separated list of events to report. For example:
perf stat -e cycles,page-faults -a
This command provides statistics about the events named cycles and page-faults for all the cores on the system.
To get the stats for a specific process, run the following command, where PID is the process ID of the process for which you want performance statistics:
Sampling with perf record
The perf record command is used to sample the data of events to a file. The command operates in a per-thread mode and by default records the data for the cycles event. For example, to collect the system wide data for all the CPUs, execute the following command:
The record collects the data for samples until you press Ctrl+C. That data is stored in a file named perf.data by default. To store the data in some other file, pass the name of the file to the command using the -o option. To see the recorded data, run the following command:
This command produces output similar to the following:
The report contains 4 columns, which have their own specific meaning:
- Overhead: the percentage of overall samples collected in the corresponding function
- Command: the command to which the samples belong
- Shared object: the name of the image from where the samples came
- Symbol: the symbol name which constitutes the sample, and the privilege level at which the sample was taken. There are 5 privilege levels: [.] user level, [k] kernel level, [g] guest kernel level (virtualization), [u] guest OS userspace, and [H] hypervisor.
The command helps you display the most costly functions. You can then focus on these functions to optimize them further.
Finding code to optimize
For example, let’s examine a firefox process, and sample the data for it. In this example, the firefox process is running as PID 2750.
Executing the perf report command produces a screen like this, listing the various symbols in decreasing order of their overhead:
With this data, we can identify the functions that generate the highest overhead in our code. Now we can start our journey to optimize them for the performance.
This has been a brief introduction of using perf to profile programs and system for performance. The perf command has lots of other options that give you the power to run benchmarks on the system as well as annotate code. For further information on the perf command, visit the Perf wiki.
Image courtesy Hani Jajeh – originally posted to Unsplash as Untitled.
On Fedora 24 perf is standalone package and not part of linux-tools:
Thanks for pointing that out Andrey
I was working on Fedora 23 while writing this post and was able to get the package through linux-tools
$ sudo dnf install linux-tools
No package linux-tools available.
Error: Unable to find a match.
Do we have to install another repository to have access to perf?
@Odaiwai, No you don’t require any other repository to install perf. It can be installed as a standalone package also by running
sudo dnf install perf
With Fedora24, I was unable to install linux-tools (did not exist), but was able to install
Description : Linuxdoc-tools is a text formatting suite based on SGML (Standard
: Generalized Markup Language), using the LinuxDoc document type.
: Linuxdoc-tools allows you to produce LaTeX, HTML, GNU info, LyX, RTF,
: plain text (via groff), and other format outputs from a single SGML
: source. Linuxdoc-tools is intended for writing technical software
Linuxdoc-tools is not relevant
As Andrey said above,
is a standalone package. It can be installed using
Oops!, that should of course be: