I’ve been working with C++ a lot for my PhD, and part of my work involves writing performant C++ code which can be used across various processors. This is particularly true in robotics, where both memory and compute can be limited resources.

So how does one handle investigating potential memory leaks and CPU slowdowns? Turns out a lot of people have this problem and have built a host of tools to tackle this. Valgrind seems to be the most popular toolkit, but for the sake of cross-platform ease and reporting, I decided to go with Google’s Gperftools.

Installing Gperftools

Installation of Gperftools is actually quite easy if you’re on a Unix based system, like Linux or MacOS. We can simply use Homebrew:

brew install google-perftools graphviz ghostscript

We also install graphviz and ghostscript to help with report generation. This is something I really like about Gperftools since it allows me to visualize graphically what is happening under the hood, rather than pore over thousands of lines of logs or print statements.

For Debian systems, you could also use apt, but I like using brew for various reasons, particularly cross-platform scripting.

Linking with CMake

Since any non-trivial C++ program would use a build system and I prefer using CMake, it is imperative to show how to link Gperftools via CMake.

Within the CMakeLists.txt file, we first need to find the required library. Gperftools has two main libraries for heap and CPU profiling that need to be linked:

find_library(GPERFTOOLS_TCMALLOC NAMES tcmalloc)
find_library(GPERFTOOLS_PROFILER NAMES profiler)

NOTE: You can provide HINTS to the find_library function to point it to the path where the libraries are located.

Once the system finds them, it is then just a simple matter of linking them (the CMake version of -l<library>):

target_link_libraries(${PROJECT_NAME} ${GPERFTOOLS_TCMALLOC} ${GPERFTOOLS_PROFILER})

pprof Setup

This is where I found things a bit difficult. pprof is now developed using golang, so it required me to install Go so I could install pprof with

go install github.com/google/pprof@latest

Even with the new development process, some features and options of pprof didn’t seem to work for me. More on that when we show how to use it.

Profiling

This is the easy part! Linking the above libraries causes the executable to have the necessary information, but the design of Gperftools is such that it won’t actually log any profiling data unless we set some environment variables.

This is great desgin since you can have your executables always linked to the libraries for no extra penalty in performance. However, if you are using your system in production, it is advised to not link the libraries in case the environment variables are accidentally set.

For our use case, we can set the environment variables and call the executable as:

CPUPROFILE=./<cpuprof-dir>/cpuprof HEAPPROFILE=./<heapprof-dir>/heapprof ./<executable>

Here CPUPROFILE tells Gperftools where to store the CPU profiler data (in this case <cpuprof-dir>) with the filename pattern as cpuprof. Similarly, HEAPPROFILE tells it where to store the heap profiling data with the filename pattern heapprof (e.g. ./<heapprof-dir>/heapprof.0001.heap).

The above command will run the executable and Gperftools will do what is needed to collect the necessary information.

Analyzing Profiles

For me, the easiest way to analyze the profiler information has been to render it as a PDF. This can be easily achieved with

pprof -pdf <executable> <cpuprof-dir>/cpuprof  > cpuprof.pdf
pprof -pdf <executable> <heapprof-dir>/heapprof.0001.heap  > heapprof.0001.pdf

And there you have it! You can now open those PDF files and view the nice graphical representation of all the function calls, the call frequency, memory usage, etc.

Some things to be wary of though, I initially had some difficulty with pprof:

  • Everything other than the golang version would not work, or would throw random errors.
  • Passing the flag for PNG files -png instead of -pdf refused to work. This could be a missing library but I couldn’t find anything online, and PDF rendering worked great, so c’est la vie.
  • On newer MacOS systems, you’ll probably get a ton of errors thrown at you from a tool called otool-classic. This is a red herring, since the PDF will still render, and the errors stem from Apple’s new approach to packaging libraries as text-based-descriptions or TBDs instead of having multiple copies of the .dylib files.

Conclusion

I hope this guide has been useful. I primarily have written it as a reference for my future self to inevitably come back to, and not spend a whole weekend figuring out stuff. This way I can focus on analyzing the performance of my code rather than having to deal with system incompatibilities.

On a positive note, I managed to figure out an issue in my code which was causing the heap to blow up to > 40 GB. Now my code takes a modest 9 MB to run (~10,000x improvement!), so a big win for profiling.