Diagnosing memory utilization, leaks and stalls in production
Finding out what parts of our code are consuming (or leaking!) memory can be a challenge if we don't have the right tools. Until not so long ago most of us relied on Valgrind/Massif to find out, but this had a huge impact on performance and was not really practical for production systems.
Gladly this has been slowly changing as more systems get upgraded to Linux releases with 4.x Kernels, where it's easy to setup tools like eBPF and perf that allow us to enjoy system-wide tracing capabilities in production with an overhead that can be tolerated by most businesses.
In this session we will show how to use these tracing tools to create so-called memory FlameGraphs, which allow us to easily visualize memory utilization within our application, so that we can understand what's causing unexpected growth. We will also review a script to detect memory leaks and go through other uses of eBPF and perf like observing iTLB/dTLB misses, checking memory saturation by measuring CPU's retired Instructions Per Cycle and observing L1/L2/L3 caches hit/miss ratios.
By the end of this talk attendees will have learned about some incredibly powerful tools to introspect their memory utilization and performance, and will be better armed to diagnose leaks and stalls, live in a production system.