Automatically Find Memory Leaks in Native Gems
I have a more in-depth article on how ruby_memcheck works internally.
This article was adapted as a talk at RubyKaigi 2022.
Ruby supports gems written in native languages, such as in C, C++, or Rust. These gems are known as “native gems”. There are several reasons why a gem author would choose to write a native gem over writing one in Ruby, such as for higher performance or to use a native library. However, native gems have several drawbacks, one of which is the risk of memory leaks. High level languages like Ruby have a garbage collector, which manages memory and ensures memory is released after you’re done using it. However, these native languages don’t have a garbage collector, so they require the developer to manually manage and release memory. If the developer forgets to release a piece of memory, then a memory leak will occur.
Impacts of Memory Leaks
Here’s a memory usage graph of a production Shopify service over time on a weekend. We see linear memory growth and usage peaking at over 3GB before containers are killed due to running out of memory. This service is on Kubernetes, which has self-healing capabilities, so the killed containers will be replaced with new ones. However, if this service wasn’t self-healing, then we could’ve experienced downtime.
Here’s a memory usage graph of the same service after the memory leaks were fixed. We see that the memory usage is relatively flat at around 1.1GB.
Using ruby_memcheck to Automatically Find Memory Leaks
The memory leaks in that service were in a native gem called liquid-c and were automatically found using the ruby_memcheck tool. ruby_memcheck is a tool built on top of Valgrind Memcheck. Valgrind Memcheck is a tool that finds memory leaks and errors inside native binaries. Unfortunately, we cannot directly use Valgrind Memcheck on Ruby since Ruby doesn’t free all of its memory during shutdown. This is deliberate, as the system will reclaim all of the memory after shutdown anyways, so freeing the memory is not necessary and will make Ruby’s shutdown slower. However, this means that Valgrind Memcheck will end up reporting tens of thousands of false positives.
ruby_memcheck uses several aggressive heuristics to filter out these false postives. In most cases, these heuristics filters out all of the false positives that originate from Ruby and leaves only the detected memory leaks from the native gem. However, since these heuristics are designed to filter out all potential false positives, it could potentially result in false negatives (real memory leaks are filtered out).
Does It Work?
Yes! It’s found memory leaks in popular native gems such as nokogiri, liquid-c, protobuf, gRPC, and libxml2. You can find more details about the memory leaks it has found in the repo.