Finding Memory Leaks in the Ruby Ecosystem

This blog post is adapted from a talk that Adam Hess and I gave at RubyKaigi 2024.

Until recently, Ruby lacked a mechanism for detecting native-level memory leaks from within Ruby and native gems. This was because, when Ruby terminates, it does not free the objects that are still alive or the memory used by Ruby’s virtual machine. This is because the system will reclaim all the memory used anyway, so it’s just extra work that would make Ruby’s shutdown slower. However, this meant that it was impossible to determine whether a piece of memory was leaked or was just not cleaned up.

This meant that it was hard to analyze Ruby applications that suffered from memory leaks, causing them to consume increasing amounts of memory until the system runs out and terminates the application. This is undesirable as it is an inefficient use of system resources, which would cost money and performance, and could lead to downtime for a web server.

In 2021, I developed a tool called ruby_memcheck to detect memory leaks in native gems using their test suites. ruby_memcheck was able to find memory leaks in popular and commonly used native gems such as nokogiri, liquid-c, protobuf, and gRPC. However, ruby_memcheck used heuristics which could cause false-negatives, which I describe in greater detail in the following section and in another blog post.

In this blog post, we’ll be looking at the RUBY_FREE_AT_EXIT feature, which is a feature in Ruby that Adam Hess and I worked on starting in 2023. RUBY_FREE_AT_EXIT allows memory leak checkers to work on Ruby without the limitations of ruby_memcheck’s heuristics.

ruby_memcheck

ruby_memcheck wraps Valgrind memcheck to find memory leaks in native gems. Valgrind memcheck is a tool used to find memory leaks in native applications. However, we can’t use it directly on Ruby because Ruby doesn’t free its memory during shutdown, leading Valgrind memcheck to report thousands of false-positive memory leaks. Ruby doesn’t free its memory during shutdown because the system will reclaim all of the program’s memory after the program has terminated anyway, so explicitly freeing the memory would only make Ruby’s shutdown slower.

Since there are tens, if not hundreds, of places where these kinds of “memory leaks” occur in Ruby at shutdown, creating a feature to free all of the memory at shutdown would have been very time-consuming. So instead, I opted to create ruby_memcheck, which uses heuristics to determine whether a memory leak is a false-positive from Ruby or a real memory leak from the native gem. Of course, the heuristic is not perfect and can cause false-negatives (i.e. it can filter out real memory leaks).

Even with this hacky heuristic, ruby_memcheck successfully identified memory leaks in popular and commonly used native gems like Nokogiri, liquid-c, gRPC, and Protobuf.

Another limitation of ruby_memcheck is that it is limited to Linux systems since Valgrind only runs on Linux. This means that we cannot use a faster memory checker like Google’s sanitizers or support other operating systems such as using the macOS leaks tool.

If you want to learn more about how ruby_memcheck works, read my blog post.

Implementing `RUBY_FREE_AT_EXIT`

In 2023, Adam Hess from GitHub collaborated with me to develop a feature in Ruby that frees all memory at shutdown. We implemented the RUBY_FREE_AT_EXIT feature which instructs Ruby to free all of its memory at shutdown when the environment variable is set. By implementing this with a flag, Ruby can maintain a fast shutdown when this feature is not needed and only free memory at shutdown when the feature is enabled.

The implementation is fairly straightforward. When cleaning up the Ruby VM at shutdown, we free all of the parts of the VM when RUBY_FREE_AT_EXIT is enabled. A snippet of the implementation looks like this:

int
ruby_vm_destruct(rb_vm_t *vm)
{
    if (rb_free_on_exit) {
        rb_free_default_rand_key();
        rb_free_encoded_insn_data();
        rb_free_global_enc_table();
        rb_free_loaded_builtin_table();

        rb_free_shared_fiber_pool();
        rb_free_static_symid_str();
        rb_free_transcoder_table();
        rb_free_vm_opt_tables();
        rb_free_warning();

...

Circular dependencies

However, it’s not that straightforward, since circular dependencies made it tricky. For example, we free Ruby objects before we free the VM because freeing Ruby objects may need the VM to be alive (e.g. executing finalizers), but things like Threads and the main Ractor are all Ruby objects, so we cannot free those objects until after most of the VM has been freed.

We solved this by determining the objects that need to remain alive for longer (threads, mutexes, fibers, and the main ractor) and free all other objects:

switch (BUILTIN_TYPE(obj)) {
  case T_DATA:
    if (rb_obj_is_thread(obj)) break;
    if (rb_obj_is_mutex(obj)) break;
    if (rb_obj_is_fiber(obj)) break;
    if (rb_obj_is_main_ractor(obj)) break;

    obj_free(objspace, obj);
    break;

We then free the VM, and finally we go back and free the leftover objects that got skipped.

Impacts of `RUBY_FREE_AT_EXIT`

After implementing RUBY_FREE_AT_EXIT, we ran Ruby’s tests and specs through Valgrind and the macOS leaks tool to find memory leaks. Through this feature, we were able to find over 30 memory leaks originating from inside of Ruby.

List of memory leak PRs found using RUBY_FREE_AT_EXIT

Let’s now take a closer look at one of the memory leaks that was found and fixed by RUBY_FREE_AT_EXIT.

Memory leak in Regexp timeout

One of the memory leaks we discovered with RUBY_FREE_AT_EXIT occurs when a regular expression match times out (ticket #20228). Consider the following code:

## Set the timeout to be very short, which is 1ms here
Regexp.timeout = 0.001
## Create a regular expression and a string such that matching
## the regular expression on the string will time out
regex = /^(a*)*$/
str = "a" * 1000000 + "x"

## Show the memory usage 10 times to demonstrate the growth
## in memory usage
10.times do
  # Run the match 100 times to make the memory leak more
  # obvious
  100.times do
    # The match of the regular expression on the string will
    # time out, so we need to wrap it in a rescue block
    begin
      regex =~ str
    rescue
    end
  end

  # Output the memory usage of the current Ruby process
  puts `ps -o rss= -p #{$$}`
end

Before this fix, we can see the memory used (in kilobytes) by the Ruby process increase linearly at a rate of about 300 megabytes per iteration, ending at around 3 gigabytes of memory used:

We can graph this, and we can visually see that the memory increases linearly:

Graph of memory usage before the memory leak was fixed

After this memory leak has been fixed, we can see that the memory grows a little bit at the beginning, but quickly flattens out to around 56 megabytes:

And we can see this visually in a graph:

Graph of memory usage after the memory leak was fixed

Fix for memory leak in Regexp timeout

This memory leak was fixed in GitHub PR #9765. The diff is quite large, but the major changes include:

The function that checks for timeouts is changed from raising an error when the regular expression match times out to returning a boolean on whether the match timed out. Since a raise will jump out of the function and into the Ruby frame with the rescue, it bypasses any cleanup of memory allocated for the match and thus leaks memory. By returning a boolean when the match times out, it allows cleanup before the Regexp::TimeoutError is raised.

// This function is periodically called during regexp matching
-void
-rb_reg_check_timeout(regex_t *reg, void *end_time_)
+bool
+rb_reg_timeout_p(regex_t *reg, void *end_time_)
 {
     rb_hrtime_t *end_time = (rb_hrtime_t *)end_time_;

@@ -4631,10 +4664,18 @@ rb_reg_check_timeout(regex_t *reg, void *end_time_)
     }
     else {
         if (*end_time < rb_hrtime_now()) {
-            // timeout is exceeded
-            rb_raise(rb_eRegexpTimeoutError, "regexp match timeout");
+            // Timeout has exceeded
+            return true;
         }
     }
+
+    return false;
+}

We then changed the macro that checks for various interrupts during the regular expression match, including for timeouts and thread interrupts. Previously, it just called rb_reg_check_timeout periodically which raises when the timeout is reached. Now, it checks whether it has timed out, and if it has, it jumps to a label called timeout.

 #ifdef RUBY

 # define CHECK_INTERRUPT_IN_MATCH_AT do { \
   msa->counter++; \
   if (msa->counter >= 128) { \
     msa->counter = 0; \
-    rb_reg_check_timeout(reg, &msa->end_time);  \
+    if (rb_reg_timeout_p(reg, &msa->end_time)) { \
+      goto timeout; \
+    } \
     rb_thread_check_ints(); \
   } \
 } while(0)

We then added a label called timeout that is executed when the match times out. This label frees memory before calling HANDLE_REG_TIMEOUT_IN_MATCH_AT, which raises the Regexp::TimeoutError.
```
   STACK_SAVE;
   xfree(xmalloc_base);
   return ONIGERR_UNEXPECTED_BYTECODE;
+
+ timeout:
+  xfree(xmalloc_base);
+  xfree(stk_base);
+  HANDLE_REG_TIMEOUT_IN_MATCH_AT;
 }
```

How you can use `RUBY_FREE_AT_EXIT`

As we’ve seen, this feature has been effective at finding memory leaks within Ruby. But how can you, as a Ruby user, utilize this feature?

There are two use cases of RUBY_FREE_AT_EXIT by the community:

Finding native level memory leaks in native gems.
Finding native level memory leaks in Ruby apps.

Finding native level memory leaks in native gems

It’s no secret that manual memory management is difficult, which is why RUBY_FREE_AT_EXIT has found so many memory leaks in Ruby and why ruby_memcheck has found so many memory leaks in native gems, including popular and commonly used native gems such as nokogiri, liquid-c, protobuf, and gRPC. Memory management is especially tricky for cases when exceptions are raised, since it can jump out of native stack frames, which can unexpectedly interrupt code execution and skip code.

If you’re a maintainer for a native gem, please try to use your test suite with RUBY_FREE_AT_EXIT and a memory leak checker (Valgrind, macOS leaks, or ASAN) to find memory leaks.

The ruby_memcheck gem now adapts to Ruby versions that support RUBY_FREE_AT_EXIT and disables heuristics (but still continues to use heuristics instead of RUBY_FREE_AT_EXIT for old Ruby versions). ruby_memcheck offers an easy way to use Valgrind memcheck using your existing minitest or RSpec test suite.

Finding native level memory leaks in Ruby apps

If you suspect that there are native level memory leaks affecting your Ruby app (which can come from native gems or Ruby itself), you can use RUBY_FREE_AT_EXIT and a memory leaks tool to help you find it. It’s important to note that this will not help you find Ruby level memory leaks, so if your memory bloat comes from creating too many Ruby objects and holding onto them, then this will not help you find it.

Request for the Ruby community

Just as how I concluded my ruby_memcheck blog post, I’ll do the same here: if you are a maintainer of a Ruby gem with native extensions, please test with ruby_memcheck to make sure that your gem does not have memory leaks. Together, let’s make Ruby a more efficient and stable platform for everyone!