Reworking Memory Management in CRuby
This blog post was adapted from our paper and talk at the International Symposium on Memory Management 2025.
Click here to read the paper
We would first like to acknowledge the late Chris Seaton, who initiated our collaboration with the Australian National University on this project. We are thankful for his contribution, vision, and leadership. Without him, none of this would have been possible.
Background
The Australian National University (ANU) and Shopify are collaborating on integrating the Memory Management Toolkit (MMTk) with Ruby. We are supporting the project and working alongside ANU researchers to explore how to build a next-generation garbage collector for Ruby.
If you’re not familiar with MMTk, it offers a highly modular, VM-neutral framework for rapidly building high-performance garbage collectors. Once a language plugs into MMTk, it can leverage a wide range of built-in garbage collection algorithms, ranging from canonical collectors such as NoGC, Mark and Sweep, and Immix to more performant collectors such as Generational Immix and Sticky Immix. Many of these algorithms are considerably more sophisticated than the Mark and Sweep algorithm used in Ruby and have the potential to deliver significant performance gains.
There are currently two implementations of MMTk in Ruby: one is maintained by the MMTk team and is a fork of Ruby (in the mmtk/ruby and mmtk/mmtk-ruby repositories), the other lives inside Ruby using the modular GC framework (in the ruby/mmtk repository). You might be wondering, why are there two implementations? The MMTk team’s implementation is much more advanced, with around 5 years of development. They continue to use it to experiment and develop new techniques to further leverage MMTk’s powers and improve performance. The implementation upstreamed to Ruby uses the modular GC framework and is designed to be part of an ecosystem of garbage collectors for Ruby. However, it is a reimplementation that uses techniques and knowledge from the MMTk team’s implementation, but is still quite behind.
In this blog post, we will follow the paper and mostly focusing on MMTk team’s implementation. However, if you want to learn more about the modular GC framework, you can watch this talk at RubyKaigi 2025 or read this blog post.
Challenges
In the paper, we discuss some of the challenges we faced and solutions we used while integrating MMTk with Ruby. In this blog post, we highlight some of these challenges, but please read the paper if you want the entire picture.
Copying Garbage Collector
When Ruby 2.7 introduced a moving garbage collector, it marked the first time that the memory location of objects could be moved. To facilitate this, there needed to be additional code in each of the data types in Ruby to update the address of the object after it has been moved. To ensure backwards compatibility, each data type needed to opt-in to using a new API that supports object movement, and all the existing types would pin the objects they refer to. A pinned object cannot move.
This pinning system works for Ruby’s default (built-in) garbage collector, because it has a marking phase to determine objects that are live and objects that are pinned followed by a compaction phase to move non-pinned objects. However, many of MMTk’s algorithms combine the marking and moving phases, meaning that an object is moved the moment it is marked. For algorithms like Immix, objects can be pinned, but they must be specified ahead of time. One solution would be to scan the heap twice: first to determine which objects get pinned, and again to mark all live objects and move the unpinned objects. However, this is inefficient because it essentially involves scanning the whole Ruby heap twice.
Fortunately, it’s been more than 5 years since a moving garbage collector was introduced to Ruby, so almost all the types in Ruby and many native gems support it. We introduced a new concept called Potentially Pinning Parents, or PPP for short. An object is a PPP if it could potentially contain references that cannot be moved. Earlier this year, we made an effort to reduce PPP objects. In fact, as of the time of writing, there are no user-facing Ruby objects that are PPPs except for ones defined in native gems (which we do not have any control over). There are still a few internal Ruby objects that are PPPs, but we are working on eliminating those as well.
Since we now know whether an object is a PPP at allocation time, MMTk keeps a list of PPP objects that are alive. Using that list, during a garbage collection cycle, it inspects every PPP object to determine the child objects that should be pinned before moving onto the phase to mark and move objects. Since the set of PPP objects is now small, this phase can be completed very quickly.
Finalization
Before Ruby 3.2, all Ruby objects were allocated out of the garbage collector in fixed 40-byte slots. This meant that any additional data for the object needed to be allocated externally, usually through the system using malloc
. In Ruby 3.2, we introduced Variable Width Allocation which allows us to allocate dynamic slot sizes through the garbage collector. However, because of legacy reasons and technical limitations of Variable Width Allocation, there are still many cases where we need to allocate memory out of the system through malloc
.
One of the superpowers of MMTk is that it supports parallelism in the garbage collector. Unlike Ruby’s default garbage collector, MMTk can split the work that needs to be done during a GC cycle (marking, sweeping, moving, etc.) into small chunks (MMTk calls these “work packets”) and process these work packets in parallel across multiple CPU cores.
It’s important to note however that while MMTk can perform its GC work in parallel, it does not run concurrently with the VM. In that sense, MMTk is a parallelized GC implementation, but it is not concurrent, meaning that Ruby code cannot run while the garbage collector is running, so it still requires the Ruby VM to be stopped.
There were many challenges that we had to overcome to move from a serial garbage collector to a parallel one, including removing dependence on thread-local variables and race conditions. However, while those issues were apparent as crashes and unexpected behavior, we ran into a tricky problem: our garbage collection cycles were slower the more threads we used!
This was counterintuitive, because if each CPU core does less work, then shouldn’t it run faster? We looked at performance profiles more closely, and saw that it was the finalization phase that was slower. The finalization phase iterates over all dead objects to run code to do things like reclaim memory or close file descriptors. Specifically, we found that the culprit was free
, the function that frees memory allocated through malloc
. In the following graph, we freed 100 million 32-byte pieces of memory using free
. We measure the time taken (in milliseconds) with the work split across a varying number of threads and using various implementations of malloc
. We see that for glibc, jemalloc, and tcmalloc, they all scale negatively with the number of threads. The only allocator that offers any scalability is mimalloc, but we see little to no gain past a factor of 4. This is likely due to mimalloc’s design for a fast free
that maximizes concurrency.
Threads | glibc | jemalloc | tcmalloc | mimalloc |
---|---|---|---|---|
1 | 1,263 | 3,935 | 4,988 | 903 |
2 | 5,002 | 11,719 | 13,539 | 493 |
3 | 5,787 | 17,606 | 11,374 | 346 |
4 | 6,790 | 22,478 | 17,295 | 265 |
5 | 8,058 | 17,785 | 291 | |
6 | 7,473 | 19,227 | 243 | |
10 | 9,400 | 23,350 | 230 | |
100 | 11,260 | 24,195 | 228 |
Another difference between MMTk and the default GC is that if an object does not require finalization (i.e. it does not have any resources that need to be reclaimed), then we don’t need to visit it at all, further improving performance. MMTk can use a bump pointer allocator, which increments a pointer for every allocation until it reaches the end of the allocation space. Meanwhile, the default GC in Ruby uses a freelist allocator, which uses a linked list of free slots to allocate objects into. Since building the freelist requires visiting all dead objects anyway, the default GC won’t be able to take advantage of this improvement.
The solution to this challenge was to avoid using malloc
. Instead, MMTk allocates the buffer for common types (Array
, String
, and MatchData
objects) using hidden Ruby objects instead. Since these buffer objects are now Ruby objects, they are also allocated through MMTk. As a result, these buffers now have automatic memory management, rather than manual memory management like malloc
. This means that Array
, String
, and MatchData
need to mark their buffer objects to keep those buffers alive in the marking phase, but, in return, they don’t need to do anything anymore during the finalization phase.
Future Work & Conclusion
In this blog post, we looked at a few of the challenges we encountered in integrating MMTk with Ruby and the solutions we used. We hope that sharing our experiences can provide insights for Ruby developers, garbage collector researchers, and language designers.
Work continues in MMTk’s fork of Ruby to experiment with more optimized memory layouts, new techniques for object movement, and integrations between JIT compilers and the garbage collector. We are also using the lessons we learned with MMTk to make improvements into Ruby upstream.