In 2023, I wrote about how we’ve tuned Ruby’s garbage collector for Shopify’s monolith, including how we implemented out-of-band garbage collection to reduce the impact of major collection on latency.

While the latency improvements were massive, we weren’t entirely satisfied with the heuristics used to trigger out-of-band garbage collection. It was purely based on averages, so we had to trade latency for capacity. More importantly, it didn’t fully eliminate major collection from request cycles, it only made it very rare.

But in December 2023, while discussing with Koichi Sasada, we came up with a new idea.

Disabling Major GC Entirely

If we want major GC to never trigger during a request cycle, why not disable it entirely?

In March 2024, during our annual Ruby Infrastructure team gathering, we fleshed out the details of the new feature we wanted, and Matthew Valentine-House started working on a proof of concept, which we then deployed to a small percentage of our production servers to see how effective it could be.

First, we needed a way to entirely prevent the Garbage Collector from automatically performing a major collection, but also to stop promoting objects to the old generation. Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself. Any object that does is probably something that should be eagerly loaded during boot, or some state that is leaking between requests. As such, any object promoted to the old generation during a request cycle is very unlikely to be immortal, so promoting it is wasteful.

We also needed a way to ask the GC whether it would have run a major collection so that we could manually trigger it outside of the request cycle, and only exactly as much as needed.

The initial proposal was for three new methods, GC.disable_major, GC.enable_major and GC.needs_major?.

After some back and forth with other Ruby committers, it became a single new method: GC.config(rgengc_allow_full_mark: true/false). We also exposed a new key in GC.latest_gc_info, :needs_major_by, for use in checking whether a major GC needs to run: GC.latest_gc_info(:needs_major_by).

This new feature was released as part of Ruby 3.4.0-preview2.

Effectiveness

Since Shopify monolith runs on Ruby’s master branch, we don’t have to wait for the December release to use these new features, so recently I went to work on enabling the new out-of-band GC implementation on 50% of production servers, and the results are amazing on all metrics.

First, as we anticipated, the time spent in GC during request cycles at the very tail end (p95/p99/p99.99) dropped very significantly.

However, more surprisingly, it also improved median latency:

The overall impact on service latency is of course more modest, but still very nice with a 5% reduction of average latency and a 10% reduction of p99 latency:

The impact on capacity, however, is less significant than we had hoped for. During the day, when there are frequent deploys, this doesn’t make much of a difference. However when deploys pause for a few hours, the new out-of-band collector runs much less often than the old implementation:

Implementation

In addition, to be more effective, this new implementation is also radically simple, thanks to the hooks provided by Pitchfork

# pitchfork.conf.rb

after_worker_fork do |_server, _worker|
  GC.config(rgengc_allow_full_mark: false)
end

after_request_complete do |_server, _worker, _rack_env|
  if GC.latest_gc_info(:need_major_by)
    GC.start
  end
end

Next Steps?

Now that the major collection is out of the picture, the next step is to look at the minor collections.

We can’t disable minor collection, as otherwise large requests that allocate a lot would run out of memory. However, we could try to additionally use heuristics from GC.stat to eagerly trigger minor garbage collection out-of-band, so that the majority of requests don’t have to spend any time at all in GC.

But the potential gains are much smaller because minor collection is quite fast even on our monolith.