<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://railsatscale.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://railsatscale.com/" rel="alternate" type="text/html" /><updated>2026-04-15T09:46:23+00:00</updated><id>https://railsatscale.com/feed.xml</id><title type="html">Rails at Scale</title><subtitle>The Ruby and Rails Infrastructure team at Shopify exists to help ensure that Ruby and Rails are 100-year tools that will continue to merit being our toolchain of choice.</subtitle><author><name>Shopify Engineering</name></author><entry><title type="html">Using Perfetto in ZJIT</title><link href="https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/" rel="alternate" type="text/html" title="Using Perfetto in ZJIT" /><published>2026-03-27T00:00:00+00:00</published><updated>2026-03-27T00:00:00+00:00</updated><id>https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/</id><content type="html" xml:base="https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>Look! A trace of slow events in a benchmark! Hover over the image to see it get bigger.</p>

<style>
img.hover-zoom:hover {
  transform: scale(2);
  transition: transform 0.1s ease-in;
}
img.hover-zoom:not(:hover) {
  transition: transform 0.1s ease-out;
}
</style>

<figure><img src="demo.png" alt="A sneak preview of what the trace looks like." class="hover-zoom"><figcaption>A sneak preview of what the trace looks like.</figcaption></figure>

<p>Now read on to see what the slow events are and how we got this pretty picture.</p>

<h2 id="the-rules">The rules</h2>

<p>The first rule of just-in-time compilers is: you stay in JIT code. The second
rule of JIT is: you STAY in JIT code!</p>

<p>When control leaves the compiled code to run in the interpreter—what the ZJIT
team calls either a “side-exit” or a “deopt”, depending on who you talk
to—things slow down. In a well-tuned system, this should happen pretty
rarely. Right now, because we’re still bringing up the compiler and runtime
system, it happens more than we would like.</p>

<p>We’re reducing the number of exits over time.</p>

<h2 id="lies-damned-lies-and-statistics">Lies, damned lies, and statistics</h2>

<p>We can track our side-exit reduction progress with <code class="language-plaintext highlighter-rouge">--zjit-stats</code>, which,
on process exit, prints out a tidy summary of the counters for all of the bad
stuff we track. It’s got side-exits. It’s got calls to C code. It’s got calls
to slow-path runtime helpers. It’s got everything.</p>

<p>Here is a chopped-up sample of stats output for the Lobsters benchmark,
which is a large Rails app:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ WARMUP_ITRS=0 MIN_BENCH_ITRS=20 MIN_BENCH_TIME=0 ruby --zjit-stats benchmarks/lobsters/benchmark.rb
...
***ZJIT: Printing ZJIT statistics on exit***
...
Top-20 side exit reasons (100.0% of total 12,549,876):
                   guard_type_failure: 6,020,734 (48.0%)
                  guard_shape_failure: 5,556,147 (44.3%)
  block_param_proxy_not_iseq_or_ifunc:   445,358 ( 3.5%)
                   unhandled_hir_insn:   215,168 ( 1.7%)
                        compile_error:   181,474 ( 1.4%)
...
compiled_iseq_count:                               5,581
failed_iseq_count:                                     2
compile_time:                                    1,443ms
...
guard_type_count:                            133,425,094
guard_type_exit_ratio:                              4.5%
guard_shape_count:                            49,386,694
guard_shape_exit_ratio:                            11.3%
...
code_region_bytes:                            31,571,968
side_exit_size_ratio:                              33.1%
zjit_alloc_bytes:                             19,329,659
total_mem_bytes:                              50,901,627
...
ratio_in_zjit:                                     82.8%
$
</code></pre></div></div>

<p>(I’ve cut out significant chunks of the stats output and replaced them with
<code class="language-plaintext highlighter-rouge">...</code> because it’s overwhelming the first time you see it.)</p>

<p>The first thing you might note is that the thing I just described as terrible
for performance is happening <em>over twelve million times</em>. The second thing you
might notice is that despite this, we’re staying in JIT code seemingly a high
percentage of the time. Or are we? Is 80% high? Is a 4.5% class guard miss
ratio high? What about 11% for shapes? It’s hard to say.</p>

<p>The counters are great because they’re <em>quick</em> and they’re reasonably stable
proxies for performance. There’s no substitute for painstaking measurements on
a quiet machine but if the counter for Bad Slow Thing goes down (and others do
not go up), we’re probably doing a good job.</p>

<p>But they’re not great for building intuition. For intuition, we want more
tangible feeling numbers. We want to see things.</p>

<h2 id="building-intuition">Building intuition</h2>

<p>The third thing is that you might ask yourself “where are these exits
coming from?” Unfortunately, counters cannot tell you that. For that, we
want stack traces. This lets us know where in the guest (Ruby) code triggers
an exit.</p>

<p>Ideally also we would want some notion of time: we would want to know not just
where these events happen but also when. Are the exits happening early, at
application boot? At warmup? Even during what should be steady state
application time? Hard to say.</p>

<p>So we need more tools. Thankfully, <a href="https://perfetto.dev/">Perfetto</a> exists.
Perfetto is a system for visualizing and analyzing traces and profiles that your
application generates. It has both a web UI and a command-line UI.</p>

<p>We can emit traces for Perfetto and visualize them there.</p>

<h2 id="a-look-at-perfetto">A look at Perfetto</h2>

<p>Take a look at this <a href="https://ui.perfetto.dev/#!/?url=https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/perfetto-36885.fxt">sample ZJIT Perfetto
trace</a>
generated by running Ruby with <code class="language-plaintext highlighter-rouge">--zjit-trace-exits</code><sup id="fnref:sampled"><a href="#fn:sampled" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>. What do you see?</p>

<p>I see a couple arrows on the left. Arrows indicate “instant” point-in-time
events. Then I see a mess of purple to the right of that until the end of the
trace.</p>

<p>Hover over an arrow. Find out that each arrow is a side-exit. Scream silently.</p>

<p>But it’s a friendly arrow. It tells you what the side-exit reason is. If you
click it, it even tells you the stack trace in the pop-up panel on the bottom.
If we click a couple of them, maybe we can learn more.</p>

<p>We can also zoom by mousing over the track, holding Ctrl, and scrolling. That
will get us look closer. But there are so many…</p>

<p>Fortunately, Perfetto also provides a SQL interface to the traces. We can write
a query to aggregate all of the side exit events from the <code class="language-plaintext highlighter-rouge">slice</code> table and
line them up with the topmost method from the backtrace arguments in the <code class="language-plaintext highlighter-rouge">args</code>
table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
  <span class="n">s</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="n">reason</span><span class="p">,</span>
  <span class="n">a</span><span class="p">.</span><span class="n">display_value</span> <span class="k">AS</span> <span class="k">method</span><span class="p">,</span>
  <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="k">count</span>
<span class="k">FROM</span> <span class="n">slice</span> <span class="n">s</span>
<span class="k">JOIN</span> <span class="n">args</span> <span class="n">a</span> <span class="k">ON</span> <span class="n">a</span><span class="p">.</span><span class="n">arg_set_id</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">arg_set_id</span> <span class="k">AND</span> <span class="n">a</span><span class="p">.</span><span class="k">key</span> <span class="o">=</span> <span class="s1">'0'</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">s</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">a</span><span class="p">.</span><span class="n">display_value</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="k">count</span> <span class="k">DESC</span>
</code></pre></div></div>

<p>This pulls up a query box at the bottom showing us that there are a couple big
hotspots:</p>

<figure><img src="method-query.png" alt="Query results showing in columns left to right: reason for side-exit, method
that exited, and count. The top three are above 1k but it quickly falls off
after that." class="hover-zoom"><figcaption>Query results showing in columns left to right: reason for side-exit, method
that exited, and count. The top three are above 1k but it quickly falls off
after that.</figcaption></figure>

<p>It even has a helpful option to export the results Markdown table so I can
paste (an edited version) into this blog post:</p>

<div style="overflow-x: auto; font-size: 0.65em; margin-left: max(-10em, calc(-50vw + 50%)); margin-right: max(-10em, calc(-50vw + 50%));">

  <table>
    <thead>
      <tr>
        <th>reason</th>
        <th>method</th>
        <th>count</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>GuardShape(ShapeId(2475))</td>
        <td>ActiveModel::AttributeRegistration::ClassMethods#attribute_types</td>
        <td>5119</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(2099268))</td>
        <td>ActiveRecord::ConnectionAdapters::AbstractAdapter#extended_type_map_key</td>
        <td>2295</td>
      </tr>
      <tr>
        <td>GuardType(FalseClass)</td>
        <td>ActiveModel::Type::Value#cast</td>
        <td>1025</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(2099698))</td>
        <td>ActiveRecord::Associations#association_instance_get</td>
        <td>904</td>
      </tr>
      <tr>
        <td>BlockParamProxyNotIseqOrIfunc</td>
        <td>ActiveRecord::AttributeMethods::Read#_read_attribute</td>
        <td>902</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(526450))</td>
        <td>Rack::Request::Env#get_header</td>
        <td>636</td>
      </tr>
      <tr>
        <td>GuardType(Class[class_exact*:Class@VALUE(0x128c60100)])</td>
        <td>ActiveRecord::Base._reflections</td>
        <td>622</td>
      </tr>
      <tr>
        <td>GuardType(ObjectSubclass[class_exact:Story])</td>
        <td>ActiveRecord::Associations#association</td>
        <td>565</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(2098982))</td>
        <td>ActiveRecord::Reflection::AssociationReflection#polymorphic?</td>
        <td>510</td>
      </tr>
      <tr>
        <td>GuardType(StringSubclass[class_exact:ActiveSupport::SafeBuffer])</td>
        <td>ActionView::OutputBuffer#&lt;&lt;</td>
        <td>500</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(2475))</td>
        <td>ActiveRecord::AttributeMethods::PrimaryKey::ClassMethods#primary_key</td>
        <td>492</td>
      </tr>
      <tr>
        <td>GuardType(ObjectSubclass[class_exact:ActiveModel::Type::String])</td>
        <td>ActiveModel::Type::Value#deserialize</td>
        <td>442</td>
      </tr>
      <tr>
        <td>GuardShape(ShapeId(2098982))</td>
        <td>ActiveRecord::Reflection::AssociationReflection#deprecated?</td>
        <td>376</td>
      </tr>
      <tr>
        <td>GuardType(ObjectSubclass[class_exact:Bundler::Dependency])</td>
        <td>Gem::Dependency#matches_spec?</td>
        <td>355</td>
      </tr>
      <tr>
        <td>UnhandledHIRInvokeBuiltin</td>
        <td>Time#initialize</td>
        <td>346</td>
      </tr>
    </tbody>
  </table>

</div>

<p>Looks like we should figure out why we’re having shape misses so much and that will
clear up a lot of exits. (Hint: it’s because once we make our first guess about
what we think the object shape will be, we don’t re-assess… <strong>yet</strong>.)</p>

<p>This has been a taste of Perfetto. There’s probably a lot more to explore.
Please join the <a href="https://zjit.zulipchat.com">ZJIT Zulip</a> and let us know if you have any cool
tracing or exploring tricks.</p>

<p>Now I’ll explain how you too can use Perfetto from your system. Adding support
to ZJIT was pretty straightforward.</p>

<h2 id="implementation">Implementation</h2>

<p>The first thing is that you’ll need some way to get trace data out of your
system. We write to a file with a well-known location
(<code class="language-plaintext highlighter-rouge">/tmp/perfetto-PID.fxt</code>), but you could do any number of things. Perhaps you
can stream events over a socket to another process, or to a server that
aggregates them, or store them internally and expose a webserver that serves
them over the internet, or… anything, really.</p>

<p>Once you have that, you need a couple lines of code to emit the data. Perfetto
accepts a number of formats. For example, in his <a href="https://thume.ca/2023/12/02/tracing-methods/">excellent blog post</a>,
Tristan Hume opens with such a simple snippet of code for logging Chromium
Trace JSON-formatted events (lightly modified by me):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">event_name</span> <span class="o">=</span> <span class="bp">...</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="bp">...</span>
<span class="n">duration</span> <span class="o">=</span> <span class="bp">...</span>
<span class="n">f</span> <span class="o">=</span> <span class="nf">open</span><span class="p">(</span><span class="sh">'</span><span class="s">trace.json</span><span class="sh">'</span><span class="p">,</span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="p">)</span>
<span class="n">f</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="sh">"</span><span class="s">[</span><span class="se">\n</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># ... emit some events here ...
</span>
<span class="c1"># Log a single event
</span><span class="n">f</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="sh">'</span><span class="s">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="s">: </span><span class="sh">"</span><span class="s">%s</span><span class="sh">"</span><span class="s">, </span><span class="sh">"</span><span class="s">ts</span><span class="sh">"</span><span class="s">: %d, </span><span class="sh">"</span><span class="s">dur</span><span class="sh">"</span><span class="s">: %d, </span><span class="sh">"</span><span class="s">cat</span><span class="sh">"</span><span class="s">: </span><span class="sh">"</span><span class="s">hi</span><span class="sh">"</span><span class="s">, </span><span class="sh">"</span><span class="s">ph</span><span class="sh">"</span><span class="s">: </span><span class="sh">"</span><span class="s">X</span><span class="sh">"</span><span class="s">, </span><span class="sh">"</span><span class="s">pid</span><span class="sh">"</span><span class="s">: 1, </span><span class="sh">"</span><span class="s">tid</span><span class="sh">"</span><span class="s">: 1, </span><span class="sh">"</span><span class="s">args</span><span class="sh">"</span><span class="s">: {}},</span><span class="se">\n</span><span class="sh">'</span> <span class="o">%</span>
  <span class="p">(</span><span class="n">event_name</span><span class="p">,</span> <span class="n">timestamp</span><span class="p">,</span> <span class="n">duration</span><span class="p">))</span>

<span class="c1"># ... emit some events here ...
</span>
<span class="c1"># ... at process exit, close the file ...
</span><span class="n">f</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="sh">"</span><span class="s">]</span><span class="sh">"</span><span class="p">)</span> <span class="c1"># this closing ] isn't actually required
</span><span class="n">f</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>
</code></pre></div></div>

<p>This snippet is great. It shows, end-to-end, writing a stream of one event. It
is a <em>complete</em> (X) event, as opposed to either:</p>

<ul>
  <li>two discrete timestamped <em>begin</em> (B) and <em>end</em> (E) events that book-end
something, or</li>
  <li>an <em>instant</em> (i) event that has no duration, or</li>
  <li>a couple other event types in the <a href="https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview">Chromium Trace Event Format doc</a>
</li>
</ul>

<p>It was enough to get me started. Since it’s JSON, and we have a lot of side
exits, the trace quickly ballooned to 8GB large for a several second benchmark.
Not great. Now, part of this is our fault—we should side exit less—and part
of it is just the verbosity of JSON.</p>

<p>Thankfully, Perfetto ingests more compact binary formats, such as the <a href="https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format">Fuchsia
trace format</a>.
In addition to being more compact, FXT even supports string interning. After
modifying the tracer to emit FXT, we ended with closer to 100MB for the same
benchmark.</p>

<p>We can reduce further by <em>sampling</em>—not writing every exit to the trace, but
instead every <em>K</em> exits (for some (probably prime) K). This is why we provide
the <code class="language-plaintext highlighter-rouge">--zjit-trace-exits-sample-rate=K</code> option.</p>

<p>Check out the <a href="https://github.com/ruby/ruby/blob/eb8051185122d4b7bc9c6a6df694a85f34ced681/zjit/src/stats.rs#L988">trace writer</a> implementation from the point this article
was written.</p>

<h2 id="tracing-more-things">Tracing more things</h2>

<p>We could trace:</p>

<ul>
  <li>When methods get compiled</li>
  <li>How big the generated code is</li>
  <li>How long each compile phase takes</li>
  <li>When (and where) invalidation events happen</li>
  <li>When (and where) allocations happen from JITed code</li>
  <li>Garbage collection events</li>
  <li>and more!</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>Visualizations are awesome. Get your data in the right format so you can ask
the right questions easily. Thanks for Perfetto!</p>

<p>Also, looks like visualizations are now available in Perfetto canary. Time to
go make some fun histograms…</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:sampled">
      <p>This is also sampled/strobed, so not every exit is in there. This
is just 1/K of them for some K that I don’t remember. <a href="#fnref:sampled" class="reversefootnote" role="doc-backlink">↩</a></p>
    </li>
  </ol>
</div>
</body></html>]]></content><author><name>Max Bernstein</name></author><category term="posts" /><category term="2026-03-27-using-perfetto-in-zjit" /><summary type="html"><![CDATA[We added Perfetto tracing support to ZJIT so we could visualize and query slow events. Take a look at the pretty colors and see how you can add this to your system too.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/5e25b1e658f8301440e1e91eafbb48286c0748f0.png" /><media:content medium="image" url="https://railsatscale.com/2026-03-27-using-perfetto-in-zjit/5e25b1e658f8301440e1e91eafbb48286c0748f0.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Engineering Rigor in the AI Age: Building a Benchmark You Can Trust</title><link href="https://railsatscale.com/2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust/" rel="alternate" type="text/html" title="Engineering Rigor in the AI Age: Building a Benchmark You Can Trust" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://railsatscale.com/2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust/</id><content type="html" xml:base="https://railsatscale.com/2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>The Rails Infrastructure team has been <a href="https://railsatscale.com/2026-03-09-faster-bundler/" target="_blank">working on making Bundler faster</a> and our work has paid off. A cold <code class="language-plaintext highlighter-rouge">bundle install</code> is 3x faster on a Gemfile with 452 gems compared to Bundler 2.7. But “faster” only means something if everyone agrees on what’s being measured. If two people are running benchmarks with different definitions of “cold install” or different cache states, the results aren’t comparable. We needed a shared tool that would give us confidence, both internally and externally, that we’re tackling the right problems and actually making Bundler faster. And along the way we learned when Claude can be helpful and when to not outsource our own expertise and thinking.</p>

<h2 id="what-affects-bundle-install-time">What affects bundle install time</h2>

<p>Before building anything, we had to understand what variables go into <code class="language-plaintext highlighter-rouge">bundle install</code> performance. There are more than you’d expect:</p>

<ul>
  <li>
<strong>Number of gems</strong>: A Gemfile with 35 gems takes less time to install than a Gemfile with 500 gems.</li>
  <li>
<strong>Depth of dependencies</strong>: A flat Gemfile with no transitive dependencies resolves faster than a deep dependency tree.</li>
  <li>
<strong>Native extensions</strong>: Gems like <code class="language-plaintext highlighter-rouge">bigdecimal</code> take orders of magnitude longer to install than pure Ruby gems (<code class="language-plaintext highlighter-rouge">bigdecimal</code> alone takes 3 seconds to install). The ratio of native extension gems to pure Ruby gems in your Gemfile changes the install profile significantly. It’s rare to have a Gemfile without at least a few dependencies on native extensions.</li>
  <li>
<strong>How Ruby is compiled</strong>: Optimization flags, compiler version, and platform all affect gem compilation time.</li>
  <li>
<strong>Network time</strong>: Downloading from rubygems.org introduces latency and rate limiting that can skew results.</li>
  <li>
<strong>Number of cores</strong>: Bundler parallelizes installs across worker threads, so core count matters.</li>
  <li>
<strong>Endpoint security software</strong>: On company issued, managed machines, security software that scans file writes adds measurable time to every gem install. Running the same benchmark on a personal non-managed device vs a managed device produced very different numbers with no code change. If your benchmarks aren’t reproducible across machines, this is worth checking.</li>
</ul>

<p>We needed a way to remove as many of these variables as possible so when we made changes, we could trust our benchmarks were correct. The goal was to remove guesswork, ensure everyone is testing from the same starting point, and provide a straightforward way to run benchmarks when making changes.</p>

<h2 id="what-we-built">What we built</h2>

<p>It took a few weeks to get reliable results and as part of building the benchmark we also implemented a full <a href="https://github.com/eileencodes/bundler-perf-toolkit" target="_blank">toolkit</a> that includes scripts for installing, benchmarking and profiling Ruby package managers.</p>

<p>Getting a reliable benchmark took a lot of iteration. The first version of the benchmark was basic, and every time we ran it we’d find something that wasn’t quite right. We worked with Claude to make the original benchmark and tweak it as we found issues with the runs. In some cases we had to do the tedious work of debugging the benchmark ourselves.</p>

<p>Back in 2018 when I was working on improving Rails integration test performance, I kept an <a href="https://github.com/eileencodes/integration_performance_test" target="_blank">entire repo</a> with all my benchmarks and profile scripts so I could track changes over time, but also so I could share it with the community. When your benchmark is open source, you’re not working in a vacuum and everyone else can check your assumptions. That lesson stuck with me, and it’s why this toolkit follows a similar pattern.</p>

<p>The toolkit includes everything you need to benchmark and profile Bundler.</p>

<ul>
  <li>
<strong>Setup scripts:</strong> Scripts to install for both macOS and Linux that lets you choose which package managers you want to benchmark.</li>
  <li>
<strong>Benchmarking tool:</strong> The benchmark tool uses <a href="https://github.com/sharkdp/hyperfine" target="_blank">hyperfine</a> for statistical timing with standard deviation, min/max, and outlier detection.
    <ul>
      <li>It supports running against multiple branches and package managers, switching the Ruby version, changing the number of iterations, and provides multiple Gemfile scenarios.</li>
      <li>It automatically runs both warm and cold scenarios and outputs how much faster or slower each is than the baseline.</li>
      <li>It includes a fake gemserver. Thanks to Claude I was able to quickly build a fake gemserver that served real gems based on Aaron’s <a href="https://github.com/tenderlove/slow-gemserver" target="_blank">slow-gemserver</a> and use that to eliminate deviations caused by network round trips to <a href="http://rubygems.org" target="_blank">Rubygems</a>org and/or rate limiting.</li>
    </ul>
  </li>
  <li>
<strong>Profiler tool for Bundler:</strong> The profiler tool can currently only profile Bundler but also includes everything you need using either <a href="https://github.com/mstange/samply" target="_blank">Samply</a> or <a href="https://github.com/jhawthorn/vernier" target="_blank">Vernier</a>
    <ul>
      <li>It supports switching the Ruby version, running with cold or warm cache mode, choosing the Gemfile scenario, and changing the output path of the profile.</li>
      <li>It also can optionally use the fake gemserver to avoid profiling network time.</li>
    </ul>
  </li>
</ul>

<h2 id="how-we-defined-what-to-measure">How we defined what to measure</h2>

<p>As part of building this benchmark we also needed to define what we wanted to measure in order to have the same understanding of the scenarios we are trying to improve.</p>

<p><strong>Cold</strong> is defined as a first-ever install. Before each iteration, hyperfine’s <code class="language-plaintext highlighter-rouge">--prepare</code> hook nukes all caches: download cache, compact index cache, installed gems, bundle home, and removes the lockfile. The install has to resolve dependencies, download every gem, and install from scratch. This is the case where nothing is compiled or installed.</p>

<p><strong>Warm</strong> is defined as reinstalling gems that have been downloaded previously. The benchmark setup first runs one full cold install to populate the download cache. Then for each timed iteration, the <code class="language-plaintext highlighter-rouge">--prepare</code> hook removes only installed gems and the <code class="language-plaintext highlighter-rouge">.bundle</code> directory, keeping the download cache and lockfile intact. The install runs with <code class="language-plaintext highlighter-rouge">BUNDLE_FROZEN=1</code> so it skips resolution and only extracts and installs.</p>

<p><strong>Getting “warm” right was harder than it sounds.</strong> <a href="https://github.com/ruby/rubygems" target="_blank">Bundler</a>, <a href="https://github.com/gel-rb/gel" target="_blank">gel</a>, <a href="https://github.com/spinel-coop/rv" target="_blank">rv</a>, and <a href="https://github.com/tobi/scint" target="_blank">scint</a> package managers all have different cache structures. Early on Bundler’s warm results were barely faster than cold, and we spent time debugging before realizing it was a cache isolation issue in the benchmark itself, not a Bundler problem. The benchmark was wrong, not the code. Interestingly, this was a case where AI wasn’t that helpful. Claude kept missing this specific environment variable, so we had to debug the hard way (this is also why the script has a <code class="language-plaintext highlighter-rouge">BENCH_DEBUG</code> mode). But it paid off because we gained a better understanding of which environment variables affect the caches.</p>

<p>In the future we may want to define other cache scenarios to measure. There are scenarios between our pre-defined cold and warm scenarios like <code class="language-plaintext highlighter-rouge">bundle update</code> or having the gems installed but no lockfile so resolution needs to occur again. The beauty of this toolkit being open source is that if there’s a scenario you want to test, we can easily add that to the benchmark script.</p>

<h2 id="using-the-benchmark-script">Using the benchmark script</h2>

<p>In order to support multiple tools we implemented a <code class="language-plaintext highlighter-rouge">--run</code> argument that is specified as a <code class="language-plaintext highlighter-rouge">LABEL:TOOL[:PATH]</code> triple. The label isolates caches, the tool selects the package manager, and the path optionally points to a local checkout or git worktree. Multiple runs can be compared in a single invocation, with the first treated as baseline.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ruby run_benchmark.rb <span class="se">\</span>
  <span class="nt">--run</span> master:bundler:~/rubygems <span class="se">\</span>
  <span class="nt">--run</span> patched:bundler:~/rubygems-patched <span class="se">\</span>
  <span class="nt">--scenario</span> rails <span class="se">\</span>
  <span class="nt">--iterations</span> 5 <span class="se">\</span>
  <span class="nt">--source</span> http://localhost:9292
</code></pre></div></div>

<p>Caches are fully isolated per label under <code class="language-plaintext highlighter-rouge">.caches/&lt;label&gt;/</code> using environment variables so comparing two Bundler versions in the same invocation won’t contaminate results. The comparison output shows relative speed so you can quickly see if your change is how many times faster or slower it is than the baseline.</p>

<p>Each scenario is just a directory containing a Gemfile. The <code class="language-plaintext highlighter-rouge">rails</code> scenario represents a typical Rails application with 35 gems. The <code class="language-plaintext highlighter-rouge">large</code> scenario is a stress test with 452 gems. You can add your own by creating a directory with a Gemfile and passing <code class="language-plaintext highlighter-rouge">--scenario yourdir</code> to the script.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ruby run_benchmark.rb --run bundler27:bundler:~/bundler27 --run master:bundler:~/rubygems/ --scenario large --iterations 3 --source http://localhost:9292 --ruby /usr/local/bin/ruby
Benchmark matrix
Ruby: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux]
Source: http://localhost:9292
Iterations: 3
Runs:
  bundler27: bundler (/home/ubuntu/bundler27)
  master: bundler (/home/ubuntu/rubygems)

=== Scenario: large (452 gems, bundler27) ===
  Running cold benchmark (3 runs)...
Benchmark 1: bundler27 (cold)
  Time (mean ± σ):     51.368 s ±  0.058 s    [User: 106.055 s, System: 22.181 s]
  Range (min … max):   51.332 s … 51.435 s    3 runs

  Running warm benchmark (3 runs)...
Benchmark 1: bundler27 (warm)
  Time (mean ± σ):      8.895 s ±  0.150 s    [User: 7.743 s, System: 4.658 s]
  Range (min … max):    8.742 s …  9.042 s    3 runs

  Cold median: 51.34s  Warm median: 8.9s

=== Scenario: large (452 gems, master) ===
  Running cold benchmark (3 runs)...
Benchmark 1: master (cold)
  Time (mean ± σ):     16.012 s ±  0.023 s    [User: 107.344 s, System: 21.568 s]
  Range (min … max):   15.987 s … 16.033 s    3 runs

  Running warm benchmark (3 runs)...
Benchmark 1: master (warm)
  Time (mean ± σ):      7.202 s ±  0.027 s    [User: 4.908 s, System: 3.061 s]
  Range (min … max):    7.173 s …  7.228 s    3 runs

  Cold median: 16.02s  Warm median: 7.2s

Results written to /home/ubuntu/bundler-bench/results/bundler27_20260318_161305.json
Results written to /home/ubuntu/bundler-bench/results/master_20260318_161305.json

=== Comparison Summary ===

Scenario: large (452 gems)
                             Cold     +/-                        Warm     +/-
  ------------------------------------------------------------------------------
  bundler27                51.34s   0.06s  baseline             8.90s   0.15s  baseline
  master                   16.02s   0.02s  3.21x faster         7.20s   0.03s  1.24x faster
</code></pre></div></div>

<p><em>Note these numbers will vary across macOS and Linux, as well as machines with endpoint security software. While we aimed to reduce many variables, you still may not see the same numbers, however times faster should be between 2-3.5x for cold and 1-1.5x for warm for bundle install. This script was run on an AWS Sandbox and therefore has no other traffic or endpoint security altering the numbers. It is also using the fake gemserver, so network round trips aren’t involved.</em></p>

<h3 id="using-the-profiling-script">Using the profiling script</h3>

<p>Benchmarks tell you whether something got faster. Profiles tell you why it’s slow. The profiling tool runs a single <code class="language-plaintext highlighter-rouge">bundle install</code> under <a href="https://github.com/jhawthorn/vernier" target="_blank">Vernier</a> or <a href="https://github.com/mstange/samply" target="_blank">samply</a> to produce flamegraphs.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ruby profile_bundler.rb <span class="se">\</span>
  <span class="nt">--run</span> master:bundler:~/rubygems <span class="se">\</span>
  <span class="nt">--scenario</span> rails <span class="se">\</span>
  <span class="nt">--mode</span> warm <span class="se">\</span>
  <span class="nt">--profiler</span> vernier
</code></pre></div></div>

<p>It supports both cold and warm modes so you can profile the specific phase you’re investigating. Profiles are written to <code class="language-plaintext highlighter-rouge">profiles/</code> with filenames that include the label, scenario, mode, platform, and timestamp so you can compare across runs and machines.</p>

<p>Here’s an example of the Vernier output for the <code class="language-plaintext highlighter-rouge">master</code> branch on the AWS linux sandbox for the cold cache mode.</p>

<figure><img src="./vernier-linux-cold-bundler-master.png" alt="Vernier flamegraph for cold bundle install on Linux"><figcaption>Vernier flamegraph for cold bundle install on Linux</figcaption></figure>

<p><em>See all the yellow bars? Those are native extensions compiling and blocking the threads from doing other work.</em></p>

<h3 id="setup-scripts">Setup scripts</h3>

<p>Reproducing someone else’s benchmark results is only possible if you’re starting from the same place. The repository includes setup scripts for both macos and linux which will install Ruby, hyperfine, profiling tools, and clone the repos you need:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./setup-benchmark-mac.sh <span class="nt">--tools</span> bundler,bundler27,gel,scint
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--tools</code> flag lets you pick which package managers or versions to install. It defaults to <code class="language-plaintext highlighter-rouge">bundler,bundler27</code> so you can compare the current master against the last stable release without extra setup. We wanted anyone on the team (or in the community) to be able to spin up a fresh machine and get comparable results without hunting down the right Ruby version, compiler flags, or repo branches.</p>

<h2 id="what-we-learned">What we learned</h2>

<p>A shared benchmark is a source of truth. When someone says “my change is faster” and someone else disagrees, you need a neutral tool that everyone agreed on beforehand. Without that, performance discussions turn into competing anecdotes. The toolkit gives us a way to settle those disagreements with data instead of intuition.</p>

<p>Reproducibility matters just as much. If you can’t reproduce similar results on someone else’s machine, you can’t verify the claims. “It’s faster on my machine” isn’t useful if it’s not faster for everyone, or worse faster on Linux but slower on macOS. When results differ across machines, we can start asking why instead of arguing about whether.</p>

<p>Back in 2018 I gave a talk called <a href="https://www.youtube.com/watch?v=oT74HLvDo_A" target="_blank">How to Performance</a> which was on the surface about how I sped up integration tests in Rails, but really it was a talk about how to write benchmarks you could trust so you know when you actually made something faster. Many of the lessons I learned back then came up again during this project.</p>

<p>Profiles and guesswork are only one part of the equation. A profile can show you a hot spot, and you can write a fix that looks faster, but without a proper benchmark you don’t actually know. You don’t know if the gain on macOS is a regression on Linux. You don’t know if “cold” got faster but “warm” got slower. You don’t know if the improvement holds across different Gemfile sizes. The benchmark is what turns a hypothesis into evidence.</p>

<p>This matters even more now than it did in 2018. Engineering rigor is more important in the AI world than it was before. It’s easy to generate output that looks correct or looks faster. It’s easy to make a benchmark that looks reasonable but cheats on the warm caches. AI is good at producing plausible code and plausible explanations. Humans are good at critical thinking and using our gut to know when something doesn’t look right. We have taste, discernment, and scrutiny, AI has data.</p>

<p>That’s not a dig on AI. I used Claude extensively throughout this project. It was great at writing the setup scripts, which are uninteresting and error prone, and it wrote the original benchmark tool. But it also got things wrong. The warm cache bug I mentioned earlier? Claude missed setting <code class="language-plaintext highlighter-rouge">BUNDLE_USER_HOME</code> in the environment, which meant Bundler was writing to the system bundle home instead of the isolated one. Warm caches on the master branch looked broken because they were being shared across runs. I spent time debugging Bundler before I realized the benchmark itself was wrong. Claude didn’t catch it because it doesn’t have the deep institutional knowledge of how Bundler’s cache layers interact. I caught it because I knew what the numbers should look like and they didn’t add up.</p>

<p>That’s not a reason to stop using AI. It’s a reminder to not outsource our thinking and to always test our assumptions. Applying engineering rigor is how we can be sure the work we’re doing, whether it’s us or AI doing it, is valid and achieves our goals.</p>

<p>The toolkit is available at <a href="https://github.com/eileencodes/bundler-perf-toolkit" target="_blank">bundler-perf-toolkit</a>. If you’re working on Bundler performance or just curious about how your Gemfile affects install times, give it a try. We welcome PRs with new scenarios, corrections to cache handling if you spot something we got wrong, and support for other tools to test against.</p>
</body></html>]]></content><author><name>[&quot;Eileen Alayce&quot;]</name></author><category term="posts" /><category term="2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust" /><summary type="html"><![CDATA[The Rails Infrastructure team built an open-source benchmarking toolkit to reliably measure Bundler performance improvements, and along the way learned that AI is great for scaffolding tools but engineering rigor — trusting your gut when numbers don't add up — is something you can't outsource.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust/755f034da9ce9efc4494dca56129e6586b81262c.png" /><media:content medium="image" url="https://railsatscale.com/2026-03-18-engineering-rigor-in-the-ai-age-building-a-benchmark-you-can-trust/755f034da9ce9efc4494dca56129e6586b81262c.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How ZJIT removes redundant object loads and stores</title><link href="https://railsatscale.com/2026-03-18-how-zjit-removes-redundant-object-loads-and-stores/" rel="alternate" type="text/html" title="How ZJIT removes redundant object loads and stores" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://railsatscale.com/2026-03-18-how-zjit-removes-redundant-object-loads-and-stores/</id><content type="html" xml:base="https://railsatscale.com/2026-03-18-how-zjit-removes-redundant-object-loads-and-stores/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<h2 id="intro">Intro</h2>
<p>Since the <a href="https://railsatscale.com/2025-12-24-launch-zjit/">post</a> at the end of last year, ZJIT has grown and
changed in some exciting ways. This is the story of how a new, self-contained
optimization pass causes ZJIT performance to surpass YJIT on an interesting
<a href="https://rubybench.github.io/benchmarks/ruby-bench.html#setivar">microbenchmark</a>. It has been 10 months since ZJIT was merged
into Ruby, and we’re now beginning to see the design differences between YJIT
and ZJIT manifest themselves in performance divergences. In this post, we will
explore the details of one new optimization in ZJIT called load-store
optimization. This implementation is part of ZJIT’s optimizer in HIR. Recall
that the structure of ZJIT looks roughly like the following.</p>

<pre><code class="language-mermaid">flowchart LR
        A(["Ruby"])
        A --&gt; B(["YARV"])
        B --&gt; C(["HIR"])
        C --&gt; D(["LIR"])
        D --&gt; E(["Assembly"])
</code></pre>

<p>This post will focus on optimization passes in HIR, or “High-level” Intermediate
Representation. At the HIR level, we have two capabilities that are distinct
from other compilation stages. Our optimizations in HIR typically utilize the
benefits of our <a href="https://bernsteinbear.com/blog/ssa/">SSA</a> representation in addition to the HIR
instruction effect system.</p>

<p>These are the current analysis passes in ZJIT without load-store optimization,
as well as the order in which the passes are executed.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">run_pass!</span><span class="p">(</span><span class="n">type_specialize</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">inline</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">optimize_getivar</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">optimize_c_calls</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">fold_constants</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">clean_cfg</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">remove_redundant_patch_points</span><span class="p">);</span>
<span class="nd">run_pass!</span><span class="p">(</span><span class="n">eliminate_dead_code</span><span class="p">);</span>
</code></pre></div></div>

<p>Here’s where load-store optimization gets added.</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  run_pass!(type_specialize);
  run_pass!(inline);
  run_pass!(optimize_getivar);
  run_pass!(optimize_c_calls);
<span class="gi">+ run_pass!(optimize_load_store);
</span>  run_pass!(fold_constants);
  run_pass!(clean_cfg);
  run_pass!(remove_redundant_patch_points);
  run_pass!(eliminate_dead_code);
</code></pre></div></div>

<h2 id="overview">Overview</h2>
<p>Ruby is an object-oriented programming language, so CRuby needs to have some
notion of object loads, modifications, and stores. In fact, this is a topic
already covered by another Rails at Scale <a href="https://railsatscale.com/2023-10-24-memoization-pattern-and-object-shapes/">blog post</a>. The shape
system provides performance improvements in CRuby (both interpreter and JIT),
but there is still plenty of opportunity to improve JIT performance. Sometimes
optimizing interpreter opcodes one at a time leaves repeated loads or stores
that can be cleaned up with a program analysis optimization pass. Before getting
into the weeds about this pass, let’s talk performance.</p>

<h3 id="results">Results</h3>
<p>The <code class="language-plaintext highlighter-rouge">setivar</code> <a href="https://rubybench.github.io/benchmarks/ruby-bench.html#setivar">benchmark</a> for ZJIT changes dramatically on
2026-03-06. This is when load-store optimization landed in ZJIT. At the time of
this writing, ZJIT takes an average of <code class="language-plaintext highlighter-rouge">2ms</code> per iteration on this benchmark,
while YJIT takes an average of <code class="language-plaintext highlighter-rouge">5ms</code>.</p>

<figure><img src="benchmark.png" alt='This graph shows ZJIT (yellow) and YJIT (green) as "times faster than interpreter" (blue). You can see the moment where load-store optimization is implemented and ZJIT overtakes YJIT.'><figcaption>This graph shows ZJIT (yellow) and YJIT (green) as "times faster than interpreter" (blue). You can see the moment where load-store optimization is implemented and ZJIT overtakes YJIT.</figcaption></figure>

<p>This is the second time that ZJIT has clearly surpassed YJIT. The first example
is <a href="https://rubybench.github.io/benchmarks/ruby-bench.html#object-new">here</a>.</p>

<p>At a high level, this means that ZJIT is over twice as fast as YJIT for repeated
instance variable assignment, and more than <strong>25 times</strong> faster than the
interpreter!</p>

<h3 id="a-troubling-development">A Troubling Development</h3>
<p>However, there’s an important question we have to address - why should an
optimization pass for object loads and stores have anything to do with instance
variable assignment? It turns out that ZJIT’s High Intermediate Representation
(HIR) uses <code class="language-plaintext highlighter-rouge">LoadField</code> and <code class="language-plaintext highlighter-rouge">StoreField</code> instructions both for both object
instance variables, and for object shapes. We’re going to have to dig deeper
into CRuby shapes and ZJIT HIR internals in order to make sense of this.</p>

<h3 id="background">Background</h3>
<p>So far, we’ve learned that HIR has <code class="language-plaintext highlighter-rouge">LoadField</code> and <code class="language-plaintext highlighter-rouge">StoreField</code> instructions.
We’ve claimed that they are multi-purpose and that the performance wins come
from optimizing object shapes, but that they can also apply to object instance
variables. Because the algorithm works just as well for both situations, the
rest of this post will focus on object instance variables. This allows us to
demonstrate concepts in pure Ruby to make things more approachable.</p>

<h4 id="example">Example</h4>
<p>Let’s start with a simple example we can all agree on. Clearly this code
snippet has a double store, and we can safely remove one of the <code class="language-plaintext highlighter-rouge">@a = value</code>
calls.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">C</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="n">value</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Here’s the same code snippet with an example of the call we remove. Here, we
have elided a redundant <code class="language-plaintext highlighter-rouge">StoreField</code> instruction.</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  class C
    def initialize
      value = 1
      @a = value
<span class="gd">-     @a = value
</span>    end
  end
</code></pre></div></div>

<p>When should we remove <code class="language-plaintext highlighter-rouge">LoadField</code> and <code class="language-plaintext highlighter-rouge">StoreField</code> instructions? The HIR code
snippets will come later. For now, we only need to know the mapping between Ruby
and HIR for instance variable loads and stores.</p>

<table>
  <thead>
    <tr>
      <th>Ruby</th>
      <th>HIR</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">@var = value</code></td>
      <td><code class="language-plaintext highlighter-rouge">StoreField var, @obj@offset, value</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">@var</code></td>
      <td><code class="language-plaintext highlighter-rouge">LoadField var, @obj@offset</code></td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>Note: In a class’s <code class="language-plaintext highlighter-rouge">initialize</code> method, instance variable operations are
likely to cause <code class="language-plaintext highlighter-rouge">LoadField</code> and <code class="language-plaintext highlighter-rouge">StoreField</code> instructions due to shape
transitions. Outside of an initialize method, the loads and stores are more
likely to be related to the instance variables themselves. We decided that
more complicated Ruby code snippets would clarify the kind of <code class="language-plaintext highlighter-rouge">LoadField</code> or
<code class="language-plaintext highlighter-rouge">StoreField</code> but overly clutter the code snippets in this post.</p>
</blockquote>

<h4 id="cases">Cases</h4>
<p>Let’s consider every edge case for our algorithm through short Ruby snippets
to illustrate scenarios where we can and cannot elide <code class="language-plaintext highlighter-rouge">LoadField</code> or
<code class="language-plaintext highlighter-rouge">StoreField</code> HIR instructions.</p>

<blockquote>
  <p>Note: The following examples could replace the <code class="language-plaintext highlighter-rouge">value</code> variable with the
constant <code class="language-plaintext highlighter-rouge">1</code>, but in ZJIT this could cause other optimizations such as
constant folding to interfere with our load-store demonstrations. We will use
these more complex code snippets in case the reader wants to follow along with
<a href="http://tryzjit.fly.dev/">a compiler explorer</a>.</p>
</blockquote>

<h5 id="redundant-store">Redundant Store</h5>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">C</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="n">value</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
    <span class="c1"># This store is redundant and should be elided in HIR</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<h5 id="redundant-load">Redundant Load</h5>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">C</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="n">value</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
    <span class="c1"># We already know that this load is `value` and should be replaced</span>
    <span class="vi">@a</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<h5 id="redundant-store-with-aliasing">Redundant Store with Aliasing</h5>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">C</span>
  <span class="nb">attr_accessor</span> <span class="ss">:a</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">class</span> <span class="nc">D</span>
  <span class="nb">attr_accessor</span> <span class="ss">:a</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">multi_object_test</span>
  <span class="n">x</span> <span class="o">=</span> <span class="no">C</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
  <span class="n">y</span> <span class="o">=</span> <span class="no">D</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
  <span class="n">new_x_val</span> <span class="o">=</span> <span class="mi">2</span>
  <span class="n">new_y_val</span> <span class="o">=</span> <span class="mi">3</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_x_val</span>
  <span class="n">y</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_y_val</span>
  <span class="c1"># We would like to elide this (but currently do not)</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_x_val</span>
<span class="k">end</span>
</code></pre></div></div>
<p>With variables pointing to distinct objects, we could elide the second store to
object <code class="language-plaintext highlighter-rouge">x</code>. This is not currently implemented, but is a possible improvement
with a technique called <a href="https://bernsteinbear.com/blog/toy-tbaa/">type-based alias analysis</a>.</p>

<h5 id="required-store-with-aliasing">Required Store with Aliasing</h5>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">C</span>
  <span class="nb">attr_accessor</span> <span class="ss">:a</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">multi_object_test</span>
  <span class="n">x</span> <span class="o">=</span> <span class="no">C</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
  <span class="n">y</span> <span class="o">=</span> <span class="n">x</span>
  <span class="n">new_x_val</span> <span class="o">=</span> <span class="mi">2</span>
  <span class="n">new_y_val</span> <span class="o">=</span> <span class="mi">3</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_x_val</span>
  <span class="n">y</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_y_val</span>
  <span class="c1"># We should not elide the second `x.a` assignment because the `y.a` assignment modifies `x`</span>
  <span class="c1"># The `x.a` store after this comment is no longer redundant</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="n">new_x_val</span>
<span class="k">end</span>
</code></pre></div></div>
<p>With multiple multiple variables aliasing to the same object, we cannot elide
the second store to <code class="language-plaintext highlighter-rouge">x</code>. While technically we could elide <code class="language-plaintext highlighter-rouge">y.a = new_y_val</code> and
the initial <code class="language-plaintext highlighter-rouge">y = x</code> assignment, these improvements are out of scope for this
post. The key point here is that aliasing needs to be considered. If we assume
that <code class="language-plaintext highlighter-rouge">y</code> and <code class="language-plaintext highlighter-rouge">x</code> reference different objects and elide the second
<code class="language-plaintext highlighter-rouge">x.a = new_x_val</code> call, we alter program behavior.</p>

<h5 id="required-store-with-effects">Required Store with Effects</h5>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">scary_method</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
  <span class="n">obj</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="s2">"We have modified the object. The second store is no longer redundant"</span>
<span class="k">end</span>

<span class="k">class</span> <span class="nc">C</span>
  <span class="nb">attr_accessor</span> <span class="ss">:a</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="n">value</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">effectful_operations_between_stores_test</span>
  <span class="n">x</span> <span class="o">=</span> <span class="no">C</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="mi">5</span>
  <span class="n">scary_method</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
  <span class="c1"># We want to elide this but `scary_method` can modify `x`</span>
  <span class="n">x</span><span class="p">.</span><span class="nf">a</span> <span class="o">=</span> <span class="mi">5</span>
<span class="k">end</span>
</code></pre></div></div>
<p>In this case, the second store looks redundant, but it might not be. An
arbitrary Ruby method (or C call, or some HIR instructions) could modify the <code class="language-plaintext highlighter-rouge">x</code>
object and breaks the assumptions we can make about the state of the <code class="language-plaintext highlighter-rouge">x</code> object.
In such cases, we cannot perform load-store optimization.</p>

<h3 id="the-algorithm">The Algorithm</h3>

<h4 id="key-idea">Key Idea</h4>
<p>With these cases, we have covered everything needed to implement our load-store
optimization algorithm. The algorithm is a lightweight
<a href="https://en.wikipedia.org/wiki/Abstract_interpretation">abstract interpretation</a> over objects. This approach allows us to
minimize the computation required to perform our optimization pass while
ensuring soundness. In layperson’s terms, this means that every load we replace
and every store we eliminate will not change program behavior, but that we will
potentially miss some loads or stores that could be eliminated.</p>

<h4 id="tricky-details">Tricky Details</h4>

<h5 id="basic-blocks">Basic Blocks</h5>
<p>Our load-store optimization pass scans through basic blocks, searches for
redundant loads and stores, and updates the HIR instructions accordingly.
Unnecessary <code class="language-plaintext highlighter-rouge">StoreField</code> operations are elided, and unnecessary <code class="language-plaintext highlighter-rouge">LoadField</code>
operations are replaced with the instruction already holding the value. While
one key benefit of ZJIT is that it can optimize entire methods, load-store
optimization is (for now) block-local only.</p>

<h5 id="loadfield-and-storefield-distinctions">LoadField and StoreField Distinctions</h5>
<p>So far, we’ve talked about elision and instruction removal. We can get away with
deleting <code class="language-plaintext highlighter-rouge">StoreField</code> instructions because no other instructions point to
<code class="language-plaintext highlighter-rouge">StoreField</code> instructions. Conversely, <code class="language-plaintext highlighter-rouge">LoadField</code> instructions <em>do</em> have
dependencies and are referenced by other instructions. These references need to
be fixed up. Each reference to <code class="language-plaintext highlighter-rouge">LoadField</code> gets replaced with the cached value
that was the target of a load.</p>

<h5 id="the-writebarrier-instruction">The WriteBarrier Instruction</h5>
<p>ZJIT has <code class="language-plaintext highlighter-rouge">WriteBarrier</code> instructions to support garbage collection. These also
can modify objects and act similarly to stores. We need to handle this case in
our algorithm.</p>

<h5 id="pointer-intricacies">Pointer Intricacies</h5>
<p>The pseudo code we are about to introduce uses the term “offset” to denote the
number of bytes from the object’s base address in memory. We use this to
detect redundant loads and stores, as well as clear the cache from effectful
instructions and write barriers. However, it is not immediately obvious that
simply checking offsets would be enough. How can we be sure that the memory
regions we are tracking remain untouched by some other instruction? Fortunately,
HIR instructions <em>always</em> point to the base of an object and use offsets that
are in bounds of the object. If we have two offsets that are not equal, they
cannot reference the same region of memory. If the offsets are equal, then
object aliasing must be considered.</p>

<h4 id="algorithm-sketch">Algorithm Sketch</h4>
<p>Here’s the pseudo-code for a given basic block.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>For each HIR instruction in the basic block
    initialize an empty cache as a hashmap
    
    if instruction is `LoadField`
        check if the object, offset, and value triple is in the cache
        if so, delete instruction and replace references to it with the loaded value
        else, cache the loaded value with the object, offset pair as a key
        
    if instruction is `StoreField`
        check if the object, offset, and value triple is in the cache
        if so, delete the instruction
        else, remove each cache entry with the same offset (the flags field) to avoid aliasing issues
        
    if instruction is `WriteBarrier`
        # This instruction is needed for the garbage collector and is complex
        # It works similarly to `StoreField` in practice
        # This instruction is never removed but the cache cleaning is still needed
        remove each cache entry with the same offset to avoid aliasing issues
        
    if instruction can modify objects
        flush the cache
        
    else
        continue
          
return the pruned HIR instructions
</code></pre></div></div>

<h4 id="source-code">Source Code</h4>
<p>The source at the time of this writing can be found <a href="https://github.com/ruby/ruby/blob/a47827c854fe94b2a582e994c0ea2ff239439267/zjit/src/hir.rs#L4952">here</a>.</p>

<h3 id="hir-improvements">HIR Improvements</h3>
<p>After the optimization, here are examples of how the HIR changes.</p>

<p>This the new HIR for our first redundant load example.</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  fn initialize@../scripts/double_load.rb:3:
  bb1():
    EntryPoint interpreter
    v1:BasicObject = LoadSelf
    v2:NilClass = Const Value(nil)
    Jump bb3(v1, v2)
  bb2():
    EntryPoint JIT(0)
    v5:BasicObject = LoadArg :self@0
    v6:NilClass = Const Value(nil)
    Jump bb3(v5, v6)
  bb3(v8:BasicObject, v9:NilClass):
    v13:Fixnum[1] = Const Value(1)
    PatchPoint SingleRactorMode
    v30:HeapBasicObject = GuardType v8, HeapBasicObject
    v31:CShape = LoadField v30, :_shape_id@0x4
    v32:CShape[0x80000] = GuardBitEquals v31, CShape(0x80000)
    StoreField v30, :@a@0x10, v13
    WriteBarrier v30, v13
    v35:CShape[0x80008] = Const CShape(0x80008)
    StoreField v30, :_shape_id@0x4, v35
<span class="gd">-   v20:HeapBasicObject = RefineType v8, HeapBasicObject
</span>    PatchPoint SingleRactorMode
<span class="gd">-   v38:CShape = LoadField v20, :_shape_id@0x4
-   v39:CShape[0x80008] = GuardBitEquals v38, CShape(0x80008)
-   v40:BasicObject = LoadField v20, :@a@0x10
</span>    CheckInterrupts
<span class="gd">-   Return v40
</span><span class="gi">+   Return v13
</span></code></pre></div></div>
<p>This the new HIR for our first redundant store example.</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">bb1():
</span>  EntryPoint interpreter
  v1:BasicObject = LoadSelf
  v2:NilClass = Const Value(nil)
  Jump bb3(v1, v2)
<span class="p">bb2():
</span>  EntryPoint JIT(0)
  v5:BasicObject = LoadArg :self@0
  v6:NilClass = Const Value(nil)
  Jump bb3(v5, v6)
<span class="p">bb3(v8:BasicObject, v9:NilClass):
</span>  v13:Fixnum[1] = Const Value(1)
  PatchPoint SingleRactorMode
  v35:HeapBasicObject = GuardType v8, HeapBasicObject
  v36:CShape = LoadField v35, :_shape_id@0x4
  v37:CShape[0x80000] = GuardBitEquals v36, CShape(0x80000)
  StoreField v35, :@a@0x10, v13
  WriteBarrier v35, v13
  v40:CShape[0x80008] = Const CShape(0x80008)
  StoreField v35, :_shape_id@0x4, v40
  v20:HeapBasicObject = RefineType v8, HeapBasicObject
  PatchPoint NoEPEscape(initialize)
  PatchPoint SingleRactorMode
<span class="gd">- v43:CShape = LoadField v20, :_shape_id@0x4
- v44:CShape[0x80008] = GuardBitEquals v43, CShape(0x80008)
- StoreField v20, :@a@0x10, v13
</span>  WriteBarrier v20, v13
  CheckInterrupts
  Return v13
</code></pre></div></div>

<p>And that’s load-store optimization!</p>

<h3 id="design-discussion">Design Discussion</h3>
<p>You may notice that our optimization is pruning the graph of loads and stores
on an object. We are solving a very similar problem to the SSA form baked into
the HIR. While it would be great to have “more SSA” at the object level, this
comes at a cost. Computing SSA at this level could necessitate structural
changes to HIR and make things less ergonomic or more confusing in regions of
the codebase outside of load-store optimization. In fact, this question of “more
SSA” is a complex design decision and contentious topic with a
<a href="https://en.wikipedia.org/wiki/Sea_of_nodes">rich</a> <a href="https://www.jikesrvm.org/JavaDoc/org/jikesrvm/compilers/opt/ssa/SSA.html">history</a> in compilers such as V8 or Jikes
RVM. So far, we’ve decided to use a lightweight SSA representation in ZJIT that
causes us to work a bit harder for certain optimization passes, yielding subtle
design simplifications across the rest of HIR.</p>

<h2 id="future-work">Future Work</h2>

<p>There’s still a lot of exciting work to be done and there are improvements to
be made before we hit diminishing returns. Dead store elimination utilizes many
of the same ideas and could help improve object initialization performance. We
could implement <a href="https://bernsteinbear.com/blog/toy-tbaa/">type based alias analysis</a>, though this
requires care, as <a href="https://phrack.org/issues/70/9#article">type confusion bugs</a> are quite
dangerous in JIT compilers. See section 4.1 in the phrack article for further
details.</p>

<h2 id="conclusion">Conclusion</h2>
<p>Thanks for reading the first post about ZJIT’s optimizer. We have lots more to
come, so stay tuned.</p>
</body></html>]]></content><author><name>Jacob Denbeaux</name></author><category term="posts" /><category term="2026-03-18-how-zjit-removes-redundant-object-loads-and-stores" /><summary type="html"><![CDATA[ZJIT's optimizer now removes redundant object loads and stores, improving JIT performance of CRuby's shape system. This post explains how the optimization works.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2026-03-18-how-zjit-removes-redundant-object-loads-and-stores/a8af7ce28b60c651dd883d4404993de7f3eb7c3d.png" /><media:content medium="image" url="https://railsatscale.com/2026-03-18-how-zjit-removes-redundant-object-loads-and-stores/a8af7ce28b60c651dd883d4404993de7f3eb7c3d.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Faster bundler</title><link href="https://railsatscale.com/2026-03-09-faster-bundler/" rel="alternate" type="text/html" title="Faster bundler" /><published>2026-03-09T00:00:00+00:00</published><updated>2026-03-09T00:00:00+00:00</updated><id>https://railsatscale.com/2026-03-09-faster-bundler/</id><content type="html" xml:base="https://railsatscale.com/2026-03-09-faster-bundler/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>At Shopify, we want our development environments to be fast.
Installing dependencies is slow, especially in an application as large as Shopify. <code class="language-plaintext highlighter-rouge">bun</code> and <code class="language-plaintext highlighter-rouge">uv</code> have dramatically
improved install times for TypeScript and Python dependencies. What if we could do the same for Bundler and the Ruby community?</p>

<p>Our team at Shopify has been working on a series of improvements to Bundler and RubyGems.
Bundler <strong>downloads gems up to 200% faster. Cloning git gems is now 3x faster</strong> in our monolith.</p>

<p>We were also able to <strong>decrease the overall <code class="language-plaintext highlighter-rouge">bundle install</code> time by 3.5x</strong> in one of our applications
by precompiling gems thanks to <a href="https://github.com/shopify/cibuildgem">cibuildgem</a>, a new
precompilation toolchain we’d love you to try!</p>

<p>Here’s an overview of the improvements we’ve made in the last few months:</p>

<h2 id="faster-gem-downloads">Faster gem downloads</h2>

<p>One impactful change was deceptively simple. Bundler’s HTTP fetcher had a connection pool size of 1.
This meant that during parallel gems installation, every thread was fighting over a single HTTP connection.</p>

<figure><img src="connection-pool.png" alt="A profile that shows the threads waiting for the only connection to be available"><figcaption>A profile that shows the threads waiting for the only connection to be available</figcaption></figure>
<blockquote>
  <p>Pink spikes are threads waiting for the connection to be available.</p>
</blockquote>

<p>By increasing the pool of HTTP connections, Bundler can download more gems in parallel.
The speed gain with this change is even more dramatic during peak hours when RubyGems.org is under heavy load,
or when you are geographically far from the CDN, where latency amplifies the cost of waiting on a single connection.</p>

<p>To benchmark this change, we opted to only measure download and extraction time (no compilation of native extensions)
and built a local gem server where we can control latency at our will.</p>

<p>This is the result in a freshly generated Rails application when all gems are served with a 100ms latency.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Scenario: rails (164 gems)
                             Cold     +/-                        Warm     +/-
  ------------------------------------------------------------------------------
  5 HTTP connections       5.86s   0.10s  baseline             4.17s   0.02s  baseline
  1 HTTP connection       19.80s   0.02s  237.6% slower        4.16s   0.02s  0.3% faster
</code></pre></div></div>

<h2 id="hotspots-and-optimizations">Hotspots and optimizations</h2>

<p>We regularly profile Bundler with different Gemfiles and identify hotspots we might optimize.
While no single optimization is dramatic, their collective impact has been significant.</p>

<p>One such optimization involves gem installation. A <code class="language-plaintext highlighter-rouge">.gem</code> file is a compressed tarball — and gzip has a built-in
integrity check: if decompression succeeds, the content is guaranteed to be intact. Despite this, RubyGems was
walking every entry in the tarball and reading all bytes upfront as an explicit corruption check, before proceeding with
installation. This redundant verification step was thrown away entirely, since a successful decompression already
provides the same guarantee.</p>

<figure><img src="verify-gz.png" alt="A profile that shows the time spent verifying the tarball's content"><figcaption>A profile that shows the time spent verifying the tarball's content</figcaption></figure>
<blockquote>
  <p>9-17% of the time installing a gem is spent verifying the tarball’s content.</p>
</blockquote>

<p>Another hotspot during installation is the check RubyGems performs to determine whether a gem
includes a RubyGems plugin and, if so, whether its plugin file needs to be regenerated. The vast majority of gems
don’t include a RubyGems plugin, yet every gem pays the cost of a <code class="language-plaintext highlighter-rouge">Dir.glob</code> with an expensive pattern just to
handle the small minority that do.</p>

<p>It turns out that unconditionally regenerating the plugin file is faster than performing this upfront check.</p>

<figure><img src="verify-plugin.png" alt="A profile that shows how frequently Bundler is spending time checking whether checking whether regenerating a gem plugin is required"><figcaption>A profile that shows how frequently Bundler is spending time checking whether checking whether regenerating a gem plugin is required</figcaption></figure>
<blockquote>
  <p>Bundler checking whether regenerating a gem plugin is required</p>
</blockquote>

<h2 id="parallel-git-clones">Parallel git clones</h2>

<p>Many Rails applications depend on gems sourced directly from git repositories. This is particularly useful if a gem
has upstream changes that aren’t yet released. Previously, Bundler would fetch each git repository sequentially,
even though there’s no technical limitation on fetching them all at once.</p>

<p>Shopify’s Core Rails monolith includes 33 git gems. After introducing this change to parallelize <code class="language-plaintext highlighter-rouge">git clone</code>,
we saw a 3x performance improvement for fetching git gems.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Bundler 2.7.2</th>
      <th>Bundler 4.0.7</th>
      <th>Performance improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Fetching 33 git gems</td>
      <td>121.57s</td>
      <td>38.75s</td>
      <td>68% faster</td>
    </tr>
  </tbody>
</table>

<h2 id="native-extensions">Native extensions</h2>

<p>By far the biggest bottleneck when running <code class="language-plaintext highlighter-rouge">bundle install</code> is the compilation of native extensions.
Many gems in the Ruby ecosystem include C code that must be compiled on each developer’s machine when installed.
Common examples are <code class="language-plaintext highlighter-rouge">json</code>, <code class="language-plaintext highlighter-rouge">date</code>, and <code class="language-plaintext highlighter-rouge">bigdecimal</code>.
Even if your Gemfile doesn’t directly depend on native extensions, it’s likely they will be included in your
<code class="language-plaintext highlighter-rouge">Gemfile.lock</code> as transitive dependencies.</p>

<figure><img src="build-extension.png" alt="A profile that shows the time spent compiling a gem"><figcaption>A profile that shows the time spent compiling a gem</figcaption></figure>
<blockquote>
  <p>An installer thread spending 92% of the time compiling the gem.</p>
</blockquote>

<p>To illustrate how slow compilation is, we can run <code class="language-plaintext highlighter-rouge">bundle install</code> on a freshly generated Rails application.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Total number of gems</strong></td>
      <td>126</td>
    </tr>
    <tr>
      <td><strong>Gems with native extensions</strong></td>
      <td>18</td>
    </tr>
    <tr>
      <td>
<strong>Time to <code class="language-plaintext highlighter-rouge">bundle install</code></strong><sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>
</td>
      <td>~13 seconds</td>
    </tr>
    <tr>
      <td><strong>Time to <code class="language-plaintext highlighter-rouge">bundle install</code> (without compilation)</strong></td>
      <td>~2 seconds (15%)</td>
    </tr>
    <tr>
      <td><strong>Time to <code class="language-plaintext highlighter-rouge">bundle install</code> (only native extensions)</strong></td>
      <td>~11 seconds (85%)</td>
    </tr>
  </tbody>
</table>

<p>Installing the 18 native extension gems accounts for 85% of the time spent running <code class="language-plaintext highlighter-rouge">bundle install</code>.</p>

<h2 id="precompiled-gems">Precompiled gems</h2>

<p>Remember when Nokogiri used to take forever to install? Those days are behind us thanks to the amazing work of its
maintainer, Mike Dalessio. Mike updated the gem’s publishing pipeline to precompile its native extensions into
platform-specific binaries and releases separate gems for each supported platform (macOS, Windows, Linux). Now Nokogiri
installs as fast as pure Ruby gems.</p>

<p>Imagine if we extended this to the rest of the Ruby ecosystem. <strong>If the community works together</strong> to ship precompiled
binaries for our most popular native-extension gems, everyone will benefit from a lightning-fast <code class="language-plaintext highlighter-rouge">bundle install</code>.</p>

<p>One way to build binary gems is with the popular <a href="https://github.com/rake-compiler/rake-compiler-dock">Rake-compiler-dock</a>
toolchain, which provides a cross-compilation environment and allows compilation to run inside Docker containers.
However, cross-compiling can be brittle and presents hard-to-debug issues. Compiling on the target platform is
ultimately far more reliable.</p>

<p>Many CI providers now offer free access to cloud machines. GitHub Actions, for example, is widely popular, and the
Ruby community has built many easy-to-use actions around it (e.g., <code class="language-plaintext highlighter-rouge">ruby/setup-ruby</code>). Could we apply the same approach
and leverage those machines to natively compile binary gems?</p>

<h2 id="introducing-cibuildgem">Introducing cibuildgem</h2>

<p>At Shopify, we wanted to build an easy-to-use tool to help developers release gems with precompiled binaries using
a native compilation approach via GitHub Workflows.</p>

<p><a href="https://github.com/Shopify/cibuildgem">cibuildgem</a> lets you generate a standard GitHub Actions workflow.
Once triggered, multiple jobs run to:</p>

<ol>
  <li>Compile the binaries and package the gems</li>
  <li>Run a matrix of test suites</li>
  <li>Verify the <code class="language-plaintext highlighter-rouge">.gem</code> files are not corrupted and installable</li>
  <li>Release the gems to RubyGems.org</li>
</ol>

<figure><img src="cibuildgem.png" alt="A screenshot of the GitHub workflow when cibuildgem is triggered"><figcaption>A screenshot of the GitHub workflow when cibuildgem is triggered</figcaption></figure>
<blockquote>
  <p>Releasing a binary gem with cibuildgem</p>
</blockquote>

<p>We aimed to make cibuildgem easy and fast to set up. Since many gems with native extensions are already configured with
Rake Compiler for development compilation, we chose to piggyback on that so cibuildgem can run without any extra
configuration for most gems.</p>

<p>The workflow generated by cibuildgem is intentionally standard.</p>
<ul>
  <li>Want to compile your gem on Linux AArch64? Add it to the
matrix.</li>
  <li>Want to trigger the workflow automatically when pushing a new
git tag? No problem — tweak it to your liking.</li>
</ul>

<p>We also wanted to ensure that the binaries compiled by cibuildgem would work in a macOS development environment and a
Linux production environment on a real Rails application.</p>

<p>As an <strong>experiment</strong>, we used <a href="https://github.com/shopify/cibuildgem">cibuildgem</a> to compile dozens of open-source gems and
publish them under a “namespace” on RubyGems.org (e.g., <code class="language-plaintext highlighter-rouge">sassc</code> -&gt; <code class="language-plaintext highlighter-rouge">precompiled-sassc</code>).</p>

<p>The goal was to see how much performance improvement we could get with precompiled binaries. To test this, we created a
<a href="https://github.com/shopify/precompiled_gems">Bundler plugin</a> that hijacked the Bundler resolver to download the gems
with precompiled binaries we had just published. For example, it would force-install <code class="language-plaintext highlighter-rouge">precompiled-json</code> if the <code class="language-plaintext highlighter-rouge">json</code>
gem was requested anywhere in the dependency tree.</p>

<p>We tested this and deployed on an internal application which included 235 gems. By precompiling 17 of them,
we saw a 3.5x performance improvement.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Without precompiled binaries</th>
      <th>With some precompiled binaries</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">bundle install</code></td>
      <td>24.2s</td>
      <td>7.0s (3.5x faster)<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>
</td>
    </tr>
  </tbody>
</table>

<p>This experiment demonstrates how much faster <code class="language-plaintext highlighter-rouge">bundle install</code> could be when gems are precompiled.
It has also given us confidence that cibuildgem builds compatible binaries for macOS and Linux.</p>

<p>In fact, a few gems at Shopify are now released with precompiled binaries (<a href="https://rubygems.org/gems/stack_frames">stack_frames</a>,
<a href="https://rubygems.org/gems/heap-profiler">heap_profiler</a>, <a href="https://rubygems.org/gems/rubydex">rubydex</a>) thanks to
cibuildgem.</p>

<p>If you maintain a gem with a native extension, we’d love for you to <a href="https://github.com/shopify/cibuildgem">give it a try</a>
and share your feedback ❤️!</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Network speed and computation power affects those results. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
    </li>
    <li id="fn:2">
      <p>5 gems are still being compiled, we could decrease install time even further. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
    </li>
  </ol>
</div>
</body></html>]]></content><author><name>[&quot;Edouard Chin&quot;, &quot;Eileen Alayce&quot;]</name></author><category term="posts" /><category term="2026-03-09-faster-bundler" /><summary type="html"><![CDATA[How Shopify contributed a series of improvements to Bundler and RubyGems to make gem installation significantly faster.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2026-03-09-faster-bundler/5e11197b5102f0dbffc178dbfb80e6c5a543d578.png" /><media:content medium="image" url="https://railsatscale.com/2026-03-09-faster-bundler/5e11197b5102f0dbffc178dbfb80e6c5a543d578.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">ZJIT is now available in Ruby 4.0</title><link href="https://railsatscale.com/2025-12-24-launch-zjit/" rel="alternate" type="text/html" title="ZJIT is now available in Ruby 4.0" /><published>2025-12-24T00:00:00+00:00</published><updated>2025-12-24T00:00:00+00:00</updated><id>https://railsatscale.com/2025-12-24-launch-zjit/</id><content type="html" xml:base="https://railsatscale.com/2025-12-24-launch-zjit/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>ZJIT is a new just-in-time (JIT) Ruby compiler built into the reference Ruby
implementation, <a href="https://en.wikipedia.org/wiki/YARV">YARV</a>, by the same compiler group that brought you YJIT.
We (Aaron Patterson, Aiden Fox Ivey, Alan Wu, Jacob Denbeaux, Kevin Menard, Max
Bernstein, Maxime Chevalier-Boisvert, Randy Stauner, Stan Lo, and Takashi
Kokubun) have been working on ZJIT since the beginning of this year.</p>

<p>In case you missed the last post, we’re building a new compiler for Ruby
because we want to both raise the performance ceiling (bigger compilation unit
size and SSA IR) and encourage more outside contribution (by becoming a more
traditional method compiler).</p>

<p>It’s been a long time since we gave an official update on ZJIT. Things are
going well. We’re excited to share our progress with you. We’ve done a lot
<a href="/2025-05-14-merge-zjit/">since May</a>.</p>

<h2 id="in-brief">In brief</h2>

<p>ZJIT is compiled by default—but not enabled by default—in Ruby 4.0. Enable
it by passing the <code class="language-plaintext highlighter-rouge">--zjit</code> flag or the <code class="language-plaintext highlighter-rouge">RUBY_ZJIT_ENABLE</code> environment variable
or calling <code class="language-plaintext highlighter-rouge">RubyVM::ZJIT.enable</code> after starting your application.</p>

<p>It’s faster than the interpreter, but not yet as fast as YJIT. <strong>Yet.</strong> But we
have a plan, and we have some more specific numbers below. The TL;DR is we have
a great new foundation and now need to pull out all the Ruby-specific stops to
match YJIT.</p>

<p>We encourage you to experiment with ZJIT, but maybe hold off on deploying it in
production for now. This is a very new compiler. You should expect crashes and
wild performance degradations (or, perhaps, improvements). Please test locally,
try to run CI, etc, and let us know what you run into on <a href="https://bugs.ruby-lang.org/projects/ruby-master/issues?set_filter=1&amp;tracker_id=1">the Ruby issue
tracker</a> (or, if you don’t want to make a Ruby Bugs account, we would
also take reports <a href="https://github.com/Shopify/ruby/issues">on GitHub</a>).</p>

<h2 id="state-of-the-compiler">State of the compiler</h2>

<p>To underscore how much has happened since the <a href="/2025-05-14-merge-zjit/">announcement of being merged
into CRuby</a>, we present to you a series of comparisons:</p>

<h3 id="side-exits">Side-exits</h3>

<p>Back in May, we could not side-exit from JIT code into the interpreter. This
meant that the code we were running had to continue to have the same
preconditions (expected types, no method redefinitions, etc) or the JIT would
safely abort. <strong>Now,</strong> we can side-exit and use this feature liberally.</p>

<blockquote>
  <p>For example, we gracefully handle the phase transition from integer to string;
a guard instruction fails and transfers control to the interpreter.</p>

  <div class="language-ruby highlighter-rouge">
<div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">add</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span>
  <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">end</span>

<span class="n">add</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span>
<span class="n">add</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span>
<span class="n">add</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span>
<span class="n">add</span> <span class="s2">"three"</span><span class="p">,</span> <span class="s2">"four"</span>
</code></pre></div>  </div>
</blockquote>

<p>This enables running a lot more code!</p>

<h3 id="more-code">More code</h3>

<p>Back in May, we could only run a handful of small benchmarks. <strong>Now,</strong> we can
run all sorts of code, including passing the full Ruby test suite, the test
suite and shadow traffic of a large application at Shopify, and the test suite
of GitHub.com! Also a bank, apparently.</p>

<p>Back in May, we did not optimize much; we only really optimized operations
on fixnums (small integers) and method sends to the <code class="language-plaintext highlighter-rouge">main</code> object. <strong>Now,</strong>
we optimize a lot more: all sorts of method sends, instance variable reads
and writes, attribute accessor/reader/writer use, struct reads and writes,
object allocations, certain string operations, optional parameters, and more.</p>

<blockquote>
  <p>For example, we can <a href="https://en.wikipedia.org/wiki/Constant_folding">constant-fold</a> numeric operations. Because we also have a
(small, limited) inliner borrowed from YJIT, we can constant-fold the entirety
of <code class="language-plaintext highlighter-rouge">add</code> down to <code class="language-plaintext highlighter-rouge">3</code>—and still handle redefinitions of <code class="language-plaintext highlighter-rouge">one</code>, <code class="language-plaintext highlighter-rouge">two</code>,
<code class="language-plaintext highlighter-rouge">Integer#+</code>, …</p>

  <div class="language-ruby highlighter-rouge">
<div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">one</span>
  <span class="mi">1</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">two</span>
  <span class="mi">2</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">add</span>
  <span class="n">one</span> <span class="o">+</span> <span class="n">two</span>
<span class="k">end</span>
</code></pre></div>  </div>
</blockquote>

<h3 id="register-spilling">Register spilling</h3>

<p>Back in May, we could not compile many large functions due to limitations of
our backend that we borrowed from YJIT. <strong>Now,</strong> we can compile absolutely
enormous functions just fine. And quickly, too. Though we have not been
focusing specifically on compiler performance, we compile even large methods in
under a millisecond.</p>

<h3 id="c-methods">C methods</h3>

<p>Back in May, we could not even optimize calls to built-in C methods. <strong>Now,</strong>
we have a feature similar to JavaScriptCore’s DOMJIT, which allows us to emit
inline HIR versions of certain well-known C methods. This allows the optimizer
to reason about these methods and their effects (more on this in a future post)
much more… er, effectively.</p>

<blockquote>
  <p>For example, <code class="language-plaintext highlighter-rouge">Integer#succ</code>, which is defined as adding <code class="language-plaintext highlighter-rouge">1</code> to an integer, is a
C method. It’s used in <code class="language-plaintext highlighter-rouge">Integer#times</code> to drive the <code class="language-plaintext highlighter-rouge">while</code> loop. Instead of
emitting a call to it, our C method “inliner” can emit our existing <code class="language-plaintext highlighter-rouge">FixnumAdd</code>
instruction and take advantage of the rest of the type inference and
constant-folding.</p>

  <div class="language-rust highlighter-rouge">
<div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">inline_integer_succ</span><span class="p">(</span><span class="n">fun</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">hir</span><span class="p">::</span><span class="n">Function</span><span class="p">,</span>
                       <span class="n">block</span><span class="p">:</span> <span class="nn">hir</span><span class="p">::</span><span class="n">BlockId</span><span class="p">,</span>
                       <span class="n">recv</span><span class="p">:</span> <span class="nn">hir</span><span class="p">::</span><span class="n">InsnId</span><span class="p">,</span>
                       <span class="n">args</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nn">hir</span><span class="p">::</span><span class="n">InsnId</span><span class="p">],</span>
                       <span class="n">state</span><span class="p">:</span> <span class="nn">hir</span><span class="p">::</span><span class="n">InsnId</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nn">hir</span><span class="p">::</span><span class="n">InsnId</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">if</span> <span class="o">!</span><span class="n">args</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">None</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">if</span> <span class="n">fun</span><span class="nf">.likely_a</span><span class="p">(</span><span class="n">recv</span><span class="p">,</span> <span class="nn">types</span><span class="p">::</span><span class="n">Fixnum</span><span class="p">,</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">left</span> <span class="o">=</span> <span class="n">fun</span><span class="nf">.coerce_to</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="n">recv</span><span class="p">,</span> <span class="nn">types</span><span class="p">::</span><span class="n">Fixnum</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">right</span> <span class="o">=</span> <span class="n">fun</span><span class="nf">.push_insn</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="nn">hir</span><span class="p">::</span><span class="nn">Insn</span><span class="p">::</span><span class="n">Const</span> <span class="p">{</span> <span class="n">val</span><span class="p">:</span> <span class="nn">hir</span><span class="p">::</span><span class="nn">Const</span><span class="p">::</span><span class="nf">Value</span><span class="p">(</span><span class="nn">VALUE</span><span class="p">::</span><span class="nf">fixnum_from_usize</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="p">});</span>
        <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="n">fun</span><span class="nf">.push_insn</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="nn">hir</span><span class="p">::</span><span class="nn">Insn</span><span class="p">::</span><span class="n">FixnumAdd</span> <span class="p">{</span> <span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">,</span> <span class="n">state</span> <span class="p">});</span>
        <span class="k">return</span> <span class="nf">Some</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="nb">None</span>
<span class="p">}</span>
</code></pre></div>  </div>
</blockquote>

<h3 id="fewer-c-calls">Fewer C calls</h3>

<p>Back in May, the machine code ZJIT generated called a lot of C functions from
the CRuby runtime to implement our HIR instructions in LIR. We have pared this
down significantly and now “open code” the implementations in LIR.</p>

<blockquote>
  <p>For example, <code class="language-plaintext highlighter-rouge">GuardNotFrozen</code> used to call out to <code class="language-plaintext highlighter-rouge">rb_obj_frozen_p</code>. Now, it
requires that its input is a heap-allocated object and can instead do a load, a
test, and a conditional jump.</p>

  <div class="language-rust highlighter-rouge">
<div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">gen_guard_not_frozen</span><span class="p">(</span><span class="n">jit</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">JITState</span><span class="p">,</span>
                        <span class="n">asm</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Assembler</span><span class="p">,</span>
                        <span class="n">recv</span><span class="p">:</span> <span class="n">Opnd</span><span class="p">,</span>
                        <span class="n">state</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">FrameState</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Opnd</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">recv</span> <span class="o">=</span> <span class="n">asm</span><span class="nf">.load</span><span class="p">(</span><span class="n">recv</span><span class="p">);</span>
    <span class="c1">// It's a heap object, so check the frozen flag</span>
    <span class="k">let</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">asm</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Opnd</span><span class="p">::</span><span class="nf">mem</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">recv</span><span class="p">,</span> <span class="n">RUBY_OFFSET_RBASIC_FLAGS</span><span class="p">));</span>
    <span class="n">asm</span><span class="nf">.test</span><span class="p">(</span><span class="n">flags</span><span class="p">,</span> <span class="p">(</span><span class="n">RUBY_FL_FREEZE</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">)</span><span class="nf">.into</span><span class="p">());</span>
    <span class="c1">// Side-exit if frozen</span>
    <span class="n">asm</span><span class="nf">.jnz</span><span class="p">(</span><span class="nf">side_exit</span><span class="p">(</span><span class="n">jit</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">GuardNotFrozen</span><span class="p">));</span>
    <span class="n">recv</span>
<span class="p">}</span>
</code></pre></div>  </div>
</blockquote>

<h3 id="more-teammates">More teammates</h3>

<p>Back in May, we had four people working full-time on the compiler. <strong>Now,</strong> we
have more internally at Shopify—and also more from the community! We have
had several interested people reach out, learn about ZJIT, and successfully
land complex changes. For this reason, we have opened up <a href="https://zjit.zulipchat.com">a chat
room</a> to discuss and improve ZJIT.</p>

<h3 id="a-cool-graph-visualization-tool">A cool graph visualization tool</h3>

<p>You <em>have to</em> check out our intern Aiden’s <a href="/2025-11-19-adding-iongraph-support/">integration of Iongraph into
ZJIT</a>. Now we have clickable, zoomable,
scrollable graphs of all our functions and all our optimization passes. It’s
great!</p>

<p>Try zooming (Ctrl-scroll), clicking the different optimization passes on the
left, clicking the instruction IDs in each basic block (definitions and uses),
and seeing how the IR for the below Ruby code changes over time.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Point</span>
  <span class="nb">attr_accessor</span> <span class="ss">:x</span><span class="p">,</span> <span class="ss">:y</span>
  <span class="k">def</span> <span class="nf">initialize</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span>
    <span class="vi">@x</span> <span class="o">=</span> <span class="n">x</span>
    <span class="vi">@y</span> <span class="o">=</span> <span class="n">y</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">P</span> <span class="o">=</span> <span class="no">Point</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">).</span><span class="nf">freeze</span>

<span class="k">def</span> <span class="nf">test</span> <span class="o">=</span> <span class="no">P</span><span class="p">.</span><span class="nf">x</span> <span class="o">+</span> <span class="no">P</span><span class="p">.</span><span class="nf">y</span>
</code></pre></div></div>

<iframe title="Iongraph Viewer" aria-label="Interactive compiler graph visualization" src="viewer.html" width="100%" height="400"></iframe>

<h3 id="more">More</h3>

<p>…and so, so many garbage collection fixes.</p>

<p>There’s still a lot to do, though.</p>

<h2 id="to-do">To do</h2>

<p>We’re going to optimize <code class="language-plaintext highlighter-rouge">invokeblock</code> (<code class="language-plaintext highlighter-rouge">yield</code>) and <code class="language-plaintext highlighter-rouge">invokesuper</code> (<code class="language-plaintext highlighter-rouge">super</code>)
instructions, each of which behaves similarly, but not identically, to a
normal <code class="language-plaintext highlighter-rouge">send</code> instruction. These are pretty common.</p>

<p>We’re going to optimize <code class="language-plaintext highlighter-rouge">setinstancevariable</code> in the case where we have to
transition the object’s shape. This will help normal <code class="language-plaintext highlighter-rouge">@a = b</code> situations. It
will also help <code class="language-plaintext highlighter-rouge">@a ||= b</code>, but I think we can even do better with the latter
using some kind of value numbering.</p>

<p>We only optimize monomorphic calls right now—cases where a method send only
sees one class of receiver while being profiled. We’re going to optimize
polymorphic sends, too. Right now we’re laying the groundwork (a new register
allocator; see below) to make this much easier. It’s not as much of an
immediate focus, though, because most (high 80s, low 90s percent) of sends are
monomorphic. <!-- TODO throwback to Smalltalk-80 --></p>

<p>We’re in the middle of re-writing the register allocator after reading the
entire history of linear scan papers and several implementations. That will
unlock performance improvements and also allow us to make the IRs easier to
use.</p>

<p>We don’t handle phase changes particularly well yet; if your method call
patterns change significantly after your code has been compiled, we will
frequently side-exit into the interpreter. Instead, we would like to use these
side-exits as additional profile information and re-compile the function.</p>

<p>Right now we have a lot of traffic to the VM frame. JIT frame pushes are
reasonably fast, but with every effectful operation, we have to flush our local
variable state and stack state to the VM frame. The instances in which code
might want to read this reified frame state are rare: frame unwinding due to
exceptions, <code class="language-plaintext highlighter-rouge">Binding#local_variable_get</code>, etc. In the future, we will instead
defer writing this state until it needs to be read.</p>

<p>We only have a limited inliner that inlines constants, <code class="language-plaintext highlighter-rouge">self</code>, and parameters.
In the fullness of time, we will add a general-purpose method inlining
facility. This will allow us to reduce the amount of polymorphic sends, do some
branch folding, and reduce the amount of method sends.</p>

<p>We only support optimizing positional parameters, required keyword parameters,
and optional parameters right now but we will work on optimizing optional
keyword arguments as well. Most of this work is in marshaling the complex
Ruby calling convention into one coherent form that the JIT can understand.</p>

<h2 id="performance">Performance</h2>

<p>We have public performance numbers for a selection of macro- and
micro-benchmarks on <a href="https://rubybench.github.io/">rubybench</a>. Here is a screenshot of what those
per-benchmark graphs look like. The Y axis is speedup multiplier vs the
interpreter and the X axis is time. Higher is better:</p>

<figure><img src="benchmark.png" alt="A line chart of ZJIT performance on railsbench improving over time, passing
interpreter performance, catching up to YJIT"><figcaption>A line chart of ZJIT performance on railsbench improving over time, passing
interpreter performance, catching up to YJIT</figcaption></figure>

<p>You can see that we are improving performance on nearly all benchmarks over
time. Some of this comes from from optimizing in a similar way as YJIT does
today (e.g. specializing ivar reads and writes), and some of it is optimizing
in a way that takes advantage of ZJIT’s high-level IR (e.g. constant folding,
branch folding, more precise type inference).</p>

<p>We are using both raw time numbers and also our internal performance counters
(e.g. number of calls to C functions from generated code) to drive
optimization.</p>

<h2 id="try-it-out">Try it out</h2>

<p>While Ruby now ships with ZJIT compiled into the binary by default, it is not
<em>enabled</em> by default at run-time. Due to performance and stability, YJIT is
still the default compiler choice in Ruby 4.0.</p>

<p>If you want to run your test suite with ZJIT to see what happens, you
absolutely can. Enable it by passing the <code class="language-plaintext highlighter-rouge">--zjit</code> flag or the
<code class="language-plaintext highlighter-rouge">RUBY_ZJIT_ENABLE</code> environment variable or calling <code class="language-plaintext highlighter-rouge">RubyVM::ZJIT.enable</code> after
starting your application.</p>

<h2 id="on-yjit">On YJIT</h2>

<p>We devoted a lot of our resources this year to developing ZJIT. While we did
not spend much time on YJIT (outside of a great <a href="/2025-05-21-fast-allocations-in-ruby-3-5/">allocation speed
up</a>), YJIT isn’t going anywhere soon.</p>

<h2 id="thank-you">Thank you</h2>

<p>This compiler was made possible by contributions to your <del>PBS station</del> open
source project from programmers like you. Thank you!</p>

<ul>
  <li>Aaron Patterson</li>
  <li>Abrar Habib</li>
  <li>Aiden Fox Ivey</li>
  <li>Alan Wu</li>
  <li>Alex Rocha</li>
  <li>André Luiz Tiago Soares</li>
  <li>Benoit Daloze</li>
  <li>Charlotte Wen</li>
  <li>Daniel Colson</li>
  <li>Donghee Na</li>
  <li>Eileen Uchitelle</li>
  <li>Étienne Barrié</li>
  <li>Godfrey Chan</li>
  <li>Goshanraj Govindaraj</li>
  <li>Hiroshi SHIBATA</li>
  <li>Hoa Nguyen</li>
  <li>Jacob Denbeaux</li>
  <li>Jean Boussier</li>
  <li>Jeremy Evans</li>
  <li>John Hawthorn</li>
  <li>Ken Jin</li>
  <li>Kevin Menard</li>
  <li>Max Bernstein</li>
  <li>Max Leopold</li>
  <li>Maxime Chevalier-Boisvert</li>
  <li>Nobuyoshi Nakada</li>
  <li>Peter Zhu</li>
  <li>Randy Stauner</li>
  <li>Satoshi Tagomori</li>
  <li>Shannon Skipper</li>
  <li>Stan Lo</li>
  <li>Takashi Kokubun</li>
  <li>Tavian Barnes</li>
  <li>Tobias Lütke</li>
</ul>

<p>(via a lightly touched up <code class="language-plaintext highlighter-rouge">git log --pretty="%an" zjit | sort -u</code>)</p>
</body></html>]]></content><author><name>[&quot;Max Bernstein&quot;]</name></author><category term="posts" /><category term="2025-12-24-launch-zjit" /><summary type="html"><![CDATA[ZJIT is now available with the release of Ruby 4.0. Here's an update of our progress.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-12-24-launch-zjit/72bca59f7ad87072e57ded2ddc5c7ddc5f00ba46.png" /><media:content medium="image" url="https://railsatscale.com/2025-12-24-launch-zjit/72bca59f7ad87072e57ded2ddc5c7ddc5f00ba46.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Introducing Aliki: A Modern Theme for Ruby Documentation</title><link href="https://railsatscale.com/2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation/" rel="alternate" type="text/html" title="Introducing Aliki: A Modern Theme for Ruby Documentation" /><published>2025-12-22T00:00:00+00:00</published><updated>2025-12-22T00:00:00+00:00</updated><id>https://railsatscale.com/2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation/</id><content type="html" xml:base="https://railsatscale.com/2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>Ruby has always been a joy to write. But for a long time, reading Ruby documentation on <a href="https://docs.ruby-lang.org">docs.ruby-lang.org</a> hasn’t really matched that experience.</p>

<p>Last year, I brought a <a href="https://st0012.dev/ruby-3-4-docs">new look to the Darkfish theme</a> by updating its visuals and improving mobile support. It was a visible improvement, but it wasn’t enough.</p>

<p>So this year, I built something new from the ground up. Starting with RDoc 7.0.0, Aliki is now the default theme for <a href="https://github.com/ruby/rdoc">RDoc</a>.</p>

<p>This release also coincides with Ruby’s 30th anniversary and the <a href="https://www.ruby-lang.org/en/news/2025/12/22/redesign-site-identity/">redesign of ruby-lang.org</a>—a great moment to give Ruby’s documentation a fresh look as we head into the next chapter with Ruby 4.0.</p>

<figure><img src="./desktop-class-light.png" alt="Screenshot of docs.ruby-lang.org with Aliki theme - desktop view" width="100%"><figcaption>Screenshot of docs.ruby-lang.org with Aliki theme - desktop view</figcaption></figure>

<h2 id="why-a-new-theme">Why a New Theme?</h2>

<p>Even after last year’s improvements, I still didn’t enjoy using docs.ruby-lang.org as much as I wanted. Every time I needed to look something up, the experience felt dated.</p>

<p>And it was difficult to further improve Darkfish because:</p>

<ul>
  <li>It lacks documentation, especially around the original design decisions</li>
  <li>Some of the patterns it uses were outdated</li>
  <li>Some third-party themes build on Darkfish, so updating it too much risks breaking them</li>
</ul>

<p>RDoc itself added more constraints: It can’t depend on any gem that doesn’t ship with Ruby itself, and it can’t run a modern JavaScript build pipeline.</p>

<p>RDoc was created in the pre-Node.js era and hasn’t evolved with frontend tooling. Adopting modern toolchains would raise the dependency requirements for everyone—Ruby’s documentation generation pipeline, gems like IRB and Reline, and so on.</p>

<p>So all JavaScript, CSS, and HTML had to be written directly—no frameworks, no build tools, no npm packages.</p>

<p>And honestly, given all the constraints, this project wouldn’t have been possible without AI coding agents.
Last year, just getting Darkfish’s code block styling right took me hours. It was a struggle for me to implement the look I wanted, and then to make it work with surrounding elements.
At that pace, building an entire new theme wasn’t realistic.</p>

<p>This year, however, I discovered that I could try three different UI styles in an hour with AI agents. So I decided to take on the impossible task.</p>

<p>My goal was simple: make docs.ruby-lang.org look modern and actually enjoyable to use.</p>

<p>I collected all the features I wished a documentation site would have, gathered feedback from Rubyists around me, cherry-picked the improvements the community added to Darkfish last year (SEO, search enhancements, etc.), and put them together into a new theme.</p>

<p>Ok, enough of the back stories. Let’s see what Aliki brings:</p>

<h2 id="search">Search</h2>

<p>The old search wasn’t intuitive. It supported fuzzy matching, but getting the sorting right was difficult—searching <code class="language-plaintext highlighter-rouge">Arr</code> never actually got you the <code class="language-plaintext highlighter-rouge">Array</code> class as the first result.</p>

<p>After a few patch-ups, it was still buggy, so I rewrote it with a new UI:</p>

<ul>
  <li>
    <figure><img src="./desktop-search-dropdown.png" alt="Aliki search dropdown on desktop showing type-aware ranking with classes, modules, methods, and constants" width="70%"><figcaption>Aliki search dropdown on desktop showing type-aware ranking with classes, modules, methods, and constants</figcaption></figure>
  </li>
  <li>
    <figure><img src="./mobile-search-dropdown.jpeg" alt="Aliki search dropdown on mobile showing full-screen search modal" width="50%"><figcaption>Aliki search dropdown on mobile showing full-screen search modal</figcaption></figure>
  </li>
</ul>

<p>Some notable new features/improvements:</p>

<ul>
  <li>
<strong>Case-aware ranking</strong>: If you search <code class="language-plaintext highlighter-rouge">parse</code> (lowercase), methods show up first. If you search <code class="language-plaintext highlighter-rouge">Parser</code> (capitalized), classes and modules come first.</li>
  <li>
<strong>Fuzzy matching</strong>: This existed before, but fuzzy results used to pollute the top of the list. Now we have a smarter ranking system to make sure exact/substring matches show up before fuzzy results.</li>
  <li>
<strong>Constants included</strong>: You can now search for constants, along with classes, modules, and methods.</li>
  <li>
<strong>Type labels</strong>: Each result shows whether it’s a class, module, method, or constant.</li>
  <li>
<strong>Keyboard support</strong>: Did you know you can press <code class="language-plaintext highlighter-rouge">/</code> to focus the search bar? This existed in Darkfish too, but I thought it was worth mentioning.</li>
</ul>

<h2 id="dark-mode">Dark Mode</h2>

<p>Aliki has a light/dark toggle. It saves your preference and respects your OS dark mode setting by default.</p>

<figure><img src="./desktop-hash-class-dark.png" alt="Ruby Hash class documentation page in dark mode" width="100%"><figcaption>Ruby Hash class documentation page in dark mode</figcaption></figure>

<p><br></p>

<figure><img src="./desktop-hash-class-light.png" alt="Ruby Hash class documentation page in light mode" width="100%"><figcaption>Ruby Hash class documentation page in light mode</figcaption></figure>

<h2 id="layout">Layout</h2>

<p>The layout has three columns:</p>

<ul>
  <li>
<strong>Left sidebar</strong>: Navigation for pages, ancestors, methods, and class/module index</li>
  <li>
<strong>Center</strong>: The documentation content</li>
  <li>
<strong>Right sidebar</strong>: A table of contents generated from headings, with the current section highlighted as you scroll</li>
</ul>

<p>Sidebar sections can collapse. For example, when you’re on a class or module page, the pages section is automatically collapsed so you can focus on the relevant navigation, with page documents still accessible if you need them.</p>

<figure><img src="./desktop-collapsible-sidebar.gif" alt="Animated demonstration of collapsible sidebar sections in Aliki" width="100%"><figcaption>Animated demonstration of collapsible sidebar sections in Aliki</figcaption></figure>

<p>Speaking of pages, I also reorganized the pages list this year. It used to be a long, rather flat list—now pages are grouped and much easier to navigate. In the coming year, we’ll continue improving page documentation so it feels more like a coherent guide rather than a collection of loosely related pages.</p>

<p>On mobile, the layout is a single column with a hamburger menu and a full-screen search modal—same as before.</p>

<h2 id="code-features">Code Features</h2>

<p><strong>Code blocks now have copy buttons:</strong></p>

<figure><img src="./desktop-code-block.gif" alt="Animated demonstration of code block copy button in Aliki" width="100%"><figcaption>Animated demonstration of code block copy button in Aliki</figcaption></figure>

<p><strong>C code is now highlighted too:</strong></p>

<figure><img src="./desktop-c-highlight.png" alt="Screenshot of C syntax highlighting" width="100%"><figcaption>Screenshot of C syntax highlighting</figcaption></figure>

<h2 id="for-gem-documentation">For Gem Documentation</h2>

<p>Aliki works for any gem, not just Ruby core. If you generate documentation with RDoc 7.0+, your users will see this theme automatically.</p>

<p>You can also customize the footer links now. For example, in your <code class="language-plaintext highlighter-rouge">.rdoc_options</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">footer_content</span><span class="pi">:</span>
  <span class="na">DOCUMENTATION</span><span class="pi">:</span>
    <span class="na">Home</span><span class="pi">:</span> <span class="s">index.html</span>
  <span class="na">RESOURCES</span><span class="pi">:</span>
    <span class="na">GitHub Repository</span><span class="pi">:</span> <span class="s">https://github.com/your/repo</span>
    <span class="na">Issue Tracker</span><span class="pi">:</span> <span class="s">https://github.com/your/repo/issues</span>
</code></pre></div></div>

<p>This is useful for linking to your gem’s repository, issue tracker, or other resources.</p>

<figure><img src="./desktop-footer.png" alt="Aliki footer showing customizable documentation and resource links" width="80%"><figcaption>Aliki footer showing customizable documentation and resource links</figcaption></figure>

<p>To keep using Darkfish in your project, you can switch back with:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">generator_name</span><span class="pi">:</span> <span class="s">darkfish</span>
</code></pre></div></div>

<h2 id="acknowledgments">Acknowledgments</h2>

<p>Thanks to <a href="https://github.com/tompng">@tompng</a> and <a href="https://github.com/earlopain">@earlopain</a> for reviewing the code and helping polish things up.</p>

<h2 id="try-it-out">Try It Out</h2>

<p>You can see Aliki at <a href="https://docs.ruby-lang.org/en/master/">docs.ruby-lang.org/en/master/</a> or <a href="https://ruby.github.io/rdoc/">ruby.github.io/rdoc/</a>.</p>

<p>If you find issues or have suggestions, <a href="https://github.com/ruby/rdoc/issues">open an issue</a> on GitHub.</p>

<h2 id="whats-next">What’s Next</h2>

<p>Now that reading docs is enjoyable again, the next step for RDoc is to make writing docs enjoyable too.</p>

<h2 id="about-the-name">About the Name</h2>

<p>Aliki is my cat. I’m not good at naming things, so I just named the new theme after her.</p>

<figure><img src="./aliki.jpg" alt="Photo of Aliki the cat" width="50%"><figcaption>Photo of Aliki the cat</figcaption></figure>
</body></html>]]></content><author><name>Stan Lo</name></author><category term="posts" /><category term="2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation" /><summary type="html"><![CDATA[Ruby's documentation gets a fresh look. Starting with RDoc 7.0.0, Aliki is the new default theme—bringing dark mode, better search, and a modern layout to docs.ruby-lang.org and gem documentation.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation/bce1e97fa3e899489dc0871076d888a8a58e4e4a.png" /><media:content medium="image" url="https://railsatscale.com/2025-12-22-introducing-aliki-a-modern-theme-for-ruby-documentation/bce1e97fa3e899489dc0871076d888a8a58e4e4a.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Rails’s Swappable Migration Backend for Schema Changes at Scale</title><link href="https://railsatscale.com/2025-12-08-swappable-migration-backends-in-rails/" rel="alternate" type="text/html" title="Rails’s Swappable Migration Backend for Schema Changes at Scale" /><published>2025-12-08T00:00:00+00:00</published><updated>2025-12-08T00:00:00+00:00</updated><id>https://railsatscale.com/2025-12-08-swappable-migration-backends-in-rails/</id><content type="html" xml:base="https://railsatscale.com/2025-12-08-swappable-migration-backends-in-rails/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>This post explores Rails’s swappable migration backend, a little-known feature that lets applications
customize how migrations run. At Shopify, we relied on monkey patches and a brittle SQL parser
to make Rails migrations work with our <em>Schema Migrations Service</em>. We developed the swappable
backend feature to more simply adapt Rails’s migration runner to our needs. We’ll cover why and
how we built this, and how Shopify uses it to power database migrations at scale.</p>

<hr>

<p>At Shopify, we run hundreds of database migrations across many Rails applications every week. Each
migration needs to be vetted for safety and executed in a way that doesn’t cause downtime for our
merchants. For years, we relied on bespoke tooling and <a href="https://github.com/shopify/lhm">LHMs</a> to
perform online schema changes at scale. In 2021, Shopify’s database team began designing a new,
centralized system for running schema migrations, the <em>Schema Migrations Service</em>. One of their goals was
to enable developers to use vanilla Rails migrations to perform schema changes safely and with zero downtime.</p>

<p>Our database team built the schema migrations gem to solve this problem, but the implementation wasn’t simple.
The gem relied on monkey patches and a complicated <a href="https://github.com/ruby/racc">RACC</a> parser to handle safety checking migrations
and submitting them to the schema migrations service. Shopify’s Rails Infrastructure team took the
opportunity to build something into the framework that would help us address our schema migration needs
more elegantly. We built the swappable migration backend (available as of Rails 7.0) to give applications
flexibility over how their migrations execute. Let’s dive into how Shopify uses this feature to power safe
database migrations at scale.</p>

<h2 id="why-production-migrations-require-a-different-approach">Why Production Migrations Require a Different Approach</h2>

<p>When you run <code class="language-plaintext highlighter-rouge">bin/rails db:migrate</code> in development, Rails executes your migration methods directly
against the database. Each call to <code class="language-plaintext highlighter-rouge">create_table</code>, <code class="language-plaintext highlighter-rouge">add_column</code>, or <code class="language-plaintext highlighter-rouge">add_index</code> immediately
translates to SQL that modifies your schema. This works great for local development, but at
Shopify’s scale, we can’t afford to run schema changes this way in production.</p>

<p>LHMs are a tool for performing online schema migrations. This means that migrations can be
performed without locking tables, enabling the system to stay up while the migration is running.
We used LHMs for many years to perform schema changes without downtime, but this also meant
that we couldn’t use Rails’s native migration API.</p>

<p>Shopify’s database team decided to build a <em>Schema Migrations Service</em> to allow developers to
return to using vanilla Rails migrations, while ensuring that schema changes were still performed
online behind the scenes. The idea was also to improve the developer experience around migrations
by:</p>

<ol>
  <li>Requiring migrations to <strong>pass safety checks</strong> before execution (e.g. blocking column-change
operations, ensuring a migration only operated on a single table, etc.).</li>
  <li>
<strong>Submitting migrations to a centralized manager</strong> to more easily orchestrate schema changes
  across multiple database shards, with better testing and retries behaviour.</li>
  <li>
<strong>Providing developers with more insight</strong> into which migrations were running, their progress,
  etc. from a comprehensive UI.</li>
</ol>

<p>The schema migrations gem built by our DB team handled safety checking migrations and submitting
them to the centralized manager. The initial implementation, however, relied heavily on monkey patches
to existing migration codepaths in Rails. Rather than executing migration SQL, the gem patched Rails
to capture any SQL statements. It relied on a RACC parser to extract schema change operations from the
SQL, safety check them, and then transform them into a JSON DDL (<a href="https://www.ibm.com/docs/en/i/7.5.0?topic=programming-data-definition-language">Data Definition Language</a>) to
be sent to the manager.</p>

<p>The Rails Infrastructure team realized that this was a great opportunity to make Rails’ migration
execution more flexible, so that we could meet Shopify’s schema migration needs without needing to
monkey patch a bunch of code or maintain a complicated RACC parser.</p>

<h2 id="building-railss-swappable-migration-strategy">Building Rails’s Swappable Migration Strategy</h2>

<p>When we started this project in early 2022, we explored several approaches that would allow us to
move away from monkey patching Rails in the gem. One idea was to use static analysis, and try to
parse migration files without running them. Another was to propose schema definition objects for every
migration operation, where Rails would expose Ruby representations of schema changes (like
<code class="language-plaintext highlighter-rouge">AddColumnDefinition</code>, <code class="language-plaintext highlighter-rouge">CreateTableDefinition</code>, etc.) that could be translated into any format:
SQL, JSON, or otherwise.</p>

<p>The Rails Core team had concerns about the complexity that schema definitions would introduce to
Active Record, and we soon pivoted to a simpler approach: the <a href="https://refactoring.guru/design-patterns/strategy">strategy pattern</a>.
Instead of fundamentally changing how migrations represent schema changes, we’d introduce an intermediary
object between migrations and the connection adapter that could customize execution behavior. This
was a cleaner abstraction that solved our problem without requiring massive changes to Active
Record’s internals.</p>

<p>In June 2022, we opened <a href="https://github.com/rails/rails/pull/45324">a pull request to Rails</a>
proposing this “execution strategy” pattern for migrations. The PR introduced a strategy object
between the <code class="language-plaintext highlighter-rouge">Migration</code> class and the connection adapter. Instead of migrations directly delegating
schema statement commands to the connection via <code class="language-plaintext highlighter-rouge">method_missing</code>, they would delegate to a strategy
object that could be swapped out.</p>

<p>For example, suppose you call a method like <code class="language-plaintext highlighter-rouge">create_table</code> in a migration. Rails routes
that call through a migration strategy object, which by default, is
<a href="https://github.com/rails/rails/blob/873fb486e78227cb27a855503fcb38ff35b3d1ae/activerecord/lib/active_record/migration/default_strategy.rb"><code class="language-plaintext highlighter-rouge">ActiveRecord::Migration::DefaultStrategy</code></a>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">ActiveRecord</span>
  <span class="k">class</span> <span class="nc">Migration</span>
    <span class="k">class</span> <span class="nc">DefaultStrategy</span> <span class="o">&lt;</span> <span class="no">ExecutionStrategy</span>
      <span class="kp">private</span>
        <span class="k">def</span> <span class="nf">method_missing</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
          <span class="n">connection</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
        <span class="k">end</span>

        <span class="k">def</span> <span class="nf">respond_to_missing?</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="n">include_private</span> <span class="o">=</span> <span class="kp">false</span><span class="p">)</span>
          <span class="n">connection</span><span class="p">.</span><span class="nf">respond_to?</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="n">include_private</span><span class="p">)</span> <span class="o">||</span> <span class="k">super</span>
        <span class="k">end</span>

        <span class="k">def</span> <span class="nf">connection</span>
          <span class="n">migration</span><span class="p">.</span><span class="nf">connection</span>
        <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The default strategy sends migration methods to the connection, which executes SQL against your database.
This is how migrations worked before, so most Rails developers are unaware that there’s now a strategy
object working behind the scenes! However, the migration strategy class
<a href="https://guides.rubyonrails.org/configuring.html#config-active-record-migration-strategy">can be configured</a> to customize how migrations are executed. As of Rails 7.0, you
can set <code class="language-plaintext highlighter-rouge">config.active_record.migration_strategy</code> in your environment configuration (for example, in
<code class="language-plaintext highlighter-rouge">config/environments/production.rb</code>). Pass it either a class object or a string with the class name:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/custom_migration_strategy.rb</span>

<span class="k">class</span> <span class="nc">CustomMigrationStrategy</span> <span class="o">&lt;</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="o">::</span><span class="no">DefaultStrategy</span>
  <span class="k">def</span> <span class="nf">drop_table</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
    <span class="k">raise</span> <span class="s2">"Dropping tables is not supported!"</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/environments/production.rb</span>

<span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">configure</span> <span class="k">do</span>
  <span class="n">config</span><span class="p">.</span><span class="nf">active_record</span><span class="p">.</span><span class="nf">migration_strategy</span> <span class="o">=</span> <span class="no">CustomMigrationStrategy</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now, when you run <code class="language-plaintext highlighter-rouge">bin/rails db:migrate</code>, Rails will delegate all migration methods to your custom
strategy, giving you complete control over how migrations are executed.</p>

<p><strong>Note</strong>: Outside of production, you will likely want to stick with the default strategy for local
development. This setup lets you safely use advanced migration tooling in production while keeping things
fast and simple for local development. We do this at Shopify.</p>

<h2 id="serializing-production-migrations-to-json">Serializing Production Migrations to JSON</h2>

<p>Once Rails supported swappable migration backends, we implemented a custom strategy that serialized
migrations as JSON, making them easy to submit to a remote manager. To accomplish
this, our gem introduced a <code class="language-plaintext highlighter-rouge">JsonSerializationStrategy</code> class. This class implemented each schema
change method available in migrations, using Rails’s schema definition APIs to build the necessary
schema objects. We then converted these objects into JSON payloads that described each schema operation.
Here’s an example of how we capture <code class="language-plaintext highlighter-rouge">create_table</code> operations:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">JsonSerializationStrategy</span> <span class="o">&lt;</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="o">::</span><span class="no">DefaultStrategy</span>
  <span class="nb">attr_accessor</span> <span class="ss">:connection</span><span class="p">,</span> <span class="ss">:operations</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">connection</span><span class="p">)</span>
    <span class="vi">@connection</span> <span class="o">=</span> <span class="n">connection</span>
    <span class="vi">@operations</span> <span class="o">=</span> <span class="p">[]</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">create_table</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
    <span class="n">td</span> <span class="o">=</span> <span class="n">connection</span><span class="p">.</span><span class="nf">build_create_table_definition</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
    <span class="n">ddl</span> <span class="o">=</span> <span class="n">connection</span><span class="p">.</span><span class="nf">schema_creation</span><span class="p">.</span><span class="nf">accept</span><span class="p">(</span><span class="n">td</span><span class="p">)</span>
    <span class="n">definition</span> <span class="o">=</span> <span class="n">extract_table_definition</span><span class="p">(</span><span class="n">td</span><span class="p">.</span><span class="nf">name</span><span class="p">,</span> <span class="n">ddl</span><span class="p">)</span>

    <span class="n">operations</span> <span class="o">&lt;&lt;</span> <span class="p">{</span>
      <span class="ss">type: :sql</span><span class="p">,</span>
      <span class="ss">op: :create_table</span><span class="p">,</span>
      <span class="ss">params: </span><span class="p">{</span>
        <span class="ss">name: </span><span class="n">td</span><span class="p">.</span><span class="nf">name</span><span class="p">,</span>
        <span class="ss">definition: </span><span class="n">definition</span><span class="p">,</span>
      <span class="p">},</span>
    <span class="p">}</span>
  <span class="k">end</span>

  <span class="kp">private</span>

  <span class="k">def</span> <span class="nf">extract_table_definition</span><span class="p">(</span><span class="n">table_name</span><span class="p">,</span> <span class="n">ddl</span><span class="p">)</span>
    <span class="n">table_name_pattern</span> <span class="o">=</span> <span class="sr">/^CREATE TABLE </span><span class="si">#{</span><span class="n">connection</span><span class="p">.</span><span class="nf">quote_table_name</span><span class="p">(</span><span class="n">table_name</span><span class="p">.</span><span class="nf">to_s</span><span class="p">)</span><span class="si">}</span><span class="sr"> /</span>
    <span class="n">ddl</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">table_name_pattern</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Here’s a simplified look at how migrations are run in production, using the swappable strategy:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ExternalMigrationsRunner</span>
  <span class="k">def</span> <span class="nf">upload_migration</span><span class="p">(</span><span class="n">migration</span><span class="p">)</span>
    <span class="c1"># Run the migration, but since we're using the JsonSerializationStrategy,</span>
    <span class="c1"># we won't execute SQL; instead, the strategy captures all operations as JSON</span>
    <span class="n">runnable_migration</span> <span class="o">=</span> <span class="n">migration</span><span class="p">.</span><span class="nf">migration_class</span><span class="p">.</span><span class="nf">new</span>
    <span class="k">if</span> <span class="n">runnable_migration</span><span class="p">.</span><span class="nf">respond_to?</span><span class="p">(</span><span class="ss">:change</span><span class="p">)</span>
      <span class="n">runnable_migration</span><span class="p">.</span><span class="nf">change</span>
    <span class="k">elsif</span> <span class="n">runnable_migration</span><span class="p">.</span><span class="nf">respond_to?</span><span class="p">(</span><span class="ss">:up</span><span class="p">)</span>
      <span class="n">runnable_migration</span><span class="p">.</span><span class="nf">up</span>
    <span class="k">end</span>

    <span class="c1"># Extract the serialized operations from the strategy</span>
    <span class="n">operations</span> <span class="o">=</span> <span class="n">runnable_migration</span><span class="p">.</span><span class="nf">execution_strategy</span><span class="p">.</span><span class="nf">operations</span>

    <span class="c1"># Upload to the migrations service via API</span>
    <span class="no">ApiClient</span><span class="p">.</span><span class="nf">upload_migration</span><span class="p">(</span>
      <span class="ss">name: </span><span class="n">migration</span><span class="p">.</span><span class="nf">name</span><span class="p">,</span>
      <span class="ss">database: </span><span class="n">database_name</span><span class="p">,</span>
      <span class="ss">identifier: </span><span class="n">migration</span><span class="p">.</span><span class="nf">version</span><span class="p">,</span>
      <span class="ss">operations: </span><span class="n">operations</span><span class="p">,</span>  <span class="c1"># JSON representation of schema changes</span>
      <span class="ss">table_name: </span><span class="n">migration</span><span class="p">.</span><span class="nf">table</span><span class="p">,</span>
      <span class="ss">author: </span><span class="n">migration</span><span class="p">.</span><span class="nf">author</span>
    <span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<h3 id="configuring-the-migration-strategy-automatically">Configuring the Migration Strategy Automatically</h3>

<p>Rather than requiring each application to configure the migration strategy in their config file for
production, the schema migrations gem leveraged an initializer to set this automatically:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/schema_migrations/railtie.rb</span>
<span class="nb">require</span> <span class="s2">"rails/railtie"</span>

<span class="k">class</span> <span class="nc">Railtie</span> <span class="o">&lt;</span> <span class="no">Rails</span><span class="o">::</span><span class="no">Railtie</span>
  <span class="o">...</span>

  <span class="n">initializer</span> <span class="s2">"schema_migrations.migration_strategy_config"</span> <span class="k">do</span> <span class="o">|</span><span class="n">app</span><span class="o">|</span>
    <span class="k">next</span> <span class="k">unless</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">production?</span>

    <span class="n">app</span><span class="p">.</span><span class="nf">config</span><span class="p">.</span><span class="nf">active_record</span><span class="p">.</span><span class="nf">migration_strategy</span> <span class="o">=</span> <span class="no">JsonSerializationStrategy</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This initializer ensures that any application that includes the schema migrations gem has its migrations
intercepted and serialized in production environments.</p>

<h2 id="reimagining-safety-checks-from-sql-parsing-to-runtime-analysis">Reimagining Safety Checks: From SQL Parsing to Runtime Analysis</h2>

<p>While working on the upstream strategy feature, our team was simultaneously tackling another critical
problem: safety checks. Before any migration runs in production at Shopify, the gem performs safety
checks to catch common mistakes that could cause downtime, such as:</p>

<ul>
  <li>Adding a <code class="language-plaintext highlighter-rouge">NOT NULL</code> column without a default value (check out <a href="https://shopify.engineering/add-not-null-colums-to-database">this blog post</a>
if you’re interested in learning more)</li>
  <li>Renaming a column (breaks downstream consumers)</li>
  <li>Changing a column type in an incompatible way</li>
</ul>

<p>These checks run in development too, giving developers immediate feedback before they deploy.</p>

<p>The old implementation of the gem’s safety checker relied on a RACC parser to analyze SQL strings, which
was brittle: every time SQL syntax changed or we encountered a new edge case, the parser had to be updated.
We wanted a standalone workflow for being able to safety check migrations, separate from the migrations
actually being executed and submitted to the manager. Consequently, we couldn’t rely on the migration strategy
to do this. Instead, we settled on a new approach that would allow us to move away from the RACC parser and
reduce a lot of the complexity. We developed a <code class="language-plaintext highlighter-rouge">MigrationOperationRecorder</code> that “runs” a migration and
records all method calls performed:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MigrationOperationRecorder</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">migration_class</span><span class="p">)</span>
    <span class="vi">@migration</span> <span class="o">=</span> <span class="n">migration_class</span><span class="p">.</span><span class="nf">new</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">record</span>
    <span class="n">singleton_class</span> <span class="o">=</span> <span class="vi">@migration</span><span class="p">.</span><span class="nf">singleton_class</span>
    <span class="n">singleton_class</span><span class="p">.</span><span class="nf">include</span><span class="p">(</span><span class="no">RecordMigrationOperations</span><span class="p">)</span>

    <span class="k">if</span> <span class="vi">@migration</span><span class="p">.</span><span class="nf">respond_to?</span><span class="p">(</span><span class="ss">:change</span><span class="p">)</span>
      <span class="vi">@migration</span><span class="p">.</span><span class="nf">change</span>
    <span class="k">elsif</span> <span class="vi">@migration</span><span class="p">.</span><span class="nf">respond_to?</span><span class="p">(</span><span class="ss">:up</span><span class="p">)</span>
      <span class="vi">@migration</span><span class="p">.</span><span class="nf">up</span>
    <span class="k">end</span>

    <span class="vi">@migration</span><span class="p">.</span><span class="nf">method_calls</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">RecordMigrationOperations</code> module works by leveraging the same <code class="language-plaintext highlighter-rouge">method_missing</code> mechanism that Rails
uses for migrations. Since <code class="language-plaintext highlighter-rouge">ActiveRecord::Migration</code> uses <code class="language-plaintext highlighter-rouge">method_missing</code> to route commands to the execution
strategy, we define <code class="language-plaintext highlighter-rouge">RecordMigrationOperations#method_missing</code> to store the method call instead:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">RecordMigrationOperations</span>
  <span class="k">def</span> <span class="nf">method_missing</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">block</span><span class="p">)</span>
    <span class="vi">@method_calls</span> <span class="o">&lt;&lt;</span> <span class="no">MigrationOperation</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
      <span class="ss">method: </span><span class="nb">method</span><span class="p">,</span>
      <span class="ss">args: </span><span class="n">args</span><span class="p">,</span>
      <span class="ss">options: </span><span class="n">options</span>
    <span class="p">)</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">method_calls</span>
    <span class="vi">@method_calls</span> <span class="o">||=</span> <span class="p">[]</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Once operations are recorded, individual safety checks can inspect the migration data.
Here’s an example of the <code class="language-plaintext highlighter-rouge">SingleTableCheck</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SingleTableCheck</span> <span class="o">&lt;</span> <span class="no">BaseSafetyCheck</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">migration</span><span class="p">)</span>
    <span class="vi">@inspected_migration</span> <span class="o">=</span> <span class="n">migration</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">check</span>
    <span class="c1"># @inspected_migration is a specialized object containing info</span>
    <span class="c1"># about all of the operations the migration performs, as returned</span>
    <span class="c1"># from MigrationOperationRecorder#record</span>
    <span class="n">tables</span> <span class="o">=</span> <span class="vi">@inspected_migration</span><span class="p">.</span><span class="nf">tables</span>

    <span class="k">return</span> <span class="k">if</span> <span class="n">tables</span><span class="p">.</span><span class="nf">one?</span>

    <span class="k">raise</span> <span class="no">SafetyCheckError</span><span class="p">,</span>
      <span class="s2">"You must work with exactly one table per migration. "</span> <span class="p">\</span>
      <span class="s2">"Split tables </span><span class="si">#{</span><span class="n">tables</span><span class="p">.</span><span class="nf">to_sentence</span><span class="si">}</span><span class="s2"> into </span><span class="si">#{</span><span class="n">tables</span><span class="p">.</span><span class="nf">length</span><span class="si">}</span><span class="s2"> migrations."</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This check accesses <code class="language-plaintext highlighter-rouge">@inspected_migration.tables</code>, which is extracted during the analysis phase,
and validates that exactly one table is involved. If the check fails, it raises a <code class="language-plaintext highlighter-rouge">SafetyCheckError</code>
with a clear message telling developers how to fix the issue.</p>

<h3 id="why-not-use-a-migration-strategy-for-safety-checking">Why Not Use a Migration Strategy for Safety Checking?</h3>

<p>You might wonder why we used <code class="language-plaintext highlighter-rouge">method_missing</code> for the <code class="language-plaintext highlighter-rouge">MigrationOperationRecorder</code> instead of
creating another strategy pattern. Couldn’t we use our newly built feature for safety checking?
The answer comes down to separation of concerns and simplicity. Safety checking and migration
execution serve different purposes:</p>

<ul>
  <li>
    <p><strong>Migration execution</strong> needs to be swappable because different environments (development vs.
production) require different behaviours. In development, we execute SQL directly. In production,
we serialize to JSON and submit to a remote service.</p>
  </li>
  <li>
    <p><strong>Safety checking</strong> needs to happen the same way everywhere. We’re analyzing which operations the
migration is performing, not executing schema changes. The same safety checks run in development,
CI, and production.</p>
  </li>
</ul>

<p>Using <code class="language-plaintext highlighter-rouge">method_missing</code> for safety checks gives us a simpler implementation that automatically
captures all migration DSL methods without needing to explicitly enumerate them all. A strategy
pattern would have required us to implement every migration method explicitly. Given that we only
wanted to record the migration methods being called and their arguments, opting for a simpler
<code class="language-plaintext highlighter-rouge">method_missing</code> approach made more sense.</p>

<h2 id="per-adapter-migration-strategies">Per-Adapter Migration Strategies</h2>

<p>One challenge with using a global migration strategy is that it’s insufficient for applications
using multiple database systems. Since its inception, Shopify has primarily used MySQL, but more
recently we’ve been exploring running non-MySQL databases. Different databases have different
requirements for how migrations should be serialized, which means that the migration strategy
needs to be tailored to database the migrations are running against.</p>

<p>We could make this work by having our gem’s migration strategy inspect the database adapter at runtime
and dispatch to the appropriate serialization logic. This is not ideal, though; we’re reimplementing
adapter dispatch logic that Rails can handle natively. It felt like this was a missing piece in our
upstream solution, so last month, we opened a <a href="https://github.com/rails/rails/pull/56204">PR</a> to add <strong>per-adapter migration strategies</strong>
to Rails. This feature will be available in Rails 8.2.</p>

<p>Instead of setting one global strategy:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">config</span><span class="p">.</span><span class="nf">active_record</span><span class="p">.</span><span class="nf">migration_strategy</span> <span class="o">=</span> <span class="no">JsonSerializationStrategy</span>
</code></pre></div></div>

<p>You can now register strategies directly on adapter classes:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ActiveSupport</span><span class="p">.</span><span class="nf">on_load</span><span class="p">(</span><span class="ss">:active_record_trilogyadapter</span><span class="p">)</span> <span class="k">do</span>
  <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">ConnectionAdapters</span><span class="o">::</span><span class="no">TrilogyAdapter</span><span class="p">.</span><span class="nf">migration_strategy</span> <span class="o">=</span>
    <span class="no">MysqlStrategy</span>
<span class="k">end</span>

<span class="no">ActiveSupport</span><span class="p">.</span><span class="nf">on_load</span><span class="p">(</span><span class="ss">:active_record_postgresqladapter</span><span class="p">)</span> <span class="k">do</span>
  <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">ConnectionAdapters</span><span class="o">::</span><span class="no">PostgreSQLAdapter</span><span class="p">.</span><span class="nf">migration_strategy</span> <span class="o">=</span>
    <span class="no">PostgreSQLStrategy</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Rails automatically selects the correct strategy based on the database adapter in use
for each migration. For example, if you’re running migrations against a MySQL database
configured with the <a href="https://github.com/trilogy-libraries/trilogy">Trilogy</a> adapter, Rails chooses <code class="language-plaintext highlighter-rouge">MysqlStrategy</code>. If your
migrations are running against a PostgreSQL database, Rails selects <code class="language-plaintext highlighter-rouge">PostgreSQLStrategy</code>.
If the current adapter does not have a strategy configured, Rails will fall back to using
the global strategy.</p>

<h2 id="making-rails-work-for-you">Making Rails Work for You</h2>

<p>One of Rails’s design philosophies is <strong>convention over configuration</strong>. The majority of Rails
apps don’t need to think about how their Rails migrations are performed, so we keep things simple
with a default migration strategy. At the point where an application needs to customize how their
migrations run, the framework provides a clear extension point. Applications can opt-into
configurable behaviour as their requirements evolve.</p>

<p>This is also a story about how working in the open benefits everyone. We could have kept our
monkey patches internal to Shopify, continuing to patch Rails as needed. Instead, we built a
more maintainable solution for ourselves while also providing the Rails community with a new tool
for customizing migration behaviour. If you’re running into limitations with Rails for your specific
use case, consider whether there’s an opportunity for an upstream contribution that could solve
your problem while benefitting the rest of the community.</p>
</body></html>]]></content><author><name>Adrianna Chang</name></author><category term="posts" /><category term="2025-12-08-swappable-migration-backends-in-rails" /><summary type="html"><![CDATA[This post explores Rails’s swappable migration backend, a little-known feature that lets applications customize how migrations run. At Shopify, we relied on monkey patches and a brittle SQL parser to make Rails migrations work with our Schema Migrations Service. We developed the swappable backend feature to more simply adapt Rails’s migration runner to our needs. We’ll cover why and how we built this, and how Shopify uses it to power database migrations at scale.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-12-08-swappable-migration-backends-in-rails/611a79dd61cc7897be8aee3ea79fd49f58e62b33.png" /><media:content medium="image" url="https://railsatscale.com/2025-12-08-swappable-migration-backends-in-rails/611a79dd61cc7897be8aee3ea79fd49f58e62b33.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Adding Iongraph support to ZJIT</title><link href="https://railsatscale.com/2025-11-19-adding-iongraph-support/" rel="alternate" type="text/html" title="Adding Iongraph support to ZJIT" /><published>2025-11-19T00:00:00+00:00</published><updated>2025-11-19T00:00:00+00:00</updated><id>https://railsatscale.com/2025-11-19-adding-iongraph-support/</id><content type="html" xml:base="https://railsatscale.com/2025-11-19-adding-iongraph-support/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>ZJIT adds support for Iongraph, which offers a web-based, pass-by-pass viewer
with a stable layout, better navigation, and quality-of-life features like
labeled backedges and clickable operands.</p>

<h2 id="prelude">Prelude</h2>

<p>I’m an intern on the ZJIT team for the fall term. I also have a rather bad habit
of being chronically on <a href="https://lobste.rs/">lobste.rs</a>.</p>

<p>While idly browsing, I spotted an article by <a href="https://bvisness.me/">Ben Visness</a>
titled
<em><a href="https://spidermonkey.dev/blog/2025/10/28/iongraph-web.html">Who needs Graphviz when you can build it yourself?</a></em>,
which covers his work on creating a novel graph viewer called <em>Iongraph</em>.</p>

<figure><img src="iongraph-spidermonkey-example.png" alt="Iongraph used to visualize an IR graph for the SpiderMonkey JIT inside Firefox."><figcaption>Iongraph used to visualize an IR graph for the SpiderMonkey JIT inside Firefox.</figcaption></figure>

<p>Immediately, I was intrigued. I like looking at new technology and I wondered
what it might be like to integrate the work done in Iongraph with ZJIT, getting
all sorts of novel and interesting features for free. I suspected that it could
also help other engineers to reason about how their optimizations might affect
the control flow graph of a given function.</p>

<p>Also, it just looks really cool. It’s got nice colours, good built-in CSS, and
is built in a fairly extensible way. The underlying code isn’t hard to read if
you need to make changes to it.</p>

<h2 id="investigating-further">Investigating further</h2>

<p>Iongraph is compelling for a few reasons.</p>

<p>It supports stable layouts, which means that removing or adding nodes (something
that can happen when you run an optimization pass) doesn’t shift the location of
other nodes to an extreme degree. Iongraph also gives all sorts of interactive
options, like clickable operands, scrollable graphs, or arrow keys to navigate
between different nodes.</p>

<p>An especially useful feature is the ability to switch between different compiled
methods with a small selector. In our codebase, ZJIT compiles each method on its
own, so using a tool like this allows us to inspect method level optimizations
all in one pane of a web browser. Of course, there are other great features,
like loop header highlighting or being able to click on optimization passes to
see what the control flow graph looks like after they’re applied.</p>

<h2 id="proposal">Proposal</h2>

<p>Roughly an hour after I read through said article, I noticed that my mentor,
<a href="https://bernsteinbear.com/">Max</a>, had also posted it in an internal team chat,
mentioning that it would be cool to support it.</p>

<p>Of course, I was tempted by this project. As is a common trait for interns, I
was tempted to take on a new, shiny project despite not knowing what it might
imply to actually develop it. After talking to Max further, he clarified that
this would require significant infrastructure work — or at the very least,
more than initially apparent.</p>

<h2 id="building">Building</h2>

<h3 id="a-json-library-inside-zjit">A JSON library inside ZJIT?</h3>

<p>Looking into the Iongraph format, I figured that I would have to use some sort
of JSON crate. Since ZJIT as a project doesn’t rely strictly on using Rust
tooling like <code class="language-plaintext highlighter-rouge">cargo</code>, directly adding <code class="language-plaintext highlighter-rouge">serde_json</code> as a dependency was out of
the question. Another compelling option was vendoring it (or a smaller JSON
library), but that was likely to include features that we did not want or
introduce licensing issues.</p>

<p>After a quick discussion, I settled on implementing the functionality myself. I
read a bit of the JSON specification, and got a sense of the ideal way to design
the library’s API. Ultimately, I chose to opt for readability and usability over
raw performance. This decision I think is reasonable given that the
serialization code is not in the critical path of the compiler. It’s also
accurate to say that the interface is clean enough to replace the internals in
the future with minimal issue should there be more performance needed.</p>

<p>In designing the serializer, I chose to target
<a href="https://datatracker.ietf.org/doc/html/rfc8259">RFC 8259</a>, which provides more
freedom than previous specifications. As noted in said RFC, historical
specifications constrained the top level value to be an array or an object, but
this spec (and my implementation) don’t require that constraint. I also opted to
avoid comments, encode strictly in UTF-8, and escape control characters.
Notably, RFC 8259 does not impose a limit on precision of numbers, just that
infinity, negative infinity, or <code class="language-plaintext highlighter-rouge">NaN</code> are restricted.</p>

<h3 id="computing-control-flow-graph-properties">Computing control flow graph properties</h3>

<p>With JSON serialization handled, the more challenging work was computing the
graph metadata that Iongraph requires. The format expects explicit successor and
predecessor relationships, loop headers, and back edge sources — information that
ZJIT doesn’t normally compute since it’s not needed for compilation at this stage
of compiler development.</p>

<p>One constraint I had to contend with was that the Iongraph format needs the user
to manually provide the successor and predecessor nodes for a given node in a
control flow graph. In ZJIT, we compile individual methods at a time as
<code class="language-plaintext highlighter-rouge">Function</code>s (our internal representation) that hold a graph of <code class="language-plaintext highlighter-rouge">Block</code>s. Each
<code class="language-plaintext highlighter-rouge">Block</code> is a basic block that you would find in a compiler textbook. (One caveat
to understand is that we use extended basic blocks, meaning that blocks can have
jump instructions at any point in their contained instructions — not just at
the end.)</p>

<p>The process of computing successors and predecessors is fairly simple. As you
iterate through the list of blocks, all blocks referenced as the target of a
jump-like instruction (whether conditional or unconditional) are added to the
successor set. Then for each successor, update their predecessor set to include
the block currently being operated on.</p>

<p>The next task I had to solve was computing the loop headers and back edge
sources.</p>

<p>Required in the process of computing both of these are computing the dominators
for blocks in a control flow graph. We can state that a block <em>i</em> dominates a
block <em>j</em> if all paths in the control flow graph that reach <em>j</em> must go through
<em>i</em>. Several algorithms exist for computing dominators. There exist both simple
iterative options and more complicated versions. Initially, I heard of a fixed
point iteration option that was very straightforward to implement but perhaps
not the most efficient. That one (which I will discuss shortly) runs in
quadratic time to the number of blocks available. In <a href="https://www.cs.tufts.edu/~nr/cs257/archive/keith-cooper/dom14.pdf"><em>A Simple, Fast Dominance
Algorithm</em></a>
by Cooper, Harvey, and Kennedy, both this iterative solution as well
as one that is optimized to use less space are mentioned. A third option is the
Lengauer-Tarjan algorithm, which has better worst case bounds compared to both
the iterative and tuned implementations.</p>

<p>Based on the goals of the project, I opted to use the iterative algorithm, since
it performs well and doesn’t incur serious memory use penalties for a small
number of blocks in a control flow graph. It can be described as such:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dom</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">nodes</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">node</span><span class="o">|</span>
  <span class="k">if</span> <span class="n">entry_nodes</span><span class="p">.</span><span class="nf">include?</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">dom</span><span class="p">[</span><span class="n">node</span><span class="p">]</span> <span class="o">=</span> <span class="no">Set</span><span class="p">[</span><span class="n">node</span><span class="p">]</span>
  <span class="k">else</span>
    <span class="n">dom</span><span class="p">[</span><span class="n">node</span><span class="p">]</span> <span class="o">=</span> <span class="n">nodes</span><span class="p">.</span><span class="nf">to_set</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="n">changed</span> <span class="o">=</span> <span class="kp">true</span>
<span class="k">while</span> <span class="n">changed</span>
  <span class="n">changed</span> <span class="o">=</span> <span class="kp">false</span>
  <span class="n">nodes</span><span class="p">.</span><span class="nf">reverse_post_order</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">node</span><span class="o">|</span>
    <span class="n">preds</span> <span class="o">=</span> <span class="n">predecessors</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
    <span class="n">pred_doms</span> <span class="o">=</span> <span class="n">preds</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="nb">p</span><span class="o">|</span> <span class="n">dom</span><span class="p">[</span><span class="nb">p</span><span class="p">]</span> <span class="p">}</span>

    <span class="c1"># Intersection of all predecessor dominators</span>
    <span class="n">intersection</span> <span class="o">=</span> <span class="k">if</span> <span class="n">pred_doms</span><span class="p">.</span><span class="nf">empty?</span>
                     <span class="no">Set</span><span class="p">.</span><span class="nf">new</span>
                   <span class="k">else</span>
                     <span class="n">pred_doms</span><span class="p">.</span><span class="nf">reduce</span><span class="p">(</span><span class="ss">:&amp;</span><span class="p">)</span>
                   <span class="k">end</span>

    <span class="c1"># Union with {node}</span>
    <span class="n">new_set</span> <span class="o">=</span> <span class="n">intersection</span> <span class="o">|</span> <span class="no">Set</span><span class="p">[</span><span class="n">node</span><span class="p">]</span>

    <span class="k">if</span> <span class="n">new_set</span> <span class="o">!=</span> <span class="n">dom</span><span class="p">[</span><span class="n">node</span><span class="p">]</span>
      <span class="n">dom</span><span class="p">[</span><span class="n">node</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_set</span>
      <span class="n">changed</span> <span class="o">=</span> <span class="kp">true</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Implementing this is fairly simple, and it runs quickly enough for the limited
number of nodes that it is totally acceptable.</p>

<p>To compute successors we use the following snippet:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">successors</span><span class="p">:</span> <span class="n">BTreeSet</span><span class="o">&lt;</span><span class="n">BlockId</span><span class="o">&gt;</span> <span class="o">=</span> <span class="n">block</span>
    <span class="py">.insns</span>
    <span class="nf">.iter</span><span class="p">()</span>
    <span class="nf">.map</span><span class="p">(|</span><span class="o">&amp;</span><span class="n">insn_id</span><span class="p">|</span> <span class="n">uf</span><span class="nf">.find_const</span><span class="p">(</span><span class="n">insn_id</span><span class="p">))</span>
    <span class="nf">.filter_map</span><span class="p">(|</span><span class="n">insn_id</span><span class="p">|</span> <span class="p">{</span>
        <span class="k">Self</span><span class="p">::</span><span class="nf">extract_jump_target</span><span class="p">(</span><span class="o">&amp;</span><span class="n">function</span><span class="py">.insns</span><span class="p">[</span><span class="n">insn_id</span><span class="na">.0</span><span class="p">])</span>
    <span class="p">})</span>
    <span class="nf">.collect</span><span class="p">();</span>
</code></pre></div></div>

<p>Here we go through all the instructions in a given block. We use a union find
data structure to map instructions to their canonical representatives (since
some optimizations may have merged or aliased instructions). We then filter by
<code class="language-plaintext highlighter-rouge">extract_jump_target</code>, which returns an <code class="language-plaintext highlighter-rouge">Option</code> that contains a <code class="language-plaintext highlighter-rouge">BlockId</code> for
jump-like instructions.</p>

<p>After finding successors, we can set the predecessors by iterating through the
nodes in the successor set and adding the current node to their predecessor
sets.</p>

<p>The last important thing we need to consider is finding the loop depth.</p>

<p>For finding this, we need to consider first how we can find a natural loop in
the first place.</p>

<p>We identify natural loops by detecting back edges. A back edge occurs when a
block has a predecessor that is dominated by that block (all paths to the
predecessor pass through this block). When we find such an edge, the target
block is a loop header and the predecessor is the source of a back edge. The
natural loop consists of all blocks on paths from the back edge source to the
loop header (excluding the header itself). Each block within this natural loop
then has its loop depth incremented.</p>

<p>These additional computations are used within the Iongraph layout engine to
determine the height at which a given block should be vertically, or where lines
should be routed within the graph. Loop headers and back edge sources are also
marked!</p>

<h2 id="the-final-result">The final result</h2>

<p>You can click around this demo graph showing a simple example from ZJIT to get
a sense of how Iongraph works! Operands are clickable to get to their definition.
You can click on the phases of optimization on the left side - note that only
the non-grayed out passes will have made changes. The graph is also zoomable and
scrollable!</p>

<iframe title="Iongraph Viewer" aria-label="Interactive compiler graph visualization" src="/assets/iongraph/viewer.html" width="100%" height="400"></iframe>

<p>Hopefully this post was educational! I learned a lot implementing this feature
and enjoyed doing so.</p>

<p>If you would like to do some work on ZJIT (and learn a lot in the process), you
are welcome to make pull requests to
<a href="https://github.com/ruby/ruby/">github.com/ruby/ruby/</a> with the commit prefix
<code class="language-plaintext highlighter-rouge">ZJIT:</code>. You can find issues
<a href="https://github.com/Shopify/ruby/issues?q=is%3Aissue%20state%3Aopen%20zjit">here</a>.</p>

<p>Also, feel free to join our <a href="https://zjit.zulipchat.com">Zulip</a>!</p>
</body></html>]]></content><author><name>Aiden Fox Ivey</name></author><category term="posts" /><category term="2025-11-19-adding-iongraph-support" /><summary type="html"><![CDATA[ZJIT adds support for Iongraph, which offers a web-based, pass-by-pass viewer with a stable layout, better navigation, and quality-of-life features like labeled backedges and clickable operands.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-11-19-adding-iongraph-support/23fbc0c0a7f8aaf5817e2d13a010ee10f2414ed2.png" /><media:content medium="image" url="https://railsatscale.com/2025-11-19-adding-iongraph-support/23fbc0c0a7f8aaf5817e2d13a010ee10f2414ed2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reworking Memory Management in CRuby</title><link href="https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/" rel="alternate" type="text/html" title="Reworking Memory Management in CRuby" /><published>2025-09-16T00:00:00+00:00</published><updated>2025-09-16T00:00:00+00:00</updated><id>https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/</id><content type="html" xml:base="https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<blockquote>
  <p>This blog post was adapted from our <a href="https://dl.acm.org/doi/10.1145/3735950.3735960">paper</a> and <a href="https://www.youtube.com/live/es03mF_1vQM?t=10181s">talk</a> at the International Symposium on Memory Management 2025.</p>
</blockquote>

<details style="width: 100%">
  <summary>Click here to read the paper</summary>

  <object data="paper.pdf" type="application/pdf" style="width: 100%; height: 75vh;">
    <p>
      This browser does not support displaying PDFs. Please download the PDF to view it:
      <a href="paper.pdf">Download PDF</a>.
    </p>
  </object>
</details>

<p><br></p>

<p>We would first like to acknowledge the <a href="/2022-12-07-farewell-to-a-friend">late Chris Seaton</a>, who initiated our collaboration with the Australian National University on this project. We are thankful for his contribution, vision, and leadership. Without him, none of this would have been possible.</p>

<h2 id="background">Background</h2>

<p>The Australian National University (ANU) and Shopify are collaborating on integrating the Memory Management Toolkit (MMTk) with Ruby. We are supporting the project and working alongside ANU researchers to explore how to build a next-generation garbage collector for Ruby.</p>

<p>If you’re not familiar with MMTk, it offers a highly modular, VM-neutral framework for rapidly building high-performance garbage collectors. Once a language plugs into MMTk, it can leverage a wide range of built-in garbage collection algorithms, ranging from canonical collectors such as NoGC, Mark and Sweep, and Immix to more performant collectors such as Generational Immix and Sticky Immix. Many of these algorithms are considerably more sophisticated than the Mark and Sweep algorithm used in Ruby and have the potential to deliver significant performance gains.</p>

<p>There are currently two implementations of MMTk in Ruby: one is maintained by the MMTk team and is a fork of Ruby (in the <a href="https://github.com/mmtk/ruby">mmtk/ruby</a> and <a href="https://github.com/mmtk/mmtk-ruby">mmtk/mmtk-ruby</a> repositories), the other lives inside Ruby using the modular GC framework (in the <a href="https://github.com/ruby/mmtk">ruby/mmtk</a> repository). You might be wondering, why are there two implementations? The MMTk team’s implementation is much more advanced, with around 5 years of development. They continue to use it to experiment and develop new techniques to further leverage MMTk’s powers and improve performance. The implementation upstreamed to Ruby uses the modular GC framework and is designed to be part of an ecosystem of garbage collectors for Ruby. However, it is a reimplementation that uses techniques and knowledge from the MMTk team’s implementation, but is still quite behind.</p>

<p>In this blog post, we will follow the paper and mostly focusing on MMTk team’s implementation. However, if you want to learn more about the modular GC framework, you can <a href="https://www.youtube.com/watch?v=04axm4JcaT4">watch this talk at RubyKaigi 2025</a> or <a href="/2025-01-08-new-for-ruby-3-4-modular-garbage-collectors-and-mmtk/">read this blog post</a>.</p>

<h2 id="challenges">Challenges</h2>

<p>In the paper, we discuss some of the challenges we faced and solutions we used while integrating MMTk with Ruby. In this blog post, we highlight some of these challenges, but please read the paper if you want the entire picture.</p>

<h3 id="copying-garbage-collector">Copying Garbage Collector</h3>

<p>When Ruby 2.7 introduced a moving garbage collector, it marked the first time that the memory location of objects could be moved. To facilitate this, there needed to be additional code in each of the data types in Ruby to update the address of the object after it has been moved. To ensure backwards compatibility, each data type needed to opt-in to using a new API that supports object movement, and all the existing types would pin the objects they refer to. A pinned object cannot move.</p>

<p>This pinning system works for Ruby’s default (built-in) garbage collector, because it has a marking phase to determine objects that are live and objects that are pinned followed by a compaction phase to move non-pinned objects. However, many of MMTk’s algorithms combine the marking and moving phases, meaning that an object is moved the moment it is marked. For algorithms like Immix, objects can be pinned, but they must be specified ahead of time. One solution would be to scan the heap twice: first to determine which objects get pinned, and again to mark all live objects and move the unpinned objects. However, this is inefficient because it essentially involves scanning the whole Ruby heap twice.</p>

<p>Fortunately, it’s been more than 5 years since a moving garbage collector was introduced to Ruby, so almost all the types in Ruby and many native gems support it. We introduced a new concept called Potentially Pinning Parents, or PPP for short. An object is a PPP if it could potentially contain references that cannot be moved. Earlier this year, we made an effort to reduce PPP objects. In fact, as of the time of writing, there are no user-facing Ruby objects that are PPPs except for ones defined in native gems (which we do not have any control over). There are still a few internal Ruby objects that are PPPs, but we are working on eliminating those as well.</p>

<p>Since we now know whether an object is a PPP at allocation time, MMTk keeps a list of PPP objects that are alive. Using that list, during a garbage collection cycle, it inspects every PPP object to determine the child objects that should be pinned before moving onto the phase to mark and move objects. Since the set of PPP objects is now small, this phase can be completed very quickly.</p>

<h3 id="finalization">Finalization</h3>

<p>Before Ruby 3.2, all Ruby objects were allocated out of the garbage collector in fixed 40-byte slots. This meant that any additional data for the object needed to be allocated externally, usually through the system using <code class="language-plaintext highlighter-rouge">malloc</code>. In Ruby 3.2, we introduced <a href="https://shopify.engineering/ruby-variable-width-allocation">Variable Width Allocation</a> which allows us to allocate dynamic slot sizes through the garbage collector. However, because of legacy reasons and technical limitations of Variable Width Allocation, there are still many cases where we need to allocate memory out of the system through <code class="language-plaintext highlighter-rouge">malloc</code>.</p>

<p>One of the superpowers of MMTk is that it supports parallelism in the garbage collector. Unlike Ruby’s default garbage collector, MMTk can split the work that needs to be done during a GC cycle (marking, sweeping, moving, etc.) into small chunks (MMTk calls these “work packets”) and process these work packets in parallel across multiple CPU cores.</p>

<p>It’s important to note however that while MMTk can perform its GC work in parallel, it does not run concurrently with the VM. In that sense, MMTk is a parallelized GC implementation, but it is not concurrent, meaning that Ruby code cannot run while the garbage collector is running, so it still requires the Ruby VM to be stopped.</p>

<p>There were many challenges that we had to overcome to move from a serial garbage collector to a parallel one, including removing dependence on thread-local variables and race conditions. However, while those issues were apparent as crashes and unexpected behavior, we ran into a tricky problem: our garbage collection cycles were slower the more threads we used!</p>

<p>This was counterintuitive, because if each CPU core does less work, then shouldn’t it run faster? We looked at performance profiles more closely, and saw that it was the finalization phase that was slower. The finalization phase iterates over all dead objects to run code to do things like reclaim memory or close file descriptors. Specifically, we found that the culprit was <code class="language-plaintext highlighter-rouge">free</code>, the function that frees memory allocated through <code class="language-plaintext highlighter-rouge">malloc</code>. In the following graph, we freed 100 million 32-byte pieces of memory using <code class="language-plaintext highlighter-rouge">free</code>. We measure the time taken (in milliseconds) with the work split across a varying number of threads and using various implementations of <code class="language-plaintext highlighter-rouge">malloc</code>. We see that for glibc, jemalloc, and tcmalloc, they all scale negatively with the number of threads. The only allocator that offers any scalability is mimalloc, but we see little to no gain past a factor of 4. This is likely due to mimalloc’s design for a fast <code class="language-plaintext highlighter-rouge">free</code> that maximizes concurrency.</p>

<table>
  <thead>
    <tr>
      <th>Threads</th>
      <th>glibc</th>
      <th>jemalloc</th>
      <th>tcmalloc</th>
      <th>mimalloc</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>1,263</td>
      <td>3,935</td>
      <td>4,988</td>
      <td>903</td>
    </tr>
    <tr>
      <td>2</td>
      <td>5,002</td>
      <td>11,719</td>
      <td>13,539</td>
      <td>493</td>
    </tr>
    <tr>
      <td>3</td>
      <td>5,787</td>
      <td>17,606</td>
      <td>11,374</td>
      <td>346</td>
    </tr>
    <tr>
      <td>4</td>
      <td>6,790</td>
      <td>22,478</td>
      <td>17,295</td>
      <td>265</td>
    </tr>
    <tr>
      <td>5</td>
      <td>8,058</td>
      <td> </td>
      <td>17,785</td>
      <td>291</td>
    </tr>
    <tr>
      <td>6</td>
      <td>7,473</td>
      <td> </td>
      <td>19,227</td>
      <td>243</td>
    </tr>
    <tr>
      <td>10</td>
      <td>9,400</td>
      <td> </td>
      <td>23,350</td>
      <td>230</td>
    </tr>
    <tr>
      <td>100</td>
      <td>11,260</td>
      <td> </td>
      <td>24,195</td>
      <td>228</td>
    </tr>
  </tbody>
</table>

<p>Another difference between MMTk and the default GC is that if an object does not require finalization (i.e. it does not have any resources that need to be reclaimed), then we don’t need to visit it at all, further improving performance. MMTk can use a bump pointer allocator, which increments a pointer for every allocation until it reaches the end of the allocation space. Meanwhile, the default GC in Ruby uses a freelist allocator, which uses a linked list of free slots to allocate objects into. Since building the freelist requires visiting all dead objects anyway, the default GC won’t be able to take advantage of this improvement.</p>

<p>The solution to this challenge was to avoid using <code class="language-plaintext highlighter-rouge">malloc</code>. Instead, MMTk allocates the buffer for common types (<code class="language-plaintext highlighter-rouge">Array</code>, <code class="language-plaintext highlighter-rouge">String</code>, and <code class="language-plaintext highlighter-rouge">MatchData</code> objects) using hidden Ruby objects instead. Since these buffer objects are now Ruby objects, they are also allocated through MMTk. As a result, these buffers now have automatic memory management, rather than manual memory management like <code class="language-plaintext highlighter-rouge">malloc</code>. This means that <code class="language-plaintext highlighter-rouge">Array</code>, <code class="language-plaintext highlighter-rouge">String</code>, and <code class="language-plaintext highlighter-rouge">MatchData</code> need to mark their buffer objects to keep those buffers alive in the marking phase, but, in return, they don’t need to do anything anymore during the finalization phase.</p>

<h2 id="future-work--conclusion">Future Work &amp; Conclusion</h2>

<p>In this blog post, we looked at a few of the challenges we encountered in integrating MMTk with Ruby and the solutions we used. We hope that sharing our experiences can provide insights for Ruby developers, garbage collector researchers, and language designers.</p>

<p>Work continues in MMTk’s fork of Ruby to experiment with more optimized memory layouts, new techniques for object movement, and integrations between JIT compilers and the garbage collector. We are also using the lessons we learned with MMTk to make improvements into Ruby upstream.</p>
</body></html>]]></content><author><name>Peter Zhu</name></author><category term="posts" /><category term="2025-09-16-reworking-memory-management-in-cruby" /><summary type="html"><![CDATA[Shopify sponsors and collaborates with academia to take Ruby to new heights. In this post, we give an overview of what we've built in collaboration with the Australian National University.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/3562da255a44c58e971e22633979facbb07865d2.png" /><media:content medium="image" url="https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/3562da255a44c58e971e22633979facbb07865d2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How Ruby Executes JIT Code: The Hidden Mechanics Behind the Magic</title><link href="https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic/" rel="alternate" type="text/html" title="How Ruby Executes JIT Code: The Hidden Mechanics Behind the Magic" /><published>2025-09-08T00:00:00+00:00</published><updated>2025-09-08T00:00:00+00:00</updated><id>https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic/</id><content type="html" xml:base="https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic/"><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>Ever since YJIT’s introduction, I’ve felt simultaneously close to and distant from Ruby’s JIT compiler. I know how to enable it in my Ruby programs. I know it makes my Ruby programs run faster by compiling some of them into machine code. But my understanding around YJIT, or JIT compilers in Ruby in general, seems to end here.</p>

<p>A few months ago, my colleague <a href="https://bernsteinbear.com/">Max Bernstein</a> wrote <a href="https://railsatscale.com/2025-05-14-merge-zjit/">ZJIT has been merged into Ruby</a> to explain how ZJIT compiles Ruby’s bytecode to HIR, LIR, and then to native code.
It sheds some light on how JIT compilers can compile our program, which is why I started to <a href="https://github.com/ruby/ruby/pulls?q=is%3Apr+author%3Ast0012+ZJIT+">contribute to ZJIT in July</a>.
But I still had many questions unanswered before digging into the source code and asking the JIT experts around me (<a href="https://bernsteinbear.com/">Max</a>, <a href="https://github.com/k0kubun">Kokubun</a>, and <a href="https://alanwu.space/">Alan</a>).</p>

<p>So I want to use this post to answer some questions/mental gaps you might also have about JIT compilers for Ruby:</p>

<ol>
  <li><strong>Where does JIT-compiled code actually live?</strong></li>
  <li><strong>How does Ruby actually execute JIT code?</strong></li>
  <li><strong>How does Ruby decide what to compile?</strong></li>
  <li><strong>Why does JIT-compiled code fall back to the interpreter?</strong></li>
</ol>

<p>While we use ZJIT (Ruby’s experimental next-generation JIT) as our reference, these concepts apply equally to YJIT as well.</p>

<h2 id="where-jit-compiled-code-actually-lives">Where JIT-Compiled Code Actually Lives</h2>

<h3 id="ruby-iseqs-and-yarv-bytecode">Ruby ISEQs and YARV Bytecode</h3>

<p>When Ruby loads your code, it compiles each method into an Instruction Sequence (ISEQ) - a data structure containing <a href="https://en.wikipedia.org/wiki/YARV">YARV</a> (CRuby virtual machine) bytecode instructions.</p>

<p>(If you’re not familiar with YARV instructions or want to learn more, <a href="https://kddnewton.com/">Kevin Newton</a> wrote a <a href="https://kddnewton.com/2022/11/30/advent-of-yarv-part-0.html">great blog series</a> to introduce them)</p>

<p>Let’s start with a simple example:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span>
  <span class="n">bar</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">bar</span>
  <span class="mi">42</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Running <code class="language-plaintext highlighter-rouge">ruby --dump=insn example.rb</code> shows us the bytecode:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>== disasm: #&lt;ISeq:foo@example.rb:1 (1,0)-(3,3)&gt;
0000 putself                                                          (   2)[LiCa]
0001 opt_send_without_block                 &lt;calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE&gt;
0003 leave                                  [Re]

== disasm: #&lt;ISeq:bar@example.rb:5 (5,0)-(7,3)&gt;
0000 putobject                              42                        (   6)[LiCa]
0002 leave                                  [Re]
</code></pre></div></div>

<h3 id="jit-compiled-code-lives-on-iseq-too">JIT-Compiled Code Lives on ISEQ Too</h3>

<p>I assumed JIT-compiled code would replace bytecode—after all, native code is faster. But Ruby keeps both, for good reason.</p>

<p>Here’s what an ISEQ looks like initially:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ISEQ (foo method)
├── body
│   ├── bytecode: [putself, opt_send_without_block, leave]
│   ├── jit_entry: NULL  // No JIT code yet
│   ├── jit_entry_calls: 0  // Call counter
</code></pre></div></div>

<p>After the method is called repeatedly and gets JIT-compiled:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ISEQ (foo method)
├── body
│   ├── bytecode: [putself, opt_send_without_block, leave]  // Still here!
│   ├── jit_entry: 0x7f8b2c001000  // Pointer to native machine code
│   ├── jit_entry_calls: 35  // Reached compilation threshold
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">jit_entry</code> field is the gateway to native code. When it’s NULL, Ruby interprets bytecode. When it points to compiled code, Ruby can jump directly to machine instructions.
But the bytecode never goes away - Ruby needs it for de-optimization, which we will explore a bit later.</p>

<h2 id="the-execution-switch-from-bytecode-to-native-code">The Execution Switch: From Bytecode to Native Code</h2>

<p>This is easier than I expected. Since each ISEQ points to its JIT compiled code when it’s available, Ruby simply
checks the <code class="language-plaintext highlighter-rouge">jit_entry</code> field on every ISEQ it’s going to execute:</p>

<figure><img src="./jit-compiled-execution.svg" alt="JIT-compiled code execution"><figcaption>JIT-compiled code execution</figcaption></figure>

<p>When there’s no JIT code (<code class="language-plaintext highlighter-rouge">jit_entry</code> is NULL), it continues interpreting. Otherwise, it runs the compiled native code.</p>

<h2 id="how-ruby-decides-what-to-compile">How Ruby Decides What to Compile</h2>

<p>Ruby doesn’t compile methods randomly or all at once. Instead, methods earn compilation through repeated use. In ZJIT, this happens in two phases:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">body</span><span class="o">-&gt;</span><span class="n">jit_entry</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="o">&amp;&amp;</span> <span class="n">rb_zjit_enabled_p</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">body</span><span class="o">-&gt;</span><span class="n">jit_entry_calls</span><span class="o">++</span><span class="p">;</span>

    <span class="c1">// Phase 1: Profile the method</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">body</span><span class="o">-&gt;</span><span class="n">jit_entry_calls</span> <span class="o">==</span> <span class="n">rb_zjit_profile_threshold</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">rb_zjit_profile_enable</span><span class="p">(</span><span class="n">iseq</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Phase 2: Compile to native code</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">body</span><span class="o">-&gt;</span><span class="n">jit_entry_calls</span> <span class="o">==</span> <span class="n">rb_zjit_call_threshold</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">rb_zjit_compile_iseq</span><span class="p">(</span><span class="n">iseq</span><span class="p">,</span> <span class="nb">false</span><span class="p">);</span>
        <span class="c1">// After this, jit_entry points to machine code</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As of now, ZJIT’s default profile threshold is <code class="language-plaintext highlighter-rouge">25</code> and compile threshold is <code class="language-plaintext highlighter-rouge">30</code> (both may change in the future). So a method’s lifecycle may look like this:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Calls:     0 ─────────── 25 ────────── 30 ─────────────────►
           │              │             │
Mode:      └─ Interpret ──┴── Profile ──┴─ Native Code (JIT compiled)
</code></pre></div></div>

<p>This is why we need to “warm up” the program before we get the peak performance with JIT.</p>

<h3 id="when-jit-code-gives-up-understanding-de-optimization">When JIT Code Gives Up: Understanding De-optimization</h3>

<p>JIT code makes assumptions to run fast. When those assumptions break, Ruby must “de-optimize” - return control to the interpreter. It’s a safety mechanism that ensures your code always produces correct results.</p>

<p>Consider this method:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
  <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
<span class="k">end</span>
</code></pre></div></div>

<p>which would generate these instructions:</p>

<pre><code class="language-txt">== disasm: #&lt;ISeq:add@test.rb:1 (1,0)-(3,3)&gt;
0000 getlocal_WC_0                          a@0                       (   2)[LiCa]
0002 getlocal_WC_0                          b@1
0004 opt_plus                               &lt;calldata!mid:+, argc:1, ARGS_SIMPLE&gt;[CcCr]
0006 leave                                                            (   3)[Re]
</code></pre>

<p>Because Ruby doesn’t know what <code class="language-plaintext highlighter-rouge">opt_plus</code> would be called with beforehand, the underlying C function <code class="language-plaintext highlighter-rouge">vm_opt_plus</code> needs to handle various classes (like String, Array, Float, Integer, etc.) that can respond to <code class="language-plaintext highlighter-rouge">+</code>.</p>

<p>But, if profiling shows <code class="language-plaintext highlighter-rouge">add</code> is always called with integers (Fixnums), JIT compilers can generate optimized code that <em>only</em> handles integer addition. But it includes “guards” to check this assumption:</p>

<figure><img src="./jit-deopt.svg" alt="JIT type guard"><figcaption>JIT type guard</figcaption></figure>

<p>When the assumption is broken, like when <code class="language-plaintext highlighter-rouge">add(1.5, 2)</code> is called:</p>

<ol>
  <li>The guard check fails</li>
  <li>JIT code jumps to a “side exit”</li>
  <li>The side exit restores interpreter state (stack, instruction pointer..etc.)</li>
  <li>Control returns to the interpreter</li>
  <li>The interpreter executes <code class="language-plaintext highlighter-rouge">opt_plus</code> and calls the <code class="language-plaintext highlighter-rouge">vm_opt_plus</code> function</li>
</ol>

<p>Other triggers for falling back include:</p>

<ul>
  <li>
<strong>TracePoint activation</strong> - TracePoint needs bytecode execution for properly emitting events (more details below)</li>
  <li>
<strong>Redefined core methods</strong> - Someone changed what <code class="language-plaintext highlighter-rouge">+</code> means on Integer</li>
  <li>
<strong>Ractor usage</strong> - Multi-ractor changes some YARV instruction’s behaviour. So the compiled code could perform differently than the interpreter in that situation</li>
</ul>

<p>These assumption checks, or patch points as we call them in ZJIT, make sure your program performs correctly when any of the assumptions change.</p>

<h2 id="answering-some-additional-questions">Answering Some Additional Questions</h2>

<p><strong>Why does enabling TracePoint slow everything down?</strong></p>

<p>(<a href="https://docs.ruby-lang.org/en/master/TracePoint.html">TracePoint</a> is a Ruby class that can be used to register callbacks on specific Ruby execution events. It’s commonly used in debugging/development tools.)</p>

<p>Most of TracePoint’s events are triggered by corresponding YARV bytecode. When TracePoint is activated, instructions in ISEQs will be replaced with their <code class="language-plaintext highlighter-rouge">trace_*</code> counterpart. Like <code class="language-plaintext highlighter-rouge">opt_plus</code> will be replaced with <code class="language-plaintext highlighter-rouge">trace_opt_plus</code>.</p>

<p>If Ruby only executes the compiled machine code, then those events wouldn’t be triggered correctly. Therefore, when ZJIT and YJIT compilers detect TracePoint’s activation, they immediately throw away the optimized code to force Ruby to interpret YARV instructions instead.</p>

<p><strong>Why doesn’t Ruby just compile everything?</strong></p>

<p>Many methods are called rarely. Compiling them would waste memory and compilation time for no performance benefit. Also, compiling methods without profiling would mean that JIT compilers either make wrong assumptions that get invalidated pretty quickly, or don’t make specific enough assumptions that miss further optimization opportunities.</p>

<h2 id="final-notes">Final Notes</h2>

<p>I hope this post helped you understand JIT compilers, a now essential part of Ruby, a little bit more.</p>

<p>If you want to learn more about Ruby’s new JIT compiler: ZJIT, I highly recommend giving <a href="https://railsatscale.com/2025-05-14-merge-zjit/">ZJIT has been merged into Ruby</a> a read.
And if you want to learn more about Ruby’s YARV instructions, <a href="https://kddnewton.com/">Kevin Newton</a>’s <a href="https://kddnewton.com/2022/11/30/advent-of-yarv-part-0.html">Advent of YARV series</a> is the best resource.</p>

</body></html>]]></content><author><name>Stan Lo</name></author><category term="posts" /><category term="2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic" /><summary type="html"><![CDATA[Where does JIT-compiled code live? How does Ruby switch between bytecode and native execution? Why does TracePoint slow everything down? This post answers the JIT questions most Ruby developers have but rarely see explained.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic/9538a6aafba35405a437f22d8b09eafcac9618af.png" /><media:content medium="image" url="https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-magic/9538a6aafba35405a437f22d8b09eafcac9618af.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>