YJIT 3.4: Even Faster and More Memory-Efficient
It’s 2025, and this year again, the YJIT team brings you a new version of YJIT that is even faster, more stable, and more memory-efficient.
A new baseline
Last year’s YJIT release delivered an impressive performance boost which earned us multiple shoutouts on social media. I was pleasantly surprised to hear that many large businesses running Rails in production had upgraded to the latest version of Ruby, in part because they were excited to get better performance. I distinctly remember, when I started at Shopify, before YJIT was a thing, that most Ruby deployments were several versions behind. Seeing many people deploying the latest Ruby with YJIT enabled left me with a warm fuzzy feeling that we had made a difference in terms of Ruby adoption. Performance really is the carrot that gets people excited.
Historically, we’ve compared the performance of YJIT to that of the CRuby interpreter, which makes for some impressive numbers. For instance, as of this writing, on our x86-64 benchmarking setup, YJIT 3.4 is ~92% faster than the interpreter across the headline benchmarks we track. However, this year marks the 4th YJIT release, and as such, we’re going with the assumption that many production deployments already have YJIT enabled, and will be upgrading from Ruby 3.3 + YJIT to Ruby 3.4 + YJIT, rather than upgrading from a deployment with YJIT disabled.
Going with this assumption, we’ve decided to change the way we track performance numbers on speed.yjit.org, and start tracking how fast the latest version of YJIT is compared to the version included in the last Ruby release, rather than only looking at how fast YJIT is compared to the interpreter. This turned out to be pretty important, because there have been cases where we’ve found (and fixed) performance regressions in the CRuby interpreter. This is a whole other discussion, but if we only compare YJIT’s performance against that of the interpreter, and the interpreter slows down without anyone noticing, it will make YJIT look better, but it definitely wouldn’t be a net win for the Ruby community.
Many small improvements
Comparing against the previous release, YJIT 3.4 is 5-7% faster on our benchmarks than YJIT 3.3.6. These are not an earth-shattering numbers, but for many use cases your code should run noticeably faster. This better performance also comes with a significant reduction in memory usage, many additional bug fixes, and some minor quality of life improvements. We’ll go over some of these changes in the following sections.
Quality of life
There is now a
--yjit-mem-size=N
command-line option to set the amount of memory overhead YJIT is allowed to use.
This behaves much more intuitively than the old --yjit-exec-mem-size=N
option, which did not account
for all of the metadata that YJIT needs to allocate.
Another small quality of life improvement is the addition of a YJIT compilation log, which
can be enabled via --yjit-log
. This allows you to see which pieces of Ruby code are compiled
at different times as your program is running. We’ve also made it so that the tail of the log can be
accessed at run-time as a Ruby object. This was done to make it possible to monitor it on our fleet
of servers in production.
The compilation log was mainly created for our own uses, it’s probably not something most users of
YJIT will want to look at. However, it can be useful to detect if your production workloads
compile new Ruby code dynamically at run-time, which is often undesirable.
We’ve also improved support for profiling using the perf
tool. You can now get a
more detailed estimate of the number of cycles for code generated by YJIT.
For more information on how to use YJIT and its command-line options, you can take a look
at the YJIT README.
Improved inlining
A significant contributor to YJIT 3.4’s better performance comes from an
improved ability to inline small C and Ruby methods.
YJIT is able to generate specialized machine code for many core C methods
that are frequently called. For example, it can inline String#empty?
and
Array#length
. YJIT 3.4 adds more specializations of this type. In practice,
this can make a fairly significant difference. On the lobsters
benchmark for example, we are able to inline over 56.3% of C method calls.
On the liquid-render
benchmark, this number is as high as 82.5%.
With this new version of YJIT, we’ve also added the ability to inline many
simple/trivial methods. For example, we can inline empty methods, methods
returning a constant, methods returning a string, methods returning self
,
and methods that directly return one of their arguments. This allows us to
inline 4.8% of Ruby calls on the lobsters
benchmark and 7.6% on liquid-render
.
It’s also particularly helpful to improve the performance of Sorbet type
annotations.
Pure-Ruby core methods
Early on in YJIT’s development, we found that there was a large performance
penalty when calling back and forth between C and Ruby code. This has been
particularly challenging, because core methods such as
Array#each
and Array#map
are written in C and repeatedly call into
Ruby blocks. We’ve found that YJIT could do a better job of optimizing
these methods if they were written in Ruby, but if we rewrite that code in
Ruby it tends to slow down the interpreter, making this a difficult tradeoff.
With YJIT 3.4, we now have the ability to switch the implementation of specific
core methods to pure-Ruby versions, but only when YJIT is enabled. That way, we
can get the best of both worlds. We’ve optimized Array#each
, Array#map
,
Array#select
, Array#filter
, and Integer#downto
this way. My colleague
Takashi Kokubun even went the extra mile and made it so that these methods appear
like C methods in cases when backtrace compatibility is expected.
Binary serialization
Something we’ve started experimenting with in 2024 is writing more pure-Ruby gems. Since we work in the web space, we deal with many gems that are used to serialize and deserialize text and binary data. We wrote our own pure-Ruby implementation of protobuf. This was a useful exercise because it allowed us to identify specific performance pain points and potential areas of improvement.
Some small but significant changes that we made include adding YJIT
fast paths for core methods such as
String#setbyte
and String#getbyte
. We’ve added a fast path
to append bytes to binary strings using String#<<
. There is
a new String#append_as_bytes
method that avoids resetting the
encoding of binary strings when appending bytes (Feature #20594).
We’ve also improved YJIT
support for bitwise operations. This makes it possible to write
pure-Ruby gems that manipulate binary data much faster than what
was possible before. We’ve experimented with writing a pure-Ruby
implementation of the protobuf spec and measured that this runs
about 14% faster on x86-64 with Ruby 3.4 than with Ruby 3.3.
Improved type propagation
We’ve also made some small tweaks to give YJIT access to better type information
to drive its optimization capabilities. We used to not propagate Array
, Hash
and String
class types due to the possibility of switching objects to singleton classes.
Now we have an invalidation mechanism that allows us to handle this without losing out
in the general case.
Improved register allocator
The YJIT 3.4 register allocator is more sophisticated than what we shipped with 3.3. We can now allocate registers for local variables and pass arguments in registers.
Lazy frame pushing
Ruby code makes a lot of calls to C functions, both as part of C extensions and also
as part of the Ruby runtime library. One tricky aspect here is that many core methods
could technically raise an exception. This used to be a challenge for us because it meant that
we always had to push a Ruby stack frame to be prepared for this eventuality.
YJIT 3.4 can now speculatively skip pushing a frame and then lazily push one only when it’s needed.
This is used for String#byteslice
, String#setbyte
, and Class#superclass
among others.
Reduced memory usage
The memory usage situation looks much better with YJIT 3.4 than it did with 3.3. According to our benchmarks on Linux/x86-64, turning on YJIT with Ruby 3.3 increases memory usage by 21% on average. With YJIT 3.4, we see a large decrease in memory usage compared to Ruby 3.3 with YJIT enabled. We can’t take all of the credit for this, because much of the improvement comes from changes on the CRuby side. In particular, a change was made to stop eagerly allocating stack space, and this made a huge difference in overall memory usage. Going from Ruby 3.3 to 3.4 (without YJIT) reduces memory usage by about 8% on average. Note that this is true on x86-64 but not on arm64.
Regardless of which platform you use, you should expect YJIT 3.4 to use slightly less memory than YJIT 3.3 on average, even though it compiles more code. Part of the improvement comes from a compressed context representation. A large proportion of the memory used by YJIT is not compiled machine code, but rather metadata associated with the compiled code. We’ve found a way to compress this data to encode it using much less memory. We’re also able to eliminate some duplicate metadata in some cases.
Performance on benchmarks
The graph below shows a comparison of YJIT 3.4’s performance against YJIT 3.3.6 on an x86-64 machine (Xeon Platinum 8488C) on our largest benchmarks. As can be seen, YJIT 3.4 is faster on almost every benchmark. Sometimes not by much, sometimes by a significant margin.
What’s particularly great is that YJIT 3.4 uses less memory than YJIT 3.3 on most benchmarks, as shown in the following graph:
Performance in production
This year, Shopify processed a record $11.5 billion in global sales for its merchants over the BFCM (Black Friday Cyber Monday) weekend, and our app servers handled over 80 million requests per minute on Black Friday while running a prerelease version of YJIT 3.4. YJIT has been deployed to all of Shopify’s StoreFront Renderer (SFR) infrastructure for all of 2023 and 2024. For context, SFR renders all Shopify storefronts, which is the first thing buyers see when they navigate to a store hosted by Shopify. It is mostly written in Ruby, depends on over 220 Ruby gems, renders millions of Shopify stores in over 175 countries, and is served by multiple clusters distributed worldwide.
The above graph is a snapshot from our dashboard comparing the performance of the Ruby 3.4.0 interpreter vs YJIT 3.4.0 over the last 12 hours. The Y-axis shows the speedup YJIT provides compared to the interpreter. This is computed based on the total end-to-end time needed to generate a response, including time the SFR servers spend doing I/O, waiting on databases, and other operations YJIT cannot optimize. Given that the speedup figures that we see tend to change over time depending on web traffic, this is not a very scientific comparison, but we’re happy to report slightly better performance than last year and no significant change in memory usage. The p50 and p99 numbers also look quite a bit better than what we saw at this time last year.
Looking forward
In 2025, the YJIT team is investigating what other design decisions could yield more Ruby performance improvements in the future. We’re looking at multiple different avenues and also considering some potentially significant design changes. Stay tuned!
Conclusion
The Ruby 3.4 release is available for you to download from the Ruby releases page.
This year’s release brings you better performance, potentially lower memory usage, and some quality of life
improvements. A small but notable change is the addition of a more intuitive --yjit-mem-size
command-line option to replace the quirky --yjit-exec-mem-size
. The YJIT team wishes you all a happy new
year and joyful hacking!
I’d like to give a big thanks to Shopify, Ruby & Rails Infrastructure, and the YJIT team. I have the privilege of working with many incredibly talented programmers and managers, and this project would not be possible without them. If you’re using YJIT in production, please give us a shout out on Twitter/X. It’s always very rewarding for us to hear about YJIT being deployed in the wild!