How Ruby Executes JIT Code: The Hidden Mechanics Behind the Magic
Ever since YJIT’s introduction, I’ve felt simultaneously close to and distant from Ruby’s JIT compiler. I know how to enable it in my Ruby programs. I know it makes my Ruby programs run faster by compiling some of them into machine code. But my understanding around YJIT, or JIT compilers in Ruby in general, seems to end here.
A few months ago, my colleague Max Bernstein wrote ZJIT has been merged into Ruby to explain how ZJIT compiles Ruby’s bytecode to HIR, LIR, and then to native code. It sheds some light on how JIT compilers can compile our program, which is why I started to contribute to ZJIT in July. But I still had many questions unanswered before digging into the source code and asking the JIT experts around me (Max, Kokubun, and Alan).
So I want to use this post to answer some questions/mental gaps you might also have about JIT compilers for Ruby:
- Where does JIT-compiled code actually live?
- How does Ruby actually execute JIT code?
- How does Ruby decide what to compile?
- Why does JIT-compiled code fall back to the interpreter?
While we use ZJIT (Ruby’s experimental next-generation JIT) as our reference, these concepts apply equally to YJIT as well.
Where JIT-Compiled Code Actually Lives
Ruby ISEQs and YARV Bytecode
When Ruby loads your code, it compiles each method into an Instruction Sequence (ISEQ) - a data structure containing YARV (CRuby virtual machine) bytecode instructions.
(If you’re not familiar with YARV instructions or want to learn more, Kevin Newton wrote a great blog series to introduce them)
Let’s start with a simple example:
def foo
bar
end
def bar
42
end
Running ruby --dump=insn example.rb
shows us the bytecode:
== disasm: #<ISeq:foo@example.rb:1 (1,0)-(3,3)>
0000 putself ( 2)[LiCa]
0001 opt_send_without_block <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 leave [Re]
== disasm: #<ISeq:bar@example.rb:5 (5,0)-(7,3)>
0000 putobject 42 ( 6)[LiCa]
0002 leave [Re]
JIT-Compiled Code Lives on ISEQ Too
I assumed JIT-compiled code would replace bytecode—after all, native code is faster. But Ruby keeps both, for good reason.
Here’s what an ISEQ looks like initially:
ISEQ (foo method)
├── body
│ ├── bytecode: [putself, opt_send_without_block, leave]
│ ├── jit_entry: NULL // No JIT code yet
│ ├── jit_entry_calls: 0 // Call counter
After the method is called repeatedly and gets JIT-compiled:
ISEQ (foo method)
├── body
│ ├── bytecode: [putself, opt_send_without_block, leave] // Still here!
│ ├── jit_entry: 0x7f8b2c001000 // Pointer to native machine code
│ ├── jit_entry_calls: 35 // Reached compilation threshold
The jit_entry
field is the gateway to native code. When it’s NULL, Ruby interprets bytecode. When it points to compiled code, Ruby can jump directly to machine instructions.
But the bytecode never goes away - Ruby needs it for de-optimization, which we will explore a bit later.
The Execution Switch: From Bytecode to Native Code
This is easier than I expected. Since each ISEQ points to its JIT compiled code when it’s available, Ruby simply
checks the jit_entry
field on every ISEQ it’s going to execute:
When there’s no JIT code (jit_entry
is NULL), it continues interpreting. Otherwise, it runs the compiled native code.
How Ruby Decides What to Compile
Ruby doesn’t compile methods randomly or all at once. Instead, methods earn compilation through repeated use. In ZJIT, this happens in two phases:
if (body->jit_entry == NULL && rb_zjit_enabled_p) {
body->jit_entry_calls++;
// Phase 1: Profile the method
if (body->jit_entry_calls == rb_zjit_profile_threshold) {
rb_zjit_profile_enable(iseq);
}
// Phase 2: Compile to native code
if (body->jit_entry_calls == rb_zjit_call_threshold) {
rb_zjit_compile_iseq(iseq, false);
// After this, jit_entry points to machine code
}
}
As of now, ZJIT’s default profile threshold is 25
and compile threshold is 30
(both may change in the future). So a method’s lifecycle may look like this:
Calls: 0 ─────────── 25 ────────── 30 ─────────────────►
│ │ │
Mode: └─ Interpret ──┴── Profile ──┴─ Native Code (JIT compiled)
This is why we need to “warm up” the program before we get the peak performance with JIT.
When JIT Code Gives Up: Understanding De-optimization
JIT code makes assumptions to run fast. When those assumptions break, Ruby must “de-optimize” - return control to the interpreter. It’s a safety mechanism that ensures your code always produces correct results.
Consider this method:
def add(a, b)
a + b
end
which would generate these instructions:
== disasm: #<ISeq:add@test.rb:1 (1,0)-(3,3)>
0000 getlocal_WC_0 a@0 ( 2)[LiCa]
0002 getlocal_WC_0 b@1
0004 opt_plus <calldata!mid:+, argc:1, ARGS_SIMPLE>[CcCr]
0006 leave ( 3)[Re]
Because Ruby doesn’t know what opt_plus
would be called with beforehand, the underlying C function vm_opt_plus
needs to handle various classes (like String, Array, Float, Integer, etc.) that can respond to +
.
But, if profiling shows add
is always called with integers (Fixnums), JIT compilers can generate optimized code that only handles integer addition. But it includes “guards” to check this assumption:
When the assumption is broken, like when add(1.5, 2)
is called:
- The guard check fails
- JIT code jumps to a “side exit”
- The side exit restores interpreter state (stack, instruction pointer..etc.)
- Control returns to the interpreter
- The interpreter executes
opt_plus
and calls thevm_opt_plus
function
Other triggers for falling back include:
- TracePoint activation - TracePoint needs bytecode execution for properly emitting events (more details below)
- Redefined core methods - Someone changed what
+
means on Integer - Ractor usage - Multi-ractor changes some YARV instruction’s behaviour. So the compiled code could perform differently than the interpreter in that situation
These assumption checks, or patch points as we call them in ZJIT, make sure your program performs correctly when any of the assumptions change.
Answering Some Additional Questions
Why does enabling TracePoint slow everything down?
(TracePoint is a Ruby class that can be used to register callbacks on specific Ruby execution events. It’s commonly used in debugging/development tools.)
Most of TracePoint’s events are triggered by corresponding YARV bytecode. When TracePoint is activated, instructions in ISEQs will be replaced with their trace_*
counterpart. Like opt_plus
will be replaced with trace_opt_plus
.
If Ruby only executes the compiled machine code, then those events wouldn’t be triggered correctly. Therefore, when ZJIT and YJIT compilers detect TracePoint’s activation, they immediately throw away the optimized code to force Ruby to interpret YARV instructions instead.
Why doesn’t Ruby just compile everything?
Many methods are called rarely. Compiling them would waste memory and compilation time for no performance benefit. Also, compiling methods without profiling would mean that JIT compilers either make wrong assumptions that get invalidated pretty quickly, or don’t make specific enough assumptions that miss further optimization opportunities.
Final Notes
I hope this post helped you understand JIT compilers, a now essential part of Ruby, a little bit more.
If you want to learn more about Ruby’s new JIT compiler: ZJIT, I highly recommend giving ZJIT has been merged into Ruby a read. And if you want to learn more about Ruby’s YARV instructions, Kevin Newton’s Advent of YARV series is the best resource.