Rails at Scale

An Introduction to the Ruby LSP Add-on System

2024-10-03T00:00:00+00:00

CONFERENCE NEWS

The Ruby LSP team will be in attendance at RubyConf 2024 in Chicago (November). Reach out if you want to chat about anything related to Ruby LSP, or the wider Ruby developer experience.

Overview

In this post we introduce the Ruby LSP add-on system. We’ll explain the problem it solves, discuss its architecture, showcase some example add-ons, and share our vision for a future add-on ecosystem that enhances the Ruby development experience.

Introduction

Ruby LSP is a language server implementation designed to streamline writing Ruby code. It uses static analysis to parse your code to provide editor features. However, in the Ruby ecosystem, dynamic programming and DSLs are commonly used to extend Ruby, and this can be a challenge for editor tooling. Rails, in particular, leverages these techniques extensively.

Since any gem can define its own DSL written with meta-programming, it wouldn’t be feasible to add specialized handling for each one in the Ruby LSP static analysis toolkit.

Additionally, each project uses a different set of dependencies. Trying to handle all possible DSLs at the same time could degrade both performance and correctness.

We needed an extensible way to allow other gems to teach the language server about how to properly handle the DSLs that they define, so that we could deliver specialized handling at scale. We also wanted to explore if we could add runtime components to the language server’s knowledge. That way, we could overcome some of the limitations of static analysis by communicating with the application being developed.

The language server protocol (LSP) allows editors to communicate with multiple servers simultaneously, enabling authors to implement their own LSPs. While implementing the basics of a language server isn’t overly complex, ensuring proper encoding handling, extensive specification support and adequate performance is challenging. This approach has several downsides:

Repeated effort: Implementing the same functionality, such as handling different Ruby version managers and platform differences.
Re-implementing static analysis: Every new language server has to implement its own static analysis tooling.
Higher resource usage: Duplicate indexing of the codebase and maintaining duplicate document representations.
Less responsive editing: Each LSP server has to process the Ruby code, rather than sharing a parsed representation.
More configuration: Using multiple language servers may require users to configure each one separately.
Requires a VS Code extensions: Unlike other editors, each LSP server needs a corresponding extension if VS Code is to be supported.

Architecture

To address these issues, we offer an add-on system for developers to enhance Ruby LSP’s behavior. Add-ons can add specialized functionality to a number of language server features, but they can also improve the Ruby LSP’s knowledge of the codebase declarations by contributing indexing enhancements.

Add-ons can be distributed in several ways:

As a standalone add-on gem
Alongside an existing library
As part of an existing app or library, for project-specific behavior

Add-ons are written in Ruby and implemented as classes inheriting from RubyLsp::Addon, and are automatically discovered and loaded by the Ruby LSP server provided they follow the naming convention my_gem/lib/ruby_lsp/my_gem/addon.rb. As with Ruby LSP itself, add-ons are not tied to any particular editor.

The Ruby LSP relies extensively on Prism, a fast, modern parser for Ruby. This is the basis of how the LSP interacts with your code, so familiarity with it is important when building an add-on.

In the example below, global_state is an object supplied by Ruby LSP which provides shared concepts about the current codebase such as the index, configuration and file encoding. The message_queue provides a way for us to send messages to the editor, such as registering for file watching or logging a trace event.

Example: Hover

In this example, we show some information when a developer hovers over a class definition:

# lib/ruby_lsp/my_gem/addon.rb
require "ruby_lsp/addon"
require_relative "my_listener"

module RubyLsp
  module MyAddonGem
    class Addon < ::RubyLsp::Addon
      def activate(global_state, message_queue)
        # Optional: put any activation code that needs to happen
        # once when the language server is booted here
      end

      def deactivate
        # Optional: put any cleanup code that needs to happen
        # when shutting down the server, like terminating a subprocess
      end

      def name
        "My Gem"
      end
    end
  end
end

Within the server, we use an observer pattern, where a listener registers with Prism’s dispatcher to receive events corresponding to particular node events, such as on_class_node_enter. This approach optimizes performance by traversing the AST only once, rather than each add-on doing so separately.

The example below outputs the name of the class when you hover over it:

# lib/ruby_lsp/my_gem/my_listener.rb
class MyListener
  def initialize(dispatcher)
    # Register to listen to `on_class_node_enter` events
    dispatcher.register(self, :on_class_node_enter)
  end

  # Define the handler method for the `on_class_node_enter` event
  def on_class_node_enter(node)
    $stderr.puts "Hello, #{node.constant_path.slice}!"
  end
end

To see this in action, we can pass a snippet of Ruby code:

dispatcher = Prism::Dispatcher.new
MyListener.new(dispatcher)

parse_result = Prism.parse("class Foo; end")
dispatcher.dispatch(parse_result.value)

which will output Hello, Foo!.

To implement an actual LSP feature, we append to @response_builder. This is an abstraction used to compose responses coming from different sources into what is expected by the editor. For example, the code below would add Hover information to each class:

  def on_class_node_enter(node)
    range = range_from_node(node)
    text = "This class is named **#{node.constant_path.slice}**"
    contents = Interface::MarkupContent.new(kind: "markdown", value: text)

    @response_builder << Interface::Hover.new(contents: contents)
  end

(The Interface:: classes come from the language-server-protocol gem that Ruby LSP depends on).

Capabilities

Add-ons are provided with a rich API to deliver their editor integrations:

Testing: Utilities such as helpers.
Logging: Visibility into what Ruby LSP is doing.
Registering formatters and linters: Support features such as format-on-save or quick-fix.
Sending notifications to the client: Display information or errors.
Listening for file updates: Reloading when a configuration file changes.
Static analysis (indexing, type inference, etc.): Provide an accurate representation of the whole codebase’s structure.

As of the date of this post, add-ons are able to enhance the LSP requests for CodeLens, Completion, Definition, DocumentSymbol and Hover.

Indexing Enhancements

There are two ways to enhance Ruby LSP features. One is handling DSLs that occur at a call site and that do not change which declarations exist in the project. An example of this is the Rails validate method, which accepts a symbol that represents a method that is dynamically invoked.

The second is to handle declaration DSLs. These are DSLs that create declarations via metaprogramming. To use another Rails example, belongs_to is a DSL that mutates the current class and adds extra methods based on the arguments passed to it.

These methods would normally be ‘invisible’ to Ruby LSP and so we offer an API for enhancing the index. By ensuring that the index is populated with all declarations, features like go to definition, hover, completion, signature help and workspace symbol will all automatically work.

Limitations

The add-on system is an approach for extending language servers, which is not common to find in other language ecosystems. We are working to understand what is needed from the API with help from the community, so that we can deliver a rich and extensible architecture for the Ruby LSP server.

Note that only the server portion of the language server can currently be enhanced by add-ons. It is not possible to add editor UI elements through this API and it’s unlikely that we would be able to do so as each editor exposes different APIs for managing UI elements.

Case Study: Ruby LSP Rails

Developed by Shopify, Ruby LSP Rails was the first add-on and helped us design the API. It enhances several requests, such as:

Hover for showing database schema information.
Go To Definition for associations and callbacks.
Code Lens for running ‘declarative’ tests inheriting from ActiveSupport::TestCase.

The Rails add-on directly communicates with your Rails app in development, meaning we can offer features that would be very difficult or impossible to build with static analysis, such as navigation from a controller action to the corresponding route:

Case Study: Standard

Standard is a linter and formatter built on top of RuboCop. Initially, it had its own LSP server and VS Code extension.

We collaborated with Standard’s author, Justin Searls, to make a few necessary changes to Ruby LSP to allow Standard to be used as an add-on.

Check out his post Why I just uninstalled my own VS Code extension for more insights.

We’re grateful to Justin for believing in our vision and collaborating to get the Standard add-on shipped!

Other Add-ons

There are several other third-party add-ons already available, such as:

We encourage you to search for ruby-lsp- on rubygems.org to discover more add-ons.

Future Vision

We envision Ruby LSP becoming not only a fully featured language server for Ruby, but also a platform for delivering editor integrations with a thriving ecosystem of add-ons. We hope to see more gems ship with an add-on included, supporting not only linters and formatters but also more specialized tooling. Ideally, add-ons should be auto-discovered and require zero configuration by the user. Our goal is to stabilize the API at v1.0.

Next Steps

To learn more about building an add-on, check out the Add-ons page in the Ruby LSP documentation.

You can also join the conversation in the #ruby-lsp-addons channel in the Ruby DX Slack workspace. We look forward to seeing how you enhance your Ruby development experience with Ruby LSP add-ons!

Mastering Ruby Code Navigation: Major Ruby LSP Enhancements in the First Half of 2024

2024-07-23T00:00:00+00:00

In the first half of 2024, Ruby LSP has seen significant enhancements, particularly in the area of code navigation, thanks to the advancement of its indexer.

In this post, we’ll dive into the major code navigation enhancements that have been made to Ruby LSP. We’ll also touch on some experimental features that are on the horizon.

NOTE

While the Ruby LSP server (ruby-lsp gem) can be integrated with many editors, all the features in this article will be demoed in VS Code with the Ruby LSP extension on a macOS machine.

Therefore, some of the features may not work the same way in non-VS Code editors, or may require different keyboard shortcuts depending on your development environment.

Code Navigation With Ruby LSP
Code Navigation Enhancements
Code Navigation Enhancements for the Rails Addon
Experimental Features
- Ancestors Hierarchy Request
- Guessed Types
Recap

Since this post focuses heavily on code navigation features in editors, let’s dive into these features in more detail.

Code navigation generally refers to four key editor features: hover, go-to-definition, completion, and signature help. When paired with language servers, these features can dramatically improve developer productivity.

Hover

The hover feature displays comments or documentation for the target constant or method when the cursor hovers over them.

In VS Code, if you hover while pressing Command, it will also send a definition request to locate the possible target sources. And it will display the target’s source code if only one source is located (e.g., the class is not reopened in multiple places).

Sorry, your browser doesn't support embedded videos. This video demonstrates the hover feature, showing how comments and documentation are displayed for the target constant or method.

Go-to-Definition

Go-to-definition allows users to navigate to the target constant or method’s definition, whether they’re defined in your project or its dependencies.

This feature can be triggered by one of the following methods:

Right click on the target, and then select Go to Definition
Placing the cursor on the target, and then hit F12
Command + click the target

With One Definition:

Users are taken directly to the source.

Sorry, your browser doesn't support embedded videos. This video shows the go-to-definition feature in action, navigating directly to the source of the target constant or method.

With Multiple Definitions:

Users see a dropdown with all the sources, along with a preview window on the side.

Sorry, your browser doesn't support embedded videos. This video demonstrates the go-to-definition feature when multiple definitions are found, showing the dropdown and preview window.

Completion

The completion feature provides users with completion candidates when the text they type matches certain indexed components. This helps speed up coding by reducing the need to type out full method names or constants. It also allows developers to discover constants or methods that are available to them.

Sorry, your browser doesn't support embedded videos. This video illustrates the completion feature, providing completion candidates as the user types.

Signature Help

Signature help often appears right after users finish typing a method, providing hints about the method’s parameters. This feature is invaluable for understanding the expected arguments and improving code accuracy.

Sorry, your browser doesn't support embedded videos. This video demonstrates the signature help feature, showing hints about the parameters the target method takes.

Let’s take a closer look at the improvements that have been made to these features.

Singleton Methods

All code navigation features for singleton methods have been significantly improved.

Sorry, your browser doesn't support embedded videos. This video shows the improved code navigation features for singleton methods.

Local Variables

Ruby LSP now offers completion support for local variables. This improvement helps developers understand the scope and usage of variables within their code, reducing the likelihood of errors and improving code readability.

Inheritance and Mixins

Ruby LSP’s indexer is now capable of estimating the inheritance hierarchy of your program, allowing it to locate method, variable, and constant definitions through the ancestors list. This includes superclasses and mixins (prepend, include, and extend).

The reason this is considered an estimation is due to Ruby’s autoload feature making it difficult to determine the exact order in which files end up being required (which may impact how the inheritance chain is constructed).

This advancement has enabled several new enhancements that were previously difficult to achieve:

Singleton Methods: All code navigation support for singleton methods now works on inherited singleton methods too.
Super: Added hover and go-to-definition support for the super keyword, allowing developers to easily trace the method call chain.
Methods Defined Through Superclasses and Mixins: All code navigation features now support methods defined in superclasses and mixins, making it easier to understand the inheritance structure.
Instance and Class Instance Variables: Added hover, go-to-definition, and completion support for instance variables and class instance variables, including inherited variables.

This video demonstrates some of these enhancements:

Sorry, your browser doesn't support embedded videos. This video demonstrates the enhancements in code navigation for inheritance and mixins.

Limitations:

Ruby LSP indexes the user’s entire codebase and dependencies by default, but it doesn’t know which files will eventually be required in your application. In some rare cases, class, module, or method definitions may show up in the go-to-definition or completion results, but may not be used by your application during runtime.
While the navigation improvements are substantial, there are still some limitations, such as runtime mixin operations, or mixins added through instance_eval, module_eval, and class_eval. For certain cases, like Rails Concerns, the plan is to support them through addons.

Ruby Core Classes, Modules, and Methods

Ruby LSP now offers various code navigation features for Ruby core classes and modules, such as String and Enumerable, as well as methods:

Hover displays documentation.
Go-to-definition leads users to the respective RBS declarations.
Completion provides them as completion candidates.
Signature help provides signature information when invoking methods.

Limitation:

While Ruby LSP can provide documentation for literal class constants, like String or Array, it still cannot provide documentation for their literals, like "str" or [1, 2, 3].

We’re building a type inferrer to enable such functionality and support on literals should come soon. So stay tuned!

The Rails addon has also seen significant improvements in the area of code navigation.

The table below summarizes the available navigation features and the components they connect:

Navigation Feature	From	To	Description
Go-to-definition	Model	Model	Navigate to another model through associations.
CodeLens	Controller	View	Jump to relevant views on action methods.
Go-to-definition	Controller	Route	Navigate to relevant routes through route helpers.
CodeLens	Controller	Route	Navigate to relevant route definitions on action methods.
Go-to-definition	Controller	Model	Navigate to relevant models through constants.
Go-to-definition	View	Model	Navigate to relevant models through constants.
Go-to-definition	View	Route	Navigate to relevant routes through route helpers.

(Note: CodeLens are clickable items that can be surfaced on top of code in the editor. It’s not a code-navigation-specific feature, but in the Rails addon we use it to trigger jumping actions in controllers.)

Now let’s dive into the specific improvements.

Go-to-Definition for ActiveRecord Model Callbacks and Associations

Developers can now navigate to the definitions of ActiveRecord callbacks and between associated models.

Sorry, your browser doesn't support embedded videos. This video shows the go-to-definition feature for ActiveRecord model callbacks and associations.

The Rails addon now provides two new ways of navigating between controllers, route definitions, and views:

Go-to-definition for route helper definitions.
CodeLens for route definitions and view files.

Sorry, your browser doesn't support embedded videos. This video demonstrates the navigation features for routes and views, including go-to-definition and CodeLens.

Code Navigation in `.erb` Files

Ruby LSP now adds code navigation support to .erb files, allowing users to access all code navigation features in those files as well.

Sorry, your browser doesn't support embedded videos. This video illustrates the code navigation support in `.erb` files, showing the go-to-definition and hover features.

(Technically, this is not a Rails-exclusive enhancement, but because it enables all the previously mentioned code navigation features in view files, it brings a dramatic improvement to the Rails development experience.)

Experimental Features

The Ruby LSP team is constantly working on new features to enhance the development experience. Here are a couple of experimental features that are currently in the works:

Ancestors Hierarchy Request

The ancestors hierarchy request feature aims to provide a better understanding of the inheritance hierarchy within your Ruby code. This feature helps developers trace the lineage of their classes and modules, making it easier to:

Visualize the inheritance hierarchy of classes and modules.
Quickly navigate through the inheritance chain.

Sorry, your browser doesn't support embedded videos. This video demonstrates the ancestors hierarchy request feature, visualizing the inheritance hierarchy.

Why Is It Experimental?

This feature is supported by the Type Hierarchy Supertypes LSP request. During implementation, we encountered some ambiguities when applying it to Ruby. For example:

Should the list include only classes (pure inheritance chain), or should it include modules too (current behavior)?
How should the inheritance chain of singleton classes be triggered and displayed?
If a class or module is reopened multiple times, it will appear multiple times in the list. In real-world applications, this can make the list very long.

We created an issue to seek clarification from the LSP maintainers. We will adjust this feature’s design and behavior based on their response and your feedback.

Guessed Types

The guessed types feature is an experimental addition to Ruby LSP that attempts to identify the type of a receiver based on its identifier name. This helps improve code completion and navigation by providing type information.

This feature is disabled by default but can be enabled with the rubyLsp.enableExperimentalFeatures setting in VS Code.

How It Works

Ruby LSP guesses the type of a variable by matching its identifier name to a class. For example, a variable named user would be assigned the User type if such a class exists:

user.name  # Guessed to be of type `User`
@post.like!  # Guessed to be of type `Post`

By guessing the types of variables, Ruby LSP can expand the code navigation features to even more cases.

Important Notes

Identifiers are not ideal for complex type annotations and can be easily misled by non-matching names.
We do NOT recommend renaming identifiers just to make this feature work.

For more information, please refer to the documentation.

Recap

The first half of 2024 has brought significant improvements to Ruby LSP, particularly in the area of code navigation. From enhanced support for singleton methods, local variables, and inheritance, to ERB support, Ruby LSP is evolving to make Ruby developers’ lives easier. The Rails addon has also seen major enhancements, simplifying navigation between models, views, and controllers. With experimental features like the ancestors hierarchy request and guess type, Ruby LSP continues to adopt new LSP features and explore creative approaches.

While Ruby LSP project is primarily driven by the Ruby Developer Experience team at Shopify, our aim is to continue running it as a collaborative open-source project serving the Ruby community overall. For that reason, we’d like to thank all the contributors who have reported issues, provided feedback, or contributed code to Ruby LSP and its Rails addon. For contributions from outside Shopify, we would like to give a special shoutout to @snutij and @Earlopain for their continuous contributions to the project.

Please give these new features a try, and share your feedback via GitHub issues or the Ruby DX Slack.

Mid-Year Review: IRB and Rails Console Enhancements in the First Half of 2024

2024-07-17T00:00:00+00:00

As a core part of the Ruby ecosystem, IRB (Interactive Ruby) is an invaluable tool for developers. With its rapid pace of changes and improvements, staying up-to-date with the latest features can significantly enhance your development workflow. In this post, we’ll cover the significant updates to IRB from the first half of 2024 (between v1.11.0 and v1.14.0), as well as enhancements in the Rails Console.

(You can also read the improvements we made prior to v1.11.0 in my previous article, “Unveiling the big leap in Ruby 3.3’s IRB”.)

NOTE

To upgrade IRB, you can simply install it as a gem by adding gem "irb" to your Gemfile.

What’s New

Help Command and Help Message Improvement
- General Improvements
- Detailed Help for Specific Commands
Extension API
- Helper Methods vs. Commands
- Real-world Examples
New Commands
- The disable_irb Command
- The cd Command
Improvement on Existing Commands
- The show_source Command
- irb:rdbg Sessions
Rails Console Improvements
- Rails Specific Prompt
- Integration with IRB’s Help Message
Recap

Help Command and Help Message Improvement

General Improvements

A good help command is crucial for usability. The help command in IRB has been significantly improved.

Previous Behavior: Opens an ri console to look up Ruby API documentation.
Current Behavior: Displays help messages for IRB itself, aligning with user expectations.
API Documentation: Users can use show_doc for API documentation.

Detailed Help for Specific Commands

The new help feature allows tools to provide detailed usage information, improving feature discovery.

Examples

irb(main):001> help edit
Usage: edit [FILE or constant or method signature]

Open a file in the editor specified in ENV["VISUAL"] or ENV["EDITOR"]

- If no arguments are provided, IRB will attempt to open the file the current context was defined in.
- If FILE is provided, IRB will open the file.
- If a constant or method signature is provided, IRB will attempt to locate the source file and open it.

Examples:

  edit
  edit foo.rb
  edit Foo
  edit Foo#bar

Extension API

With IRB v1.13.0, a new API was introduced to extend its functionality. This allows libraries to customize and enhance IRB sessions by adding new commands and helper methods tailored for specific needs.

Helper Methods vs. Commands

Helper Methods: Designed to return Ruby objects that interact with the application
- For example, a admin_user method that always returns the admin user object
Commands: Perform tasks or display information, similar to shell commands.
- For example, IRB’s show_source and edit commands

Real-world Examples

For more details and examples about the API, please visit its documentation.

New Commands

The `disable_irb` Command

The disable_irb will disable any further binding.irb calls for the session, making it easier to use breakpoints, especially in loops or frequently called code paths.

Primary Use Case

For users of binding.irb, there were two ways to exit a breakpoint:

Use exit to leave the current breakpoint.
Use exit! to end the entire process.

In code paths executed multiple times (e.g., within loops), repeatedly using exit could be cumbersome. The disable_irb command performs an exit from the current breakpoint, and at the same time disables all subsequent breakpoints within the session, so the program can immediately resume its operation.

Example

The `cd` Command

The new cd command simplifies navigating through different contexts within an IRB session. Users can use cd to make the current context, and then cd .. to move to the previous context.

Please note that it doesn’t match Pry’s cd command and doesn’t support usages like cd @x/@y or cd -.

Example

# test.rb
class Foo
  def bar
    puts "bar"
  end
end

f = Foo.new

binding.irb

Help message

Usage: cd ([target]|..)

IRB uses a stack of workspaces to keep track of context(s), with `pushws` and `popws` commands to manipulate the stack.
The `cd` command is an attempt to simplify the operation and will be subject to change.

When given:
- an object, cd will use that object as the new context by pushing it onto the workspace stack.
- "..", cd will leave the current context by popping the top workspace off the stack.
- no arguments, cd will move to the top workspace on the stack by popping off all workspaces.

Examples:

  cd Foo
  cd Foo.new
  cd @ivar
  cd ..
  cd

Improvement on Existing Commands

The `show_source` Command

The show_source command has received significant updates, making it more powerful and user-friendly.

Enhancements

Display Methods Defined During IRB Session: Previously, show_source couldn’t display methods defined within an IRB session. This limitation has been addressed.

irb(main):001* class Foo
irb(main):002*   def bar; end
irb(main):003> end
=> :bar
irb(main):004> show_source Foo#bar
From: (irb):2
def bar; end
=> nil

Bug Fixes

Handling Top-Level Constants: Fixed issues related to top-level constants like ::Post.
Improved Constant Lookup Logic: Enhanced reliability and usability.

Detailed Help Message:

Usage: show_source [target] [-s]

-s  Show the super method. You can stack it like `-ss` to show the super of the super, etc.

Examples:

show_source Foo
show_source Foo#bar
show_source Foo#bar -s
show_source Foo.baz
show_source Foo::BAR

`irb:rdbg` Sessions

The irb:rdbg sessions provide an integrated debugging experience within IRB.

Bug Fixes

Fixed issues related to history saving and input recognition, enhancing reliability and usability.

Repeating Previous Command Execution

Sending blank input after certain commands, like c (for continue), will automatically repeat the command. This matches debug’s REPL behavior, making the integration more convenient.

Additional Information

Users can use the RUBY_DEBUG_IRB_CONSOLE=1 to replace debug’s default console with irb:rdbg. This allows users to keep using debugger or binding.break while still benefiting from an irb:rdbg session. For more information about irb:rdbg, please refer to the IRB’s GitHub README.

Rails Console Improvements

Rails Specific Prompt

The Rails console now displays a Rails-specific prompt format.

The prompt format is ()>, such as my-app(prod)>. This enhancement helps developers quickly identify the application and the environment they are working in, reducing context-switching errors.

Integration with IRB’s Help Message

Rails commands and helper methods are now integrated into IRB’s help message system, making them easier to discover and use within an IRB session.

Example

my-app(dev)> help
...

Rails console
  reload!        Reloads the Rails application.

Helper methods
  conf           Returns the current IRB context.
  helper         Gets helper methods available to ApplicationController.
  controller     Gets a new instance of ApplicationController.
  new_session    [Deprecated] Please use `app(true)` instead.
  app            Creates a new ActionDispatch::Integration::Session and memoizes it. Use `app(true)` to create a new instance.

...

Recap

The first half of 2024 has brought some fantastic improvements to IRB. From a more intuitive help command to a powerful new extension API, IRB is evolving to make Ruby developers’ lives easier. New commands like disable_irb and cd simplify common tasks, while enhancements to show_source and irb:rdbg sessions make debugging smoother.

Rails console hasn’t been left behind either. With a new Rails-specific prompt format and better integration of commands and helper methods into IRB’s help system, it’s now easier to identify your environment and discover useful commands.

Finally, I want to give a big shoutout to the community contributors who made these enhancements possible. Dive in and explore these new features, and stay tuned for more updates at the end of the year!

Finding Memory Leaks in the Ruby Ecosystem

2024-07-09T00:00:00+00:00

This blog post is adapted from a talk that Adam Hess and I gave at RubyKaigi 2024.

Until recently, Ruby lacked a mechanism for detecting native-level memory leaks from within Ruby and native gems. This was because, when Ruby terminates, it does not free the objects that are still alive or the memory used by Ruby’s virtual machine. This is because the system will reclaim all the memory used anyway, so it’s just extra work that would make Ruby’s shutdown slower. However, this meant that it was impossible to determine whether a piece of memory was leaked or was just not cleaned up.

This meant that it was hard to analyze Ruby applications that suffered from memory leaks, causing them to consume increasing amounts of memory until the system runs out and terminates the application. This is undesirable as it is an inefficient use of system resources, which would cost money and performance, and could lead to downtime for a web server.

In 2021, I developed a tool called ruby_memcheck to detect memory leaks in native gems using their test suites. ruby_memcheck was able to find memory leaks in popular and commonly used native gems such as nokogiri, liquid-c, protobuf, and gRPC. However, ruby_memcheck used heuristics which could cause false-negatives, which I describe in greater detail in the following section and in another blog post.

In this blog post, we’ll be looking at the RUBY_FREE_AT_EXIT feature, which is a feature in Ruby that Adam Hess and I worked on starting in 2023. RUBY_FREE_AT_EXIT allows memory leak checkers to work on Ruby without the limitations of ruby_memcheck’s heuristics.

ruby_memcheck

ruby_memcheck wraps Valgrind memcheck to find memory leaks in native gems. Valgrind memcheck is a tool used to find memory leaks in native applications. However, we can’t use it directly on Ruby because Ruby doesn’t free its memory during shutdown, leading Valgrind memcheck to report thousands of false-positive memory leaks. Ruby doesn’t free its memory during shutdown because the system will reclaim all of the program’s memory after the program has terminated anyway, so explicitly freeing the memory would only make Ruby’s shutdown slower.

Since there are tens, if not hundreds, of places where these kinds of “memory leaks” occur in Ruby at shutdown, creating a feature to free all of the memory at shutdown would have been very time-consuming. So instead, I opted to create ruby_memcheck, which uses heuristics to determine whether a memory leak is a false-positive from Ruby or a real memory leak from the native gem. Of course, the heuristic is not perfect and can cause false-negatives (i.e. it can filter out real memory leaks).

Even with this hacky heuristic, ruby_memcheck successfully identified memory leaks in popular and commonly used native gems like Nokogiri, liquid-c, gRPC, and Protobuf.

Another limitation of ruby_memcheck is that it is limited to Linux systems since Valgrind only runs on Linux. This means that we cannot use a faster memory checker like Google’s sanitizers or support other operating systems such as using the macOS leaks tool.

If you want to learn more about how ruby_memcheck works, read my blog post.

Implementing `RUBY_FREE_AT_EXIT`

In 2023, Adam Hess from GitHub collaborated with me to develop a feature in Ruby that frees all memory at shutdown. We implemented the RUBY_FREE_AT_EXIT feature which instructs Ruby to free all of its memory at shutdown when the environment variable is set. By implementing this with a flag, Ruby can maintain a fast shutdown when this feature is not needed and only free memory at shutdown when the feature is enabled.

The implementation is fairly straightforward. When cleaning up the Ruby VM at shutdown, we free all of the parts of the VM when RUBY_FREE_AT_EXIT is enabled. A snippet of the implementation looks like this:

int
ruby_vm_destruct(rb_vm_t *vm)
{
    if (rb_free_on_exit) {
        rb_free_default_rand_key();
        rb_free_encoded_insn_data();
        rb_free_global_enc_table();
        rb_free_loaded_builtin_table();

        rb_free_shared_fiber_pool();
        rb_free_static_symid_str();
        rb_free_transcoder_table();
        rb_free_vm_opt_tables();
        rb_free_warning();

...

Circular dependencies

However, it’s not that straightforward, since circular dependencies made it tricky. For example, we free Ruby objects before we free the VM because freeing Ruby objects may need the VM to be alive (e.g. executing finalizers), but things like Threads and the main Ractor are all Ruby objects, so we cannot free those objects until after most of the VM has been freed.

We solved this by determining the objects that need to remain alive for longer (threads, mutexes, fibers, and the main ractor) and free all other objects:

switch (BUILTIN_TYPE(obj)) {
  case T_DATA:
    if (rb_obj_is_thread(obj)) break;
    if (rb_obj_is_mutex(obj)) break;
    if (rb_obj_is_fiber(obj)) break;
    if (rb_obj_is_main_ractor(obj)) break;

    obj_free(objspace, obj);
    break;

We then free the VM, and finally we go back and free the leftover objects that got skipped.

Impacts of `RUBY_FREE_AT_EXIT`

After implementing RUBY_FREE_AT_EXIT, we ran Ruby’s tests and specs through Valgrind and the macOS leaks tool to find memory leaks. Through this feature, we were able to find over 30 memory leaks originating from inside of Ruby.

List of memory leak PRs found using RUBY_FREE_AT_EXIT

Let’s now take a closer look at one of the memory leaks that was found and fixed by RUBY_FREE_AT_EXIT.

Memory leak in Regexp timeout

One of the memory leaks we discovered with RUBY_FREE_AT_EXIT occurs when a regular expression match times out (ticket #20228). Consider the following code:

## Set the timeout to be very short, which is 1ms here
Regexp.timeout = 0.001
## Create a regular expression and a string such that matching
## the regular expression on the string will time out
regex = /^(a*)*$/
str = "a" * 1000000 + "x"

## Show the memory usage 10 times to demonstrate the growth
## in memory usage
10.times do
  # Run the match 100 times to make the memory leak more
  # obvious
  100.times do
    # The match of the regular expression on the string will
    # time out, so we need to wrap it in a rescue block
    begin
      regex =~ str
    rescue
    end
  end

  # Output the memory usage of the current Ruby process
  puts `ps -o rss= -p #{$$}`
end

Before this fix, we can see the memory used (in kilobytes) by the Ruby process increase linearly at a rate of about 300 megabytes per iteration, ending at around 3 gigabytes of memory used:

We can graph this, and we can visually see that the memory increases linearly:

Graph of memory usage before the memory leak was fixed

After this memory leak has been fixed, we can see that the memory grows a little bit at the beginning, but quickly flattens out to around 56 megabytes:

And we can see this visually in a graph:

Graph of memory usage after the memory leak was fixed

Fix for memory leak in Regexp timeout

This memory leak was fixed in GitHub PR #9765. The diff is quite large, but the major changes include:

The function that checks for timeouts is changed from raising an error when the regular expression match times out to returning a boolean on whether the match timed out. Since a raise will jump out of the function and into the Ruby frame with the rescue, it bypasses any cleanup of memory allocated for the match and thus leaks memory. By returning a boolean when the match times out, it allows cleanup before the Regexp::TimeoutError is raised.

// This function is periodically called during regexp matching
-void
-rb_reg_check_timeout(regex_t *reg, void *end_time_)
+bool
+rb_reg_timeout_p(regex_t *reg, void *end_time_)
 {
     rb_hrtime_t *end_time = (rb_hrtime_t *)end_time_;

@@ -4631,10 +4664,18 @@ rb_reg_check_timeout(regex_t *reg, void *end_time_)
     }
     else {
         if (*end_time < rb_hrtime_now()) {
-            // timeout is exceeded
-            rb_raise(rb_eRegexpTimeoutError, "regexp match timeout");
+            // Timeout has exceeded
+            return true;
         }
     }
+
+    return false;
+}

We then changed the macro that checks for various interrupts during the regular expression match, including for timeouts and thread interrupts. Previously, it just called rb_reg_check_timeout periodically which raises when the timeout is reached. Now, it checks whether it has timed out, and if it has, it jumps to a label called timeout.

 #ifdef RUBY

 # define CHECK_INTERRUPT_IN_MATCH_AT do { \
   msa->counter++; \
   if (msa->counter >= 128) { \
     msa->counter = 0; \
-    rb_reg_check_timeout(reg, &msa->end_time);  \
+    if (rb_reg_timeout_p(reg, &msa->end_time)) { \
+      goto timeout; \
+    } \
     rb_thread_check_ints(); \
   } \
 } while(0)

We then added a label called timeout that is executed when the match times out. This label frees memory before calling HANDLE_REG_TIMEOUT_IN_MATCH_AT, which raises the Regexp::TimeoutError.
```
   STACK_SAVE;
   xfree(xmalloc_base);
   return ONIGERR_UNEXPECTED_BYTECODE;
+
+ timeout:
+  xfree(xmalloc_base);
+  xfree(stk_base);
+  HANDLE_REG_TIMEOUT_IN_MATCH_AT;
 }
```

How you can use `RUBY_FREE_AT_EXIT`

As we’ve seen, this feature has been effective at finding memory leaks within Ruby. But how can you, as a Ruby user, utilize this feature?

There are two use cases of RUBY_FREE_AT_EXIT by the community:

Finding native level memory leaks in native gems.
Finding native level memory leaks in Ruby apps.

Finding native level memory leaks in native gems

It’s no secret that manual memory management is difficult, which is why RUBY_FREE_AT_EXIT has found so many memory leaks in Ruby and why ruby_memcheck has found so many memory leaks in native gems, including popular and commonly used native gems such as nokogiri, liquid-c, protobuf, and gRPC. Memory management is especially tricky for cases when exceptions are raised, since it can jump out of native stack frames, which can unexpectedly interrupt code execution and skip code.

If you’re a maintainer for a native gem, please try to use your test suite with RUBY_FREE_AT_EXIT and a memory leak checker (Valgrind, macOS leaks, or ASAN) to find memory leaks.

The ruby_memcheck gem now adapts to Ruby versions that support RUBY_FREE_AT_EXIT and disables heuristics (but still continues to use heuristics instead of RUBY_FREE_AT_EXIT for old Ruby versions). ruby_memcheck offers an easy way to use Valgrind memcheck using your existing minitest or RSpec test suite.

Finding native level memory leaks in Ruby apps

If you suspect that there are native level memory leaks affecting your Ruby app (which can come from native gems or Ruby itself), you can use RUBY_FREE_AT_EXIT and a memory leaks tool to help you find it. It’s important to note that this will not help you find Ruby level memory leaks, so if your memory bloat comes from creating too many Ruby objects and holding onto them, then this will not help you find it.

Request for the Ruby community

Just as how I concluded my ruby_memcheck blog post, I’ll do the same here: if you are a maintainer of a Ruby gem with native extensions, please test with ruby_memcheck to make sure that your gem does not have memory leaks. Together, let’s make Ruby a more efficient and stable platform for everyone!

Fixing a footgun in ActiveRecord::Core#inspect

2024-04-30T00:00:00+00:00

The inspect method on Active Record models returns a string including the model’s class and a list of all its attributes and their values. In Rails 7.2, you can configure which attributes are included in the output of inspect. In this post, I’ll discuss the performance issue that led me to implement this feature. I’ll also talk about how this feature can be used to improve developer experience.

to_s vs. inspect

Objects in Ruby define both to_s and inspect methods. Conventionally, to_s returns a string representation of the object, while inspect returns information about the object that can be used for debugging purposes.

Let’s take a look at how to_s and inspect work for Active Record models:

Book.first.to_s #=> "#"

Book.first.inspect #=> "#”

As you can see, to_s returns a short string representation of the model, containing just the name of the model’s class. On the other hand, inspect returns all the model’s attributes and their values.

However, that means calling inspect can be much slower than calling to_s. When we call inspect, we need to iterate over all the model’s attributes and get their values. We also check each of these against a list of filters defined by the ActiveRecord::ParameterFilter which are used to redact things like Personal Identifiable Information (PII) from the output. So, the cost of calling inspect increases based on the number of attributes on the model and the number of filters in the parameter filter. On the other hand, the cost of calling to_s is constant.

Since inspect is generally intended for debugging purposes, and its performance can vary depending on the model it’s called on, we should be careful about when it gets called in production. But as we’ll see, it can be surprisingly easy to inadvertently call inspect and cause problems.

The footgun

Developers at Shopify regularly profile their code to identify performance bottlenecks in our applications. Last year, a team discovered that we were spending almost 8 seconds in Active Record’s inspect method on certain requests. That’s a really long time!

7s spent in ActiveRecord::Core#inspect" />

The image above is a section of a profile that shows >7s being spent in Active Record’s inspect method. This is a left heavy view, so that means 7 seconds were spent in that method in total during this request, not necessarily consecutively. (Checkout speedscope to learn more about this profile.)

On further investigation it turned out that this code was calling to_s on an Array containing hundreds of Active Record models. For enumerables like arrays and hashes, to_s is simply an alias for inspect. The method works by iterating over the elements of the enumerable and calling inspect on each of them. to_s can be called in many ways, like implicitly during string interpolation, meaning it’s not always obvious in the code that inspect might be called on a model, or hundreds of models.

Developers sometimes refer to this type of issue as a “footgun”. The model’s inspect method is a useful feature for debugging, but misusing it is also an easy way to inadvertently shoot yourself in the foot and create performance issues in production. Calling to_s on a gigantic array full of Active Record models is most likely a mistake, but mistakes happen, and it shouldn’t lead to such a catastrophic performance issue.

Disarming the footgun

This kind of performance regression can be fixed by refactoring the application’s code to avoid the massive amount of calls to inspect. However, we would also like Rails to do something a little more reasonable in production to help avoid such a dire situation.

When considering possible solutions, it’s important to remember inspect does have use cases in production, like when we want to emit logs during error cases. So our solution can’t simply be to redefine or remove the inspect method in production. Ideally, our solution should maintain the power of the inspect method, while providing safety against unintentional misuse.

After some consideration we landed on a solution with two parts:

Allow developers to configure which attributes are output by inspect
In production, configure models to only output their id by default

Let’s get into a little bit more detail about how this solution is implemented (you can also check out the Pull Request).

First, we created a new class attribute on ActiveRecord::Base called attributes_for_inspect. When calling inspect on a model instance, only the attributes listed in attributes_for_inspect will be included.

Book.first.inspect #=> "#”

Book.attributes_for_inspect = [:id, :title]

Book.first.inspect #=> "#”

Next, in production, we set the value of attributes_for_inspect to [:id] for all models. So, opening a Rails console in the production environment, we will see:

Book.first.inspect #=> "#”

Now, the potential performance problem has been greatly reduced. By default in production, calling inspect on a model will only iterate through a single attribute, :id. We still check that attribute against any parameter filters, so it’s slower than calling to_s. But importantly, the cost of calling inspect no longer grows with the number of attributes on the model.

There are two “escape hatches” that allow you to easily get back the full power of the inspect method. First, you can pass the symbol :all to attributes_for_inspect. This will revert to the behaviour of outputting all the attributes. That’s what happens by default in development and test modes, so we can still have the power of inspect for debugging purposes.

Book.attributes_for_inspect = :all

Book.first.inspect #=> "#”

Alternatively, if you need the full output in a specific case, you can call the new full_inspect method.

Book.attributes_for_inspect = [:id]
Book.first.inspect #=> "#
Book.first.full_inspect #=> "#”

Bonus: wins for developer experience

The motivation for introducing attributes_for_inspect was to dampen potential performance issues related to the use of inspect in production. However, it can also be used to improve the experience for developers working on the application.

In large Rails applications, it’s not uncommon for Active Record models to have dozens of attributes, sometimes with quite long values. It can be a pain because we don’t always manually call inspect, for example, inspect is called whenever an object is returned in the console.

myapp(dev)> Book.first
=>
  #id: 1,
  title: “The Rings of Saturn”,
  author: “W.G. Sebald”,
  published: 1998,
  pages: 306,
  language: “English”,
  type: “fiction”,
  format: “hardcover”,
  translated: true,
  original_language: “German”,
  original_name: "Die Ringe des Saturn”,
  translator: “Michael Hulse”,
  isbn-13: 9780811213783,
  isbn-10: “0811213781”,
  publisher: “New Directions”,
  created_at: "2024-03-19 10:16:20.229686000 -0400”,
  updated_at: "2024-03-19 10:16:20.229686000 -0400”>

At a certain point, the output is broken into multiple lines. When long enough, or if your window is small enough, you might even find your terminal has been hijacked to allow you to scroll through the output, requiring you to enter a command to exit before continuing with your debugging session. Long output from inspect can also clutter the results of failed test runs.

If you are like me, you might find that these long outputs are very hard to parse and slow you down while debugging as you try to make sense of the wall of characters printed on the screen. Rarely do I need to know the value of all the model’s attributes. Rather, I’m just interested in one or two things that identify the model for me before deciding what to do next.

So we can use attributes_for_inspect to only include attributes that are most useful for developers when debugging. This is especially noticeable when we are working with arrays of models, where we can reduce the size of the output from hundreds of lines to just a few.

By default, Rails 7.2 will set attributes_for_inspect to :all for all models in development and test, to maintain the existing behaviour. But I encourage you to try using attributes_for_inspect to improve your experience in the Rails console. And remember, you can always call full_inspect to get the values of all the model’s attributes.

Conclusion

As the saying goes, Rails provides sharp knives. But sometimes a powerful feature can cause unintended problems. In this post we saw that developers could accidentally create a situation where the inspect method is called on many Active Record models, leading to a performance issue. For Rails 7.2, we made the attributes included in the output of inspect configurable. In product environments only a model’s id will be included in the output of inspect, which helps guard against potential performance issues caused by mistakenly calling inspect. As a nice additional win, we also saw that this feature can be used to improve developer experience. Give attributes_for_inspect a try for yourself when you upgrade your application to Rails 7.2.

Autotuner: How to Speed Up Your Rails App

2024-04-24T00:00:00+00:00

This article was adapted from my Rails World talk “Rails and the Ruby Garbage Collector: How to Speed Up Your Rails App”.

Ruby’s garbage collector is designed to be adaptable, scaling from short Ruby scripts to running apps that serve millions of requests per second. While it’s designed to be adaptable, it may not work optimally for every use case. For this reason, Ruby’s garbage collector supports many (19 at the time of writing) parameters that can be used to tune it. However, the use of these parameters requires knowledge of how the garbage collector works on the inside. It also doesn’t help that there are changes to the garbage collector in every major Ruby release, meaning that some of these parameters may become deprecated and new ones may be added. This implies that you’ll need to understand the changes and might have to retune the garbage collector to regain optimal performance. This complexity often deters Rails developers from tuning the garbage collector.

That’s why we created the Autotuner gem, which analyzes the traffic of your Rails app and gives you suggestions to personalize the garbage collector for your app. Setting it up should be very straightforward by following the two steps in the readme. In this article, we’ll delve into the motivations behind creating this gem, its workings, and our experimentation process with garbage collector tuning at Shopify.

Motivations for creating Autotuner

There are many strategies to speed up a Rails app, such as getting faster servers, improving database queries, or moving more logic into background jobs. However, your app might be spending more time in the garbage collector than you think. In Jean Boussier’s blog post about tuning Ruby’s garbage collector in Shopify’s monolith, tuning the garbage collector decreased the 99.9th percentile time in garbage collector by 87%, from over 1 second to 0.15 seconds. Similarly, in Storefront Renderer, tuning the garbage collector decreased the 99.9th percentile time in garbage collector by 59%, which translated to a 18% reduction in response times.

The garbage collector often disproportionately affects tail latency (e.g. 99th or 99.9th percentile) because it doesn’t usually run very often (if it does, you’ve got another problem!). This is often caused by a major garbage collection cycle (see my blog post if you’re not familiar with the difference between minor and major garbage collection cycles). Surprisingly, most of the major garbage collection cycles run in a Rails app are actually unnecessary. However, due to various heuristics aimed at optimizing other workloads and reducing memory usage, Ruby’s garbage collector ends up running garbage collection cycles much more than necessary. If you know which garbage collector parameters to tune, this is a low-hanging fruit that can significantly improve the response times of your Rails app.

Suppose you’ve figured out how to tune the garbage collector and have optimized it for your Rails app. The next time you upgrade to a new major release of Ruby, which might be less than a year later, you will need to understand what has changed in the garbage collector and might have to retune it to regain optimal performance. Sometimes, an outdated tuning configuration might even slow down your Rails app!

How Autotuner works

Autotuner functions as a Rack plugin, collecting data about the garbage collector before and after each request, as well as the time taken to process the request. After the request completes, it passes all of this data to a list of heuristics, each of which knows about a particular strategy to optimize GC time.

Each heuristic can select the necessary data and may store it to identify trends. For example, let’s take a closer look at the HeapSizeWarmup heuristic, which suggests sizes for the memory heaps after your app has warmed up and reached peak performance. During bootup, your app allocates many objects, causing Ruby’s garbage collector to grow the heap, which means that garbage collection cycles will be ran frequently. This is bad for performance as Ruby’s garbage collector is stop-the-world, meaning that execution of Ruby code is paused while the garbage collector runs. In a Rails app, this translates to longer response times during the warmup phase until the heap stabilizes. The following diagram illustrates this:

The HeapSizeWarmup heuristic records the time taken for each request and the size of the heaps to determine when the request time has plateaued and peak performance has been achieved. It then suggests values to configure the heaps in Ruby to this size so the heaps will no longer grow at bootup and instead be immediately grown to that size.

Experimenting with garbage collector tuning

Collecting metrics

When performing these experiments, it’s critical to understand the bottlenecks, identify the metrics we want to improve, and know how to measure the improvements. The callback Autotuner.metrics_reporter provided in Autotuner is a good starting point. This is a callback that will report metrics after every request such as the time taken to process the request, the time spent in the garbage collector, the number of major and minor garbage collection cycles ran, and the size of the Ruby heap.

Using these metrics, we can determine whether tuning the garbage collector is worth it or not. There are a few common things to optimize:

Bootup performance: If your app is slower at boot and spending more time in the garbage collector, then garbage collector tuning may help your app reach peak performance faster.
Average response times: If your app is spending a high average proportion of the request time in the garbage collector, then tuning the garbage collector can reduce the frequency of garbage collection cycles and improve average response times.
Extreme (99th, 99.9th percentile) response times: If your app is spending a higher proportion of the request time in the garbage collector in some of the longest requests, then tuning the garbage collector can reduce the impact of the garbage collector on these requests, and bring those extreme response times down.

How we experiment with garbage collector tuning at Shopify

Not all suggestions from Autotuner will positively impact performance. Some of them will be trade-offs. A common one will be trading better average garbage collector and response times while making extremes like 99th or 99.9th percentile worse. A common example of how this can happen is that some tuning decreases the frequency of garbage collection cycles. Running less garbage collection cycles often means that the performance will be better on average, but when a garbage collection cycle does run, it may do more work as there might be more dead objects. Depending on your workloads and requirements, you might not want to sacrifice extreme performance for better average performance.

We’ve found that response times can vary significantly due to factors like load, traffic patterns, and database and cache response times. Because of this, it’s difficult to accurately compare the impacts of the tuning over two different time periods.

Given these reasons, we conduct tests on a small portion of production traffic, dividing it into three groups:

Untuned: this group has no garbage collector tuning applied. This is a control group to compare the final tuning improvements.
Stable: this group has garbage collector tuning that yields a positive performance improvement.
Experimental: this group is where we apply the tuning suggestions from Autotuner one-by-one.

For a high-traffic app, we select around 1-5% of servers for each of the three groups. For a low-traffic app, we select a higher proportion to reduce data variance. Garbage collector tuning is a relatively safe experiment to run; it will usually not cause incidents or significant degradation in performance, so it isn’t too risky to run on a larger portion of traffic.

We follow this process for experimenting with garbage collector tuning:

Pick one tuning suggestion from Autotuner and apply it on the experimental group.
Compare the various performance metrics between the experimental group and stable group over a period of several days to a week.
If the tuning provides a positive performance improvement, apply this tuning suggestion to the stable group as well.
If the tuning does not provide a positive performance improvement, or gives a trade-off that you do not want, then discard the tuning configuration from the experimental group.
Repeat step 1 while there’s a tuning suggestion that has not been tried yet.
Compare the performance between the stable and the untuned groups for total performance improvement.

After this process, your Rails app should ideally be faster! You can now remove Autotuner and redo this experiment when you upgrade Ruby versions or if your app changes significantly.

Conclusion

Ruby’s garbage collector is designed to adapt to various workloads and balance memory usage and performance. It often results in good performance, but not the best performance. Garbage collector tuning allows us to optimize the garbage collector for our specific workload and the metrics we care about. However, for many Rails developers, Ruby’s garbage collector is a black box, making it difficult to find ways to tune it. The Autotuner gem is designed to assist you in finding ways to enhance the garbage collector performance of your Rails apps. In this blog post, we’ve explored the motivations for tuning the garbage collector, how Autotuner works, and how to experiment with changes from Autotuner. For complete documentation on how to set up Autotuner, check the readme.

Prism in 2024

2024-04-16T00:00:00+00:00

In Ruby 3.3.0, a new standard library was added to CRuby called Prism. Prism is a parser for the Ruby language, exposed as both a C library (optionally usable by CRuby) and a Ruby library (usable as a Ruby gem). The Prism project represents many person-years worth of effort, and is the result of a collaboration between Shopify, CRuby core contributors, other Ruby implementation authors, and Ruby tooling developers.

This post provides an overview of the Prism project — why it exists, where it stands today, and what the future holds. It also gives some insight into the broader ecosystem of Ruby parsers, intermediate representations, and tools. This includes some well-known projects that you are likely to have heard of (e.g., Ripper) and newer projects that you may not yet be familiar with (e.g., LRama).

If you’re low on time and just want the conclusion, here it is: if you need to parse Ruby code for whatever reason, use the Prism library. Regardless of any future decisions made by the CRuby core team, this library is guaranteed to live on in perpetuity as the definitive Ruby parser API. It is well-documented, error-tolerant, portable to every major Ruby implementation, and has a clear path to future improvements.

The story of the Ruby language frontend is long, fragmented, and complex. It includes many projects, each with their own (sometimes conflicting) goals. To understand the current state of affairs, it’s necessary to look back at the history of the Ruby frontend, to see how the language has evolved over time. There is a lot to digest here, so we will try to make this as brief as possible.

History

Ruby was created in early 1993. At the time, it was very common for language designers to generate their parsers using a tool called Yacc, which stood for “Yet Another Compiler Compiler”. Yacc takes a grammar file (a file suffixed with .y that describes the syntax of a language) and generates a parser for that language (in this case a .c file). Matz took this same approach; the first parser of the Ruby language shipped with Ruby 0.01 was generated by Yacc.

1994-01-07: CRuby 0.06

The oldest available changelog entry that has an explicit version attached is for CRuby 0.06. At this point the parser was generated by Yacc and written in C.

Fundamental to the current state of the Ruby frontend is that at this point Ruby was a tree-walk interpreter. This means that after generating a syntax tree, the Ruby runtime would walk the tree to execute the code. This is in contrast to the current CRuby runtime YARV, which is a bytecode interpreter. All syntax errors, warnings, and other diagnostics were generated by the parser itself, and the parser was tightly coupled to the runtime. The syntax tree that was generated was explicitly designed for speedy execution, not for analysis or transformation.

This is a very important point, and worth spending some extra time considering. You can still see its impact in the structure of the CRuby syntax tree today. You can use ruby --dump=parsetree -e [SOURCE] to see the parsed tree.

Consider how the following examples are represented and how they differ from the actual source code.

for left, right in elements do end
def foo = return :bar
puts "World!"; BEGIN { puts "Hello!" }
/foo #{bar}/o

Notice that all of the deviations in the syntax tree from the source code make sense for the use case of an interpreter and/or compiler — they make things more efficient. But also notice that they make things significantly more difficult for any other use case.

1995-12-21: CRuby 0.95

Ruby 0.95 was released at the end of 1995, and in this release there was a new entry in the ToDo file at the root of the repository: hand written parser (recursive descent). We’ve written before about hand-written recursive descent parsers and how they differ from generated LALR parsers in the Prism announcement post, so I won’t rehash it here. Suffice to say, we believe a hand-written recursive descent parser is the best choice for a language like Ruby, and it appears from this early version of CRuby that (at least at the time) Matz agreed.

2000-10-01: nodeDump

In 2000, Dave Thomas created the nodeDump project, which walked the Ruby AST and generated documentation. To our knowledge, this is the first attempt to access and manipulate the Ruby syntax tree outside of the CRuby runtime itself. While this project was made obsolete by the Ruby 1.9 switch and therefore is no longer maintained, it is worth mentioning that from the earliest public days of Ruby a desire for a Ruby parser API existed.

2001-09-10: JRuby

In 2001, Jan Arne Petersen began work on a reimplementation of Ruby for the Java Virtual Machine that was a direct port of the Ruby 1.6 code named JRuby. This project still exists and is in use in production systems to this day. The parser took a copy of the parse.y grammar file used as the input to Yacc and rewrote its actions in Java. Since then, any changes to the Ruby grammar have been manually copied over to JRuby through this same process.

This is another important point: every change to the CRuby grammar file induced a change in the JRuby grammar file. This is not a unique story; every unique Ruby parser developed since these early days (14 by our count) has had to do the same thing. This is a significant investment of time and effort.

Somewhat incredibly, JRuby has managed to stay on top of the grammar changes and in its latest release supports nearly all of the syntax of CRuby 3.3.0 (the latest released version at the time of writing this post). We say nearly here because minute differences have existed between the two parsers since the beginning. While JRuby’s parser has been by far the most comprehensive alternative Ruby parser over the years, getting to 100% parity with all of the various eccentricities is extremely difficult.

2001-10-20: Ripper

In 2001, around the time of Ruby 1.7, Aoki Minero released the first version of the Ripper library. This was an event-driven parser that allowed users to build their own syntax trees. It worked by copying the Ruby grammar file and modifying the actions to dispatch events that called out to user-defined methods. Originally this project existed on its own, before the maintenance of it proved to be difficult and it was eventually merged into CRuby three years later.

This parser still exists today as a standard library. Although it has had the explicit Ripper is still early-alpha version. warning at the top of its documentation for the past 20 years, it has still served as the choice of parser for many projects (including yard, prettier, rubyfmt, etc.). These projects chose Ripper for myriad reasons, but the two that stand out are: it was the only standard library parser available and it was the only parser guaranteed to parse exactly as the current version of CRuby (a somewhat unique constraint necessary for tools like IRB).

2003-08-04: CRuby 1.8.0

In 2003, CRuby released version 1.8.0, the last minor CRuby version to include the tree-walk interpreter. In this version, a new entry in the ToDo file appeared in CRuby: Parser API. Likely this was a reaction to the demand that was already present in the ecosystem: developers wanted to build tools on top of the Ruby syntax tree and couldn’t.

In contrast, there was prior art in other language ecosystems. Perl had the PPI module starting in 2001, which allowed developers to access the syntax tree without the Perl runtime. CPython shipped with the dis module starting in 1990, which gave developers access to the bytecode, and later in 2005 shipped the ast module. It is table-stakes within any language ecosystem to be able to access the syntax tree and/or the bytecode to create high-quality tooling, and Ruby was behind in this regard.

2004-09-17: CRuby e77ddaf

In 2004, Ripper was merged and along with it CRuby switched from using Yacc to using its spiritual successor: GNU Bison. Bison was compatible with Yacc and boasted a number of improvements including reentrancy. Ripper was also built on top of Bison, meaning it was necessary to switch to Bison to support merging Ripper.

It’s important to note that this meant that in order to compile CRuby from source developers had to have Bison installed on their system. This was not a significant change (previously developers had to have Yacc installed), but it’s worth mentioning because this pain-point existed until only very recently.

2004-11-10: ParseTree

Also in 2004, Ryan Davis released a library called ParseTree, which gave developers access to the CRuby syntax tree using a C extension that converted it into Ruby primitives (arrays, strings, symbols, integers, etc.). This relied heavily on the structure of the CRuby 1.8 syntax tree, effectively mirroring it into a Ruby structure. While this project did not survive the 1.8 to 1.9 migration, it was the spiritual predecessor to the ruby_parser gem.

2007-11-14: ruby_parser

In 2007, Ryan Davis closed down his work on the ParseTree library and instead replaced it with the ruby_parser gem, which worked on both the CRuby 1.8 and 1.9 branches. (At the time, 1.9 was a long-lived branch that housed the YARV bytecode interpreter). This project was different from ParseTree; it took the approach of copying the grammar file, rewriting the actions in Ruby, and then feeding it into the racc parser generator.

This project still exists today, and is the basis of many tools in the Ruby ecosystem (including but not limited to flog, dawnscanner, and fasterer). This is the first true fragmentation within the CRuby parser ecosystem; developers could now choose between using Ripper or ruby_parser.

2007-12-25: CRuby 1.9.0

At long last in 2007 CRuby released version 1.9, which among other things included the YARV bytecode interpreter. This was the first version that included Ripper as a standard library. Merging in Ripper meant that CRuby now maintained two parsers: the Bison-generated parser that was compiled into YARV bytecode, and the Ripper parser that was exposed as a standard library.

In order to marry these two requirements, Ripper was fashioned as a pre-processing step on the existing grammar file. Within the actions of the grammar file a special domain-specific language was used in C language comments to describe the actions that Ripper would take. A tool was created that would extract these comments and generate a clean grammar file that itself could then be passed into Bison. If this sounds complicated, that’s because it is. Ripper’s setup means that any changes to the grammar file might inadvertently change the semantics of Ripper, a caveat that exists to this day.

2012-04-19: mruby

In 2012, four days after Ruby 1.9 was recognized by the ISO as an international standard, work began on a new Ruby implementation called mruby. It was meant as a “lightweight” version of Ruby suitable for smaller devices with limited memory.

The parser started out as a copy of an earlier version of the CRuby parser, but quickly developed its own syntax tree more in the style of a Lisp S-expression. While effort has been made to update its syntax tree to include newer CRuby features, it is proven both difficult and at times undesirable.

Importantly, mruby was designed to be embedded and portable. This meant it could serve as a submodule of other libraries. As such, its parser has been used to power other projects (notably Artichoke Ruby) because it is accessible from other languages.

2013-04-15: parser

One year later in 2013, the parser gem was created. Again, it took the grammar file from CRuby. It then rewrote the actions in Ruby. Using a lexer generated by ragel and a parser generated by racc, it provided a Ruby API for accessing the Ruby syntax tree.

This project proved quite popular, thanks in part to tireless efforts to exactly match CRuby parsing semantics for every version (a separate grammar file is checked in for each supported version). It was convenient enough that tools like rubocop ended up switching over from Ripper to use it. This is the most widely-used Ruby parser in the ecosystem today outside of the CRuby parser itself. It is the basis for most of the static analysis tooling in use today.

While this project was quite successful, the caveat is that it further fragmented the community. Now developers could choose between using Ripper, ruby_parser, and parser. Static analysis tools that were being developed at the time ended up not being able to reuse code from one another, and instead the community reimplemented the same logic in multiple projects. Worse still, the CRuby codebase only used its own parser, which meant any improvements to tooling generated by the ecosystem outside of CRuby were lost to the reference implementation.

To keep up with syntax changes, tooling has been developed to open issues on the repository any time a change to CRuby’s parse.y file is committed. This is a common story that we’ve already seen in this post going back to JRuby in 2001.

2013-10-26: TruffleRuby

Also in 2013, Oracle Labs started a new Ruby implementation based on partial evaluation of self-optimizing AST interpreters. To do so, it used two Java-based technologies: the Truffle AST interpreter framework and the Graal JIT compiler. Given its basis in Java, TruffleRuby initially joined the JRuby project as an alternative runtime backend called JRuby+Truffle. However, as the projects drifted apart in design, TruffleRuby broke off as a standalone project again and forked the JRuby parser to adapt to its own core library.

TruffleRuby boasts the highest peak performance of any Ruby implementation to date. It leverages the power of multiple JIT compilers and the GraalVM ecosystem to achieve this. While it isn’t widely-used in production systems, the concepts that it has introduced have been quite influential in the Ruby community. Perhaps most notably, contributors to TruffleRuby have drastically advanced the Ruby Spec Suite, a comprehensive test suite for the Ruby language.

2017-02-26: typedruby

In 2017, a project called typedruby was created, which introduced gradual static typing for Ruby. In order to parse Ruby code it rewrote the lexer from ruby_parser, copied the grammar file from CRuby 2.4.0, and rewrote the actions in C++.

This project is not still under active development, but it is worth mentioning here because the parser it created was eventually vendored into another C++ project: Sorbet. Sorbet is a gradual static type checker for Ruby that was developed by Stripe. It is still in active use in production systems of the largest Ruby codebases around today.

2018-01-15: CRuby 0f3dcbdf

In 2018, commit 0f3dcbdf introduced an AST module in CRuby to help with writing tests for the parser. This ended up being renamed to RubyVM::AbstractSyntaxTree and was released as an experimental feature in Ruby 2.6.0. Many caveats were attached to this feature, including warnings of future changes and the fact that it was not guaranteed to be stable. This was the first time the CRuby parser was exposed as a public API from within CRuby itself and not from a community project.

For the most part, the warnings appear to have worked. Not many projects have been developed on top of this feature. The most notable exception is the error_highlight gem, a core library that provides better error messages by including snippets of the source code that generated the errors.

2019-04-17: CRuby 9738f96

In 2019, CRuby introduced the concept of pattern matching. This was the largest influx of new syntax into the Ruby language since Ruby 1.9 12 years earlier. The parse.y file added a whopping 714 lines of code to support this new feature.

Inherently, this meant that every parser in the ecosystem had to be updated to support this new syntax. Inadvertently, this spelled the end of claiming 100% compatibility for most of the parsers in the ecosystem for many years. Most of them started out with a subset of the whole feature, supporting only the most common use-cases. Here you can see a timeline of these efforts:

2019-05-08 - parser
2020-09-22 - Sorbet
2021-04-07 - JRuby
2021-08-30 - ruby_parser
2023-04-08 - TruffleRuby

2019-11-12: Natalie

In 2019, Tim Morgan created the Natalie project, an ahead-of-time compiled C++ Ruby implementation. The parser was hand-written, using the syntax tree structure developed by Ryan Davis in the ruby_parser project. Over time, the parser was extracted into its own project called natalie_parser.

2020-01-18: PicoRuby

In 2020, an alternative mruby implementation was created called PicoRuby. The project was meant to be a minimal, small-footprint implementation of mruby that would function well on microcontroller boards like a Raspberry Pi Pico. The parser was copied from the mruby project and then modified to better suite the needs and requirements of the project.

2022-09-12: Prism

At this point in the timeline in late 2022, the first commit was made on the Prism project, the topic of this blog post. We’ll come back to this topic momentarily.

2023-05-12: CRuby a1b01e77

In 2023, Kaneko Yuichiro created commit a1b01e77, which added LRama as the new parser generator in CRuby. LRama is a reimplementation of the Bison parser generator written in Ruby. It took the same parse.y file that had been used for the past 20 years and generated the same parse.c file that Bison would have. This solved problems that had existed since the creation of the CRuby parser; developers compiling CRuby from source would no longer have to have Yacc/Bison installed on their system. Additionally, small differences in Bison versions would no longer accidentally break the parser.

LRama was a significant shift in the maintenance of the CRuby parser. Because the entire parser generation pipeline was now controlled by CRuby, it became possible to modify the grammar file in ways that were not previously possible with Bison. You can see this in commits that refactor Ripper and in commits that introduce the use of the ? operator to the grammar file like terms? and ‘\n’?.

It’s important to note that this change is not without downsides. If you’ve been reading carefully, you’ll remember that almost every other parser in the ecosystem relied on mirroring changes in parse.y into their own grammar files. At the time of this commit that included the parsers in JRuby, TruffleRuby, ruby_parser, parser, and Sorbet, as well as all of the downstream tools that depend on these projects.

2023-06-12: CRuby b481b673

In mid-2023, Kaneko Yuichiro created CRuby commit b481b673, introducing the concept of a “universal” parser. You can read more about it in the issue. The idea was to incrementally extract the existing CRuby parser into a standalone library.

This was accomplished by providing a callback interface where consumers would implement all of the features needed by the parser. Over time, the number of callbacks necessary to implement as a consumer has decreased, with the stated goal of eventually having the fewest number of callbacks possible.

Work continues on this project today. At the time of writing, this interface can be optionally used by CRuby, but it has not been adopted elsewhere in the ecosystem.

Other projects

I’ve purposefully omitted a large number of projects from this list, in a (somewhat failing) effort to keep this history as brief as possible. A number of other parsers and syntax trees were developed throughout this time and contributed to the overall development of the Ruby frontend ecosystem. Here are a few of them:

Project	Parser
Cardinal	A hand-written parser in Parrot
IronRuby	A copy of the CRuby parser, rewritten in .NET
MacRuby	A copy of the CRuby parser, rewritten in Objective-C
Rubinius	Melbourne, a copy of CRuby parser, rewritten in Ruby
Ruby Intermediate Language	A hand-written parser in OCaml
Syntax Tree	A syntax tree built on top of Ripper
Topaz	A copy of the CRuby parser, rewritten in RPython

Prism

At the time of Prism’s conception in early 2022, we faced a fragmented ecosystem with a large number of disparate requirements. Taking into account the history of the Ruby frontend and the current state of the ecosystem, it was clear that if a single parser were going to be written, it would have to solve everything at the same time to avoid risking becoming yet another option.

Standards

We wanted to solve these problems for a lot of reasons, but the biggest was the sheer maintenance cost. In early 2022, Shopify was investing in CRuby, TruffleRuby, parser (via downstream projects like rubocop and packwerk), and Sorbet. We had developers who were actively working on syntax updates for TruffleRuby. We also had developers who had recently invested months of effort bringing pattern matching to Sorbet. Simply put, exploring the possibility of a single parser that all of these projects could use made quite a lot of sense.

Taking stock of the community, we realized that we could never claim to be a universal parser unless we worked closely with the maintainers of all of the various parsers in the ecosystem to determine their needs and meet them. We would also need a clear migration path for all of the existing projects to move over to a new parser, which was no small feat. In the end, we came up with the following list:

Concern	Description
Compatibility	No one would want to adopt a new parser that didn’t parse the same code the exact same way as their existing parser.
Maintainability	Every project in the ecosystem wanted a parser that was easy to maintain and update.
Performance	No projects would want to adopt a new parser that was slower than their existing parser.
Error‑tolerance	To be useable as the basis for IDE tooling, the new parser must be error-tolerant (a requirement of all of the implementations, as well as Sorbet and Ruby LSP).
Portability	At this time there were parsers being actively maintained in C, C++, Ruby, Rust, and Java. All except the last could function through FFI, but requiring a Java process to call back and forth into native functions was not a good solution. Instead, we would need to develop a serialization format that could be used by the Java projects that could retrieve the syntax tree with a single FFI call.
Reentrancy	The parser would need to be reentrant in order to be used in a multi-threaded environment.
Identifiable	The parser would need to generate syntax trees whose nodes could be consistently identified across parses. This was a requirement of error_highlight, which reparses the source code to find the exact location of errors.
Small footprint	The footprint would need to be small in order to be suitable for implementations like mruby and PicoRuby.
Migration path	We would need to provide a clear migration path for all of the existing projects to move over to a new parser.

With this large, ambitious list of requirements, we went to work. After a year’s worth of work, we had:

designed a new syntax tree that fit the needs of all of the stakeholders
created a new parser that could lex all Ruby code on rubygems.org exactly as CRuby’s parser did
began work on integrating the parser into most of the projects in the ecosystem

At this point we opened our first pull request against CRuby to merge in our efforts in CRuby cc7f765f in late June of 2023. Because CRuby has been a bytecode interpreter since 1.9, our next step was then to generate the same bytecode instructions as the existing CRuby compiler, so we set out to do that.

In the meantime, we also began work on our migration story to make it easier for existing projects to migrate over to Prism. We began developing “translation” layers that would translate Prism’s syntax tree into the syntax tree of the other existing parsers. This included Ripper, ruby_parser, and parser.

Today

The history and the work we’ve done on Prism brings us to today. After two years worth of work, here is where the ecosystem outside of CRuby currently stands:

Project	Status
Natalie	Toward the end of last year, Natalie completed its migration to Prism. This was the first adopter, and Tim helped find a number of bugs in Prism that we were able to fix.
JRuby	Earlier this year, JRuby completed its migration to Prism. They were one of the earliest adopters, and after weathering all of the various breaking changes that Prism introduced while it was being developed, they can now boast a peak of a 3x speedup in parsing time.
TruffleRuby	Also earlier this year, TruffleRuby completed its migration to Prism. In their announcement they mention the parser that Prism is about twice as fast as their previous parser.
PicoRuby	As of writing, the author of PicoRuby is working on experimenting with both Prism and the “universal” parser. They have not yet completed their migration, but plan to present their findings at RubyKaigi next month.
ruby_parser	Earlier this year we completed our translation layer to the ruby_parser syntax tree. This can be used as the basis of tools relying on this parser to migrate over to Prism, and is showing a nice speedup in parsing time.
parser	Earlier this year we also completed our translation layer to the parser syntax tree. It was successful enough that it was immediately adopted into rubocop and released as an optional configuration in the latest version. The author of rubocop also mentioned he would consider switching the parser over to Prism directly in the future. Other tools also began using this translation layer, including packwerk.
Ripper	Most recently, we completed our translation layer to the Ripper event stream. This was the most complex translation layer to write, but it now allows tools to migrate over to Prism that may have been relying on this “experimental” tool.
Sorbet	To reap the benefits of our work, we are considering attempting to migrate the Sorbet project over to using Prism. This work is currently being evaluated.

A number of other open-source libraries and implementations have also adopted or are experimenting with Prism, a few of which include:

Garnet — a Ruby implementation written in TypeScript, via the WASM bindings
Opal - a Ruby-to-JavaScript translator, via the Ruby gem and WASM bindings
Rails - in various ways
Ruby LSP - a language server protocol implementation for Ruby, migrated from Syntax Tree

Numerous other closed-source tools have also been developed since the inception of Prism, both inside and outside of Shopify. These includes tools developed in Ruby, C++, Rust, and JavaScript, all via the various bindings that Prism provides.

In total, at this point if PicoRuby (and potentially mruby) choose to adopt Prism going forward, this will mean every Ruby implementation and parser in the world will be using Prism, except CRuby.

CRuby and the future

Unsurprisingly, the reference implementation of Ruby is the most difficult to migrate. There are a number of factors that make switching to Prism difficult, layed out below.

Magnitude

Prism designed its own syntax tree from the ground-up with the concerns of the various stakeholders in mind. This necessarily means we also need to write our own compiler to match the instruction sequences generated by CRuby. All told at the time of writing this accounts for about 9000 lines of code. With a change as large as replacing the parser and compiler, it’s difficult to ensure that every edge case is covered. Trepidation about the sheer size of the change is warranted and understandable.

Compatibility

While we continue to make progress on this every day, there are still a small number of edge cases indicating slight deviations with Prism. The vast majority of these have to do with the compiler, as opposed to the parser, within modules like TracePoint and Coverage. We are actively working on reducing these differences, but it is a slow process.

As of writing, we are failing 105/21813 tests and 42/32601 specs. Confidently, we can say we will have them resolved by the next CRuby release (if not significantly sooner). In the meantime we will be testing Shopify’s core monolith with --parser=prism in our CI environment to ensure that we catch any regressions that might occur, in addition to passing all of the tests and specs.

Competition

As previously mentioned, the other effort within CRuby to improve the Ruby frontend ecosystem is the “universal” parser. While Prism has been adopted by nearly every other parser and implementation in the ecosystem, efforts have continued on the “universal” parser to ease the maintenance burden of the CRuby developers. While Prism has been merged into CRuby as an library, it has not yet been adopted as the default parser because both the Prism project and the “universal” parser project have been asked to compete.

Unfortunately this means developing CRuby itself is going to be difficult until such a time as a decision is made. Any changes to the grammar or compiler will have to be done twice: once for the existing pipeline and once for Prism. Fortunately for those not developing CRuby, to ease concerns about the future Matz has agreed that going forward the official parser API for Ruby will be the Prism API. This means regardless of which parser (Prism or the “universal” parser) is adopted as the official solution for CRuby, the developer-facing API will be the same. This is a significant win for the Ruby ecosystem, as it means developers can develop against Prism today and do not have to worry about whichever internal solution CRuby ends up choosing.

Wrapping up

We hope this post has brought some clarity as to the relationship between Prism, LRama, “universal” parser, and the broader Ruby frontend ecosystem. We are excited about the progress the whole community has made in rallying around a single syntax tree, and are very excited about the possibilities this enables. By sharing a single source of truth for parsing, the whole community can start to benefit from things like better error messages, shared indices for code navigation, and more. Overall, we are excited to see what the future holds for the Ruby frontend ecosystem, and are excited to be a part of it.

Catching Assertionless Tests

2024-02-08T00:00:00+00:00

In Shopify we have more than 300,000 tests in our core monolith. These tests cover most of our application code and provide us with a great level of confidence when making changes to our app. But do all tests still perform the duties they were intended to even if some of these tests were added more than 10 years ago? In this article we will explore how Shopify revised all tests to reveal and fix ones that were testing nothing while still being executed.

Ordinary Minitest test

To start, let’s create and run an ordinary Minitest test. This test is going to verify the status transition of an Order object:

  def test_order_archive_moves_status_to_archived
    order = Order.new
    assert_equal(:open, order.status)

    order.archive!
    assert_equal(:archived, order.status)
  end

Now, let’s run the test:

# Running:
1 runs, 2 assertions, 0 failures, 0 errors, 0 skips

The output looks good, all as we expected: 2 assertions have been reported as a result of two assert_equal calls. Now, have a look at another test. This time the test verifies that MyCollection serialization happens by calling the encode method of an injected serializer. This test doesn’t care about serialization details so we will create a mock to serve as a test serializer:

def test_collection_is_using_injected_serializer_to_encode_data
  serializer_mock = Minitest::Mock.new
  my_collection = MyCollection.new(serializer: serializer_mock)
  my_collection.push("data")
  expected_serialized_data = "[my_serialized_collection]"

  serializer_mock.expect(:encode, expected_serialized_data) do |arg|
    assert_equal(["data"], arg)

    assert_equal(expected_serialized_data, my_collection.serialize)
  end
end

Running it:

1 runs, 0 assertions, 0 failures, 0 errors, 0 skips

That’s unexpected! No assertions even though we know we are calling assert_equal at least twice. What a great opportunity to utilize the puts debugging technique and provide a little more insight into the execution flow of the test:

def test_collection_is_using_injected_serializer_to_encode_data
  puts "test starts"
  serializer_mock = Minitest::Mock.new
  my_collection = MyCollection.new(serializer: serializer_mock)
  my_collection.push("data")
  expected_serialized_data = "[my_serialized_collection]"

  puts "before stubbing the :encode call"
  serializer_mock.expect(:encode, expected_serialized_data) do |arg|
    puts "before assertions"
    assert_equal(["data"], arg)

    assert_equal(expected_serialized_data, my_collection.serialize)
    puts "after assertions"
  end
  puts "test ends"
end

And run the test:

# Running:

starting the test
before stubbing the :encode call
test ends

It looks like the test doesn’t execute the block passed to the expect method and thus doesn’t perform the assertion. Let’s have a look at the Minitest::Mock#expect docs. Ah! We have confused Minitest::Mock#expect method usage with the Object#stub provided by Minitest which stubs the method call for the duration of the block. Unlike Object#stub, Minitest::Mock#expect only requires block if we want to perform additional assertions on arguments passed to the method which is not needed in our case. Let’s fix the #expect method call:

def test_collection_is_using_injected_serializer_to_encode_data
  serializer_mock = Minitest::Mock.new
  my_collection = MyCollection.new(serializer: serializer_mock)
  my_collection.push("data")
  expected_serialized_data = "[my_serialized_collection]"

  serializer_mock.expect(:encode, expected_serialized_data, [["data"]])
  assert_equal(expected_serialized_data, my_collection.serialize)
end

And run the test once again:

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Awesome! Now it actually performs the assertion.

Could have this been prevented?

What an easy mistake to make but hard to spot! That could have easily gotten merged if not for our attention to the assertion. Was there a way to prevent this from the beginning?

To prevent similar situations from happening again we will patch MiniTest::Unit::LifecycleHooks#after_teardown hook with a logic that fails the test if the one performs no assertions:

module CheckAssertions
  def after_teardown
    super

    return if skipped? || error?

    raise(Minitest::Assertion, "Test is missing assertions") if assertions.zero?
  end
end

Minitest::Test.prepend(CheckAssertions)

We are not interested in skipped or naturally failed tests so we will add an early return. Otherwise we will fail the test if it performed 0 assertions. Running our assertionless test from the beginning results in:

  1) Failure:
MyTest#test_collection_is_using_injected_serializer_to_encode_data [minitest_example_test.rb:39]:
Test is missing assertions

1 runs, 0 assertions, 1 failures, 0 errors, 0 skips

That serves as a much more efficient measure against this test being merged into the codebase and doing nothing, without us noticing.

Raising an exception is a clean solution for a newly created or a small application but for a large existing codebases you may benefit from utilizing a solution like deprecation_toolkit to record existing violations first to gradually address them but to immediately prevent new violations.

Two types of assertionless tests

After learning about assertionless tests and how to catch them let’s get familiar with two categories we can put assertionless tests into: truly broken tests and tests that are valid but still report no assertions.

Truly broken tests

The first category includes tests that are truly broken and do not perform intended assertions. We just explored an example of such test but let’s have a look on one more:

def test_all_published_posts_should_have_a_reviewer do
  published_posts = Post.published.to_a

  published_posts.each do |published_post|
    assert_predicate published_post.reviewer, :present?
  end
end

The test above iterates over a published posts array and verifies that each published post has a reviewer. Is this an assertionless test? It depends. It may suddenly become an assertionless tests if published scope changes in a way that it returns no records in your test environment iterating over an empty array and never calling assert_predicate. Luckily the check we added will prevent this test from ever passing while issuing no assertions.

In addition to the assertionless tests reporter I prefer verifying the setup of the test before even attempting to make any assertions, for example:

def test_all_published_posts_should_have_a_reviewer do
  published_posts = Post.published.to_a

  flunk("test requires non-empty published posts collection") if published_posts.empty?

  published_posts.each do |published_post|
    assert_predicate published_post.reviewer, :present?
  end
end

The flunk line will prevent published_posts from ever becoming empty. It doesn’t have to be flunk, it can also be an exception raise or an assertion similar to the assert_equal(:open, order.status) we added in our first test to make sure that objects in our test satisfy the initial requirement. I prefer using flunk to distinguish setup-type assertions from assertions that verify behavior of the code.

Valid but assertionless tests

The second category consists of tests that will trigger the assertionless exception we just added while being perfectly valid tests. The most common example of a test from this category would be a test that is expected to fail when exception is raised, otherwise the test should be considered as successfully passed. For example:

  def test_passed_if_nothing_raised
    MyCode.doesnt_raise
  end

A test like the one above will fail the test run because of the assertionless exception raise we added. And the most common solution would be to wrap the code in an assert_nothing_raised call which will increment the assertions count behind the scenes. Though some people may prefer adding an explicit assertion of the code return value instead of assert_nothing_raised which is also a valid approach.

  def test_passed_if_nothing_raised
    assert_nothing_raised { MyCode.doesnt_raise }
  end

In rare cases when there is no suitable assertion to be used or we are using a manually built assertion helper - explicitly incrementing assertions counter is a fair option. For example:

  def test_passed_if_nothing_raised
    MyCode.doesnt_raise
    # The test passed if the line below has been executed
    self.assertions += 1
  end

This is the downside of having an assertionless check added to the test suite - some tests will have to become more verbose in order to satisfy the check. However, some may argue that it’s actually a benefit and those tests become more readable and intentions of the tests are clearer.

Opportunity to contribute to open source projects

Improving tests is a great way of making your first contribution to open source along with learning about software internals. Changing tests has no risk of breaking main code and test improvements, especially if those improvements fix completely broken tests, always welcomed by maintainers. Give it a try in your favorite codebases and see how many asertionless tests it reveals. You may use one of the PRs from rails/rails as an example - https://github.com/rails/rails/pull/48065

RSpec

Unfortunately, at this moment RSpec does not have an assertions/expectations counter so it is not feasible for us to setup a similar logic to catch assertionless tests in RSpec. However, there is an open issue in the rspec-core repository that is waiting to be picked up.

Conclusion

Testing the tests themselves is highly beneficial, enhancing overall confidence in the test suite. It also presents a valuable opportunity for open source contributions, allowing individuals to optimize slow tests and ensure their intended behavior. By actively engaging in these contributions, we can collectively strengthen the reliability and effectiveness of open source projects.

A Packwerk Retrospective

2024-01-26T00:00:00+00:00

In September, 2020, our team at Shopify released a Ruby gem named Packwerk, a tool to enforce boundaries and modularize Rails applications. Since its release, Packwerk has taken on a life of its own, inspiring blog posts, conference talks, and even an entire gem ecosystem. Its popularity is an indication that Packwerk clearly filled a void in the Rails community.

Packwerk is a static analysis tool, similar to tools like Rubocop and Sorbet. Applied to a codebase, it analyzes constant references to help decouple code and organize it into well-defined packages.

But Packwerk is more than just a tool. Over the years, Packwerk’s approach to modularity has come to embody distinct and sometimes conflicting perspectives on code organization and its evolution. Packwerk’s feedback can change the entire trajectory of a codebase to a degree that distinguishes it from other tools of its kind.

This retrospective is our effort, as the team that developed Packwerk at Shopify, to shine a light on our learnings working with the tool, concerns about its use, and hopes for its future.

Origins of Packwerk

Packwerk as a Dependency Management Tool

“I know who you are and because of that I know what you do.” This knowledge is a dependency that raises the cost of change. – Sandi Metz, Practical Object-Oriented Design in Ruby

Sandi Metz’ quote above captures the spirit from which Packwerk was born. The premise is simple. To use Packwerk, you must first do two things:

Define a set of packages, captured in (possibly nested) file directories.
Define a non-circular set of dependency relationships between these packages.

With this done, you can then run Packwerk’s command-line tool, which will tell you where constants from one package reference constants from another package in ways that violate your stated dependency graph. Violations can be temporarily “allowed” via todo files (package_todo.yml); this makes it possible to “declare bankruptcy” in a codebase by generating a todo file for existing violations and using Packwerk to prevent new ones from creeping in.

The pursuit of a well-defined dependency graph in this way should, in theory, make application code more modular and less coupled. If a section of an application needs to be moved, it can be done more easily if its dependencies are explicitly defined. Conversely, circular dependencies tangle up code and make it more difficult to understand and refactor.

Packwerk as a Privacy Enforcer

In the metaphor of carrots and sticks, privacy is sugar. It’s easy to understand and has broad appeal, but it may not actually be good for you. – Philip Müller, original author of Packwerk (link)

Packwerk acquired an entirely different usage in its early stages, in the form of “privacy checks” which could be enabled on the same set of packages above to statically declare public APIs. Constants that were placed in a separate public directory were treated as “public” and could be referenced from any other package. Other constants were considered “private” and references to them from other packages were treated as violations, regardless of dependency relationships.

As expressed in the quote above by Philip Müller, privacy checks were never intended to be the main feature of Packwerk, but it is easy to see their appeal. Dependencies in large sprawling codebases can be difficult to correctly define, and even harder to resolve. Declaring a constant public or private, in contrast, is simple, and closely resembles Ruby’s own concept of private and public methods.

Unfortunately, while easy to use, Packwerk’s privacy checks introduced several problems. Some of these were problems of implementation: the checks required a separate app/public directory for code that was meant to be public API. This broke Rails conventions on file layout by introducing a folder under app that denoted privacy level instead of architecture concepts. Confusion around where files should go resulted in new subdirectories being created for controllers and jobs, duplicating those that already existed under app. As public API, these subdirectories should have been documented and well thought-out, but Packwerk didn’t encourage this level of detail. Thus, we ended up with endless poorly-documented public code that was never meant to be public in the first place.

Yet there was a deeper problem, namely that privacy checks had transformed Packwerk into something it was never intended to be: an API design tool. Packwerk was being used to ensure that packages communicated via blessed entrypoints, whereas its original purpose was to define and enforce a dependency graph. Package A using package B’s code (even its public API) is not acceptable if package A doesn’t depend on B, yet we found developers were focusing on the design of their APIs over the dependencies in their code. This was drawing attention away from the problem the tool had been created to solve.

Given these issues, privacy checks were removed from Packwerk with the release of version 3.0.

We have found that the biggest issues with Packwerk are related to what the tool does not do for you: what it cannot see, what it cannot know, and what it does not tell you.

Using Packwerk starts with declaring your packages: what code goes where, and how each set of code depends on the rest. The choice of packages and their relationships can be fiendishly difficult to get right, particularly in a large codebase where historically everything has been global. While you can change your package definitions later, any such changes come with a potential cost in terms of the time and effort spent isolating code that now ends up back together. Packwerk provides no guidance here, and is happy with any choice you make. It will generate for you a set of todo files that get you to your stated goal. Whether this work will actually get you to a better place, however, is another question entirely.

Pushing the responsibility of drawing the dependency graph for an application onto the developer can often lead to incorrect assumptions on how code is coupled. This is particularly true if you only work with one section of a larger codebase, or don’t have a good grasp on dependency management and code architecture.

We have found that developers tend to group code into packages based strongly on semantic clues that in many cases have little relation to how the code actually runs. We have a model in our monolith, for example, that holds “shop billing settings”, including whether a shop is fraudulent. This model was placed in a “billing” package by virtue of its name, but this was the wrong place for it: detecting fraudulent shops is essential to handling any shop request, not just those related to billing details. Our solution was to ignore the semantics of its name and move it to the base of our dependency graph, making it available to any controller.

This kind of decision is hard because it goes against our intuition, as humans, to abide by the naming of things. Packwerk operates entirely on the basis of the high-level view of the codebase we provide it, which is often strongly influenced by this intuition; if the graph of dependencies it sees is misaligned with reality, then the effort developers exert resolving dependencies may bring little to no benefit. Indeed, such efforts may even make the code worse by introducing indirection, rendering it more complicated and harder to understand.

Even assuming a well-drawn dependency graph, the problem arises of how to resolve violations. Packwerk does not provide feedback on how to do this; it only sees constant references and how they relate to the set of packages you have provided. This makes it difficult to know if you’re doing the right thing or not when approaching fixes for dependency violations.

There are further blind spots that can make these problems worse. Like other static analysis tools, Packwerk is unable to infer constants generated dynamically at runtime. However, Packwerk has a far more limiting gap in its picture of application constants because of its dependence on Zeitwerk autoload directories. Constants loaded using mechanisms like require, autoload, or ActiveSupport::Autoload are untracked and invisible to the tool. As a result, a package that is well-defined according to Packwerk (has no violations left to resolve) may actually crash with name errors when its code is executed.

Further to Packwerk not seeing the full picture, if you’re using full Rails engines as packages like us, it doesn’t help with sorting through routes, fixtures, initializers, or anything outside of your app directory. Anything that isn’t referenceable with constants becomes implicit dependencies that Packwerk can’t see. This often causes more problems that are only visible at runtime.

A Package with Zero Violations

The blind spots mentioned above become the most obvious when you actually attempt to run packaged code in isolation. Running in “isolation” here means loading a package together with its dependencies and nothing else. In theory, a package that has no violations, whose dependencies themselves also have no violations, should be usable without any other code loaded. This is the point of a dependency graph, after all.

Recently, we decided to put Packwerk to the test and actually create such a package. To keep things simple, we chose for this test the only part of our monolith that should, by definition, have no dependencies. This “junk drawer” of code utilities, named “Platform”, holds the low-level glue code that other packages use. Platform’s position at the base of the monolith’s dependency graph made it an obvious choice for our first isolation effort.

Platform, however, was not even remotely isolated when we started. Having a clean slate was important, so rather than begin with Platform itself, we instead carved out a new package under it that would only contain its most essential parts. Into this package, which we named “Platform Essentials”, we moved base classes like ApplicationController and ApplicationRecord, along with the infrastructure code that other parts of the monolith depended on to do pretty much anything. Platform Essentials would be to our monolith what Active Support is to Rails.

The exercise to isolate this package was an eye-opener for us. We achieved our goal of an isolated base package with zero violations and zero dependencies. The process was not easy, however, and we were forced to make many tradeoffs. We relied heavily on inversion of control, for example, to extract package references out of base layer code. These changes introduced indirection that, while resolving the violations, often made code harder to understand.

We were greeted at the zero violation goal line with a surprising discovery: a bug in Packwerk. Packwerk was not cleaning up stale package todos when all violations were resolved. The fact that this bug, which we patched, had been virtually unnoticed until then indicated that we were likely the first Packwerk user to completely work through an entire package todo file, years after its initial release. This confirmed our suspicion that the rate at which Packwerk was identifying problems to its users vastly outpaced their capacity to actually fix them (or interest in doing so).

Having resolved all Packwerk violations for our base package, we then attempted to actually run it by booting the monolith with only its code loaded. Unsurprisingly, given the issues mentioned in the last section, this did not work. Indeed, we had yet more violations to resolve in places we had never considered: initializers and environment files, for example. As mentioned earlier, we also had to contend with code that was loaded without Zeitwerk, which Packwerk did not track. We fixed these issues by moving initializers and other application setup into engines of the application, so that they were not loaded when we booted the base layer on its own.

With boot working, we went a step further and created a CI step to run tests for the package’s code in isolation. This surfaced yet more issues that neither Packwerk’s static analysis nor boot had encountered. With tests finally passing, we reached a reasonable confidence level that Platform Essentials was genuinely decoupled from the rest of the application.

Even for this relatively simple case of a package with no dependencies, our effort to reach full isolation had taken many months of hard work. On the one hand, this was far more than one might expect for a single package, hinting at the daunting scale of dependency issues left to address in our monolith. The fact that so much work remained to be done even after resolving dependency violations was an indication of Packwerk’s limitations and the additional tooling needed to fill gaps in its coverage.

In truth, though, the exercise was not really about Packwerk. It was about isolation, and whether such a thing was even possible in a codebase of this size, built on assumptions of global access to everything. And on this question, the exercise had been a resounding success. We did something that had never been done before in a timespan that had a concrete completion date. We implemented checks in CI to ensure our progress would never be reversed. We had made real, tangible progress, and Packwerk, given the right context, had played a key role in making that progress a reality.

Domain versus Function in Packages

Shopify organizes its monolith into code units called “components”. Components were created many years ago by sorting thousands of files into a couple dozen buckets, each representing its own domain of commerce. The monolith’s codebase was thus divided into directories with names like “Delivery”, “Online Store”, “Merchandising” and “Checkouts”. With such a large change, this was a great way at the time to partition work for teams, limit new component creation, and bring order to a codebase with millions of lines of code.

However, we quickly discovered that domains and the boundaries between them do not reflect the way Shopify’s code actually functions in practice. This was immediately obvious when running Packwerk on the codebase, which generated monstrously large todo files for every component. With every new feature added, these todo files grew larger. Developers could resolve some of these violations, but often the fixes felt unnatural and overly complicated, like they were going against the grain of what the code was actually trying to do.

There was an important exception, however. The monolith’s Platform component, described earlier, was from the start a purely system-level concern. Along with a couple others like it, this component never fit into the mold of a “commerce domain”. This made it an oddball in a domain-centric view of the world. When we shifted our focus to actually running code, as opposed to simply sorting it, the purely functional nature of this component suddenly became very useful, however. Unlike every other component, Platform’s position in the dependency graph was obvious: it must sit at the base of everything, and it must have zero dependencies.

The focus on running code has instigated a rethink of how we organize our monolith. We are faced with a dichotomy: some components are domains, while others are designed around the functional role they play in the application. A checkout flow is a function defined as the code required for a customer to initiate a checkout and pay for their order. Our “checkouts” component, however, contains a number of concerns unrelated to this flow, such as controllers and backend code for merchants to modify their checkout settings. This code is part of the checkout domain, but not a part of checkout flow functionality.

Actually running packages in isolation requires them to be defined strictly on a functional basis, but most of our components are defined around domains. Recently, our solution to this has been to use components as top-level organizational tools for grouping one or more packages, rather than a singular code unit. This way, teams can still own domains, while individual packages act as the truly modular code units. This is a compromise that accommodates both the human need for understandable mental models and the runtime need for well-defined units of a dependency graph.

Packwerk is a Sharp Knife

When attempting to modularize a large legacy codebase, it’s easy to get carried away with ideas of how code should behave. Packwerk lends itself to this tendency by allowing you, the developer, to define your desired end state, and have the tool lead you to that goal. You decide the set of packages, and you decide the dependency graph that links them together. Just work down the todo file, and you will reach the code organization you desire.

The problem with this view is that it is hard to know if it will lead to concrete results. Code exerts a powerful drive in the direction of function. It is much harder to bend this behavior to fit your mental models than it is to bend your mental models to fit what a codebase actually does.

We learned this lesson the hard way. We started with a utopian vision for our monolith, with modular code units representing domains of commerce and cleanly-defined dependencies relating them to each other. We built a tool to chart a course to our goal and applied it to our codebase. The work to be done was clear, and the path forward seemed obvious.

Then we actually sat down to do the work, and things began to look a lot less rosy. With hard-fought gains and messy tradeoffs, we made it through the todos for a single package, only to find that we were likely the first to reach the finish line. Our achievement turned out to be bittersweet, since our code was still broken and unusable in isolation. The utopia we had imagined simply did not exist, and the tool we thought would get us there was leading us astray.

What turned this situation around for us was the realization that running code, more than any metric, will always be the best indicator of real progress. Packwerk has its place, but it is just one tool of many to measure aspects of code quality. We achieved a small but significant victory by being highly pragmatic and broadening our understanding to leverage an approach we hadn’t originally considered.

Like many other tools in the Rails ecosystem, Packwerk is a sharp knife, and it must be wielded with care. Be intentional about how you use it, and how you fix the violations it raises. Always ask yourself if the violation is an error at the developer level, or at the dependency graph level. If it is at the graph level, consider adjusting your package layout to better match the dependencies of your code.

At Shopify, we often stress test our assumptions and revisit the decisions we made in the past. We have discussed removing Packwerk from our monolith, given the costs it incurs and the weaknesses and blind spots described earlier. For us, the technical debt introduced from privacy checking is still a long way from being paid off. Packwerk has however provided value in holding the line against new dependencies at the base layer of our application. However imperfect, its list of violations to resolve is an effective way to divvy up work toward a well-defined isolation goal.

Our learnings using Packwerk have informed a larger strategy for modularizing large Rails applications, one that is strongly oriented toward running code and executable results rather than philosophical ideals. While no longer as central as it once was, Packwerk still plays a role at Shopify, and will likely continue to do so over the years to come.

Shopify at RubyConf 2023

2024-01-18T00:00:00+00:00

Shopify continues investing in Ruby and Rails to help ensure that they are 100-year tools. Moving them forward and bringing the community with us are a large part of that work, and participating in community events like RubyConf is an important part of that.

The first day at RubyConf—appropriately titled Community Day—was a chance for people to learn and work together on open-source. The most important event on that day for us was Hack Day, which was very similar to something we do at Shopify; Ufuk Kayserilioglu was part of the committee planning & organizing that day of the conference, and he brought up some of his personal experience to help set it up.

During Hack Day, contributors from different open source projects mentored attendees into making contributions; many members from the Ruby and Rails Infrastructure team participated or led projects.

Many pull requests came out of the collaboration with the community—too many to list them all here. Some of the contributions include:

The Ruby LSP and the official VS Code extension got contributions for things like ERB support exploration, an add-on for rubyfmt, inlay hints for omitted hash values, and more.
Prism (formerly known as YARP) got several bug fixes and multiple improvements.
IRB, which we also contribute to as part of our effort in improving developer experience, got improvements and a new feature.
RubyGems / Bundler also had people starting to contribute with improvements and new features (PRs should be open soon!).
We got more familiar with the developer experience using Steep (both for inspiration and potential contributions in the future), and also helped the Steep maintainer get started with building a Ruby LSP add-on for Steep (which could allow him to not have to maintain a separate extension, and further reduce fragmentation in the Ruby tooling ecosystem).
Ruby itself had coverage improvements to Variable Width Allocation, general bug fixes, and volunteers also did some tidying up on the issue tracker.

The contributions, pairing, and collaboration didn’t stop at Community Day: many people used Shopify’s booth to continue the work in the following days as well, like we did at RailsConf 2023.

Three Shopifolk presented talks at the conference, and the recordings are already available on YouTube.

Jenny Shen gave a talk titled Demystifying the Ruby package ecosystem:

Jemma Issroff gave a talk titled Popping Into CRuby:

Kevin Newton (who came straight from Japan after receiving the Ruby Prize!) gave a talk titled The Future of Understanding Ruby Code:

These three talks were all about education: demystifying concepts, ideas, and tools so more people can use and contribute to the community.

Kevin’s talk also acted as an example of and a call-to-action regarding reducing fragmentation in the community. We strongly believe that the community can be much stronger and that our tools can be much better if we focus our efforts; this has been reflected in our approach with projects such as Prism and the Ruby LSP, which have been quickly replacing multiple competing (and often unmaintained) projects that do the same thing in slightly different ways.

Matz echoed that point in his keynote: we’re stronger when we contribute and work together; he also mentioned that it’s no longer enough to improve a language: the ecosystem and the developer tools are more important than ever, and we should invest heavily in them. This is further validation for our team’s goals and approach.

Last but not least, we’re happy to announce that Ufuk joined the board of directors of Ruby Central, an important step in our ongoing collaboration with Ruby Central and the community as a whole.

Rails at Scale

An Introduction to the Ruby LSP Add-on System

Overview

Introduction

Architecture

Example: Hover

Capabilities

Indexing Enhancements

Limitations

Case Study: Ruby LSP Rails

Case Study: Standard

Other Add-ons

Future Vision

Next Steps

Mastering Ruby Code Navigation: Major Ruby LSP Enhancements in the First Half of 2024

Table of Contents

Code Navigation With Ruby LSP

Hover

Go-to-Definition

Completion

Signature Help

Code Navigation Enhancements

Singleton Methods

Local Variables

Inheritance and Mixins

Ruby Core Classes, Modules, and Methods

Code Navigation Enhancements for the Rails Addon

Go-to-Definition for ActiveRecord Model Callbacks and Associations

Navigation to Routes and Views

Code Navigation in .erb Files

Experimental Features

Ancestors Hierarchy Request

Why Is It Experimental?

Guessed Types

How It Works

Important Notes

Recap

Mid-Year Review: IRB and Rails Console Enhancements in the First Half of 2024

What’s New

Help Command and Help Message Improvement

General Improvements

Detailed Help for Specific Commands

Examples

Extension API

Helper Methods vs. Commands

Real-world Examples

New Commands

The disable_irb Command

Primary Use Case

Example

The cd Command

Example

Help message

Improvement on Existing Commands

The show_source Command

Enhancements

Bug Fixes

irb:rdbg Sessions

Bug Fixes

Repeating Previous Command Execution

Additional Information

Rails Console Improvements

Rails Specific Prompt

Integration with IRB’s Help Message

Example

Recap

Finding Memory Leaks in the Ruby Ecosystem

ruby_memcheck

Implementing RUBY_FREE_AT_EXIT

Circular dependencies

Impacts of RUBY_FREE_AT_EXIT

Memory leak in Regexp timeout

Fix for memory leak in Regexp timeout

How you can use RUBY_FREE_AT_EXIT

Finding native level memory leaks in native gems

Finding native level memory leaks in Ruby apps

Request for the Ruby community

Fixing a footgun in ActiveRecord::Core#inspect

to_s vs. inspect

The footgun

Code Navigation in `.erb` Files

The `disable_irb` Command

The `cd` Command

The `show_source` Command

`irb:rdbg` Sessions

Implementing `RUBY_FREE_AT_EXIT`

Impacts of `RUBY_FREE_AT_EXIT`

How you can use `RUBY_FREE_AT_EXIT`