Rails’s Swappable Migration Backend for Schema Changes at Scale

This post explores Rails’s swappable migration backend, a little-known feature that lets applications customize how migrations run. At Shopify, we relied on monkey patches and a brittle SQL parser to make Rails migrations work with our Schema Migrations Service. We developed the swappable backend feature to more simply adapt Rails’s migration runner to our needs. We’ll cover why and how we built this, and how Shopify uses it to power database migrations at scale.

At Shopify, we run hundreds of database migrations across many Rails applications every week. Each migration needs to be vetted for safety and executed in a way that doesn’t cause downtime for our merchants. For years, we relied on bespoke tooling and LHMs to perform online schema changes at scale. In 2021, Shopify’s database team began designing a new, centralized system for running schema migrations, the Schema Migrations Service. One of their goals was to enable developers to use vanilla Rails migrations to perform schema changes safely and with zero downtime.

Our database team built the schema migrations gem to solve this problem, but the implementation wasn’t simple. The gem relied on monkey patches and a complicated RACC parser to handle safety checking migrations and submitting them to the schema migrations service. Shopify’s Rails Infrastructure team took the opportunity to build something into the framework that would help us address our schema migration needs more elegantly. We built the swappable migration backend (available as of Rails 7.0) to give applications flexibility over how their migrations execute. Let’s dive into how Shopify uses this feature to power safe database migrations at scale.

Why Production Migrations Require a Different Approach

When you run bin/rails db:migrate in development, Rails executes your migration methods directly against the database. Each call to create_table, add_column, or add_index immediately translates to SQL that modifies your schema. This works great for local development, but at Shopify’s scale, we can’t afford to run schema changes this way in production.

LHMs are a tool for performing online schema migrations. This means that migrations can be performed without locking tables, enabling the system to stay up while the migration is running. We used LHMs for many years to perform schema changes without downtime, but this also meant that we couldn’t use Rails’s native migration API.

Shopify’s database team decided to build a Schema Migrations Service to allow developers to return to using vanilla Rails migrations, while ensuring that schema changes were still performed online behind the scenes. The idea was also to improve the developer experience around migrations by:

Requiring migrations to pass safety checks before execution (e.g. blocking column-change operations, ensuring a migration only operated on a single table, etc.).
Submitting migrations to a centralized manager to more easily orchestrate schema changes across multiple database shards, with better testing and retries behaviour.
Providing developers with more insight into which migrations were running, their progress, etc. from a comprehensive UI.

The schema migrations gem built by our DB team handled safety checking migrations and submitting them to the centralized manager. The initial implementation, however, relied heavily on monkey patches to existing migration codepaths in Rails. Rather than executing migration SQL, the gem patched Rails to capture any SQL statements. It relied on a RACC parser to extract schema change operations from the SQL, safety check them, and then transform them into a JSON DDL (Data Definition Language) to be sent to the manager.

The Rails Infrastructure team realized that this was a great opportunity to make Rails’ migration execution more flexible, so that we could meet Shopify’s schema migration needs without needing to monkey patch a bunch of code or maintain a complicated RACC parser.

Building Rails’s Swappable Migration Strategy

When we started this project in early 2022, we explored several approaches that would allow us to move away from monkey patching Rails in the gem. One idea was to use static analysis, and try to parse migration files without running them. Another was to propose schema definition objects for every migration operation, where Rails would expose Ruby representations of schema changes (like AddColumnDefinition, CreateTableDefinition, etc.) that could be translated into any format: SQL, JSON, or otherwise.

The Rails Core team had concerns about the complexity that schema definitions would introduce to Active Record, and we soon pivoted to a simpler approach: the strategy pattern. Instead of fundamentally changing how migrations represent schema changes, we’d introduce an intermediary object between migrations and the connection adapter that could customize execution behavior. This was a cleaner abstraction that solved our problem without requiring massive changes to Active Record’s internals.

In June 2022, we opened a pull request to Rails proposing this “execution strategy” pattern for migrations. The PR introduced a strategy object between the Migration class and the connection adapter. Instead of migrations directly delegating schema statement commands to the connection via method_missing, they would delegate to a strategy object that could be swapped out.

For example, suppose you call a method like create_table in a migration. Rails routes that call through a migration strategy object, which by default, is ActiveRecord::Migration::DefaultStrategy:

module ActiveRecord
  class Migration
    class DefaultStrategy < ExecutionStrategy
      private
        def method_missing(method, ...)
          connection.send(method, ...)
        end

        def respond_to_missing?(method, include_private = false)
          connection.respond_to?(method, include_private) || super
        end

        def connection
          migration.connection
        end
    end
  end
end

The default strategy sends migration methods to the connection, which executes SQL against your database. This is how migrations worked before, so most Rails developers are unaware that there’s now a strategy object working behind the scenes! However, the migration strategy class can be configured to customize how migrations are executed. As of Rails 7.0, you can set config.active_record.migration_strategy in your environment configuration (for example, in config/environments/production.rb). Pass it either a class object or a string with the class name:

# lib/custom_migration_strategy.rb

class CustomMigrationStrategy < ActiveRecord::Migration::DefaultStrategy
  def drop_table(*)
    raise "Dropping tables is not supported!"
  end
end

# config/environments/production.rb

Rails.application.configure do
  config.active_record.migration_strategy = CustomMigrationStrategy
end

Now, when you run bin/rails db:migrate, Rails will delegate all migration methods to your custom strategy, giving you complete control over how migrations are executed.

Note: Outside of production, you will likely want to stick with the default strategy for local development. This setup lets you safely use advanced migration tooling in production while keeping things fast and simple for local development. We do this at Shopify.

Serializing Production Migrations to JSON

Once Rails supported swappable migration backends, we implemented a custom strategy that serialized migrations as JSON, making them easy to submit to a remote manager. To accomplish this, our gem introduced a JsonSerializationStrategy class. This class implemented each schema change method available in migrations, using Rails’s schema definition APIs to build the necessary schema objects. We then converted these objects into JSON payloads that described each schema operation. Here’s an example of how we capture create_table operations:

class JsonSerializationStrategy < ActiveRecord::Migration::DefaultStrategy
  attr_accessor :connection, :operations

  def initialize(connection)
    @connection = connection
    @operations = []
  end

  def create_table(...)
    td = connection.build_create_table_definition(...)
    ddl = connection.schema_creation.accept(td)
    definition = extract_table_definition(td.name, ddl)

    operations << {
      type: :sql,
      op: :create_table,
      params: {
        name: td.name,
        definition: definition,
      },
    }
  end

  private

  def extract_table_definition(table_name, ddl)
    table_name_pattern = /^CREATE TABLE #{connection.quote_table_name(table_name.to_s)} /
    ddl.sub(table_name_pattern, "")
  end
end

Here’s a simplified look at how migrations are run in production, using the swappable strategy:

class ExternalMigrationsRunner
  def upload_migration(migration)
    # Run the migration, but since we're using the JsonSerializationStrategy,
    # we won't execute SQL; instead, the strategy captures all operations as JSON
    runnable_migration = migration.migration_class.new
    if runnable_migration.respond_to?(:change)
      runnable_migration.change
    elsif runnable_migration.respond_to?(:up)
      runnable_migration.up
    end

    # Extract the serialized operations from the strategy
    operations = runnable_migration.execution_strategy.operations

    # Upload to the migrations service via API
    ApiClient.upload_migration(
      name: migration.name,
      database: database_name,
      identifier: migration.version,
      operations: operations,  # JSON representation of schema changes
      table_name: migration.table,
      author: migration.author
    )
  end
end

Configuring the Migration Strategy Automatically

Rather than requiring each application to configure the migration strategy in their config file for production, the schema migrations gem leveraged an initializer to set this automatically:

# lib/schema_migrations/railtie.rb
require "rails/railtie"

class Railtie < Rails::Railtie
  ...

  initializer "schema_migrations.migration_strategy_config" do |app|
    next unless Rails.env.production?

    app.config.active_record.migration_strategy = JsonSerializationStrategy
  end
end

This initializer ensures that any application that includes the schema migrations gem has its migrations intercepted and serialized in production environments.

Reimagining Safety Checks: From SQL Parsing to Runtime Analysis

While working on the upstream strategy feature, our team was simultaneously tackling another critical problem: safety checks. Before any migration runs in production at Shopify, the gem performs safety checks to catch common mistakes that could cause downtime, such as:

Adding a NOT NULL column without a default value (check out this blog post if you’re interested in learning more)
Renaming a column (breaks downstream consumers)
Changing a column type in an incompatible way

These checks run in development too, giving developers immediate feedback before they deploy.

The old implementation of the gem’s safety checker relied on a RACC parser to analyze SQL strings, which was brittle: every time SQL syntax changed or we encountered a new edge case, the parser had to be updated. We wanted a standalone workflow for being able to safety check migrations, separate from the migrations actually being executed and submitted to the manager. Consequently, we couldn’t rely on the migration strategy to do this. Instead, we settled on a new approach that would allow us to move away from the RACC parser and reduce a lot of the complexity. We developed a MigrationOperationRecorder that “runs” a migration and records all method calls performed:

class MigrationOperationRecorder
  def initialize(migration_class)
    @migration = migration_class.new
  end

  def record
    singleton_class = @migration.singleton_class
    singleton_class.include(RecordMigrationOperations)

    if @migration.respond_to?(:change)
      @migration.change
    elsif @migration.respond_to?(:up)
      @migration.up
    end

    @migration.method_calls
  end
end

The RecordMigrationOperations module works by leveraging the same method_missing mechanism that Rails uses for migrations. Since ActiveRecord::Migration uses method_missing to route commands to the execution strategy, we define RecordMigrationOperations#method_missing to store the method call instead:

module RecordMigrationOperations
  def method_missing(method, *args, **options, &block)
    @method_calls << MigrationOperation.new(
      method: method,
      args: args,
      options: options
    )
  end

  def method_calls
    @method_calls ||= []
  end
end

Once operations are recorded, individual safety checks can inspect the migration data. Here’s an example of the SingleTableCheck:

class SingleTableCheck < BaseSafetyCheck
  def initialize(migration)
    @inspected_migration = migration
  end

  def check
    # @inspected_migration is a specialized object containing info
    # about all of the operations the migration performs, as returned
    # from MigrationOperationRecorder#record
    tables = @inspected_migration.tables

    return if tables.one?

    raise SafetyCheckError,
      "You must work with exactly one table per migration. " \
      "Split tables #{tables.to_sentence} into #{tables.length} migrations."
  end
end

This check accesses @inspected_migration.tables, which is extracted during the analysis phase, and validates that exactly one table is involved. If the check fails, it raises a SafetyCheckError with a clear message telling developers how to fix the issue.

Why Not Use a Migration Strategy for Safety Checking?

You might wonder why we used method_missing for the MigrationOperationRecorder instead of creating another strategy pattern. Couldn’t we use our newly built feature for safety checking? The answer comes down to separation of concerns and simplicity. Safety checking and migration execution serve different purposes:

Migration execution needs to be swappable because different environments (development vs. production) require different behaviours. In development, we execute SQL directly. In production, we serialize to JSON and submit to a remote service.
Safety checking needs to happen the same way everywhere. We’re analyzing which operations the migration is performing, not executing schema changes. The same safety checks run in development, CI, and production.

Using method_missing for safety checks gives us a simpler implementation that automatically captures all migration DSL methods without needing to explicitly enumerate them all. A strategy pattern would have required us to implement every migration method explicitly. Given that we only wanted to record the migration methods being called and their arguments, opting for a simpler method_missing approach made more sense.

Per-Adapter Migration Strategies

One challenge with using a global migration strategy is that it’s insufficient for applications using multiple database systems. Since its inception, Shopify has primarily used MySQL, but more recently we’ve been exploring running non-MySQL databases. Different databases have different requirements for how migrations should be serialized, which means that the migration strategy needs to be tailored to database the migrations are running against.

We could make this work by having our gem’s migration strategy inspect the database adapter at runtime and dispatch to the appropriate serialization logic. This is not ideal, though; we’re reimplementing adapter dispatch logic that Rails can handle natively. It felt like this was a missing piece in our upstream solution, so last month, we opened a PR to add per-adapter migration strategies to Rails. This feature will be available in Rails 8.2.

Instead of setting one global strategy:

config.active_record.migration_strategy = JsonSerializationStrategy

You can now register strategies directly on adapter classes:

ActiveSupport.on_load(:active_record_trilogyadapter) do
  ActiveRecord::ConnectionAdapters::TrilogyAdapter.migration_strategy =
    MysqlStrategy
end

ActiveSupport.on_load(:active_record_postgresqladapter) do
  ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.migration_strategy =
    PostgreSQLStrategy
end

Rails automatically selects the correct strategy based on the database adapter in use for each migration. For example, if you’re running migrations against a MySQL database configured with the Trilogy adapter, Rails chooses MysqlStrategy. If your migrations are running against a PostgreSQL database, Rails selects PostgreSQLStrategy. If the current adapter does not have a strategy configured, Rails will fall back to using the global strategy.

Making Rails Work for You

One of Rails’s design philosophies is convention over configuration. The majority of Rails apps don’t need to think about how their Rails migrations are performed, so we keep things simple with a default migration strategy. At the point where an application needs to customize how their migrations run, the framework provides a clear extension point. Applications can opt-into configurable behaviour as their requirements evolve.

This is also a story about how working in the open benefits everyone. We could have kept our monkey patches internal to Shopify, continuing to patch Rails as needed. Instead, we built a more maintainable solution for ourselves while also providing the Rails community with a new tool for customizing migration behaviour. If you’re running into limitations with Rails for your specific use case, consider whether there’s an opportunity for an upstream contribution that could solve your problem while benefitting the rest of the community.