How We Used a SQLite Memory DB for Rails Benchmarking

Here at Shopify, we recently adapted the Lobste.rs source code into a Rails benchmark for YJIT. But we want very little database time in our benchmarks, since we’re trying to optimise Ruby. One way to do that is to use SQLite rather than a separate database like MySQL or PostgresSQL. In fact, faster yet is an in-memory SQLite database. It doesn’t even write to disk. But that’s an unusual setup, and wants some code to set it up. Want to see how we did it?

But Where Does Data Come From?

An in-memory database isn’t shared between processes. Instead, there needs to be one process, with its own database.

So that means no database migrations since they’d get their own separate blank database. It means no running rake db:seed since the seed data would evaporate the moment rake finished.

So initially we configured SQLite the normal way with a production.sqlite3 file on disk. We ran Lobste.rs’ surprisingly cool equivalent of db:seed to fill it with fake data. And then we had a nice on-disk database sitting around.

Which we couldn’t use, of course. The in-memory database starts out blank every time a new process happens.

To create your own on-disk database, use a standard SQLite config/database.yml with a filename for the database rather than “:memory:”. Then you could run bin/rails db:create db:migrate db:seed and then copy the resulting file. You’ll see what we did with that file below.

But Isn’t a Rails App Multiple Processes?

We know we can’t use migrations or db:seed. But isn’t the app already multiple processes? Depends how you do it.

We used a small, tight benchmarking setup. We didn’t use a real application server like Puma or Unicorn, and we definitely didn’t use multiple Rails servers. Just one process.

We didn’t have a separate load-test process. Rack, Ruby’s low-level HTTP server API, has a small, clean API for creating mock web requests and we built those in advance to send into the app. That way there’s a minimum of multi-process communication, just hashes that we send into a method call, all in one process.

And in Rack, running your app is literally just calling it like a method, though we wrapped it in another small method to track cookies from request to request.

This wouldn’t work if we were trying to measure performance for the app server, or socket overhead, or HTTP parsing. But that’s exactly what we didn’t want for this benchmark. We wanted to measure the Rails framework and the application. This setup was perfect.

Does the Data Just Magically Show Up?

This is Rails. Doesn’t it magically fix anything we stop paying attention to? Alas, no.

Instead we need to load the schema and the data when the process starts up. You can dump the database as SQL and execute it. That’s what we did at first. But there’s a better way.

SQLite has an “online backup” operation that fits what we want perfectly. You can back up one SQLite database into another. Ruby’s SQLite3 gem exposes the operation so we can use it.

And so every time we start up we can “back up” the on-disk SQLite binary file into the in-memory database before the benchmark starts. It takes a few seconds, but then we don’t have to do the same thing during the benchmark.

Enough Exposition, This Isn’t a Recipe Web Site

So what does the code look like?

We need a database.yml for the memory-only configuration:

# config/database.yml
production:
  adapter: sqlite3
  pool: 5
  timeout: 5000
  database: ":memory:"

We also need to load the disk database into the memory database:

mem_db = ActiveRecord::Base.connection.raw_connection
file_db = SQLite3::Database.new('db/production.sqlite3')  # Use your filename here
b = SQLite3::Backup.new(mem_db, 'main', file_db, 'main')
b.step(-1) # import until finished
b.finish # destroy the Backup object

And mostly that’s it. Just keep in mind that you’ll have to execute the DB backup code on every process start. You could also use this method for unit tests or other situations where you want really fast DB access and don’t mind a bit of startup delay. But SQLite3::Backup is going to be faster than parsing YAML and inserting via Active Record!