Let's Have an Unhurried Conversation about Optimizing Queries in Rails

Learn about ActiveRecord callbacks & the mechanisms rails has in place for getting around them

by Dan McClory

tl;dr: This is a post about ways to temporarily circumvent ActiveRecord callbacks in Rails. There are tons of questions about this on Stack Overflow but no article comprehensively compares the options. We'll go over this flowchart:

Flow chart to help you decide when its a good time to write SQL

Here's a problem we had: On one of our projects, we needed to be able to upload a CSV file and update about 5,000 records at a time. This is not a staggering amount of data, it's actually a pretty small amount. But though the update worked, it was slow. So slow. Like, ten minutes slow. Even though the data was stored, the app also timed out and we could see that the logs were spewing hundreds and hundreds of redundant looking queries.

After we looked at it for a few minutes, we realized it was slow because we had a bunch of ActiveRecord callbacks that were redundantly triggering other callbacks on associated models.

In the normal flow of work, the callbacks perform aggregations of data that we needed, e.g: total of different categories, totals of those categories. They worked fine when we changed one record at time. But when we were bulk updating a ton of records, it became A PROBLEM.

Because we only needed to compute the aggregations once, after everything had been loaded, we knew that all of these callback executions were completely redundant. So we explored a number of ways to get around them.

Generic Example code: Blockbuster Edition

For the rest of this article, we'll use a system that models a DVD inventory for a retail chain of DVD rental shops.

class Dvd < ApplicationRecord
  belongs_to :retail_location

  # also has the property :number_of_copies

  after save: :update_inventory_count

  def update_inventory_count

class RetailLocation < ApplicationRecord
  has_many :dvds
  belongs_to :region

  # also has the property :total_dvds

  after_save :update_regional_counts

  def refresh_counts
    self.update(total_dvds: dvds.sum(:num_copies))

  def update_regional_counts

class Region < ApplicationRecord
  has_many :retail_locations

  def refresh_counts
    self.update(total_dvds: retail_locations.sum(:total_dvds))

Obviously this is kind of a silly example, but hopefully you can also see how saving a Dvd is going to trigger updates on its RetailLocation and the associated Region. And hopefully you can also see how if we were updating a few thousand DVDs, and doing nothing to the counts, repeatedly updating the count would be pointless.

In general we want to minimize the number of SQL calls, because each call is going to add a few more milliseconds. If we're doing a hundred calls for associated records in the callbacks, that could add nearly a second for a request. And if those calls are redundant and we're doing them over and over for thousands of records, that's going to bring the app to a halt. Ideally, we'll be able to get it down to 1 query.

So what options do we have for reducing the number of SQL calls? Especially when we take into account these two constraints:

  1. We can't get rid of the callbacks altogether, we can only disable or circumvent them. (Though given more time, it would be nice to move them into service objects)
  2. We can't set up a background job. (In some situations this is a perfect solution! But it was not for this project in this case.)

So that leaves us with really three types of options:

  1. disabling the callbacks temporarily.
  2. skipping the callbacks
  3. generating our own SQL. (there are a few ways to do this, and we'll compare them.)

skip_callback - It sounds good, but don't use it

One strategy, suggested in this article, is to wrap your code between calls to skip_callback and set_callback

When you set a callback on a model, like the after :save on our DVD class, ActiveRecord calls an internal method set_callback, which is actually responsible for managing lists of methods to be called after all the different events. It has a corresponding method, skip_callback, which has essentially the opposite behavior: it causes callbacks methods to be skipped for an event.



The docs for skip_callback imply that this method really should only be used when you're defining the class. Though it technically does work, in general you shouldn't use skip_callback to temporarily skip callbacks. It is appropriate in a case where you know that you always want to skip a callback.

And for our DVD example, where we're trying to do as few SQL calls as possible, this doesn't work, because save will still be called once per model.

Methods which Skip Callbacks - :thumbsup:

Rails defines ~10 methods for saving models & skipping callbacks. The most promising for our case is: update_all. (update_columns is good if you have a model loaded and want to save it's records without triggering the callbacks)

update_all works on a relation, so you can update all the records that match a query. For example, imagine that all the DVDs which have been scratched need to be flagged for removal.

When you save a bunch of records, you don't need to have them loaded. ActiveRecord will be smart enough to write an 'UPDATE' statement for you. That's promising, it's very close to our problem! You use update_all like so:

DVD.where(scratched: true).update_all(flag_for_destruction: true)

# this would also work & is totally fine
dvds = DVD.where(scratched: true)
# ... some other code
dvds.update_all(flag_for_destruction: true)

You can pass SQL fragments to update_all and ActiveRecord will generate a query that updates each record that matches the query. For example, this code will lower case the descriptions of the first 100 dvds:

# this ruby code:
DVD.limit(100).update_all('description = lower(description)')
# will generate & execute this SQL:
UPDATE "dvds" SET description = lower(description) WHERE "dvds"."id" IN (SELECT "dvds"."id" FROM "dvds" ORDER BY "dvds"."created_at" ASC LIMIT 100)

update_all works best when 1) you can make a selection based on some logical grouping that can be expressed in a select statement and 2) it you only need to change that can be uniformly applied to all the selected records. But for this use case it works great.

It doesn't work as well if you need to update arbitrary fields with arbitrary data. For that, you'll need to -

Write (or Generate) Your Own SQL - :sunglasses:

What if you have a set of 'known entities' that can't be reduced to a simple query? If, as in our original example, you're trying to update certain fields for all the records in a CSV file, you'll need to write your own SQL.

Imagine that someone from a franchise location needs to be able to upload a file with changes to prices. (I'll freely admit that this stretches the bounds of the DVD example, but stay with me a little bit longer!)

You can use ActiveRecord::Base.conection.execute to execute a query string. A little bit of interpolation will make it possible to construct a giant query that updates each record.

# assume that you dvd_updates is a list of pairs - [Dvd UPCs, new rental prices]
# eg: dvd_updates[0] == ["085391164197", 2.99]
conn = DVD.connection

query = {|isbn, price|
  "SELECT from dvds where isbn = #{conn.quote(isbn)} UPDATE price = #{conn.quote(price)}"

conn.execute query

Writing sql is a little outside the scope of this post, but fortunately there are scads of resources out there. You can also use Arel, the library that ActiveRecord uses to construct SQL. The learning curve on that library is a little steep, but the tradeoff for that investment would be less brittle code. In our original case, we stopped short of using Arel, but your mileage may vary.

In Conclusion - :wave:

That's it! Good luck speeding up any of the redundant queries you're working on!

And remember: Rails is still cool, even if people aren't paying attention to it as much anymore. Go forth and do cool stuff!

Thanks to Ray Zane and Patrick Smith for reading drafts of this post.

We'd love to keep in touch!

Opt in for occasional updates from us. Privacy Policy