Fast-Talking ActiveRecord Migrations

Most migrations do very little and run very quickly. Add a column or two; run in a second or two. Other migrations need to do much more, but we still want them to run just as quickly. And, if they’re going to take a little longer, we want to know why.

In Ace Ventura, as Ace is about to search a drained dolphin tank for clues of the crime, he says to those nearby, “If I’m not back in five minutes, just wait longer.” While good for a laugh, that’s not good “advice” from a migration. Here’s a migration that reminds me of Ace:

 1 class AddApplicationToCe < ActiveRecord::Migration
 2 
 3   def self.up
 4     add_column :continuing_educations, :origin_application, :string
 5     ContinuingEducation.find_each do |ce|
 6       ce.update_attribute(:origin_application, "bank")
 7     end
 8   end
 9 
10   def self.down
11     remove_column :continuing_educations, :origin_application
12   end
13 
14 end

A fairly typical migration, right? Add a column and give it a value for all existing records. But, look at the output from running it:

1 ==  AddApplicationToCe: migrating =================================
2 -- add_column(:continuing_educations, :origin_application, :string)
3    -> 2.9832s
4 ==  AddApplicationToCe: migrated (724.5134s) ======================

12 minutes?! That makes for a long feedback cycle, not to mention a long deploy. What’s more, that 12 minutes silently passes between the add_column call and the end of the migration. Another developer who pulls down this migration and runs it is going to wonder what’s taking so long.

Be Talkative

Whether this migration runs for 12 minutes or 12 seconds, we can be open about what it’s doing by using say or say_with_time.

Use say to display a message about what’s being done in the block that you pass to it and Use say_with_time to also display the elapsed time after the block executes.

We’ll use say_with_time:

 1 class AddApplicationToCe < ActiveRecord::Migration
 2   def self.up
 3     add_column :continuing_educations, :origin_application, :string
 4 
 5     say_with_time "Setting origin_application for all existing CEs" do
 6       ContinuingEducation.find_each do |ce|
 7         ce.update_attribute(:origin_application, "bank")
 8       end
 9     end
10   end
11 
12   ...
13 end

Here’s the output:

1 ==  AddApplicationToCe: migrating =================================
2 -- add_column(:continuing_educations, :origin_application, :string)
3    -> 2.1238s
4 -- Setting origin_application for all existing CEs
5    -> 722.3896s
6    -> 254703 rows
7 ==  AddApplicationToCe: migrated (724.5134s) ======================

That’s much more informative, but it isn’t any faster.

I Feel the Need…the Need for Speed

Like Maverick and Goose from Top Gun, we feel the need for speed. In Rails 2.3.x find_each finds records in batches of one thousand and loads them into memory. Though there are no guarantees, hopefully garbage collection will run and free that memory.

Since we’re working with several hundred thousand records in this migration, using find_each is a great way to avoid exhausting memory. We’re not just finding records, though. We’re also updating them—one at a time—with the same value. Oh.

If you have any experience with SQL, you know that it operates on sets of records. ActiveRecord enables us to work that way when we need to with methods like update_all, which constructs one SQL UPDATE statement and sends it to the database without loading the affected records.

Update all records without having to find and load any of them first? Sounds perfect. Let’s try it:

 1 class AddApplicationToCe < ActiveRecord::Migration
 2   def self.up
 3     add_column :continuing_educations, :origin_application, :string
 4 
 5     say_with_time "Setting origin_application for all existing CEs" do
 6       ContinuingEducation.update_all "origin_application = 'bank'"
 7     end
 8   end
 9 
10   ...
11 end

And the result?

1 ==  AddApplicationToCe: migrating =================================
2 -- add_column(:continuing_educations, :origin_application, :string)
3    -> 2.1238s
4 -- Setting origin_application for all existing CEs
5    -> 3.5165s
6    -> 254703 rows
7 ==  AddApplicationToCe: migrated (5.6485s) ========================

128x faster! I can wait 6s for feedback or as part of a deploy. We’re not quite done, though.

Dotting the i’s and Crossing the t’s

In our migration, we add a column and then use its ActiveRecord model to set its value for all existing records. To be sure we have immediate access to this new column via the model, we can call reset_column_information. This method tells ActiveRecord to re-read the model’s column information from its backing database table.

 1 class AddApplicationToCe < ActiveRecord::Migration
 2   def self.up
 3     add_column :continuing_educations, :origin_application, :string
 4 
 5     ContinuingEducation.reset_column_information
 6     say_with_time "Setting origin_application for all existing CEs" do
 7       ContinuingEducation.update_all "origin_application = 'bank'"
 8     end
 9   end
10 
11   ...
12 end

I don’t think I’ve ever seen a migration fail due to stale column information, but I wouldn’t chance it, since Murphy’s Law will probably only be proven true during a production deploy.

Review

  • Use say or say_with_time to be talkative about what migrations are doing when they do more than usual or take longer than a few seconds.

  • Ensure migrations do their work quickly for rapid feedback in development and when deploying.

  • Use update_all and related methods let you work closer to SQL metal when you need their speed and efficiency of working with large sets of data.
  • Craig Demyanovich, Software Craftsman

    Craig Demyanovich is an avid hockey player and loves visiting new places with his wife Sandy.