Winning at Consistency

Do you use validates_uniqueness_of in Rails? Do you feel confident that it works to prevent duplicate records? If you’re like most of us, you won’t have given it much thought, but since I’m asking, you’re second-guessing yourself.

And indeed you should. More and more, the Rails documentation and community blogs reflect the problems with validates_uniqueness_of (see Further Reading below), and the fact that you can’t depend on it to prevent duplicates in your database.

“You can’t?!?” Right. It’s called validates_uniqueness_of, and it will perform that validation… right up until your project hits production.

First, let’s take a look at what happens normally. When saving an ActiveRecord, your validations run first. So by virtue of the validates_uniqueness_of call, ActiveRecord will check the database to see if this User is the only one with that email address. If so, great! Save away!

1 App Server                          Database               Site Visitor
2 |  --- is the record unique? --->   |                      |
3 |               <--- yes! -------   |                      |
4 |  --- Great! save! ------------>   |                      |
5 |                 ----------  things went great :)  --->   |

Otherwise, the save fails, and errors are populated on the object to let you know what went wrong.

1 App Server                          Database                Site Visitor
2 |  --- is the record unique? --->   |                       |
3 |                <--- no! -------   |                       |
4 |                 --------------  we had problems :(  --->  |

Everything looks good so far. Now what’s the big deal about your app going to production? Well, with any reasonably sized application, you’ll need multiple application server processes or at least threads (Mongrel, Passenger, Tomcat, etc.) to handle incoming traffic. And that’s when the problem strikes.

The Problem

Now Impatient Ian creates an account, quadruple-clicking the “Create Account” button after filling out his info. Let’s say either there isn’t any Javascripty double-click protection on that button or he has Javascript turned off.

Now somewhere in the intertubes, those four requests are all vying for your app servers’ attention, and two of them happen to hit the app servers simultaneously.

Now, each process checks the database to see if the newly constructed but unsaved object is unique for the given scoping, as before. Now, both of them say “yes”, and so both of them can go ahead and save. Hooray!

 1 App Server 1                        Database               Site Visitor
 2 |  --- is the record unique? --->   |                      |
 3 |                <--- yes! ------   |                      |
 4 
 5 App Server 2                        Database               Site Visitor
 6 |  --- is the record unique? --->   |                      |
 7 |                <--- yes! ------   |                      |
 8 
 9 
10 App Server 1                        Database               Site Visitor
11 |  --- Great! save! ------------>   |                      |
12 |                 ----------  things went great :)  --->   |
13 
14 App Server 2                        Database               Site Visitor
15 |  --- Great! save! ------------>   |                      |
16 |                 ----------  things went great :)  --->   |

Wait. Not hooray—the opposite of that. Now we have duplicate records in our database, after we explicitly said that we didn’t want that to happen.

So that’s the problem: validates_uniqueness_of doesn’t work as our intuitions might lead us to expect. What’s the solution? Well, if you need to make sure you’ve got no duplicate records, you’ll want a database-level constraint: a unique index.

Has_one has similar problems, though there are dozens of possible root causes of those in application logic, so I’ll leave that as an exercise to the reader.

A Solution: consistency_fail

Consistency_fail is a brand-new gem I wrote that aims to make it easier to fix these problems. By installing the gem for Rails 3…

1 gem install consistency_fail

…or alternatively for Rails 2.3…

1 gem install consistency_fail -v=0.1.1

… you get a consistency_fail executable that will print out a report of the indexes you’re missing.

Here’s an example. We’ve got two models in this Rails 3 project, both of which have missing indexes:

 1 class User < ActiveRecord::Base
 2   validates_uniqueness_of :email
 3 
 4   has_one :address
 5 end
 6 
 7 class Address < ActiveRecord::Base
 8   validates_uniqueness_of :street, :scope => [:zip]
 9 
10   belongs_to :user
11 end

So, getting an exhaustive list of missing indexes is as easy as consistency_fail:

 1 sad_panda % consistency_fail
 2 
 3 There are calls to validates_uniqueness_of that are not backed by unique indexes.
 4 --------------------------------------------------------------------------------
 5 Model    Table Columns
 6 --------------------------------------------------------------------------------
 7 Address  addresses (street, zip)
 8 User     users (email)
 9 --------------------------------------------------------------------------------
10 
11 There are calls to has_one that are not backed by unique indexes.
12 --------------------------------------------------------------------------------
13 Model  Table Columns
14 --------------------------------------------------------------------------------
15 User   addresses (user_id)
16 --------------------------------------------------------------------------------

The first column of the report, labeled Model, shows you where the call to validates_uniqueness_of or has_one issues from, and the Table Columns column shows you the table with the database columns that need a unique index. Multiple column names in parentheses mean that a composite unique index is required.

For performance reasons, you’ll want to think carefully about how you order the columns in a multicolumn unique index, but for our purposes, any ordering will work to enforce uniqueness.

It’s worth noting here that by adding these indexes, we will protect our data, but at the risk of bubbling a database level uniqueness violation exception to the top level, and potentially showing our user an error page.

Unless, that is, we want to hack ActiveRecord up to catch the database specific exception (done it!) and give a nice explanation. So, buyer beware for the time being, there’s application specific code that will need to be written if handling these user-facing errors is a concern.

I’d love to hear whether you find this reporting from consistency_fail to be valuable, and what improvements you’d like to see.

Further reading:

Colin Jones, Director of Software Services

Colin Jones is particularly interested in web security and functional programming, using languages like Clojure.