The Repository Pattern

A recurring theme on the 8th Light blog is decoupling your applications from the implementation of details. A detail can be anything from UI elements to a database or even a framework. Decoupling your application from the details provides you with the ability to defer implementation decisions until you have a clearer picture of what you really need. In addition, decoupling has the added benefit of making it easier to test your system. In this post I want to take a closer look the Repository Pattern as a way that we can decouple ourselves from one of those details: the database.

Right now you might be having the same reaction I did when I first heard the database referred to as a detail. It was something along the lines of, "WHAT? but every application needs a database! That's where all the stuff goes- how can it be a detail?!?"

Of course we do need data, but the decision on how we're going to store that data is most certainly a detail, i.e. do we use MySQL, PosgreSQL, Redis, flat-files- or even a combination of different options. And there is a huge benefit in delaying that decision as long as possible because it allows us more opportunities to learn how we're going to use the data and the ways that we'll need to access it. Then we can make informed decisions about the details of our database(s).

It's Just An Interface

At its core the Repository pattern is a simple interface. It exists as a layer between your application and your data source so that your application doesn't need to be concerned with the implementation of data storage. As illustrated in the diagram below, instead of talking directly to the database your application uses the interface created by the Repository. It doesn't matter whether I'm using PostgreSQL or a simple in-memory data structure.

Repository Diagram

Repository, meet Sinatra

Following is a scaled-down example of how I implemented a Repository in a Sinatra application. It was a small internal application and we implemented the Repository pattern for a couple of reasons: 1) I wasn't sure what database I was going to use, and 2) I wanted to keep my tests fast by using in-memory objects.

The first thing I needed for my app was a User, so I created a MemoryRepository::UserRepository that allowed me to store and retrieve a new user for the purposes of testing and development. (Since I'm storing everything in memory all of the records are just User objects in a hash with unique ids.)

 1 module MemoryRepository
 2   class UserRepository
 3     def initialize
 4       @records = {}
 5       @id = 1
 6     end
 7  
 8     def model_class
 9       MemoryRepository::User
10     end
11  
12     def new(attributes = {})
13       model_class.new(attributes)
14     end
15  
16     def save(object)
17       object.id = @id
18       @records[@id] = object
19       @id += 1
20       return object
21     end
22  
23     def find_by_id(n)
24       @records[n.to_i]
25     end
26   end
27 end

There shouldn't be any surprises in these methods, except for maybe model_class, which is used again in the new method. My repository's primary responsibility is the storage of objects so it shouldn't really need to be concerned with the creation of those objects. However, there was a good chance we were going to end up using DataMapper or ActiveRecord so it wouldn't be unreasonable for someone to ask the datastore class to create a new object of the type it holds. So in order to make sure that there weren't any surprises for future devs the new and model_class methods delegate that request to the User object.

Next up is the Repository class. This is the class that my app talks to and allows it to be blissfully unaware of any datastore on the other side. Again, I'm just using a simple hash table and I can register any repository I want to use and then call it with for.

 1 class Repository
 2   def self.register(type, repo)
 3     repositories[type] = repo
 4   end
 5  
 6   def self.repositories
 7     @repositories ||= {}
 8   end
 9  
10   def self.for(type)      
11     repositories[type]
12   end
13 end

In order to tie the Repository and my UserRepository together I add the following lines to an environment.rb file (or a config file or just at the top of the main Sinatra app):

1 configure :test, :development do
2   Repository.register(:user, MemoryRepository::UserRepository.new)
3 end

Now to put it to use. Let's say I have a Sinatra action that takes a user to their home page and I want to grab that user from the database so I can say "Hello, #{@user.name}". Here's how I use my Repository to do that:

1 get "/user/:id" do
2   @user = Repository.for(:user).find_by_id(params[:id])
3   erb '/users/show'.to_sum
4 end

Exciting, right? Well, not really, at least not yet. The real action happens as soon as we decide to implement another data store for Users.

For production I ended up using DataMapper wired to PostgreSQL. Using RSpec's 'shared_examples' (which I've written about here) I was able to quickly build out my new DatamapperRepository::UserRepository based on the methods I'd already implemented in my in-memory UserRepository class.

 1 module DataMapperRepository
 2   class UserRepository
 3     def model_class
 4       DataMapperRepository::User
 5     end
 6  
 7     def new(attributes = {})
 8       model_class.new(attributes)
 9     end
10  
11     def save(object)
12       object.save
13       return object
14     end
15  
16     def find_by_id(n)
17       model_class.get(n)
18     end
19   end
20 end

As you can see, a few of the methods changed, but my main application code knows nothing about the data stores and won't change at all. It will continue to just call Repository.for(:user), and as long as every new UserRepository class implements the same methods then it doesn't matter what's on the other side.

The only lines of code I need to add to start using the new UserRepository in are back in my environment file:

1 configure :production do
2   Repository.register(:user, DatamapperRepository::UserRepository.new)
3 end

Mix & Match Databases

I won't go into detail, but the other real advantage of the Repository pattern is if I decided to use different types of databases. For example, consider if I used PostgreSQL for Users and then Redis for "Tasks". Would my main application care? Nope, it would just continue to call Repository.for(:user) and Repository.for(:task).

Conclusion and Other Resources

This was of course a very simple example of the pattern, but you can see more complex (and real-world) samples in a pair of 8th Light's open source projects. The repositories that we use for an internal project were split out into gems and you can look through those on Gihtub at the artisan-repository, artisan-memory-repository and artisan-ar-repository (for ActiveRecord). Several 8th Light craftsmen have also contributed to Hyperion, which provides a uniform API to multiple datasources using simple key-value pairs. (Hyperion even takes most of the leg work out of setting up the Repository pattern.)

Like any design pattern, it really only makes sense to implement the Repository pattern if you have a good reason. However, the ability to defer your database decision until you've had the chance to learn how you're going to use your data, and then painlessly change it if your needs change, seems like a very good reason.

Mike Ebert, Software Craftsman

Mike Ebert is in constant pursuit of the cleanest and simplest solution.