Saturday, January 17, 2009

Eric Evans on Repositories

domain_driven_design

Domain Driven Design by Eric Evans is the bible of that school of software development. It’s one of the most influential books in the realm of enterprise application architecture. Evans brings a real clarity of purpose to both the analysis and implementation of business software. I read it back in 2004 when it was first published and, along with Martin Fowler’s ‘Patterns of Enterprise Application Architecture’, has probably had the most influence on the way I think about building business systems. I’m not alone, you really have to read it if you want to be taken seriously as an application architect.

There’s been a bit of a backlash recently against a tendency in DDD circles to treat ‘the blue book’ almost too literally as a bible, but that shouldn’t detract from what is a fantastic piece of work.

The term ‘repository’ as a way of encapsulating  object persistence is well defined in the book and Evan’s definition is often referred to when discussing the repository pattern. I thought it was worth re-reading the chapter on Repositories and summarising it here so that I have a baseline for any further discussions.

Part II of the book describes a way of modelling user domains using object oriented programming. The technique is to describe your model in terms of entities and value types grouped together in aggregates.  In simple terms an aggregate is an object graph that has a lifecycle determined by the root entity. For example, an aggregate might have a root of customer with related orders, order-lines, address etc. An order does not have an existence separate from a customer. If the customer was deleted you would expect the rest of the graph; orders, order-lines etc; to be deleted as well. A product on the other hand, while it has a relationship with an order-line, also has a life cycle independent of that order-line.

Repositories are responsible for persisting entities and value types. They are described in their own section in chapter 6 and are said to have the following advantages:

  • “They present clients with a simple model for obtaining persistent objects and managing their life cycle.”
  • “The decouple application and domain design from persistence technology, multiple database strategies, or even multiple data sources.”
  • “They communicate design decisions about object access.”
  • “They allow easy substitution of a dummy implementation, for use in testing (typically using an in-memory collection)”

Thus the core purpose of the repository is to encapsulate persistence. The client should appear to be simply using an entity collection and all the details of object relational mapping and specific data access APIs should be hidden behind that collection like interface. Repositories should only be provided for aggregate roots:

“For each type of object that needs global access, create an object that can provide the illusion of an in-memory collection of all objects of that type. Set up access through a well-known global interface. Provide methods to add and remove objects, which will encapsulate the actual insertion of removal of data in the data store. Provide methods that select objects based on some criteria and return fully instantiated objects or collections of objects whose attribute values meet the criteria, thereby encapsulating the actual storage and query technology. Provide repositories only for aggregate roots that actually need direct access. Keep the client focused on the model, delegating all object storage and access to the Repositories.”

Transactions should not be a concern of the repository. He suggests that the client should handle them: “Leave transaction control to the client”. Interestingly, Evans does not mention the Unit of Work pattern in the repository discussion although it’s implied in the section on transactions.

Entity creation should not be the concern of the repository. Keep the concept of ‘factory’ and ‘repository’ distinct, although, in theory, a repository might use a factory internally.

In chapter 9, Evans describes ‘specifications’ as a way of encapsulating queries as part of the domain model. Much of the detail is concerned with providing a single interface for both in-memory (Java in his case) and repository level (SQL) querying. The core point is that specification definition is a domain concern and is best decoupled from the repository, although earlier he does say that in simple situations it might make sense to have methods on the repository to encapsulate queries.

There is also an excellent discussion of the compromises that may have to be made to co-ordinate the object and relational schemas.

So the core message is that repositories are a collection like interface that encapsulate persistence and that queries should encapsulated in the domain as specifications.

Re-reading the section on repositories and thinking about my own use of the term ‘repository’ in software I’ve been building recently tells me that I’m mostly in agreement with Evans. I think I’ve neglected to enforce the aggregate root rule, my repositories will persist any entity in my domain. In practice I don’t have well defined aggregates in my domain and that’s something I should improve. I’ve also allowed unit of work concerns to leak into my repositories. That’s something I’m keen to correct. As for the debate about exposing IQueryable’1, Evans doesn’t have much to say. Obviously, Java doesn’t have anything similar to LINQ so it wouldn’t be an option in any case, but the emphasis on treating the repository as a domain collection does fit quite nicely with the pattern of having specifications implemented as extension methods of a repository that implements IQueryable’1.

8 comments:

Colin Jack said...

Hi,

Yeah on the four points you give its worth considering how many generic repositories fulfill. Simple, guess so because queries move elsehwere. Decouple from persistence to allow multiple DB's data sources, not so much. Communicate design decisions, less than concrete. Allow easy substitution, yes...but only if you can substitute all queries.

UOW is actually awkward as far as DDD, something that's been discussed in ALT.NET forum recently and that I've blogged vaguely about. Essentially you want to ensure all invariants for an aggregate are met before persisting the aggregate and UOW's don't normally aim for that usage model.

"Keep the concept of ‘factory’ and ‘repository’ distinct, although, in theory, a repository might use a factory internally." -> Guess it depends what you mean but normally repositories wouldn't create objects at all.

"The core point is that specification definition is a domain concern and is best decoupled from the repository, although earlier he does say that in simple situations it might make sense to have methods on the repository to encapsulate queries." -> You can do both, have the names methods and then have them use specifications if that helps (which is what I often prefer). Specifications written using Linq are useful tho.

"That’s something I’m keen to correct. As for the debate about exposing IQueryable’1, Evans doesn’t have much to say. " -> There has been lots of discussions on DDD forum. To me repositories are about aggregates, so returning IQueryable is not in keeping.

Ta,

Colin

Mike Hadlow said...

Hi Colin,

Thanks for another excellent comment; this series of posts is turning into 'Colin Jack educates Mike Hadlow about DDD'. Sorry I'm such an argumentative and poor student :)

"... on the four points ..." Just because you have a generic repository interface, doesn't mean that you can't have diverse implementations. It's certainly possible to use diverse data sources. Of course the design of the generic repository imposes certain expectations about the data provider. Returning IQueryable'1 means that it must have a LINQ provider for example. Substituting generic repositories for testing has worked well for me. I simply return an in-memory object graph from the IQueryable'1 method. The specification extension methods then work with LINQ-to-Objects automatically, there's no need to substitute them.

"UOW is actually awkward as far as DDD..." you've lost me there. Do you have a pointer to a fuller explanation?

"Keep the concept of 'factory' and 'repository' distinct.. " Yes, absolutely. I was just paraphrasing the book. Evans suggests that a repository might 'create' (reconstitute from the DB) objects using the same factory that the client uses for creating them, but of course using an ORM like NHibernate means that you would never do that.

"The core point..." Yes, I guess that's what Evans says. I didn't phrase it very well, he is quite happy to mix and match repository methods and specifications on a repository. My thoughts are that the only thing that differentiates repositories are their queries. If you wrap all your queries up as specifications then surely what you are left with is a generic repository?

"That's something I'm keen to correct... " Yes, but we're talking about IQuerayble'1 where 1 is the aggregate root. It's still a query on the aggregate.

Mike Hadlow said...

Greg Young has an excellent post arguing against generic repositories here: http://codebetter.com/blogs/gregyoung/archive/2009/01/16/ddd-the-generic-repository.aspx

Bryan Watts said...

One of the points that has been danced around is that IQueryable`1 is a framework-level abstraction. With no other persistence-related decisions, you can confidently use it for queries where there would otherwise be nothing. IRepository`1 is the natural other half of the story, the persistent aspect of the abstraction.

This gels with DDD, as it removes technical concerns from the domain; the focus is then on which queries to issue and when to make changes.

That statement is predicated upon some assumptions I'd like to see you discuss (@Mike and @Colin) about who consumes repositories.

I venture to say domain services are the target consumers; they provide operations in the Ubiquitous Language on top of the query/persistence elements. As such, generalization is very very good, instead of YAGNI.

Thoughts?

Mike Hadlow said...

Bryan, that's also my view, but it's a minority one. Most of the comments I've read about returning IQueryable'1 from a repository are against the idea. They say it allows query logic to leak into other parts of the application and means that the execution of the query can take place at some unspecified time.

As for who consumes repositories, there's an argument that it should only be domain services. I'm a bit more relaxed and have controllers consume repositories, but I can see that at certain scales enforcing the former rule might make sense.

I also think that much of the argument is about terminology. One man's domain service is another mans repository. What to me is a generic repository consumed by domain services, is to others a repository abstraction consumed by repositories.

Daniel Fernandes said...

Something to bear in mind the discussion of IQueryable being part of the framework and therefore should be used to provide querying logic is beside the point.

What should matter is that a the Repository Pattern is a technique/practice which is abstract on his own which requires adherence to some principles. Those principles, such as SOLID, are not related to what's technically possible but by what constitutes good design and some say good design is one that is easy to change but some might think differently hence this discussion :)

Mike Boldischar said...

I'm reading through Domain Driven Design. One thing that bugs me is the idea of a "value" type. Why not just call those objects immutable entities? Maybe it's just a personal preference, but in my opinion, building "values" into the design adds little value. They only encapsulate a subset of entity values. Why not just specify immutable entities? Any thoughts?

Mike Hadlow said...

Mike,

It's more about identity than immutability. Entities have an independent identity. You can ask for an individual entity by it's ID. value types only exist in terms of their parent entities and have no separate identity. Think of a line (Entity) made up of some points (Value), or a contact (Entity) that has an address (Value).

A common anti-pattern is to have entities with large numbers of properties with basic types (int, string). Often they map 1-to-1 to a database table. Individual or groups of basic types usually have some meaning in terms of the business and should be factored into to value types.

Take a contact for example:

public class Contact
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Address1 { get; set; }
public string Address2 { get; set; }
public string City { get; set; }
public string Postcode { get; set; }
}

It's an entity (has an Id) that has basic type properties that map directly to table columns. It might make more sense as:

public class Contact
{
public int Id { get; set; }
public Name Name { get; set; }
public Address Address { get; set; }
}

public class Name
{
public string FirstName { get; set; }
public string LastName { get; set; }
}

public class Address
{
public string Address1 { get; set; }
public string Address2 { get; set; }
public string City { get; set; }
public string Postcode { get; set; }
}

Here, Name and Address are value types, they don't have an Id. Why is this good? Because we're decoupling stuff to do with names from stuff to do with addresses. We can reuse name and address in other entities, but they make no sense and independent things. Note that the database table wouldn't change, we're breaking the 1-to-1 class-to-table link that's often the naive starting point for most object-relational mapping.

One problem I have with Evan's book is that it's long on theory and short on example and I think the Entity/Value distinction is a great example of this. He explains the difference in the abstract at great length, but without enough concrete examples to really cement what he's saying.