Monday, January 12, 2009

Should my repository expose IQueryable?

This post is an attempt to describe an interesting point of difference about the way a generic repository can be implemented. I'm writing it to lay out my argument that exposing IQueryable on a generic repository is a good thing. I know a lot of people disagree, so I'm hoping I can spark a debate and win it learn why I'm wrong.

I talked about the generic repository pattern a while back. My IRepository interface looks something like this:

public interface IRepository<T> where T : class
{
    T GetById(int id);
    IQueryable<T> GetAll();
    void InsertOnSubmit(T entity);
    void DeleteOnSubmit(T entity);
    void SubmitChanges();
}

It assumes that the underlying data access model is based on Unit of Work, but with that caveat, it will happily wrap any data access technology that supports a LINQ provider. I've used it successfully with both LINQ to SQL and NHibernate. I've found returning an IQueryable<T> very useful in my applications, there's a nice pattern where I can chain extension methods:

var articles = articleRepository
    .GetAll()
    .ThatMatch(criteria)
    .ToPagedList(pageNumber, pageSize);

However, returning IQueryable<T> is controversial. Quite a few people I respect think that it allows data access concerns to leak into the wider application. Since the way IQueryable<T> is resolved depends on the LINQ provider, some part of my application may fail at runtime. What may work with LINQ to Objects in my unit tests may fail with LINQ to SQL or NHibernate.Linq because not every expression can be parsed into SQL. Also, we are allowing what can be passed to the database to be specified at any point in our application rather than containing that specification within our repository. That makes it possible to write articleRepository.GetAll().AsEnumerable() when we might have millions of articles in the database. Not good.

But in our brave new ORM world, hasn't the horse already bolted?

We are happy to use NHibernate or LINQ to SQL to track changes to our domain entities, and we trust these tools to write the updates correctly back to the database. We also rely on them to lazy load our domain object graph as required. Sure it means we can have an inefficient conversation with the database if we're not careful (the N+1 problem for example), but those are minor concerns compared with the tremendous benefits of ORMs. Behind the scenes the ORM is fiddling with our POCO domain entities so they're not really POCO any more, but we accept that.

Allowing instances of IQueryable<T> to escape from the repository is a similar case. We're saying, "I don't care when the query execution takes place, or what the query looks like, just give me the correct collection of entities". We are giving up control of what queries can be sent to our database, but does this really matter? The benefits are huge if we can live with this loss of control. It's real persistence ignorance. Rather than being concerned about exactly what queries are being sent down the wire we are simply filtering collections in a natural style and leaving it to our infrastructure to work out how to give us what we want.

We can successively filter collections knowing that only the entities we need will be loaded. In the above example I can have query specifications that are specific to that entity (ThatMatch) and others that work on any collection (ToPagedList), mix and match as needed, and only load a single page of entities from the database or whatever backing store we've configured.

As an aside, LINQ to SQL has a really nice (but nasty at the same time) feature that allows you to do this:

var orders = customer.Orders.Where(order => order.Price > 10.0);

It will only lazy load the orders where the price is greater than 10. You can't do this with NHibernate. The nasty part is that you have to implement your entity collection properties as EntitySet (which implements IQueryable) and that really is letting a data access concern leak into the application. But there's no reason why a proxy implementation couldn't do the same thing, so long as we define our collections as IQueryable<T>.

So let us embrace IQueryable<T>. Do not fear lazy evaluation.

35 comments:

Michael Wagg said...

Whenever this argument comes up I always think it might be a question of the scale of the application or the size of the team working on it.

On a small/medium project with a small enough team I think this approach works well. But as the codebase or the team grows I think this approach starts to break down.

My reason for not liking it is not the lazy evaluation, but the fact that you can write your query logic all over your app. My main issue with it is sooner or later your going to have two queries in two different places that are supposed to be doing the same thing but are actually different either because they were written different in the first place or because one of them has been refactored or bug fixed.

On a large app I like to be able to say all queries are in the repositories so it is easy to discover existing ones and obvious were new ones can be added. This does require some discipline of course so you don't end up with lots of similar queries.

Ben Taylor said...

Fortunately, I was going to write what Michael Wagg wrote. Now I don't have to. Nice. +1 for what he said ;)

Mike Hadlow said...

Michael, good point, but isn't that the same for any piece of code, query or not. Don't repeat yourself is always the rule. Using IQueryable<T> doesn't preclude factoring queries. I personally like to encapsulate them as extension methods and I would submit that this method makes it easier, not harder, to wrap up common queries because they can be typed to interfaces as well as concrete entity types:

IQueryable<T> InOrder(this IQueryable<T> items) where T : IOrderable

Allows me to write a query in terms of IOrderable and then apply it to any entity that implements that interface.

You can have a well known place in your application where these extension methods go, so they are easy to find.

Michael Wagg said...

I think you're right it does open up a lot of new ways of organising it.

I definately agree on the single well known place for queries, I guess its then just a question of communicating that to new devs. In my opinion the repository is a good place for that because its an obvious place.

But just to be clear I don't personally stick with one or the other religiously, its a judgement call on each project.

I guess my point is that its not lazy execution that bothers me about this approach, its the potential for abuse when an inexperienced dev is under pressure.

Mike Hadlow said...

Michael, sorry, I wince when I hear the, "new devs will mess it up" argument. Surely pair programming, code reviews and tools like NDepends are the solution, rather than rejecting particular architectural practices or frameworks?

Stu Campbell said...

Another +1 for Michael's view.

Although I really love the flexibility offered by IQueryable of T, it does require discipline to avoid query logic leaking into other parts of app.

In an MVC web app I think it is sometimes permissable to expose a Repository to a controller (instead of going through a separate App Service layer). If that Repository exposes IQueryable of T then I'm not as comfortable doing so because the temptation is too great to avoid further manipulation of the expression tree - especially if the developer is in a rush.

But you can't deny the sheer sexiness of IQueryable of T + Specifications. :)

Michael Wagg said...

Yeah I agree with you and in fact I'd like to reteract the inexperienced dev argument :-)

But I still do think both approaches have a place. In the right context having all query logic behind a repository makes sense. Having said that I would say the number of systems I've worked on that justified that approach can be counted on one hand.

Mike Hadlow said...

Stu, yes, you can not expose IQueryable<T> *if* you are worried about the expression tree being modified outside of the repository.

But you've stated it as if it's obviously *a bad thing*. I'd like to explore that. What are the arguments against modifying the expression tree outside the repository? Why is it bad?

Stu Campbell said...

Let me just say that I've got two projects at the moment one of which exposes IQueryable and one which doesn't. And I like both of them :)

In the one that exposes IQueryable, writing and testing specifications is a joy. I just have to be disciplined - which as Michael said, is not really a problem in a small team.

Anyway, my discomfort isn't with having the tree-modifying code outside of the Repository - after all, my Specs live outside the repository. It's with the code being written in the controllers because that breaks SoC.

But the more I think about it the more I agree that we'd be throwing the baby out with the bath water to dismiss the design just because it can be subverted. As you say there are tools and processes to help us there.

Colin Jack said...

I'm with Michael's original point.

If you've got a small/simple domain then I think the generic repository + IQueryable approach maybe makes sense. To be honest though having used this approach I'm not a fan at all, don't see any real advantages over the original repositories and specifications approach.

On top of this for more complex projects I'd certainly stay away from these patterns.

Mike Hadlow said...

Colin, Can you elaborate? Isn't it simply an issue of appropriate factoring? For example accessing the generic repository directly in a controller vs insulating it in a domain service? Surely even if you go with your original repository + specification pattern there's probably some generic abstraction within it.

Do you consider the use of IQueryable<T> a data access concern only?

jnystrom said...

If I am not using LINQ, should I add methods on the Repository to get the exact data I want (like GetNewOrders)? I can still use the service layer as mainly a proxy for the repo.

Daniel Fernandes said...

Indeed Michael is right, any design decision should be dependent on resources and that includes time for delivery.
In a small project with a straight forward Domain Model then IQueryable is just fine to be exposed directly and indeed classes can be used to compose for instance complex predicates to be sent to a IQeryable[[T]] IRepository[[T]].All(Predicate[[T]] pred).
The situation comes to a point though where either the Domain Model is more complex or the project is quite large and there is a need to be more disciplined.
In addition to possibly unnecessary duplication of querying logic by directly using IQueryable the other danger is that nothing stops a developer to do a IQueryable[[T]].AsEnumerable() anywhere. Some may argue that then it's the fault of the developer and not of the technology...

Mike Hadlow said...

Daniel,

If you're going to return IQueryable'1 there's no need to pass a predicate to your 'All' method. IQueryable gives you access to the expression.

Why does allowing IQueryable'1 to be returned make code duplication any more likely? I can just as easily write duplicate logic inside a specification. It's the "new devs won't understand it" argument again.

Writing .AsEnumerable() is the same as passing a specification that just says 'get everything'. How is that safer?

Daniel Fernandes said...

@Mike

Forget the All(Predicate`1 pred), but my point remains as noted by Michael.

Re AsEnumerable():
I'm just saying that there are maybe situations it feels it is necessary to do AsEnumerable() because you have to check calculated properties and this won't work as translated into a Linq
database. A specialized Query object can deal with that an issue optimized database queries.

I will have to agree though that Linq is great and it would be fantastic if we didn't have to deal with blooming relational databases.

jnystrom said...

Any ideas when you cant use LINQ? I was wondering about the Repositories and putting methods on them that would do filtering in the DB, ideas? Not sure if this breaks the repository pattern idea

Think Before Coding said...

I'm not fond of generic repositories. But it's still possible to return IQueryable from properties or methods on a specific repository.

IMHO it's better to encapsulate the business selection logic in the repository methods, but enable code that use the result to order or page it.

Ordering and paging are presentation problems that usually polute domain model implementations. Linq IQueryable is a powerfull way to make it transparent AND enficient.

jnystrom said...

I understand in a perfect world we would not want the filtering to happen in the repo, but with not being able to use LINQ I dont want to get all data and load it up in mem then filter. I want to do the filtering in the db at that point, right?

jnystrom said...

For instance, if I only need uncomplete orders, I dont want to get every order in the system first, then filter in code. Since I dont have LINQ, I dont have the delayed execution, I would need to get the data in the repo. Why not just let the db filter at that point, if I do NOT have the ability to use LINQ?

Mike Hadlow said...

jnystrom, Think Before Coding,

The paging and filtering example is a good one. I think it's a great reason *for* returning IQueryable'1 from the repository. You can separate UI concerns from the domain model and still have efficient fetching from the DB.

But the it doesn't stop there. After embracing IQueryable'1 I've discovered one nice refactoring after another.

I think part of the disagreements were having are about semantics. I'm starting to agree with the DDD guys that 'repository' is probably the wrong name a generic persistence store. How about IPersist'1 instead? If we accept that 'repository' is an invention of DDD gurus like Fowler and Evans, then we should try to keep to its original meaning.

I'm now thinking I would keep IRepository'1/IPersist'1 but use it internally in a repository, especially in larger applications. Do we return IQueryable'1 from these repositories? Well the paging/sorting argument says we should.

jnystrom said...

I completely agree that in a perfect world I would use IQueryable, but if I am not using LINQ, how can I use IQueryable without loading a TON of data into memory when calling the repository methods?

Mike Hadlow said...

jnystrom, sorry I didn't reply to that particular point before. You are right, you most definately should not return IQueryable'1 from your repository if your data access technology does not support LINQ. A different specification based pattern would probably be more appropriate. What are you using for data access?

Colin Jack said...

"I think part of the disagreements were having are about semantics. I'm starting to agree with the DDD guys that 'repository' is probably the wrong name a generic persistence store. How about IPersist'1 instead? If we accept that 'repository' is an invention of DDD gurus like Fowler and Evans, then we should try to keep to its original meaning."

Yeah I like that way of looking at it, what we're moving back to is a domain focussed DAO approach (with slightly different constraints).

It might have advantages in some situations but I'm not sure I view it as a repository.



"I'm now thinking I would keep IRepository'1/IPersist'1 but use it internally in a repository, especially in larger applications. Do we return IQueryable'1 from these repositories? Well the paging/sorting argument says we should."

So yeah thats what I've been doing for a couple of years, basically. I have concrete domain focussed repositories like ICustomerRepository that have specific methods like GetActiveAccounts (or that take specifications). They entirely encapsulate persistence and return aggregate routes or aggregated values.

Internally these repositories used one or more repository helper classes, which did most of the heavy lifting.

This was pre-linq and paging/sorting was handled by the caller creating an object (kind like a specification) that specified how many results to go on and how to sort them and so on.

The solution was very DRY and, I think, worked well. Never blogged about it though, not sure why. I did mention it on DDD forum recently though (http://tech.groups.yahoo.com/group/domaindrivendesign/message/9053) and Tobin did a good job of writing it up.

Mike Hadlow said...

Colin, Thanks for pointing to that that discussion on the DDD list, it was very useful.

A blog post with some concrete examples would be excellent. I'd very much like to see an actual implementation of the patterns you are talking about. Do you have an open source project that I could look at?

I also like your point about segregating the interface to pull out IDelete, IPersist. That could play very nicely in an DI scenario.

This debate has really moved my ideas about repository forward. I'm still not convinced that returning IQueryable'1 is a bad idea, but at least I'm far more aware of the arguments against it.

Michael Wagg said...

I agree the discussion around this post has been great.

I'll be at the OpenSource .Net Exchange at Skillsmatter so hopefully I'll get a chance to chat to you then.

jnystrom said...

Thanks Mike, I am thinking of just using ADO.NET directly. Any advice with Data Access with .NET (using MONO) and MySQL would be appreciated. I read LINQ to Entities might work...not sure?

Colin Jack said...

"A blog post with some concrete examples would be excellent. I'd very much like to see an actual implementation of the patterns you are talking about. Do you have an open source project that I could look at?"

I could definitely blog about it but I never open-sourced it, actually I couldn't open source it because it was for a company but it wouldn't be hard to do cause it's just a few classes talking to NHibenate but I just never got around to it.

Ben said...

Interesting way to think about it. I'm currently playing with doing this myself.

David Nelson said...

I was going to comment that a distinction needs to be made between returning IQueryable from the data access layer and returning IQueryable from the business layer. But it seems that conclusion was eventually reached without my help :)

I think it also needs to be pointed out that IQueryable != direct data access; at least, not necessarily. All IQueryable does is take expression trees instead of functions in its implementation of the standard query operators. This allows the query to be processed in whatever way you like. So, for example, you could write an IQueryable implementation which enforces that any query on an entity set has a predicate which results in the query being indexable (see the Facebook API at http://wiki.developers.facebook.com/index.php/FQL#The_Query_Language for an example of this approach, although it is obviously not .NET or LINQ based). This can be a very powerful mechanism for "having your cake and eating it too."

Mike Hadlow said...

David,

I agree. One way of seeing this discussion is as the collision of functional and object oriented program paradigms. Much of the attraction of functional programming, from which LINQ takes most of its ideas, is that execution can be defered and optimised at runtime. This doesn't always sit well with many of the hard won principles of object oriented design. On the one hand we don't want to insert artificial boundaries for optimisation by saying, for instance, that querying should not be specified outside of a repository, because that removes our tools ability to do optimisation. On the other hand, with our OO hats on, we want to cleanly separate our concerns.

Tim Acheson said...

I would certainly reccomend exposing an IQueryable interface! It's so versatile. This is surely the future of web services and APIs.

Serge said...

The keyword here is "LINQ-Capable", as this will not work for anything other than that. Relying on LINQ is a good thing until something better comes along. Besides, i've stumbled upon a few good DAL architectures such as NetTiers which provide custom querying in a very easy and extensible way without relying on LINQ expressions. But besides that point, arguing whether to use IQueryable or not is like arguing whether to use Helper classes or not. You do create a leaky abstraction this way, but it will work as long as its under control and doesn't overflow your repositories. Kind of like Helper classes which make sense as long as they are controlled.

Bealer said...

We expose IQueryable through our generic ISession, which restricts usage a little.

The best thing is that it allows us to easily unit test complex queries.

We can just mock our ISession.QueryFor, to return a new List.AsQueryable().

You can then simply assert the collection having been run through the query returns what you expect.

James Morcom said...

I've had a thought on this. What if we instead exposed a special enumerable type from our repository (someone will come up with a better name I'm sure):

IRepositoryEnumerable : IEnumerable

Then we could implement our own extension methods specifically for the things we want our application to be able to do at it's end. I.e.

OrderBy(this IRepositoryEnumerable, ...)

AsPaged(this IRepositoryEnumerable, ...)

so we can make sure the implementation of those specific methods is efficient without giving the application access to potentially "dangerous" Queryable extensions.

James Morcom said...

N.b. the above should have <T> generic parameters but Blogger stripped them thinking they were HTML tags.