Tuesday, October 02, 2007

Why are you still hand coding your data access layer?

At last it seems that the dreaded DataSet is dead. There are many reasons why you should always think twice before choosing the DataSet as the core of your application architecture, I covered most of them a couple of years ago here. In my freelancing work I've found that none of my recent clients have used DataSets, preferring instead some kind of Domain Model or Active Record data access mechanism, with Active Record becoming by far the favorite. It's also worth noting that the terminology in most Microsoft shops calls the Active Record class a 'business object' or 'data object', almost nobody says 'Active Record'.

A core part of an Active Record based architecture is some kind of Data Access Layer that does Object Relational Mapping (ORM). Everyone writes their own one of these, and that's the main point of this post; you shouldn't need to do this. If you are like the majority of my clients, your application features thousands of lovingly hand crafted lines of code like this:


These hand written data access layers are the commonest source of bugs in most business applications. There are several reasons for this, the most obvious being that there are hand coded string literals representing stored procedure names, stored procedure parameter names and column names. People do use an  enumeration instead of the string literal column names, mainly for performance reasons, but it doesn't stop the enumeration and the stored proc select statement's columns getting out of sync. There's also the overhead of matching up the SQL Server types to .net types and dealing with null values.  But the worst offense of all is the tedium and the waste of time. Writing and maintaining these Data Access Layers is the programmer's equivalent of Dante's inner ring of hell, and you don't have to do it.

If you're like me, you resent any kind of repetitive coding that the computer could do just as easily, but much much faster and more accurately. At some time in your career you've probably written a little program that queries the database schema and generates your Active Record classes and Data Access Layer. Yes, I've done this twice, once back in the days of Visual Basic client server systems and more recently for .NET. The second attempt got quite sophisticated, handling object graphs and change tracking, but I never really got it to the stage of a real Data Access Framework, one that could look after all my persistence needs. I used it to generate the bulk of the code and then hand coded the tricky bits, such as complex queries and relationships. I'm not the only one who's been  down this road, a whole army of very clever people, cleverer than you or me, have devoted large amounts of time to this problem, which is great because it means that you and me don't have to bother any more. These tools are now robust and mature enough that it's more risky to do it yourself than use one of them.

But how  do you choose which one to use? There are two basic approaches, code generators and runtime ORM engines. Code generators, like mine, are the easiest to create, so there are more of them out there. Runtime ORM engines are a much trickier engineering problem but they will probably win in the end because they're easier to use for the end developer. Amongst the code generators, the ones I hear of the most are various Code Smith templates like net tiers, LLBLgen by Frans Bouma who's also a very active participant in community discussions around ORM, Subsonic which is attempting to be a Rails for .net, and Microsoft's very own Guidance Automation Toolkits. All are well regarded and you probably wouldn't go too far wrong with choosing any of them.

Among the runtime ORM engines, I hear NHibernate mentioned more than anything else. Hibernate is huge in the Java world so the NHibernate port has plenty of real world experience to fall back on. It's been used in a number of large scale projects and is the core of the Castle project's ActiveRecord rails-a-like data access solution. I haven't used it in anger, but my few experiments with it have been quite fun.

I haven't mentioned the elephant in the room yet, that's LINQ to SQL coming with .net 3.5. Microsoft have taken a long time to join the ORM party. A couple of years ago there was much talk of ObjectSpaces a Hibernate style ORM tool that never saw the light of  day. LINQ is a very elegant attempt to integrate declarative query style syntax into imperative .net languages like C#. LINQ to SQL makes good use of it, especially, as you'd expect, with its query syntax. In other ways LINQ to SQL is very much a traditional ORM mapper along the lines of Hibernate, it features some code generation tools to create your Active Record objects, a runtime ORM mapper that creates SQL on the fly, identity management and lazy loading; all the features you'd expect. If you're starting a new project and you're prepared to risk using the beta of Visual Studio 2008 then I would choose LINQ to SQL in favor of any of the other alternatives, not least because it's what everyone will be using in a couple of years time.


Adam V. said...

The link to the .png isn't working (I'm getting "forbidden".)

Mike Hadlow said...

Hi Adam. Thanks, it seems to have happened since I changed my blog to have its own domain name 'coderant.net'. Now all the images don't work :(

Mike Hadlow said...

The answer for now is to abandon the custom domain and move back to mikehadlow.blogspot.com. Shame but at least the images are back.

Slevdi said...

There are *too many* ORMs out there. I want to choose the easiset one to use for now, but be able to swap in different ones at runtime. E.g. I might develop using fakes or mocks, do integration tests with NHibernate or EasyObjects, then next year decide I need the MS Entity Framework when it arrives, LLBGEN Pro or even the awsome Deklarit for some/all of my live projects.

Thus, I need an ORM Provider Strategy. Any ideas on where to start?

Mike Hadlow said...

Slevdi, that's just good architecture. You should always architect your apps in a layered fashion seperating out the different concerns. I'm a big fan removing dependencies with the Inversion of Control pattern and then binding my application with an IoC container. If you've done all that it should be pretty simple to swap in different ORM strategies. If you're thinking about using a provider pattern in your apps be sure to check this out first: Castle Project's Windsor Container; it's awesome.

As for there being too many ORMs out there, I think that's the symptom of a healthy interest out there in solving this problem as well as the fact that Microsoft has been so tardy in joining the ORM party. But not for much longer. I think the impact of Linq to SQL will be pretty heavy for all the ORM vendors as it will become the default choice for pretty much every Microsoft shop.