At last it seems that the dreaded DataSet is dead. There are many reasons why you should always think twice before choosing the DataSet as the core of your application architecture, I covered most of them a couple of years ago here. In my freelancing work I've found that none of my recent clients have used DataSets, preferring instead some kind of Domain Model or Active Record data access mechanism, with Active Record becoming by far the favorite. It's also worth noting that the terminology in most Microsoft shops calls the Active Record class a 'business object' or 'data object', almost nobody says 'Active Record'.
A core part of an Active Record based architecture is some kind of Data Access Layer that does Object Relational Mapping (ORM). Everyone writes their own one of these, and that's the main point of this post; you shouldn't need to do this. If you are like the majority of my clients, your application features thousands of lovingly hand crafted lines of code like this:
These hand written data access layers are the commonest source of bugs in most business applications. There are several reasons for this, the most obvious being that there are hand coded string literals representing stored procedure names, stored procedure parameter names and column names. People do use an enumeration instead of the string literal column names, mainly for performance reasons, but it doesn't stop the enumeration and the stored proc select statement's columns getting out of sync. There's also the overhead of matching up the SQL Server types to .net types and dealing with null values. But the worst offense of all is the tedium and the waste of time. Writing and maintaining these Data Access Layers is the programmer's equivalent of Dante's inner ring of hell, and you don't have to do it.
If you're like me, you resent any kind of repetitive coding that the computer could do just as easily, but much much faster and more accurately. At some time in your career you've probably written a little program that queries the database schema and generates your Active Record classes and Data Access Layer. Yes, I've done this twice, once back in the days of Visual Basic client server systems and more recently for .NET. The second attempt got quite sophisticated, handling object graphs and change tracking, but I never really got it to the stage of a real Data Access Framework, one that could look after all my persistence needs. I used it to generate the bulk of the code and then hand coded the tricky bits, such as complex queries and relationships. I'm not the only one who's been down this road, a whole army of very clever people, cleverer than you or me, have devoted large amounts of time to this problem, which is great because it means that you and me don't have to bother any more. These tools are now robust and mature enough that it's more risky to do it yourself than use one of them.
But how do you choose which one to use? There are two basic approaches, code generators and runtime ORM engines. Code generators, like mine, are the easiest to create, so there are more of them out there. Runtime ORM engines are a much trickier engineering problem but they will probably win in the end because they're easier to use for the end developer. Amongst the code generators, the ones I hear of the most are various Code Smith templates like net tiers, LLBLgen by Frans Bouma who's also a very active participant in community discussions around ORM, Subsonic which is attempting to be a Rails for .net, and Microsoft's very own Guidance Automation Toolkits. All are well regarded and you probably wouldn't go too far wrong with choosing any of them.
Among the runtime ORM engines, I hear NHibernate mentioned more than anything else. Hibernate is huge in the Java world so the NHibernate port has plenty of real world experience to fall back on. It's been used in a number of large scale projects and is the core of the Castle project's ActiveRecord rails-a-like data access solution. I haven't used it in anger, but my few experiments with it have been quite fun.
I haven't mentioned the elephant in the room yet, that's LINQ to SQL coming with .net 3.5. Microsoft have taken a long time to join the ORM party. A couple of years ago there was much talk of ObjectSpaces a Hibernate style ORM tool that never saw the light of day. LINQ is a very elegant attempt to integrate declarative query style syntax into imperative .net languages like C#. LINQ to SQL makes good use of it, especially, as you'd expect, with its query syntax. In other ways LINQ to SQL is very much a traditional ORM mapper along the lines of Hibernate, it features some code generation tools to create your Active Record objects, a runtime ORM mapper that creates SQL on the fly, identity management and lazy loading; all the features you'd expect. If you're starting a new project and you're prepared to risk using the beta of Visual Studio 2008 then I would choose LINQ to SQL in favor of any of the other alternatives, not least because it's what everyone will be using in a couple of years time.