Wednesday, September 13, 2006

How to do database source control and builds

Most business applications are fundamentally database applications. A business application's database is as much a part of the application as its C# source code. Unfortunately, in many development shops versioning and controlling the database is done very differently from the in memory source code. For example:

  • Often a single 'development' database is shared between the developers. This means that if a developer makes a schema change he runs the risk of breaking his colleague's environments at least until the next build. Often this leads to a fear of changing the database schema.
  • The database is often versioned independently of the in memory source code. This makes is hard to deploy a specific build because often the database backups and the in memory source are are not syncronised.
  • The database is often not source controlled, or if it is, it is done as a single build mega-script. This makes it impossible to package linked database and in memory source changes as a single 'feature' or 'fix'. It makes it hard or impossible to roll out a single failing feature. Also, because the mega-script is often created by choosing the 'script database' feature of the dbms it is checked into source control under the build manger's login or whatever login the build scripts are running under. This means that it is impossible to track database changes. I can't go to a single table's create script in source safe, look for the last time it was checked in and find out who made the change and then read the comments to found out why it was changed. The best I can mange is to look through every single version of a huge script file looking for a change, but then there's no way of knowing who made it or why it was made.

How should you manage your database development then? Well the best thing is to treat it as much as you can just like all your other source code:

  • Every developer should develop against their own database instance.
  • A database backup should be stored with the executables from the build. That database backup should be a backup of a database built from the sql scripts retrieved from source control at the same label as all the other source.
  • The database should be maintained as object scripts. Each table, stored procedure, view and function should have its own script. To make any change in their local database a developer should check out the object they want to change, apply that to their database, unit test the change and then check in the change as a package with any other source for that feature or fix. The developer should label the package and comment it with a link to a feature or defect number.
  • Before a developer starts work on the next feature or fix he should get the source from the last build label (which, with continuous integration should also be the latest version) and restore his local database from the same labelled backup.

In order to do this you need tools that allow you to do the following:

  • Build a database from object level source files. Most dbms will choke if you try to just run the scripts in without first working out a build order by examining the object references. The tool must automate this for you, it's far to onerous to try to do this manually.
  • Be able to upgrade a database to a new schema version by calculating 'alter' scripts.

Here's where I plug a product that a couple of guys I know and worked with at NTL have developed, DbGhost. It allows you to adopt all those good practices that I've listed above. I've got it adopted on several projects now and they still haven't given me any commission or offered me a lucrative consultancy contract:(


secretGeek said...

Hey! i've just been writing a blog post titled "Tools for: Version Control of Stored Procedures, DB Schema etc."

-- have you been watching over my shoulder or something??

Unknown said...

Hi Mike,

Thanks for the plug! Send me an email and we'll forward you some "commission" ;)



hometoast said...

Thanks for this (and your link in the comment over at coding horror) - I've been looking to get our place to do this. We're mainly on informix here (sigh) and a lot of our database is 'config data' which needs to be versioned too. Thanks for the jumping off point.

melenik said...

As far as I understand DbGhost work only with SQL Server. May you recommend tools like that for Oracle database?

Mike Hadlow said...

Hi mnick,

I don't know any equivalent tools for Oracle, and I'm afraid that I know almost nothing about the Oracle development world, so I can't even recommend where you could ask that question.


Anonymous said...

Working on "isolated", small databases has its drawbacks too:

1) Code/queries that works perfectly with the developer's 10.000 rows tables doesn't with the test database 10^9 rows tables.

2) Code/queries that works perfectly when only *one* user is connected to the database doesn't when 1000 users are connected (or just two, sometimes...)

3) Development must take into account the production configuration - for example if the production DB is partitioned/clustered/etc. and the development one is not due to lack of hardware resources, again issue could surface only in the test phase (hoping the test database mirrors the production one too...)

Cathing this issues in the test phase only could be too late, and require lots of changes.
Databases are a "shared resource" by definition - be careful when working in isolation.

Mike Hadlow said...

Hi Anonymous,

Of course you need to do integration and performance testing. But that shouldn't be done by individual developers but on a copy of the production environment, with full data and client load. I don't think developing with individual databases negates that.

Unknown said...

I pretty much recommend the same. Just one question, why do you recommend one script per object and not one script with all objects?

Mike Hadlow said...

K Oyedeji,

In the same way that having all your C# code in one huge file would be very hard to maintain, having all your database objects in one huge script is very hard to manage under source-control.

With tools that can understand object dependencies and automatically execute scripts in the correct order, there is really no requirement for a single database creation script.

Unknown said...

Thanks for your response Mike - any such tools you would recommend? Thanks

Mike Hadlow said...

K Oyedeji,

As I mentioned in the post, I know the guys who make Db Ghost which I've used on several projects and would recommend for managing database builds.

It's also worth checking out the tools from Redgate and Microsoft's own Visual Studio Database Edition.

Nathan Rozentals said...

Hey Mike,
For those who can't cough up the dollars for Visual Studio for Database Professionals, I've released an open source tool.
Let me know what you think.
- nathan

Mike Hadlow said...

Thanks for that Nathan. Looks very interesting. I'll try and put aside some time to check it out.

Anonymous said...

Hi Mike,

Thanks for the post, great as always. Very helpful, trying to get my head around making my DB's first class citizens and hopefully will convince my boss that its a good idea and worth the time / effort in setting it up.

At present we all work on the same DB for testing and it is not version controlled at all, (at least not to the best of my knowledge).

We are a small shop (4 people) and they feel that it is more overhead that it is worth, same for TDD. I feel differently but do not have the practice experience to back it up.

Mike Hadlow said...

Hi Jon, I keep the Suteki Shop DB creation scripts under source control, and that's for a team of (mostly) one. Good luck with bringing your team into the 21st century :)

Andreas Kleffel said...

Funny, this post is from 2006 and now 10 years later the practices described here are NOT common sense. I especially like the term "single build mega-script" - I know a lot of companies (some of which I worked for) that still work like this - and I think it's still the majority!

I wonder (as I am also developing a tool for db version control): are tools too expensive or too complex so that we don't see a 90% adoption rate?