Wednesday, October 04, 2006

Nullability Voodoo

We had a good discussion in the office today about nulls. People often use null in a database to mean something in business terms. For example if 'EndDate' is null it means that the task hasn't ended yet. But this kind of 'nullability voodoo' is bad, you're not being explicit about meaning and someone looking at your database schema has to know more implicit rules beyond what the schema itself can provide. Of course that's always going to be the case, but keeping explicitness (is that a word?) to a maximum will save you lots of time and money later. Nullability usually means something in a business sense that is better represented in some other way. Rather than using the nullability of EndDate to mean that the task hasn't completed, consider giving the task a status instead. I've maintained systems where complex rules about various attributes had to be interpreted to mean some kind of status to know how painfull this can be.

If you must represent nullable types in managed code, avoid using the SqlTypes. I've found numerous problems with them, they don't implicitly cast or behave like the basic value types and in any case, who wants to drag a reference to System.Data up into their domain layer. I haven't used the new nullable types in .net 2.0 so I can't really comment on them, but effectively it's a way of giving nullability to value types and has a nasty hackish smell about it. In any case you should be very carefull of equating TSQL null (an empty value) with C# null (a zero pointer) they mean different things and it can make code very tricky when you constantly have to test for null everywhere.

It's worth checking out the 'null object' pattern if you've got a business case for an entity that has to represent itself as a null value. It means that you can factor all your null processing into one class.

5 comments:

Mel said...

To me, it's very simple. Null means "I don't know". Perhaps I don't know some piece of information because we haven't reached that step in the process yet. But there is a difference between empty and null. For instance, there's a difference between me not knowing your middle name, and knowing that you don't HAVE one.

Mark Brady said...

In fact, Nulls in a database have certain very useful side-effects. Oracle (maybe other databases) doesn't index nulls. SO if you have a column where only one value is important -- like say you have an employee list you never purge and an Is_Active_Employee Bit Column that's indexed. In this case, for an established business, the number of employees you've had is >>> than the number of employees you have now. Say you have 1% 1's and 99% 0's.

That index is 99x larger than it needs to be. It's indexing all the 1's which is fine since they are 1% of the table but it's also loaded with 0's for rows you COULD NEVER use that index to find. If you want's all the inactive employees, the indexed would be ignored and a FTS would be performed. So 1 and NULL would be MUCH more efficient from a database perspective.

Mike Hadlow said...

Mel,

OK, so you've got a rule that says null = 'I don't know', so you're only going to use null values in a very specific business sense. The problem with such conventions is that you have to communicate them very clearly.

Mark,

Thanks, I wasn't aware of that behaviour in Oracle. But even if were to use true=1, false=null in your database deliberately for performance reasons, it would be a mistake to allow that to propogate into your application code. Having a Nullable<bool> and trying to enforce the convention true=true, but null=false and false is an illegal value is going to lead you into a whole world of pain.

Mark Brady said...

I don't disagree that I would convert Null to 0 in a view and hide the original table. Doing it in the app layer is ok, but not everything which queries your database will go through that. Better to keep such conversions at the lowest level possible -- keeps it in sync with every higher level.

As another thought. We have a situation with exactly as you suggest, a status code which is updated instead of using a NULL enddate as a flag. We've had LOTS of issues with deadlocking... because some process... well everyone should know how deadlocks happen but... suffice it to say that when you have lots of simultaneous transactions on the same rows of a busy table, you have to be very careful of the order of your resource usage.

Mike Hadlow said...

Hi Mark,

Personally I'm an enthusiastic user of ORM tools, so any mapping I might do between application code and database is done in one place and one place only.

As for your locking issue, well that's something that you obviously have to investigate in the context of the technology you are using, but it still shouldn't give you an excuse for weird and non-intuitive property values in your domain layer.