Friday, 30 October 2009

Great myths of database design

I have a really pressing deadline and consequently very little time to spare – but I’ve found something sufficiently annoying that not to rant about it would be illogical.  Basically it’s big denormalised tables.  There, I feel better now…almost.

Many times over the course of my IT career when enquiring as to why someone has not bothered to design their database properly, I hear something along the lines of “we decided against a normalised design for performance reasons” or “it’s not appropriate to the domain model”.  I have yet to see a case where this is actually true (and I’ve been round the block a few times).  The real reason is usually either a lack of competence or simply because things were done in such a hurry that good design didn’t seem a priority.

Of course right now if anyone is actually reading this, at least one person will be thinking “but my case is special!”.  To these people I say: “no it isn’t”.  Unless you’re working for Google, the SharePoint team at Microsoft, or a similar mass-market product/service vendor with a requirement to scale out to massive volume and throughput – then it is highly unlikely that your project requires that level of ‘innovation’.  Let me make a bold assertion right now: you are probably not as smart as Edgar Codd (and if you are then it’s likely that you agree with me).

The relational model is a rare thing in the IT world: a methodology that has lasted more than 5 years.  Dr Codd also coined the term OLAP (amongst other things) so what he didn’t know about managing data probably wasn’t worth knowing.  That’s not to say that things haven’t moved on since then (e.g. the removal of business logic from the data layer, moving the conceptual layer into the domain model, etc) but the core principles of normal form have not changed.  Every database should exists in normal form somewhere, even if application or task specific abstractions need to be used (and in my experience they rarely do).

My recommended metrics for table design are that tables should ideally have between 3 and 12 fields.  The practical limit is 20 – anything more than that and you need to break the tables and review whether your view of the data accurately reflects the entity boundaries (if these metrics sound familiar it’s because they are similar to Juval Lowy’s metrics for service design in WCF applications).

By keeping table sizes reasonable it becomes much easier to manage indexes.  In most cases I would strongly recommend that all foreign keys are indexed, as well as any fields which are likely to form the basis of search criteria.  There is nothing whatsoever wrong with indexing every field in a table.  Further, I would recommend not using natural keys as primary keys – use surrogates instead (for further information see Jamie Thompson’s article here).  Implicit in this is that all foreign keys should reference the surrogate key.

One further thing I should clarify is that the conceptual model does not have to match the logical model.  To put this in context: normalised databases do not preclude domain-driven design.  This is what we have data persistence frameworks for (i.e. to act as a broker between the two models).

There, I feel much better now.

No comments:

Post a Comment