All / Technology & Society

For Data Wizards Everywhere

Today’s discussion topic is the similitude between querying databases and casting spells.   If you are not a person who deals frequently with databases, that probably means nothing to you.  Actually, I imagine many of you who do deal frequently with databases are equally mystified.  I shall endeavor to explain.

Databases are made up of tables.  A table is a set of lists of stuff that are all related to each other.  For instance:

Name Profession Weapon Concealed
Hamlet Prince of Denmark sword in the graveyard
Othello Venetian General fluffy pillow in but a patient list
Benedick the Prince [Don Pedro]’s jester rapier-like wit in the arbor

You might think it’s a good thing to have all these things in one big table.  But one big massive table starts to get confusing.  For instance, if you want to list Hamlet’s profession as ‘murderer’ as well as ‘Prince of Denmark’, you need an extra line.  (But hang on a second!  I hear you cry.  It’s OTHELLO who’s the murderer!  Well, go tell that to Rosencrantz and Guildenstern.  Oh wait, you can’t.  Because they are dead.  And how were they killed?  In an extremely snarky way, that’s how.)  An extra line means a whole lot of duplicate data because you have to repeat all the other fields twice.  (Nobody wants to sit through two Hamlets in one go.  You’d die of sad, or get an acute case of cursed spite, or something.)  This will do bad things: it will mean that when you count the number of people in the table you’ll get four instead of three, and you won’t be able to tell for certain whether the second Hamlet is a whole new Hamlet who also happens to like hanging out in graveyards with swords, or just the same guy with some extra information attached (unless you are clever and add a unique identifier).  Plus when it comes time to update things, say Hamlet gives up the sword and starts gallivanting about with some bolas, if you only update ONE of the Hamlet records the other one will be wrong.  Because of all these things, storing everything in one massive table will make your database run really, really slow.  Like I said, nobody wants to sit through two Hamlets in one go.

The worst, though, is where somebody has decided to store two (or more, if they’re feeling adventurous) types of data in one single field.  Imagine that instead of separate fields for Name, Profession, Weapon and place of concealment, I’d just stuck them all in one giant column all willy-nilly.  Perhaps titled ‘ALL THE THINGS EVER.’  How would you ever know whose weapon went where?  Surely such things don’t happen, you say.  I inform you with deeply felt personal regret that they do.  Fortunately there are workarounds.  Unfortunately they require a thorough understanding of the more esoteric methods of database divination.  (Here’s what I found when I Googled ‘datamancy’.  Please allow yourself a moment of distraction.)

So instead of having one massive table (or, I shudder to think, one massive towering column) with all the information in the entire universe in it, usually it makes more sense to make a whole bunch of different smallish lists.  This makes the world run more smoothly.

And the whole point of the world running smoothly, in database terms, is to ask it questions.  But you have to ask those questions in the right way.

Getting information from a database can be very like casting a spell.  You must be extremely precise with your commands or you can get a very unexpected result indeed.  To prize out of the database what you are seeking, you must think about all the ways in which the question you’re asking might be interpreted, not just what you think you’ve asked.  For the database cannot read what you meant, only what you actually said.  Often this means the successful querent breaks things down into smaller steps, or remembers to account for certain particular conditions.  If this is all sounding a bit ‘first anoint your cauldron with the sacred smoke’ and ‘perform when the moon is in the seventh house’, you’re not wrong.  Though it’ll look a bit more like ‘show me only records made in the last year’ and ‘exclude records with no attachments’.

The truly arcane spell-like properties of databases really come out, though, when you have to manipulate a table like the one I mentioned earlier, with a whole bunch of different and demonstrably unrelated things all stored in the same table as if it’s Mary Poppins’ handbag.  For then, then you get to replicate the table.  In spell terms, this is like you having all your ingredients in one giant bowl, except what you really need is each of them separately for different bits of the spell.  So, you duplicate the bowl a bunch of times (because you are a magic data-wizard), then tell each bowl to show you only one ingredient at a time.  The other ingredients are still there, mind, but you’ve magically made them inert. Then you can do with each one as you will.

And then sometimes despite your best intentions, what you thought was going to be a beautiful spell (read: efficient query) ends up producing something that looks like a pigweasel.  Or you accidentally use too much power and cause deep disturbances in the fabric of the universe (the database universe.)  Data query languages are strange and terrible forces, my friends.  Be wary of attempting to wield such power.

Advertisements

One thought on “For Data Wizards Everywhere

Share Your Thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s