Tuesday, June 03, 2008

Towards a new database model

Over the past few days I've been pondering the way databases get used, and I think I have a way to help make things better by shifting things around a bit. I'm going to dive right in to deep territory here...

The current crop of SQL type databases are all batch oriented. They just don't scale well because of this. We need to update the model from one of data that gets visited by the occasional query to one where the queries are always running and updating their results.

If you find yourself requerying a database without having changed the data yourself, you're wasting a huge amount of resources. You really only want to know what's changed, and it's silly to re-examine everything that isn't modified since the last run of the query.

Imagine a table in a database. If you recorded the initial state of the table, and all of the subsequent operations, you could perfectly replicate the state of the table.

If you then wrote code to implement this type of logic, it takes a lot less code to keep valid replicas. You would have a stream of changes instead of a set of facts that constantly needed to be reexamined.

If you ran a query against such a table, then ran the query against the stream of changes, you could have a running output that worked vastly more efficiently, since you only have to examine records that have changed.

I'm not a computer scientist, nor an engineer, but it seems to me that this model should be looked into a bit, and has some promise to save us all some time and effort.

Who knows, it might even save Twitter.

No comments: