Databases, Dimensions, and Change

While working on a pet project, I was forced to make a decision about data storage. Being somewhat conservative and lazy I decided to (mis)use a row-oriented RDBMS for tagged data objects. That is, there is a table in the database where each row represents an object and I use relationships defined in other tables to annotate each object with any number of tags.

Yuck.

Google’s press release and paper on BigTable was the first time I had noticed column-oriented tables. Ever since I’ve kept tabs on alternative database systems, matching each with hypothetical problems.

What I want is a system for storing structured data, tagging the data, and performing queries based on combinations of tags. Tags don’t have to be fluffy things like “kittens” — tags can be used to signal a particular trait, such as “all posts in the current month”.

At this point, there appears to be only one open source, column-oriented database that is still being actively developed and could be used behind a web application [1]. It’s unclear whether it supports bitmap indices [2] or is capable of handling heavy web traffic, but fight off NIH syndrome long enough to determine that.

___
[1] MonetDB, http://www.monetdb.nl/
[2] bitmap indices, http://en.wikipedia.org/wiki/Bitmap_index