Friday, September 17, 2010

"FriendFeed" Model Schema-Less Data Storage

Modern application developers face a challenge ill-suited to traditional RDBMS'. The data collected by applications is a moving target, with additional data points added often as applications mature. Also, many applications aggregate data from multiple sources where data points captured may differ or change beyond the control the application developer. Traditional RDBMS and the tools and checks built into those tools prefer static schemas. For example, Netbeans has a wizard to create Java entity beans from tables, but no simple way to update the entity beans when the schema changes. While creating the bean can take 5 minutes, adding a column takes hours. What's an architect to do?

One solution is to stuff an entire record into a LOB column in a traditional RDBMS (or whatever the equivalent of the LOB is in the particular RDBMS). This, however, makes indexing difficult.

Oracle has built in libraries to allow indexing and searching of XML data in BLOB and CLOB columns, but the performance is poor (though better than extracting the whole LOB and parsing it in, say, Java). Columns for indexing could be added to the LOB table, however if additional indexes are needed at a further date, the table schema must be altered.

The solution is to the indexing problem is to create separate tables for each index. Details of how this was accomplished at FriendFeed can be found here.

Note also, that if value comparison searches are not needed, a persistent hash map like Project Voldemort can be used instead.

Labels

Blog Archive

Contributors