About these ads

Conor O'Mahony's Database Diary

Your source of IBM database software news (DB2, Informix, Hadoop, & more)

What you need to know about Column-Oriented Database Systems

with one comment

Column-oriented database systems (Column Stores) have attracted a lot of attention in the past few years. Vendors have quoted impressive performance and storage gains over row-oriented database systems. In some cases, vendors have even claimed as much as 1000x performance improvement.

For many years Sybase IQ led the way for Column Stores. But they have since been joined by a long list of vendors, including Infobright, Paraccel, and Vertica. The claims being made by these vendors are attention-grabbing. But they don’t tell the whole story. Before you run out and get yourself a Column Store, you should be aware of the following:

  • Performance issues when queries involve several columns.
    Column stores can be faster than row-oriented stores when the number of columns involved is small (it depends on a number of factors, including the number of columns, the use of indexing, the specifics of the query, and so on). However, you can run into performance issues with Column Stores as the number of columns increases, due to the re-composition overhead. In fact, the performance degradation can be quite significant. If your queries involve more than a few columns (either columns for retrieving data or columns for query predicates), you need to be aware of this potential issue. If you are evaluating a column Store, make sure to test queries that involve more than a couple of columns.
  • Overhead associated with inserting data.
    When you create a new data record, you are creating a data row. However, a Column Store does not have a row-orientation. Instead, a Column Store must decompose that data row into the individual column values, and store each of those column values individually. This adds up to a lot more block updates for a Column Store than a row-oriented store. As you can imagine, this is quite a bit of additional work. You should also keep in mind that the values in Column Stores are typically sorted for fast selection and retrieval, which means even more work for data insert and update operations. So what does all this mean? Essentially these limitations make it difficult to have real-time or near real-time data analysis with Column Stores (unless the Column Store vendor uses an approach like Vertica where they have an “update area” in memory that is essentially a row store cache, where real-time inserts are first written, then asynchronously written to disk).

Column Stores are great as analytic data marts where queries do not involve many columns. In such situations, you can enjoy performance gains. However, for more involved usage, you may run into issues. For instance, a Column Store is almost certainly not up to supporting thousands of simultaneous users and mixed query workloads, which are common in Enterprise Data Warehouse (EDW) environments. Sometimes people can get blinded by Column Store success for relatively simple data mart environments. You should be aware that these performance gains do not necessarily translate to larger, more complex environments. In fact, they may not even translate to other simple data marts with different schemas, or where your queries involve more than a couple of columns. The bottom line here is that you need to know both the benefits and the limitations of a Column Store, and make the right decision for your particular situation.

About these ads

Written by Conor O'Mahony

September 6, 2011 at 10:46 am

One Response

Subscribe to comments with RSS.

  1. Hi Conor,

    One of your key points is something that I agree with entirely – that organizations need to pick the right tool for their particular use case. There is now a wide and growing range of database and data management products on the market that are designed to solve different problems, and the days of one-size-fits-all are long over.

    Companies use columnar databases specifically for analytics, as it drastically reduces the amount of I/O and hence queries are much faster. The data is purposefully stored in columns since, in analytic queries, only a small subset of columns is typically used. Just as row-oriented databases do not efficiently extract a single column, column-oriented databases do not efficiently extract all columns. They are used for two different purposes. As to the number of columns, it doesn’t matter how many there may be. If the database needs to retrieve more data (more columns for example), it will take more time, just as a traditional row database would take more time to retrieve more rows. (As a shameless promotion, Infobright’s technology is based on eliminating the need to access data to respond to a query based on the automatic intelligence we create about the data. No indexes, no data partitioning, no cubes…)

    As to the idea of picking the right tool, Infobright’s columnar database is designed for analyzing large volumes of “machine-generated data” such as Weblogs, network logs and event data, telecom call records, sensor data etc. We would never suggest that it is suited for an enterprise data warehouse or transactional applications. Tools like Hadoop or MongoDB and many others (many are open source) are driving huge innovation for Big Data analytics, but focused on particular use cases.

    The more options there are, the more confusing it can become. There are a growing number of excellent reports that can help educate people on “what to use when” and help make sense of it.

    Regards

    Susan Davis
    VP Marketing and Product Management
    Infobright

    Susan Davis

    September 7, 2011 at 4:24 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 66 other followers

%d bloggers like this: