Archive for the ‘InfoSphere Warehouse’ Category
Cloud computing is certainly a hot topic these days. If an organization is not already using cloud computing, it has plans to do so. The economics, agility, and value offered by cloud computing is just too persuasive for IT organizations ignore.
Even the high-profile Amazon outage couldn’t slow cloud computing’s relentless march towards mainstream adoption. If anything, that outage helped make cloud computing more robust by highlighting the need for hardened policies and procedures around provisioning in the cloud.
IBM recently announced updates to a set of products that make it easy to deploy DB2 and InfoSphere Warehouse on private clouds:
- IBM Workload Deployer (previously know as WebSphere CloudBurst), which is a hardware/software appliance that streamlines the deployment and management of software on private clouds.
- IBM Transactional Database Pattern, which works with the IBM Workload Deployer to generate DB2 instances that are suitable for transactional workloads.
- IBM Data Mart Pattern, which generates InfoSphere Warehouse instances for data mart workloads.
These patterns consist of more than just deploying virtual images with pre-configured software. You should instead think of them as being like mini-applications for configuring and deploying a cloud-based database instances. Users specify information about the database, and then the pattern builds and deploys the database instance.
The Transactional Database Pattern is for OLTP deployments. It includes templates for sizing the virtual machine, database backup scheduling, database deployment cloning capabilities, and tooling (including Data Studio). The Data Mart Pattern incorporates the features to the OLTP pattern, together with deep compression and data movement tools. But, of course, it is configured and optimized for data mart workloads in a virtual environment.
Oracle garnered a lot of headlines a couple of weeks ago with their Oracle Database Appliance. It didn’t take long for SmarterQuestions to indicate why the IBM Smart Analytics Systems are A Smarter Database System for SMB Clients.
Recently, IBM added the following systems:
- IBM Smart Analytics System 5710, which is an x86-based Linux system
- IBM Smart Analytics System 7710, which is a Power Systems-based UNIX system
- IBM Smart Analytics System 9710, which are mainframe-based systems
These systems include everything you need to quickly set up a data warehouse environment, and to quickly have your business analysts working with the data.
On top of the servers and storage, it includes database and data warehouse software, Cognos software, cubing services, data mining capabilities, and text analytic capabilities. And it is available on your platform of choice (Linux, UNIX, or mainframe). It is also competitively priced, when you consider that the starting price for the 5710 is under $50k, just like the Oracle appliance. However, the IBM system includes all of the necessary software, whereas with the Oracle appliance you have to purchase the very expensive Oracle Database software separately. And the Oracle Database software is not exactly inexpensive.
If you want to learn more, please visit the IBM Smart Analytics Systems Web page.
Column-oriented database systems (Column Stores) have attracted a lot of attention in the past few years. Vendors have quoted impressive performance and storage gains over row-oriented database systems. In some cases, vendors have even claimed as much as 1000x performance improvement.
For many years Sybase IQ led the way for Column Stores. But they have since been joined by a long list of vendors, including Infobright, Paraccel, and Vertica. The claims being made by these vendors are attention-grabbing. But they don’t tell the whole story. Before you run out and get yourself a Column Store, you should be aware of the following:
- Performance issues when queries involve several columns.
Column stores can be faster than row-oriented stores when the number of columns involved is small (it depends on a number of factors, including the number of columns, the use of indexing, the specifics of the query, and so on). However, you can run into performance issues with Column Stores as the number of columns increases, due to the re-composition overhead. In fact, the performance degradation can be quite significant. If your queries involve more than a few columns (either columns for retrieving data or columns for query predicates), you need to be aware of this potential issue. If you are evaluating a column Store, make sure to test queries that involve more than a couple of columns.
- Overhead associated with inserting data.
When you create a new data record, you are creating a data row. However, a Column Store does not have a row-orientation. Instead, a Column Store must decompose that data row into the individual column values, and store each of those column values individually. This adds up to a lot more block updates for a Column Store than a row-oriented store. As you can imagine, this is quite a bit of additional work. You should also keep in mind that the values in Column Stores are typically sorted for fast selection and retrieval, which means even more work for data insert and update operations. So what does all this mean? Essentially these limitations make it difficult to have real-time or near real-time data analysis with Column Stores (unless the Column Store vendor uses an approach like Vertica where they have an “update area” in memory that is essentially a row store cache, where real-time inserts are first written, then asynchronously written to disk).
Column Stores are great as analytic data marts where queries do not involve many columns. In such situations, you can enjoy performance gains. However, for more involved usage, you may run into issues. For instance, a Column Store is almost certainly not up to supporting thousands of simultaneous users and mixed query workloads, which are common in Enterprise Data Warehouse (EDW) environments. Sometimes people can get blinded by Column Store success for relatively simple data mart environments. You should be aware that these performance gains do not necessarily translate to larger, more complex environments. In fact, they may not even translate to other simple data marts with different schemas, or where your queries involve more than a couple of columns. The bottom line here is that you need to know both the benefits and the limitations of a Column Store, and make the right decision for your particular situation.
Here is a video where Philip Howard, Research Director at Bloor Research, evaluates performance, scalability, administration, and cost considerations for IBM Smart Analytics System and Oracle Exadata [for data warehouse environments]. This video is packed with great practical advice for evaluating these products.