Conor O'Mahony's Database Diary

Your source of IBM database software news (DB2, Informix, Hadoop, & more)

Win a Trip to the IDUG Conference of your Choice

leave a comment »

DB2Night ShowThe International DB2 User Group (IDUG) is a user-run organization. If you want independent information about DB2, IDUG is the place to go. This year, IDUG are have conferences in the US (Denver), Germany (Berlin), and Australia (Sydney). The good news is that the DB2night Show is holding a contest, and the prize is an all expenses-paid trip to the IDUG conference of your choice. The contest aims to identify new users who can speak about their experiences with DB2. It’s a talent contest of sorts, where the talent is sharing your experiences. If you have ever considered speaking at a conference, this contest is the ideal way to see how you might do in a fun setting.

Written by Conor O'Mahony

January 25, 2012 at 2:01 pm

Anatomy of an Oracle Marketing Claim

with 9 comments

Yesterday, Oracle announced a new TPC-C benchmark result. They claim:

In this benchmark, the Sun Fire X4800 M2 server equipped with eight Intel® Xeon® E7-8870 processors and 4TB of Samsung’s Green DDR3 memory, is nearly 3x faster than the best published eight-processor result posted by an IBM p570 server equipped with eight Power 6 processors and running DB2. Moreover, Oracle Database 11g running on the Sun Fire X4800 M2 server is nearly 60 percent faster than the best DB2 result running on IBM’s x86 server.

Let’s have a closer look at this claim, starting with the first part: “nearly 3x faster than the best published eight-processor result posted by an IBM p570 server“. Interestingly, Oracle do not lead by comparing their new leading x86 result with IBM’s leading x86 result. Instead they choose to compare their new result to an IBM result from 2007, exploiting the fact that even though this IBM result was on a different platform, it uses the same number of processors. Of course, we all know that the advances in hardware, storage, networking, and software technology over half a decade are simply too great to form any basis for reasonable comparison. Thankfully, most people will see straight through this shallow attempt by Oracle to make themselves look better than they are. I cannot imagine any reasonable person claiming that Oracle’s x86 solutions offer 3x the performance of IBM’s Power Systems solutions, when comparing today’s technology. I’m sure most people will agree that this first comparison is simply meaningless.

Okay, now let’s look at the second claim: “nearly 60 percent faster than the best DB2 result running on IBM’s x86 server“. Oracle now compare their new leading x86 result with IBM’s leading x86 result. However, if you look at the benchmark details, you will see that IBM’s result uses half the number of CPU processors, CPU cores, and CPU threads. If you look at performance per core, the Oracle result achieves 60,046 tpmC per CPU core, while the IBM result achieves 75,367 tpmC per core. While Oracle claims to be 60% faster, if you take into account relevant system size and determine the performance per core, IBM is actually 25% faster than Oracle.

Finally, let’s not forget the price/performance metric from these benchmark results. This new Oracle result achieved US$.98/tpmC, whereas the leading IBM x86 result achieved US$.59/tpmC. That’s correct, when you determine the cost of processing each transaction for these two benchmark results IBM is 39% less expensive than Oracle. (BTW, I haven’t had a chance yet to determine if Oracle Used their Usual TPC Price/Performance Tactics for this benchmark result, as the result details are not yet available to me; but if they have, the IBM system will prove to be even less expensive again than the Oracle system.)

Benchmark results are as of January 17, 2012: Source: Transaction Processing Performance Council (TPC), www.tpc.org.
Oracle result: Oracle Sun Fire X4800 M2 server (8 chips/80 cores/160 threads) – 4,803,718 tpmC, US$.98/tpmC, available 06/26/12.
IBM results: IBM System p 570 server (8 chips/16 cores/32 threads) -1,616,162 tpmC, US$3.54 /tpmC, available 11/21/2007. IBM System x3850 X5 (4 chips/40 cores/80 threads) – 3,014,684 tpmC, US$.59/tpmC, available 09/22/11.

Written by Conor O'Mahony

January 18, 2012 at 11:01 am

Top Posts of 2011

leave a comment »

Its that time of year again. Here are the top posts from this blog in 2011, as judged by number of views.

  1. IBM DB2 Welcomes Oracle Database/HP Itanium Customers
  2. New IBM DB2 vs. Oracle Database Advertising Campaign
  3. A Closer Examination of Oracle’s “Database Performance” Advertisement
  4. Comparing Price for Oracle Exadata and IBM Smart Analytics System
  5. IBM DB2 Strikes Another Blow to Oracle Database

As you can see, there is a strong DB2/Oracle Database competitive theme running through these popular topics. And here are the top posts of 2011, as judged by reader participation. In other words, as judged by the number of comments (or perhaps the amount of controversy).

  1. New IBM DB2 vs. Oracle Database Advertising Campaign (20 comments)
  2. A Closer Examination of Oracle’s “Database Performance” Advertisement (19 comments)
  3. The Future of the NoSQL, SQL, and RDBMS Markets (12 comments)
  4. Update on the IBM DB2 “SQL Skin” for Migrating from Sybase ASE (8 comments)
  5. Industry Benchmark Result for DB2 pureScale: SAP Transaction Banking (TRBK) Benchmark (7 comments)

Written by Conor O'Mahony

December 19, 2011 at 11:00 am

Posted in Uncategorized

Deploying DB2 and InfoSphere Warehouse on Private Clouds

leave a comment »

Cloud computing is certainly a hot topic these days. If an organization is not already using cloud computing, it has plans to do so. The economics, agility, and value offered by cloud computing is just too persuasive for IT organizations ignore.

Even the high-profile Amazon outage couldn’t slow cloud computing’s relentless march towards mainstream adoption. If anything, that outage helped make cloud computing more robust by highlighting the need for hardened policies and procedures around provisioning in the cloud.

IBM recently announced updates to a set of products that make it easy to deploy DB2 and InfoSphere Warehouse on private clouds:

  • IBM Workload Deployer (previously know as WebSphere CloudBurst), which is a hardware/software appliance that streamlines the deployment and management of software on private clouds.
  • IBM Transactional Database Pattern, which works with the IBM Workload Deployer to generate DB2 instances that are suitable for transactional workloads.
  • IBM Data Mart Pattern, which generates InfoSphere Warehouse instances for data mart workloads.

These patterns consist of more than just deploying virtual images with pre-configured software. You should instead think of them as being like mini-applications for configuring and deploying a cloud-based database instances. Users specify information about the database, and then the pattern builds and deploys the database instance.

The Transactional Database Pattern is for OLTP deployments. It includes templates for sizing the virtual machine, database backup scheduling, database deployment cloning capabilities, and tooling (including Data Studio). The Data Mart Pattern incorporates the features to the OLTP pattern, together with deep compression and data movement tools. But, of course, it is configured and optimized for data mart workloads in a virtual environment.

Written by Conor O'Mahony

December 12, 2011 at 5:40 pm

Need Help Determining Hadoop Split Sizes? Use Adaptive MapReduce Instead!

leave a comment »

IBM is actively working on adaptive features for the Map and Reduce phases of its InfoSphere BigInsights product (which is based on Apache Hadoop). In some cases, this involves applying techniques commonly found in mature data management products, and in some cases it involves developing new techniques. While a number of these adaptive features are still under development, there are some features in the product today. For instance, BigInsights currently includes an Adaptive Mapper capability that allows Mappers to successively process multiple splits for a job, and avoid the start-up costs for subsequent splits.

When a MapReduce job begins, Hadoop divides the data into multiple splits. It then creates Mapper tasks for each split. Hadoop deploys the first wave of Mapper tasks to the available processors. Then, as Mapper tasks complete, Hadoop deploys the next Mapper tasks in the queue to the available processors. However, each Mapper task has a start-up cost, and that start-up cost is repeated each time a Mapper task starts.

With BigInsights, there is not a separate Mapper task for each split. Instead, BigInsights creates Mapper tasks on each available processor, and those Mapper tasks successively process the splits. This means that BigInsights significantly reduces the Mapper start-up cost. You can see the results of a benchmark for a set-similarity join workload in the following chart. In this case, the tasks have a high start-up cost. The AM bar (Adaptive Mapper) in the chart is based on a 32MB split size. You can see that by avoiding the recurring start-up costs, you can significantly improve performance.

Adaptive MapReduce Benchmark: Set-Similarity Join Workload

Of course, if you chose the largest split size (2GB), you would achieve similar results to the Adaptive Mapper. However, the you might potentially expose yourself to the imbalanced workloads that sometimes accompany very large splits.

The following chart shows the results of a benchmark for a join query on TERASORT records. Again the AM bar (Adaptive Mapper) in the chart is based on a 32MB split size.

Adaptive MapReduce Benchmark: TERASORT Join Workload

In this case, the Adaptive Mapper results in a more modest performance improvement. Although, it is still an improvement. The key benefit of these Adaptive MapReduce features is that they eliminate some of the hassles associated with determining the split sizes, while also improving performance.

As I mentioned earlier in this post, a number of additional Adaptive MapReduce features are currently in development for future versions of BigInsights. I look forward to telling you about them when they are released…

In the mean time, make sure to check out the free online Hadoop courses at Big Data University. I previous blogged about my experiences with these courses in Hadoop Fundamentals Course on BigDataUniversity.com.

Written by Conor O'Mahony

December 7, 2011 at 1:07 pm

Comparing HDFS and GPFS for Hadoop

leave a comment »

Here is a chart that compares the performance of Hadoop Distributed File System (HDFS) with General Parallel File System-Shared Nothing Cluster (GPFS-SNC) for certain Hadoop-based workloads (it comes from the Understanding Big Data book). As you can see, GPFS-SNC easily out-performs HDFS. In fact, the book claims that a 10-node GPFS-SNC-based Hadoop cluster can match the performance of a 16-node HDFS-based Hadoop cluster.

Comparing HDFS and GPFS for Hadoop Workloads

GPFS was developed by IBM in the 1990s for high-performance computing applications. It has been used in many of the world’s fastest computers (including Blue Gene and Watson). Recently, IBM extended GPFS to develop GPFS-SNC, which is suitable for Hadoop environments. A key difference between GPFS-SNC and HDFS is that GPFS-SNC is a kernel-level file system, whereas HDFS runs on top of the operating system. This means that GPFS-SNC offers several advantages over HDFS, including:

  • Better performance
  • Storage flexibility
  • Concurrent read/write
  • Improved security

If you are interested in seeing how GPFS-SNC performs in your Hadoop cluster, please contact IBM. Although GPFS-SNC is not in the current release of InfoSphere BigInsights (IBM’s Hadoop-based product), GPFS-SNC is currently available to select clients as a technology preview.

Written by Conor O'Mahony

November 30, 2011 at 1:07 pm

Informix Users are Going to San Diego

leave a comment »

It has just been announced that next year’s International Informix Users Group (IIUG) conference will be held in San Diego, California on 22 – 25 April. The IIUG Conference continues to offer incredible value. Sign up soon to get the $695 early bird rate, and if you sign up for free IIUG membership, you even get $100 off that rate. $595 for a conference of this length and quality is amazing value. But you’re going to have to act fast to get this discount rate!

And, don’t forget that San Diego is such a great city to visit. Not only is it a wonderful city with an ideal year-round climate. But it also has fantastic array of attractions like the world-famous San Diego Zoo, Sea World, LEGO land, and the Zoo Safari Park (a personal favorite).

International Informix Users Group (IIUG) Conference

Written by Conor O'Mahony

November 30, 2011 at 9:22 am

Posted in DBA, IIUG, Informix

Tagged with , ,

Highlights from the IDUG EMEA Conference

leave a comment »

DB2Night ShowI’m still in the afterglow of the International DB2 User Group (IDUG) conference in Prague, Czech Republic. It was another great conference at a great facility in a great city. The conference organizers should be commended on a truly outstanding event. Its incredible to think that the conference organizers are user volunteers, and not professional conference planners! I’m already looking forward to the next IDUG EMEA conference in Berlin next year. If you are interested in a more in-depth discussion of the conference, including lessons learned from the technical sessions, Norberto Filho will be appearing on the DB2Night show on Friday 02 December 2011. Even if you were at the conference, there was so much happening there that you are sure to learn something new from Norberto’s experiences.

Written by Conor O'Mahony

November 30, 2011 at 8:29 am

Posted in DB2 for LUW, DB2 for z/OS, DBA, IDUG

Tagged with ,

IBM is Baking NoSQL Capabilities into DB2 and Informix

leave a comment »

IBM recently revealed its plan to integrate certain NoSQL capabilities into IBM DB2 and Informix. In particular, it is working to integrate graph store and key:value store capabilities into the flagship IBM database products. IBM is not yet indicating when these new capabilities will be available.

IBM does not plan to integrate all NoSQL technologies into DB2 and Informix. After all, there are many NoSQL technologies, and quite a few of them are clearly not suitable for integration into IBM’s products. The following chart summarizes the NoSQL product landscape. This landscape includes more than 100 products across a number of database categories. IBM is saying that they will integrate certain NoSQL capabilities into their products and work hand-in-hand with others NoSQL technologies.

NoSQL Landscape

Readers of this blog will know that these developments are consistent with my view that certain NoSQL technologies will eventually find themselves integrated into the major relational database products. In much the same way as the major relational database products fended off the challenge of object databases by adding features like stored procedures and user-defined functions, I expect the major relational database products to fend off the NoSQL challenge with similar tactics. And don’t forget that the major relational database products have already integrated XML capabilities, providing XQuery as an alternate query language. Its not too much of a stretch to imagine how several of these NoSQL capabilities might be supported in an optimized way as part of a relational database product.

I look forward to blogging more about this topic as news about it emerges…

Written by Conor O'Mahony

November 21, 2011 at 9:00 am

IBM DB2 Analytics Accelerator—Bringing Netezza to the Mainframe

leave a comment »

Now that the IBM Information on Demand (IOD) and International DB2 User Group (IDUG) conferences are behind me, I have time to blog about some of the great announcements from those conferences. Probably the announcement that generated the most interest among conferences attendees is the new release of the IBM DB2 Analytics Accelerator (IDAA). This product takes advantage of Netezza to accelerate analytics queries on DB2 for z/OS.

The way it works is… you specify the data whose analysis you want to speed up, and a copy of that data is placed on Netezza (DB2 for z/OS continues to be the system of record for all data). Then, when DB2 for z/OS receives a query, an optimizer determines whether that query should be handled by DB2 for z/OS or by IBM Netezza. Here is a chart from the IDUG conference that summarizes the query execution flow.

IBM DB2 Analytics Accelerator

Conceptually, you could almost think of the IBM DB2 Analytics Accelerator as a mainframe specialty processor for analytics. I know its not actually a specialty processor, but it does perform the processing involved with complex analytics queries. It also makes life easier for database administrators who often struggle with long-running complex queries, by providing them with an accelerator that does not require additional tuning. To see how much faster it is, here is another chart from the IDUG conference. It shows the experiences of IBM DB2 Analytics Accelerator Beta program participants.

IBM DB2 Analytics Accelerator Performance

If you run complex analytical queries on DB2 for z/OS, it is almost certainly worth you while to learn more about the IBM DB2 Analytics Accelerator.

Written by Conor O'Mahony

November 18, 2011 at 9:00 am

What will Happen to “In-Memory” when Storage Class Memory Arrives?

leave a comment »

During this week’s keynote address at the International DB2 User Group (IDUG) conference in Prague, Namik Hrle talked about Storage Class Memory. Storage Class Memory is a technology in development that promises the performance of Solid State Drive (SSD) technology at the low cost of Hard Disk Drive (HDD) technology. It also promises compelling breakthroughs in space and power consumption. Storage Class Memory is essentially the marriage of scalable non-volatile memory technology and ultra high-density technology. Here is a table that projects the 2020 characteristics of Storage Class Memory:

Storage Class Memory

This table was actually created in 2008. From what Mr. Hrle says, we are tracking ahead of this schedule and will have these capabilities available sooner than 2020.

The performance limitations of disk-based systems have led to the addition of many database and data warehouse “features” (clever optimizations that address these limitations, and provide acceptable performance). If Storage Class Memory delivers on its random and sequential I/O performance promises, as well as its cost promises, many of these optimizations will become either less important, or perhaps unnecessary. In fact, it makes you wonder if our industry’s current fixation with in-memory capabilities may be short-sighted. Several vendors have in-memory database product visions that will not be realized until the latter half of this decade, which is a similar time frame to the projected availability of low-cost Storage Class Memory. Certainly food for thought…

Written by Conor O'Mahony

November 17, 2011 at 10:17 am

Posted in Cost, Performance

Comparing “New Big Data” with IMS on the Mainframe

with 2 comments

While it does not come up often in today’s data management conversations, the IMS database software is at the heart of many major corporations around the world. For many people, it is the undisputed leader for mission-critical, enterprise transaction and data-serving workloads. IMS users routinely handle peaks of 100 million transactions in a day, and there are quite a few users who report more than 3,000 days without unplanned outages. That’s more than 8 years without an unplanned outage!

IBM recently announced IMS 12, claiming peak performance at a remarkable 66,000 transactions per second. The new release features improved performance and CPU efficiency for most IMS use cases, and a significant improvement in performance for certain use cases. For instance, the Fast Path Secondary Index means that workloads that use this secondary index are 60% faster.

It is interesting to compare the performance of IMS with the headline-grabbing “big data” solutions that are all the rage today. For instance, at the end of August this year, we read how Beyonce Pregnancy News Births New Twitter Record Of 8,868 Tweets Per Second. I am not saying that IMS can replace the infrastructure of Twitter. Far from it. However, I am saying that, when you consider that IMS can handle 66,000 transactions per second, the relative performance levels of the “new big data” solutions when compared with IMS are food for thought. Especially when you consider the very significant infrastructure in place at Twitter, and the staff needed to manage that infrastructure. And don’t forget that IMS supports these performance levels with full read-write capability, full data integrity, and mainframe-level security.

I appreciate that many of today’s Web-scale businesses begin with capital investments that preclude the hardware and software investments required for something like IMS. These new businesses need to be relatively agile, and depend upon the low barrier of entry that x86-based systems and open source/inexpensive software afford. However, I still think it interesting to put this “new big data” in perspective.

Written by Conor O'Mahony

November 9, 2011 at 2:17 pm

IBM Champions Delivering Sessions at the IOD Conference

leave a comment »

IBM has great leaders among its user base. They may be technical leaders, whose technical expertise puts them in an elite group of people. They may be community leaders, who bring users together to help one another. They may be academic leaders, who are molding the next generation of innovators. IBM strives to recognize these leaders in its IBM Champion program.

At the Information On Demand (IOD) Conference later this month, 59 IBM Champions will be delivering an impressive lineup of more than 80 sessions across the Business Leadership, Information Management, Enterprise Content Management, and Business Analytics tracks. To see a list of all the IBM Champion-delivered sessions at the IOD Conference, check out the online Roadmap for IBM Champion Sessions.

Written by Conor O'Mahony

October 13, 2011 at 2:57 pm

Posted in Uncategorized

New IBM Smart Analytics Systems

leave a comment »

Oracle garnered a lot of headlines a couple of weeks ago with their Oracle Database Appliance. It didn’t take long for SmarterQuestions to indicate why the IBM Smart Analytics Systems are A Smarter Database System for SMB Clients.

Recently, IBM added the following systems:

  • IBM Smart Analytics System 5710, which is an x86-based Linux system
  • IBM Smart Analytics System 7710, which is a Power Systems-based UNIX system
  • IBM Smart Analytics System 9710, which are mainframe-based systems

These systems include everything you need to quickly set up a data warehouse environment, and to quickly have your business analysts working with the data.

On top of the servers and storage, it includes database and data warehouse software, Cognos software, cubing services, data mining capabilities, and text analytic capabilities. And it is available on your platform of choice (Linux, UNIX, or mainframe). It is also competitively priced, when you consider that the starting price for the 5710 is under $50k, just like the Oracle appliance. However, the IBM system includes all of the necessary software, whereas with the Oracle appliance you have to purchase the very expensive Oracle Database software separately. And the Oracle Database software is not exactly inexpensive.

If you want to learn more, please visit the IBM Smart Analytics Systems Web page.

Written by Conor O'Mahony

October 13, 2011 at 11:26 am

Top 10 Reasons to Attend IDUG EMEA in Prague this November

leave a comment »

Here are my personal top 10 reasons to attend the upcoming International DB2 User Group (IDUG) conference in Prague, Czech Republic this November.

  1. 100+ of the best technical sessions about DB2, featuring IBM developers, industry experts, and users like you
  2. IBM keynote on the future of relational database software
  3. Official IBM certification tests at no additional cost
  4. Pre-conference seminars on preparing for DB2 certification tests at no additional cost
  5. Pre-conference workshop on preparing for DB2 10 for z/OS upgrades at no additional cost
  6. Conference exhibit hall with the world’s top DB2 tool vendors, consulting firms and solution providers
  7. Post-conference day-long educational seminars
  8. It’s a great way to meet and get to know fellow DB2 users
  9. It’s a great way to speak directly with the DB2 developers
  10. Prague is one of the most beautiful cities in the world

Registration is now open at http://bit.ly/IDUGEMEA. If you register before 17 October 2011, you can take advantage of the early bird discount and save 275 Euro + VAT.

Written by Conor O'Mahony

October 7, 2011 at 9:16 am

Follow

Get every new post delivered to your Inbox.

Join 64 other followers