Archive for August 2010
IBM DB2 recently beat Oracle in the race to 10 million tpmC for the TPC-C benchmark. For fun, some of my engineering colleagues spent time putting this in “real world” terms. They wanted to be able to explain to their friends and family just how big the system is, without using jargon like tpmC.
Remember that the DB2 benchmark system processed 10.6 million new orders per minute. Well, if you extrapolate those numbers, such a system would be able to process the transactions if every man, woman, and child on the planet happened to go to McDonalds three times a day to purchase a Big Mac™. Not only that, but all of those transactions would be recorded in real time in a single database system. Of course, I am not advocating that everyone on the planet go out and exclusively dine in that manner. But if a retailer wants to cater for that eventuality, then we have the system for you
Next, they looked at these numbers from another angle. The TPC-C benchmark attempts to model the transactions of a hypothetical wholesale supplier. It includes information for sales districts, where each district has a sales warehouse. This benchmark system was able to process transactions for almost 1 million different sales distribution warehouses. Well, Walmart—the World’s largest retailer—currently has 8,446 stores. (Now, this is not intended to be a rigorous calculation, it is a simple and quick calculation that makes the questionable assumption that Walmart has one distribution warehouse for each store and ignores the relative size of those warehouses.) If you do the math, then this benchmark system can handle 120 times as many distribution warehouses as would be needed by the world’s largest retailer. Yes, that’s right… more than 100 times the number of the world’s largest.
At this stage, you are probably wondering about the applicability of these benchmark results. After all, would you really ever need a system that handles transactions for everyone on the planet eating three times a day in real time? The answer to that question is that today very, very few organizations need to process anything approaching that volume of transactions (and those that do, are often secretive). However, who knows what tomorrow may bring? After all, the world is becomming more instrumented all the time. Every day, there are new electronic devices that have the ability to create transactions (or simply supply information). There are new electronic health monitoring devices, electronic utility meters, electronic sensors in automobiles, and so on. We are only at the beginning of this era of machine-generated data and Big Data. Just think back 5 or 10 years at the volume of data that your IT department was processing. And now apply that growth level to the next 5 or 10 years. If we follow similar patterns, then such large systems might be here sooner than you think. And it will be good if you can be confident that your systems can scale to meet those needs.
I got a Twitter Direct Message from Ian Bjorhovde (@idbjorh) yesterday, asking why IBM’s latest TPC-C benchmark result uses shared-nothing partitioning (Database Partitioning Feature) rather than shared-disk clustering (DB2 pureScale). After all, partitioning is typically used in warehouse situations, while clustering is typically used in OLTP situations like those being measured in the TPC-C benchmark. He also went on to remind me that IBM spent the last year touting DB2 pureScale as the Oracle RAC killer. So why is IBM not using DB2 pureScale in this situation?
The answer is actually quite straightforward. IBM evaluated the various options—including DB2 pureScale—and determined that for the purposes of reclaiming TPC-C throughput leadership, a system that uses shared-nothing partitioning would achieve the best results. In other words, IBM chose a system that is optimized for the intended workload, just as you would do with any project. You see, the TPC-C data set is highly partitionable (90% of transactions for a given customer are serviced by the same sales warehouse). And it only makes sense for IBM to take advantage of this fact in its system design (just as Oracle takes advantage of that fact to route transactions to specific nodes in its Oracle RAC cluster). So the bottom line here is that IBM chose the system that best suits the intended workload, and in this case that system is based on shared-nothing partitioning.
Remember the infamous advertisement from Oracle where they claimed they would be announcing a TPC-C benchmark result with XX million tpmC, hinting at a massive double digit TPC-C result. And then they ended up announcing a result with more than 7 million tpmC. At the time, a few people joked that they didn’t expect the first X in Oracle’s boast to be a zero. Well, last week IBM became the first vendor to break the 10 million tpmC barrier for the TPC-C benchmark.
The new benchmark result is impressive in many regards. It provides 35% greater throughput1 than the Oracle/Sun result. But this great leap forward in throughput is only part of the story. An equally important consideration is system efficiency, with the IBM result having only half the number of CPU cores when compared with the Oracle/Sun system. If you do the math, the IBM result gets 2.7 times more productivity per CPU core than the Oracle/Sun result. And because software is typically licensed by CPU core, that has the potential to add up to a lot of savings, both for initial purchase price and for ongoing maintenance costs.
But the aspect of this result that really astounded me was the price/performance metric. The IBM result achieved 41% lower cost per transaction than the best Oracle/Sun TPC-C performance result. And don’t forget the games that Oracle played in their price/performance metric. This means that IBM is 41% better even when Oracle resorts to these tactics!
You may recall that, when Oracle talked about their benchmark result, they boasted that their system needed less energy than IBM’s system. Of course, what they neglected to mention was that their system used Solid State Disk (SSD) technology and, at the time, IBM’s used only spinning disk. Well, now the IBM system also uses SSD technology. With this level playing field, the IBM system requires 35% less energy per transaction (Watts/tpmC) than published Oracle energy usage data2.
I’m sure that, like me, everyone in the DB2 community is delighted that DB2 has reclaimed its position at the top of the TPC-C benchmark in such an emphatic manner. It is further proof of DB2′s performance leadership. In fact, since 1 January 2003, DB2 has enjoyed more time leading the TPC-C, TPC-H 10TB, and SAP 3-Tier SD benchmarks than all other vendors combined.
2 Energy claims for either system are not related to official TPC-Energy results and should not be compared to TPC-Energy results. Energy comparisons are between IBM and Oracle/Sun system configurations referenced above. IBM POWER7 energy consumption = 65130 Watts, 0.006282 Watts/tpmC; Oracle/Sun system consumption = 73932 Watts, 0.009668 Watts/tpmC. IBM energy estimate based in IBM calculations using customer-available energy estimation tools for IBM servers, storage energy estimation reports available from IBM Techline services, and published component active power consumption specifications. Oracle energy estimate from Oracle-published results available at http://www.oracle.com/features/strategic-focus-report.pdf.
IBM has just announced the availability of DB2 pureScale on System x at a Smarter Systems Tour event in Beijing.
DB2 pureScale is the Oracle RAC-killer that IBM announced last year. It brings IBM’s industry-leading clustering architecture from the mainframe to distributed systems. DB2 pureScale benefits include centralized lock management, which eliminates the need for the locality of reference that is sometimes hardcoded into applications that use Oracle RAC. Also, when nodes go offline, either for planned or unplanned reasons, centralized lock management improves the rate at which data becomes available again to the application. DB2 pureScale uses remote direct memory access for inter-node communication, improving system efficiency. When you combine all of this with superior scale-out efficiency, DB2 pureScale makes for a very exciting product.
When it was first announced, DB2 pureScale was made available on IBM Power Systems (high-end Unix servers). Many people I talked with, both among analysts and clients, immediately asked when it would be available on x86-based hardware. Well, now it is available on servers ranging from the IBM System x3650 M3 to the IBM System x3850 X5.
For more information, see the DB2 pureScale Web page.