Industry Benchmark Result for DB2 pureScale: SAP Transaction Banking (TRBK) Benchmark
A couple of years ago, IBM introduced the pureScale feature, which provides application cluster transparency (allowing you to create shared-disk database clusters). At the time, IBM had taken their industry-leading clustering architecture from the mainframe, and brought it to Unix environments. IBM subsequently also brought it to Linux environments.
Today, IBM announced its first public industry benchmark result for this cluster technology. IBM achieved a record result for the SAP Transaction Banking (TRBK) Benchmark, processing more than 56 million posting transactions per hour and more than 22 million balanced accounts per hour. The results were achieved using IBM DB2® 9.7 on SUSE Linux® Enterprise Server. The cluster contained five IBM System x 3690 X5 database servers, and used the IBM System Storage® DS8800 disk system. The servers were configured to take over workload in case of a single system failure, thereby supporting high application availability. For more details, see the official certification from SAP.

It interesting, but why not using standart 3-tier SAP SD benchmark ?
krids
September 13, 2011 at 2:39 am
Hi krids,
This was more of an “opportunistic” benchmark run, than a strategically planned benchmark run. In other words, we were performing a Proof of Concept (PoC) in conjunction with a deal cycle at a major bank. That bank wanted the PoC to use the framework for the SAP Transaction Banking (TRBK) Benchmark. So in this case we decided to perform an official benchmark run, rather than just a private PoC. As I said, we had to do this for other reasons, and decided that it is a good “opportunity” to get an official industry benchmark out there for pureScale.
Regards,
Conor.
Conor O'Mahony
September 13, 2011 at 7:13 am
CF: 75%, Member 0: 26%, Member 1: 26%, Member 2: 26%, Member 3: 25%
75% – catastrofic RDMA usage. is there any detail about nodes interconnect and trafic ? it is Infiniband ?
Triffids
September 16, 2011 at 11:06 am
Hi Triffids, not catastrophic at all actually. In fact quite the opposite. Folks that have been using DB2 on z/OS will recognize that having the CF not using up all the CPU it has available actually means there is plenty of room for more work. What is reported in the SAP results is the CPU usage as measured by the OS (think vmstat) but that’s not representative of the actual work being performed by the CF threads. The reason is that (just like on the mainframe) the CF in pureScale has threads that are constantly polling for work to do. If they find no work they continue to poll for work. This keeps the CF looking busy but in fact there is lots of cycles available inside the thread to do real work. Think of it this way..If I order a pizza and then go to my front door and open the door to see if he is there (find no pizza guy), close the door, open the door to see if he is there (not yet), close the door, open the door, etc. etc. then someone might think I’m really busy. Why not just wait for the doorbell to ring? Well I’m certainly using up more energy to open and close the door to see if there is anyone there, but I’m guaranteed that when he does arrive at my door, I’ll take the pizza faster than if I was sitting on my couch waiting for the doorbell to ring (i.e. in computer terms, sleeping waiting for an interrupt to arrive). So this is actually a very fast way to ensure optimal communication between the member and the CF. It makes the CF look busy from the outside but it’s really not and has lots of more room to grow (and do more real work). And inside DB2 there are monitors that you can see how much real work is going on compared to “busy work” so you know how to size your CF (these numbers are not reported as part of the SAP benchmark process).
Regards,
Chris
Chris Eaton
September 19, 2011 at 4:16 pm
Strange post. We all know that pureScale use uDAPL to transfer data blocks. There are no loops in uDAPL concepts …
You can check it in the test, there are “Part I: Day Processing” with much less CPU utilization on CF node.
Triffids
September 19, 2011 at 5:56 pm
Triffids, not sure what you are refering to here but the CF threads run in a tight loop looking for work. Here is an old but still relevant article on how the CF works on the mainframe (and remember pureScale uses the same architecture as DB2 for z/OS data sharing). Just change the word engine to CPU and you can go from mainframe speak to LUW speak
http://www.redbooks.ibm.com/abstracts/TIPS0237.html
Note the paragraph:
However, unlike z/OS, which is interrupt-driven and therefore gives up a shared engine when it is has no work to process, Coupling Facility Control Code (CFCC) runs in a polling loop, constantly looking to see if new work has arrived. Because CF response times are so short (typically around 100 or more times faster than DASD), running in this fashion allows the CF to deliver faster response times than if it was interrupt-driven. As a result, a CF LP will use as much of a shared engine as PR/SM will allow, even if it has no work to do.
So that’s why the CF looks much busier than it really is in most cases because the polling loop uses CPU cycles doing “no-ops” really so there is lots of room to still do more real work. There is lots of literature out there on how the CF works since it’s been around for so long.
Note also that it is quite reasonable for the CF to be busier during the Night processing for this benchmark since the night processing is more update intensive I believe (doing account balance activity) but there is still plenty of room left. Also note that the rules of the SAP TRBK benchmark require that the night processing not finish too quickly (I won’t go into details here) but suffice it to say that we had to artificially slow DB2 down during the night processing so that it would meet the benchmark requirements and not finish too quickly (obviously not what you would do in the real world).
If you would like some facts on why the CF is so much better in terms of scalability and availability compared to distributed locking and distributed caching like in RAC, feel free Triffids to reach out to me directly and I can show you why this architecture is superior in many ways. You can easily find ways to contact me by google’ing Chris Eaton DB2 or by going to idug.org and downloading any of my presentations from there (they have my email on the front page usually).
Chris Eaton
September 19, 2011 at 10:40 pm
There is also an option to disable polling in the mainframe coupling facility (CF) if desired.
Timothy Sipples
October 3, 2011 at 10:46 pm