The internet buzz seems to be that Larry Ellison, CEO, Oracle Corporation and John Fowler, EVP, Sun Microsystems, Inc. will be announcing a new product, the world’s first OLTP database machine with Sun’s brand new FlashFire technology on Tuesday, September 15, 2009, 1 p.m. PT.
Both Sun and Oracle have Webcast invitations on their websites:
I plan on being at the Oracle Conference Center for the launch and will try and Tweet the highlights. First Oracle Database 11g Release 2, now an OLTP database machine. Are there more innovations up Oracle’s sleeve? I guess we’ll have to wait and see.
Even though Oracle OpenWorld 2009 is a few months away, I thought I would take a moment to mention that the Oracle Real-World Performance Group will again be hosting three sessions. Hopefully you are no stranger to our Oracle database performance sessions and this year we have what I think will be a very exciting and enlightening session: The Terabyte Hour with the Real-World Performance Group. If you are the slightest bit interested in seeing just how fast the Oracle Database Machine really is and how it can devour flat files in no time, rip through and bend data at amazing speeds, this is the session for you. All the operations will be done live for you to observe. No smoke. No mirrors. Pure Exadata performance revealed.
|Session Title:||Real-World Database Performance Roundtable|
|Session Abstract:||This session is a panel discussion including Oracle’s Real-World Performance Group and other invited performance experts. To make the hour productive, attendees need to write their questions on postcards and hand them to the panel at the beginning of the session. The questions should stick to the subject matter of real-world database performance. The panel members look forward to meeting you and answering your questions.|
|Session Title:||The Terabyte Hour with the Real-World Performance Group|
|Session Abstract:||Last year at Oracle OpenWorld, Oracle launched the Oracle Database Machine, a complete package of software, servers, and storage with the power to tackle large-scale business intelligence problems immediately and scale linearly as your data warehouse grows. In this session, Oracle’s Real-World Performance Group demonstrates how to use an Oracle Database Machine to load, transform, and query a 1-terabyte database in less than an hour. The demonstration shows techniques for exploiting full database parallelism in a simple but optimal manner.|
|Session Title:||Current Trends in Real-World Database Performance|
|Session Abstract:||The year 2009 has been an exciting one for Oracle’s Real-World Performance Group. The group has been challenged by bigger databases, new performance challenges, and now the Oracle Database Machine with Oracle Exadata Storage Server. This session focuses on some of the real-world performance ideas and solutions that have worked over the last year, including performance design philosophies, best practices, and a few tricks and tips.|
Oracle put out a press release today entitled “Customers are Choosing the Oracle Database Machine” mentioning the new Exadata and Oracle Database Machine customers. I’ve quoted a few parts of it below. Oracle cites twenty initial customers.
Initial Oracle Exadata customers including Amtrak, Allegro Group, Automobile Association of the UK, CTC, Garanti Bank, Giant Eagle, HISCOM (Hokuriku Coca Cola), KnowledgeBase Marketing, Loyalty Partner Solutions, M-Tel, MTN Group, Nagase, NS Solutions, NTT Data, OK Systems, Research in Motion, SoftBank Mobile, Screwfix, ThomsonReuters, and True Telecom, confirm the benefits Oracle Exadata products bring to their Oracle data warehouses.
“The HP Oracle Database Machine beat the competing solutions we tested on bandwidth, load rate, disk capacity, and transparency. In addition, Allegro Group saw a significant performance boost from the new data warehouse. A query that used to take 24 hours to complete now runs in less than 30 minutes on the HP Oracle Database Machine, and that’s without any manual query tuning.” — Christian Maar, CIO of Poznań, Poland-based Allegro Group “After carefully testing various options for a new data warehouse platform we chose the HP Oracle Database Machine over Netezza. Oracle Exadata was able to speed up one of our critical processes from days to minutes. The HP Oracle Database Machine will allow us to improve service levels and expand our service offerings. We also plan to consolidate our current data warehouse solutions onto the Oracle Exadata platform. This should eliminate several servers and a number of storage arrays and help reduce our operating overhead and improve margins.” — Brian Camp, Sr. VP of Infrastructure Services, KnowledgeBase Marketing “We anticipate the move of our Data Warehouse to Oracle Database 11g running on our first HP Oracle Database Machine with Oracle Exadata will deliver a substantial boost in performance and scalability, simply and easily. Our business users expect to benefit from faster access to information more quickly than ever before. The resulting agility should make a huge difference to our business.” — Andreas Berninger, Chief Operating, Loyalty Partner Solutions “The biggest technological challenge we face when we architect a database is how to create a system that performs fast with huge volumes. Oracle Exadata helps solve our performance demands. It's highly available and reliable, and it can essentially scale linearly. All of the queries we tested were faster with Oracle Exadata. The smallest performance boost we experienced was 10 times; the fastest was 72 times faster.” — Plamen Zyumbyulev, Head of Database Administration, M-Tel “A key component of RIM's manufacturing process is extensive testing of each handheld device. This testing generates large volumes of data, which is extensively analyzed by our quality and test engineers and business users to ensure RIM is producing the highest quality devices for our customers. The HP Oracle Database Machine is an ideal platform to store and analyze this data since it provides the performance, scalability and storage capacity for our requirements. It’s a cost-effective platform to meet our speed and scalability needs and is an integral component used for analysis in our manufacturing process.” — Ketan Parekh, Manager Database Systems, Research in Motion “The benchmark result of Oracle Exadata was amazing! Since it is based on Oracle Database 11g, we determined that it is compatible with other systems and the most suitable solution for our increasing data infrastructure.” — Keiichiro Shimizu, General Manager, Business Base Management Dept., Information System Div., SoftBank Mobile Corp. “Oracle Exadata is among the most successful new product introductions in Oracle's history,” said Willie Hardie, vice president of Database Product Marketing, Oracle. “Repeatedly in customer proof of concepts and benchmarks, Oracle Exadata has delivered extreme performance for customers' data warehouses.”
Oracle Corporation had its F4Q09 earnings call today and the Exadata comments started right away with the earnings press release:
“The Exadata Database Machine is well on its way to being the most successful new product launch in Oracle’s 30 year history,” said Oracle CEO Larry Ellison. “Several of Teradata’s largest customers are performance testing — then buying — Oracle Exadata Database Machines. In a recent competitive benchmark, a Teradata machine took over six hours to process a query that our Exadata Database Machine ran in less than 30 minutes. They bought Exadata.”
During the earnings call Larry Ellison discusses Exadata and the competition:
…I’m going to talk about Exadata again. I said last quarter that Exadata is shaping up to be our most exciting and successful new product introduction in Oracle’s 30 year history and [in the] last quarter Exadata continues to grow and win competitive deals in the marketplace against our three primarily competitors. It’s turning out that Teradata is our number one competitor…Netezza and IBM are kind of tied for second.
Ellison describes some of the Exadata sales from this quarter which include:
- A well-known California SmartPhone and computer manufacturer (win vs. Netezza) who commented that Exadata ran about 100 times faster in some cases then their standard Oracle environment
- Research in Motion
- A large East Coast insurance company
- Thomson Reuters
- A Japanese telco (biggest Teradata customer in Japan) who benchmarked Exadata and found it to be dramatically faster then Teradata
- Barclays Capital (UK)
- A number of banks in Western Europe and Germany
Larry Ellison follows with:
It was just a great quarter for Exadata, a product that is relatively new to the marketplace that is persuading people to move from their existing environments because Exadata is faster and the hardware costs less.
In the Q&A Larry Ellison responds to John DiFucci on Exadata:
By the way every customer I mentioned and alluded to were actual sales. Now some of these, because the Exadata product is so new, quite often will install in kind of a try and buy situation, but I can’t think of a case where we installed the machine that they didn’t buy. So we’re winning these benchmarks. Sometimes we’re beating Teradata. I think in my quote, I said we’ve beat Teradata on one of the queries by 20 to one. So we think it’s a brand new technology, we think we’re a lot faster then the competition. The benchmarks are proving out with real customer data, we’re proving to be much faster then the competition. Every single deal I mentioned were cases where the customer bought the system. There are obviously other evaluations going on and we expect the Exadata sales to accelerate.
Oracle and HP have taken back the #1 spot by setting a new performance record in the 1TB TPC-H benchmark. The HP/Oracle result puts the Oracle database ahead of both the Exasol (currently #2 & #3) and ParAccel (currently #4) results in the race for performance at the 1TB scale factor and places Oracle in the >1 million queries per hour (QphH) club, which is no small achievement. Compared to the next best result from HP/Oracle (currently #5), this result has over 9X the query throughput (1,166,976 QphH vs. 123,323 QphH) at around 1/4 the cost (5.42 USD vs. 20.54 USD) demonstrating significantly more performance for the money.
Some of the interesting bits from the hardware side:
- 4 HP BladeSystem c7000 Enclosures
- 64 HP ProLiant BL460c Servers
- 128 Quad-Core Intel Xeon X5450 “Harpertown” Processors (512 cores)
- 2TB Total System Memory (RAM)
- 6 HP Oracle Exadata Storage Servers
As you can see, this was a 64 node Oracle Real Application Cluster (RAC), each node having 2 processors (8 cores). This is also the first TPC-H benchmark from Oracle that used Exadata as the storage platform.
Congratulation to the HP/Oracle team on the great accomplishment!
A few weeks ago I read Curt Monash’s report on interpreting the results of data warehouse proofs-of-concept (POCs) and I have to say, I’m quite surprised that this topic hasn’t been covered more by analysts in the data warehousing space. I understand that analysts are not database performance engineers, but where do they think that the performance claims of 10x to 100x or more come from? Do they actually investigate these claims or just report on them? I can not say that I have ever seen any database analyst offer any technical insight into these boasts of performance. If some exist be sure to leave a comment and point me to them.
Oracle Exadata Performance Architect Kevin Closson has blogged about a 485x performance increase of Oracle Exadata vs. Oracle Exadata and his follow-up post to explain exactly where the 485x performance gain comes from gave me the nudge to finish this post that had been sitting in my drafts folder since I first read Curt’s post.
Customer Bechmarketing Claims
I thought I would compile a list of what the marketing folks at other database vendors are saying about the performance of their products. Each of these statements have been taken from the given vendor’s website.
- Netezza: 10-100 times faster than traditional solutions…but it is not uncommon to see performance differences as large as 200x to even 400x or more when compared to existing Oracle systems
- Greenplum: often 10 to 100 times faster than traditional solutions
- DATAllegro: 10-100x performance over traditional platforms
- Vertica: Performs 30x-200x faster than other solutions
- ParAccel: 20X – 200X performance gains
- EXASolution: can perform up to 100 times faster than with traditional databases
- Kognitio WX2: Tests have shown to out-perform other database / data warehouse solutions by 10-60 times
Certainly seems these vendors are a positioning themselves against traditional database solutions, whatever that means. And differences as large as 400x against Oracle? What is it exactly they are comparing?
Investigative Research On Netezza’s Performance Claims
Using my favorite Internet search engine I came across this presentation by Netezza dated October 2007. On slide 21 Netezza is comparing an NPS 8150 (112 SPU, up to 4.5 TB of user data) server to IBM DB2 UDB on a p680 with 12 CPUs (the existing solution). Not being extremely familiar with the IBM hardware mentioned, I thought I’d research to see exactly what an IBM p680 server consists of. The first link in my search results took me to here where the web page states:
The IBM eServer pSeries 680 has been withdrawn from the market, effective March 28, 2003.
Searching a bit more I came across this page which states that the 12 CPUs in the pSeries 680 are RS64 IV microprocessors. According to Wikipedia the “RS64-IV or Sstar was introduced in 2000 at 600 MHz, later increased to 750 MHz”. Given that at best, the p680 had 12 CPUs running at 750 MHz and the NPS 8150 had 112 440GX PowerPC processors I would give the compute advantage to Netezza by a significant margin. I guess it is cool to brag how your most current hardware beat up on some old used and abused server who has already been served its end-of-life notice. I found it especially intriguing that Netezza is boasting about beating out an IBM p680 server that has been end-of-lifed more than four years prior to the presentation’s date. Perhaps they don’t have any more recent bragging to do?
Going back one slide to #20 you will notice a comparison of Netezza and Oracle. Netezza clearly states they used a NPS 8250 (224 SPUs, up to 9 TB of user data) against Oracle 10g RAC running on Sun/EMC. Well ok…Sun/EMC what??? Obviously there were at least 2 Sun servers, since Oracle 10g RAC is involved, but they don’t mention the server models at all, nor the storage, nor the storage connectivity to the hosts. Was this two or more Sun Netra X1s or what??? Netezza boasts a 449x improvement in a “direct comparison on one day’s worth of data”. What exactly is being compared is up to the imagination. I guess this could be one query or many queries, but the marketeers intentionally fail to mention. They don’t even mention the data set size being compared. Given that Netezza can read data off the 224 drives at 60-70 MB/s, the NPS 8250 has a total scan rate of over 13 GB/s. I can tell you first hand that there are very few Sun/EMC solutions that are configured to support 13 GB/s of I/O bandwidth. Most configurations of that vintage probably don’t support 1/10th of that I/O bandwidth (1.3 GB/s).
Here are a few more comparisons that I have seen in Netezza presentations:
- NPS 8100 (112 SPUs/4.5 TB max) vs. SAS on Sun E5500/6 CPUs/6GB RAM
- NPS 8100 (112 SPUs/4.5 TB max) vs. Oracle 8i on Sun E6500/12 CPUs/8 GB RAM
- NPS 8400 (448 SPUs/18 TB max) vs. Oracle on Sun (exact hardware not mentioned)
- NPS 8100 (112 SPUs/4.5 TB max) vs. IBM SP2 (database not mentioned)
- NPS 8150z (112 SPUs/5.5 TB max) vs. Oracle 9i on Sun/8 CPUs
- NPS 8250z (224 SPUs/11 TB max) vs. Oracle 9i on Sun/8 CPUs
As you can see, Netezza has a way of finding the oldest hardware around and then comparing it to its latest, greatest NPS. Just like Netezza slogan, [The Power to ]Question Everything™, I suggest you question these benchmarketing reports. Database software is only as capable as the hardware it runs on and when Netezza targets the worst performing and oldest systems out there, they are bound to get some good marketing numbers. If they compete against the latest, greatest database software running on the latest, greatest hardware, sized competitively for the NPS being used, the results are drastically different. I can vouch for that one first hand having done several POCs against Netezza.
One Benchmarketing Claim To Rule Them All
Now, one of my favorite benchmarketing reports is one from Vertica. Michael Stonebraker’s blog post on customer benchmarks contains the following table:
Take a good look at the Query 2 results. Vertica takes a query running in the current row store from running in 4.5 hours (16,200 seconds) to 1 second for a performance gain of 16,200x. Great googly moogly batman, that is reaching ludicrous speed. Heck, who needs 100x or 400x when you do 16,200x. That surely warrants an explanation of the techniques involved there. It’s much, much more than simply column store vs. row store. It does raise the question (at least to me): why Vertica doesn’t run every query in 1 second. I mean, come on, why doesn’t that 19 minute row store query score better than a 30x gain? Obviously there is a bit of the magic pixie dust going on here with, what I would refer to as “creative solutions” (in reality it is likely just a very well designed projection/materaizied view, but by showing the query and telling us how it was possible would make it less unimpressive [sic]).
What Is Really Going On Here
First of all, you will notice that not one of these benchmarketing claims is against a vendor run system. Each and every one of these claims are against existing customer systems. The main reason for this is that most vendors prohibit benchmark results being published with out prior consent from the vendor in the licensing agreement. Seems the creative types have found that taking the numbers from the existing, production system is not prohibited in the license agreement so they compare that to their latest, greatest hardware/software and execute or supervise the execution of a benchmark on their solution. Obviously this is a one sided apples to bicycles comparison, but quite favorable for bragging rights for the new guy.
I’ve been doing customer benchmarks and proof of concepts (POCs) for almost 5 years at Oracle. I can guarantee you that Netezza has never even come close to getting 10x-100x the performance over Oracle running on a competitive hardware platform. Now I can say that it is not uncommon for Oracle running on a balanced system to perform 10x to 1000x (ok, in extreme cases) over an existing poorly performing Oracle system. All it takes is to have a very unbalanced system with no I/O bandwidth, not be using parallel query, not use compression, poor or no use of partitioning and you have created a springboard for any vendor to look good.
One More Juicy Marketing Tidbit
While searching the Internet for creative marketing reports I have to admit that the crew at ParAccel probably takes the cake (and not in an impressive way). On one of their web pages they have these bullet points (plus a few more uninteresting ones):
- All operations are done in parallel (A non-parallel DBMS must scan all of the data sequentially)
- Adaptive compression makes disks faster…
Ok, so I can kinda, sorta see the point that a non-parallel DBMS must do something sequentially…not sure how else it would do it, but then again, I don’t know any enterprise database that is not capable of parallel operations. However, I’m going to need a bit of help on the second point there…how exactly does compression make disks faster? Disks are disks. Whether or not compression is involved has nothing to do with how fast a disk is. Perhaps they mean that compression can increase the logical read rate from a disk given that compression allows more data to be stored in the same “space” on the disk, but that clearly is not what they have written. Reminds me of DATAllegro’s faster-than-wirespeed claims on scan performance. Perhaps these marketing guys should have their numbers and wording validated by some engineers.
Do You Believe In Magic Or Word Games?
Creditable performance claims need to be accounted for and explained. Neil Raden from Hired Brains Research offers guidance for evaluating benchmarks and interpreting market messaging in his paper, Questions to Ask a Data Warehouse Appliance Vendor. I think Neil shares the same opinion of these silly benchmarketing claims. Give his paper a read.