Oracle Exadata: In Response to Chuck Hollis

Chuck Hollis, VP and Global Marketing CTO at EMC has written a couple blog posts offering his thoughts on Oracle Exadata. The first was “Oracle Does Hardware” which he wrote the day after the product launch. The second, unimpressively titled “I Annoy Kevin Closson at Oracle” was on Monday October 20th which was in response to a blog post by Exadata Performance Architect, Kevin Closson who commented on Chuck’s first post and some comments left on Kevin’s blog.

Clearly Stated Intentions

Since Chuck had disabled comments for his “I Annoy Kevin” post, I’m going to write my comments here. I have no intention to get into some fact-less debate turn flame, but I will make some direct comments with supporting facts and numbers while keeping it professional.

Storage Arrays: Bottleneck or Not?

Chuck thinks:

“…array-based storage technology is not the bottleneck; our work with Oracle [on the Oracle Optimized Warehouse Initiative] and other DW/BI environments routinely shows that we can feed data to a server just as fast as it can take it.”

First let me comment on the Optimized Warehouse Initiative. There have been some good things that have come out of this effort. I believe it has increased the level of awareness when it comes to sizing storage for BI/DW workloads. All too often storage sizing for BI/DW is done by capacity, not I/O bandwidth. The focus is on building balanced systems: systems that can execute queries and workloads such that no one component (CPU/storage connectivity/disk array/disk drives) becomes the bottleneck prematurely. The industry seems to agree: IBM has the Balanced Warehouse and Microsoft has a reference architecture for Project Madison as well.

So the question comes back to: Is array-based storage technology the bottleneck or not? I would argue it is. Perhaps I would use a word other than “bottleneck”, but let’s be clear on the overall challenge here. That is: to read data off disk with speed and efficiently return it to the database host to process it as fast as possible.

Let’s start at the bottom of the stack: hard disk drives. If the challenge is to scan lots of data fast, then how fast data can be read off disk is the first important metric to consider. In the white paper Deploying EMC CLARiiON CX4-960 for Data Warehouse/Decision Support System (DSS) Workloads EMC reports a drive scan rate (for a BI/DW workload) of 20 MB/s using 8+1 RAID-5 and 33 MB/s using a 2+1 RAID-5 LUN configuration. Oracle Exadata delivers drive scan rates around 85 MB/s, a difference of 2.5X to 4.25X. To understand the performance impact of this I’ve put together a few tables of data based on these real workload numbers.

Hardware Specs and Numbers for Data Warehouse Workloads

Storage RAID Raw:Usable Ratio Disk Drives Disk Scan Rate
EMC CX4-960
8+1 RAID 5
9:8
146 GB FC 15k RPM
20 MB/s
EMC CX4-960
2+1 RAID 5
3:2
146 GB FC 15k RPM
33 MB/s
EMC CX4-960
8+1 RAID 5
9:8
300 GB FC 15k RPM
20 MB/s
EMC CX4-960
2+1 RAID 5
3:2
300 GB FC 15k RPM
33 MB/s
Oracle Exadata
ASM Mirroring
2:1
450 GB SAS 15k RPM
85 MB/s

Sizing By Capacity

Storage RAID Total Usable Space Disk Drive Number of Drives Total Scan Rate
EMC CX4-960
8+1 RAID 5
18 TB
146 GB
139
2.8 GB/s
EMC CX4-960
2+1 RAID 5
18 TB
146 GB
185
6.1 GB/s*
EMC CX4-960
8+1 RAID 5
18 TB
300 GB
68
1.4 GB/s
EMC CX4-960
2+1 RAID 5
18 TB
300 GB
90
3.0 GB/s
Oracle Exadata
ASM Mirroring
18 TB
450 GB
80
6.8 GB/s

* I’m not sure that the CX4-960 array head is capable of 6.1 GB/s so it likley takes at least 2 CX4-960 array heads to deliver this throughput to the host(s).

Sizing By Scan Rate

Storage RAID Total Scan Rate Disk Drive Number of Drives Total Usable Space
EMC CX4-960
8+1 RAID 5
3.00 GB/s
146 GB
150
19.46 TB
EMC CX4-960
2+1 RAID 5
3.00 GB/s
146 GB
90
8.76 TB
EMC CX4-960
8+1 RAID 5
3.00 GB/s
300 GB
150
40.00 TB
EMC CX4-960
2+1 RAID 5
3.00 GB/s
300 GB
90
18.00 TB
Oracle Exadata
ASM Mirroring
3.00 GB/s
450 GB
36
8.10 TB

A Few Comments On The Above Data Points

Please note that “Total Usable Space” is a rough number for the total protected disk space one can use for a database if you filled each drive up to capacity. It does not take into consideration things like loss for formatting, space for sort/temp, etc, etc. I would use a 60% rule for estimating data space for database vs. total usable space. This means that 18 TB of total usable space would equate to 10 TB (max) of space for database data (compression not accounted for).

I’d also like to note that in the Sizing By Capacity table the “Total Scan Rate” is a disk only calculation. Whether or not a single CX4-960 array head can move data at that rate is in question. Based on the numbers in the EMC whitepaper it would appear CX4-960 head is capable of 3 GB/s but I would question if it is capable of much more than that, hence the reason for the asterisk(*).

Looking At The Numbers

If you look at the number for Sizing By Capacity, you can see that for the given fixed size, Exadata provides the fastest scan rate while using only 80 disk drives. The next closest scan rate is just 700 MB/s less but it uses 105 more disk drives (80 vs. 185). Quite a big difference.

When it comes to delivering I/O bandwidth, Exadata clearly stands out. Targeting a scan rate of 3 GB/s, Exadata delivers this using only 36 drives, just 3 Exadata Storage Servers. If one wanted to deliver this scan rate with the CX4 it would take 2.5X as many drives (90 vs. 36) using 2+1 RAID 5.

So are storage arrays the bottleneck? You can draw your own conclusions, but I think the numbers speak to the performance advantage with Oracle Exadata when it comes to delivering I/O bandwidth and fast scan rates. Consider this: What would the storage topology look like if you wanted to deliver a scan rate of 74 GB/s as we did for Oracle OpenWorld with 84 HP Oracle Exadata Storage Servers (6 HP Oracle Database Machines)? Honestly I would struggle to think where I would put the 185 or so 4Gb HBAs to achieve that.

Space Saving RAID or Wasteful Mirroring

This leads me to another comment by Chuck in his second post:

“[with Exadata] The disk is mirrored, no support of any space-saving RAID options — strange, for such a large machine”

And this one in his first post:

“If it were me, I’d want a RAID 5 (or 6) option.”

And his comment on Kevin’s blog:

“The fixed ratio of 12 disks (6 usable) per server element strikes us as a bit wasteful….And, I know this only matters to storage people, but there’s the minor matter of having two copies of everything, rather than the more efficient parity RAID approaches. Gets your attention when you’re talking 10-40TB usable, it does.”

Currently Exadata uses ASM mirroring for fault tolerance so there is a 2:1 ratio of raw disk to usable disk, however I don’t think it matters much. The logic behind that comment is that when one is sizing for a given scan rate, Exadata uses less spindles than the other configurations even though the disk protection is mirroring and not space-saving RAID 5. I guess I think it is strange to worry about space savings when disks just keep getting bigger and many are keeping the same performance characteristics as their predecessors. Space is cheap. Spindles are expensive. When one builds a configuration that satisfies the I/O scan rate requirement, chances are you have well exceeded the storage capacity requirement, even when using mirroring.

Perhaps Chuck likes space-saving RAID 5, but I think using less drives (0.4 as many, 36 vs. 90) to deliver the same scan rate is hardly wasteful. You know what really gets my attention? Having 40 TB of total usable space on 15 HP Oracle Exadata Storage Servers (180 450GB SAS drives) and being able to scan it at 15 GB/s compared to say having a CX4 with 200 drives @ 300GB using 2+1 R5 and only being able to scan them at 6.6 GB/s. I’d also be willing to bet that would require at least 2 if not 3 CX4-960 array heads and at least 30 4Gb HBAs running at wire speed (400 MB/s).

Exadata Is Smart Storage

Chuck comments:

“Leaving hardware issues aside, how much of the software functionality shown here is available on generic servers, operating systems and storage that Oracle supports today? I was under the impression that most of this great stuff was native to Oracle products, and not a function of specific tin …

If the Exadata product has unique and/or specialized Oracle logic, well, that’s a different case.”

After reading that I would said Chuck has not read the Technical Overview of the HP Oracle Exadata Storage Server. Not only does Exadata have a very fast scan rate, it has intelligence. A combination of brawn and brains which is not available with other storage platforms. The Oracle Exadata Storage Server Software (say that 5 times fast!!!) is not an Oracle database. It is storage software not database software. The intelligence and specialized logic is that Exadata Smart Scans return only the relevant rows and columns of a query, allowing for better use of I/O bandwidth and increased database performance because the database host(s) are not issuing I/O requests for data that is not needed for the query and then processing it post-fact. There are a couple slides (18 & 19) referencing a simple example of the benifits of Smart Scans in the HP Oracle Exadata Storage Server technical overview slide deck. It is worth the read.

It Will Be Interesting Indeed

Chuck concludes his second post with:

“The real focus here should be software, not hardware.”

Personally I think the focus should be on solutions that perform and scale and I think the HP Oracle Exadata Storage Server is a great solution for Oracle data warehouses that require large amounts of I/O bandwidth.

Ending On A Good Note

While many comments by Chuck do not seem to be well researched I would comment that having a conventional mid-range storage array that can deliver 3 GB/s is not a bad thing at all. I’ve seen many Oracle customers that have only a fraction of that and there are probably some small data warehouses out there that may run fine with 3 GB/s of I/O bandwidth. However, I think that those would run even faster with Oracle Exadata and I’ve never had a customer complain about queries running too fast.

15 comments

  1. Chuck Hollis

    Hi

    Thanks for the commentary.

    A small correction to your logic — when we do scale out with larger DW, we use multiple smaller arrays, rather than a single big one. In both the Exadata and external disk array approach, both deliver linear scaling of bandwidth.

    It then gets down to which does the best job for the money, yes?

    Both Exadata and external arrays support RAID 1 configurations if needed. My argument about space-saving approaches is that customers should have a choice as to their storage requirements, something not offered with Exadata.

    Additionally, there’s only one choice of storage medium on the Exadata. No FC in different sizes and speeds, no choice of SATA drives, no enteprise flash — just a single disk choice, period.

    That means no potential for tiering, saving money, archival versions of the database.

    Look, guys, the economic picture isn’t exactly rosy out there, is it? And, trust me, customers will be looking to make choices to save some money here and there. At the very least, you should offer them the choice.

    And don’t get me started with things like the need for snaps, etc. in these environments.

    As far a the ‘brain vs. brawn’ handwaving, since you’re running your software on a generic Linux environment on commodity x64 servers, why all the fuss? I’m having a hard time believing that any “secret software sauce” can only run on a specific flavor of obviously generic components.

    The customers who I talk to who really, really need speed from their DW environment have already moved on to specialists in the field, so I wonder where the Exadata fits in?

    It doesn’t appear to be fastest, nor the cheapest, nor the most functional, nor the most highly-available solution out there. It does, however, run Oracle software.

    Good luck out there, guys, defending this beast.

    It should be interesting to see how it all turns out in a year or so.

    – Chuck

  2. Chuck Hollis

    Oh, BTW, my bad on the comments.

    TypePad switched me to a new platform, and changed my defaults, and I wasn’t paying attention.

    Problem fixed now — thanks!

  3. Kevin Closson

    “Additionally, there’s only one choice of storage medium on the Exadata. No FC in different sizes and speeds, no choice of SATA drives, no enteprise flash — just a single disk choice, period.”

    Chuck,

    Are you recommending we consider offering SATA and Fibre Channel?

  4. Greg Rahn

    @Chuck

    Let me correct your small correction:
    There are no statements in that post that say EMC can not scale out with multiple smaller arrays. In fact, I have done so with projects already. What I am saying is it takes much more from both a storage tier and database tier to deliver the same I/O throughput rates, not to mention it takes more connectivity between the database tier and the storage tier because all the data has to travel back to the database hosts. So yes, it certainly can be done and yes, it will certainly take more hardware to do it.

    Let’s toss out a scan rate number: 15GB/s. That is 15 HP Oracle Exadata Storage Servers using 180 SAS drives or 5 CX4-960 arrays each with 6 DAEs for a total of 450 FC drives with 2+1 R5. Sound right? In terms of rack units it is 30U for Exadata or 120U for the CX4-960 arrays, so the CX4-960 arrays take 4X the data center space. So again, it certainly can be done.

    Now I have to ding you on your comment of “no choice of SATA drives”. There are 1TB SATA drives offered. Please read the Oracle Exadata datasheet. And BTW, Exadata can scan those at twice the rate of the CX4 2+1 R5 15K FC drives. So yes, there is “potential for tiering, saving money, archival versions of the database”.

    The phrase is “brains and brawn” – and you get both with Oracle Exadata. The Oracle Exadata Software does run on Oracle Enterprise Linux on x64 commodity servers, currently the HP ProLiant DL180 G5 Storage Server. The DL180 is used because it has a good balance of disk bandwidth and CPU power so customers can get the best results from it. I see no reason that should be looked on as a negative. Apple’s OS X software runs only on the hardware that Apple sells even though it is the exact same industry standard hardware that Windows runs on, and in fact, does. Being a MacBook Pro owner, I have never once thought of it is a negative that OS X only runs on Apple hardware. I’ve only thought to myself: Why didn’t I switch to OS X sooner.

  5. Pingback: Log Buffer #121: a Carnival of the Vanities for DBAs
  6. Chuck Hollis

    So, by now, I’m guessing you’ve hit the brick wall of customer skepticism, near as I can tell.

    You may have the hearts and minds of a subset of the Oracle DBA world, but this approach is not winning any fans with the crew that’s responsible for the infrastructure.

    With the customers I’ve spoken to, it’s boiled down to two key objections:

    1 — Horribly inefficient use of hardware resources, which translates into multiple costs: acquisition costs, operational costs, etc. You can discount the upfront costs, but I doubt that you can cover maintenance, floorspace, power, cooling, etc.

    2 — Can’t be operationalized in a consistent fashion. IT infrastructure guys like to have one way of doing things: management, backup, DR, provisioning, etc. With this approach, you’re foisting yet another stovepipe on them, and they don’t like it. They’d much prefer to have standard infrastructure for DW/BI that they can manage like the rest of their landscape.

    If you could somehow make enough of a case that the benefits outweigh the pains, maybe you’d stand a chance.

    But you haven’t — most of what’s out there is hand-waving. And I see customer after customer tell me they’ve taken a close look at this approach, and said “no thanks”.

    I’m sure a few unlucky souls will start down this path, and we’ll be reading about these few in breathless press releases. But, right now, it looks like this one has stalled on the launch pad, so to speak.

    Best regards!

    – Chuck

  7. Greg Rahn

    Welcome back Chuck. I see you have brought nothing but marketing hot air and propaganda to the discussion. Spreading the fear, uncertainty and doubt (FUD) about a product that you have apparently read little about.

    I find it a bit entertaining that you have mentioned that Exadata is a “horribly inefficient use of hardware resources” and that floorspace can not be discounted. On what technical information is that based on? Given that Exadata scans HDDs 2.5x faster (85MB/s vs. 33MB/s) than an EMC CX4 for a db workload, it would seem that a CX4 is “horribly inefficient”. But let’s just say for arguments sake that a CX4 was driven to scan HDDs at 85MB/s – the same rate that Oracle Exadata scans HDDs. If we compare storage that can be scanned at 3GB/s (based on the aforementioned EMC paper the CX4-960 array head is capable of this) one could use 3 HP Oracle Exadata Storage Servers containing a total of 36 HDDs and 6 Quad-Core Intel Xeon processors taking up 6 rack units or a CX4-960 array head with 4 Quad-Core Intel processors (2 in each SP/server processor) and 3 DAEs with a total of 36 HDDs taking up 15 rack units (2U SPS, 4U SPE and 3U for each DAE). So even when we assume the same number of HDDs and the same scan rate, the EMC CX4 solution takes up much more space (9U more). Looks like Exadata is more efficient and takes up less space by the numbers. BTW Chuck, if you (or EMC engineers) have performance numbers and data that the EMC CX4-960 array head can push more than 3GB/s, do let me know. It’s a good move up from the 1.2GB/s that the CX3 array heads max out at which is not even enough to max out the 4x4Gb FCP its plumbed to the hosts with.

    There probably will be some IT infrastructure guys (SAN admins) will balk at Exadata, just in the same way that they balked at ASM, but I think that the business people will be very pleased when their queries run 10x to 100x faster (or more) on Exadata and the SAN admins will eventually come around. Business has this way of driving technology. Just by pure data bandwidth capacity a single HP Oracle Database Machine probably has more than 4x as much (14GB/s) as I/O bandwidth the most SAN installations out there (to the DW database grid or host). I havent seen too many SAN configurations where they push more than 2GB/s let alone 3.5GB/s. If there are such customers, feel free to showcase them. May I also suggest you read the beta customers reports about their 1/2 HP Oracle Database Machine experiences. Two of them use EMC storage today. With only 6 HP Oracle Exadata Storage Servers they are seeing up to 72x performance gains. Hard to argue with those numbers.

    If you come across any customers that are doubtful, be sure tell them to see their Oracle account rep about doing a POC on Exadata, but warn them – it may be a bit like driving the Ariel Atom (2m:50s) compared to their current setup which is probably connected to a SAN.

  8. Chuck Hollis

    Hi — I’m having our engineering team get precise numbers as to actual scan rates achievable with current CX4 technology. They laughed at your 33MB number — shouldn’t let software people get near hardware, they said …

    I said “play fair” and don’t include flash drives. I’ll be back to you soon with the data.

    Oh, BTW, just be aware that the scan rates are very different for Oracle as compared with, say, something that’s optimized for the role, such as Vertica, Greenplum, ParAccel, DatAllegro et. al. Oracle keeps getting server-limited, the others can go much father/faster.

    Be back soon with the goodies!

    – Chuck

  9. Chuck Hollis

    Go re-read your white paper again.

    The passage you obviously picked up on was a theoretical discussion OF A SINGLE DRIVE, not an array.

    To make matters even more interesting, you then do all this nice math on a bogus misinterpretation. Utter rubbish.

    Stick to software, guys …

  10. Greg Rahn

    The 33MB/s number comes from an EMC white paper (page 14):

    For the thin LUN (two-way striped), it takes 3 ms to move the head, and 12 ms to move the 512 KB worth of bits. The net MB/s from that drive is therefore 33 MB/s (1000 x 512 KB /(3+12)).

    It is not my number. If I have taken it out of context, do provide technical clarity.

    If my math is incorrect, do provide the correct equations and calculations, assumptions, etc. None of the array spec sheets have a max observable I/O bandwidth number. They used to contain the max throughput from cache numbers, which was a great marketing number, but a horrible engineering number.

    Based on my information DATAllegro v3 was able to max out a CX3 head at 1200MB/s with just 12 SATA drives (plus 1 spare). Greenplum’s numbers might be interesting, but Vertica and ParAccel are probably less interesting as they are more memory based column databases where drive scan throughput is probably much less.

  11. Greg Rahn

    P.S. Flash drives will have no bearing on how much data the array head can deliver to a host. The SPs still have to move the data no matter what the media type.

  12. Chuck Hollis

    Hi. You misread the whitepaper. You can see by reading it that is was a example discussion to discuss how to do the math, not an actual numbers.

    Go read it carefully, please. I’ll be waiting for your comments …

    So, here are some ACTUAL NUMBERS you may want to go chew on:

    A given disk drive sitting in a CX (or any other well-designed storage array) will deliver between 85MB/sec (SATA) and up to 150MB/sec (15k FC). Flash, considerably more, but that’s a separate discussion.

    Add drives into a controller until you hit the bandwidth limit of the controller.

    Here are some rules of thumb to help you:

    AX4-5 is ~800MB/sec
    CX4-240 is ~1.1GB/sec
    CX4-960 is ~2.8-3GB/sec

    Now, let me please point out — this is without the overhead of disk mirroring (mandatory in your approach), or being forced to use SATA drives, or only being able to get 6 usable drives per server. Maybe you can deliver great performance, but at what cost?

    So, mentally I’m seeing the closest substitute here in the EMC portfolio being an AX4-5 with about 12 SATA drives (let’s add two for parity, shall we?) — so 14 drives delivering approx. ~800MB/sec.

    Your approach would require either two bricks with 12 each to equate usable capacity (wasting 12 takeway 2 or 10 drives in the process), and the cost of two beefy HP servers as compared to a single low-cost AX4-5 controller. We’re talking several thousands of dollars difference at the “brick” level — and, by the time we get really big, we’re talking hundreds of thousands of dollars difference. Not to mention power, cooling, etc.

    I’d suggest take a look at what one of these looks like — it sits in a rack just like your server-based storage brick.

    Can your two mirrored bricks deliver ~800MB/sec sustained sequential throughput? I’m just assuming that’s the case, because — if not — your approach gets even more expensive. Hint: your bottleneck will either be the HP I/O card in the server, or the IB connection.

    Dude, it’s not even a fair fight.

  13. Greg Rahn

    I think I know where the paths may have diverged…

    I initially wrote:

    In the white paper Deploying EMC CLARiiON CX4-960 for Data Warehouse/Decision Support System (DSS) Workloads EMC reports a drive scan rate [for a BI/DW workload] of 20 MB/s using 8+1 RAID-5 and 33 MB/s using a 2+1 RAID-5 LUN configuration. Oracle Exadata delivers drive scan rates around 85 MB/s, a difference of 2.5X to 4.25X.

    I was also (correctly) asserting that the CX4-960 head was only capable of 3GB/s. Note this passage in that same paper:

    Ninety drives of the 300 GB organized as “thin RAID” will deliver in theory 3000 MB/s (90 x 33.3 MB/s)

    Now, it seems way too coincidental that the array head has a max throughput of 3GB/s and that is exactly the number worked out. The 33MB/s number would appear to be an observed number (by someone), but I also do recognize that FCAL drives are capable of more, but for whatever reason (a discussion for another day) the workload didn’t exhibit higher rates.

    On to the hardware comparison…
    Each HP Oracle Exadata Storage Server can scan the 12 SAS drives at 1GB/s (technically it’s just a bit more, but 1GB is a nice round marketing number), so 25% more than the AX4-5 can do. Exadata scans the SATA drives at 750 MB/s. The IB can easily deliver that data to the database grid at those rates as well, however, in nearly all cases the data sent to the database grid is much less. Exadata only sends rows and columns required for the query. So unless you query every column and every row, the data sent will be much, much less than what is read from disk. This dramatically reduces the amounts of data sent from the storage (to the DB grid) and number of rows that the DB grid has to deal with, so compared to FC attached storage, less DB CPU is required to do the same work. This is where “smart scans” and “query processing closer to the data” comes into play and is what differentiates Exadata from other storage.

  14. Pingback: StorageRap: Props to my Sto-blog homeys
  15. Florin M.

    Hi Guys,

    Well I have to tell you that this is a very important post for both sides 
    In my opinion :
    - Oracle is very good in database technologies (better the EMC)
    - EMC is very good in storage technologies (better then Oracle)
    But let’s see the all the picture: Oracle Exadata is a solution for Oracle software suites and according to my understanding with better usage results for Oracle Databases performance rather then using “traditional” storages like CX4, DMX3 or from IBM DS5300 or DS8300.
    So where is the issue ? I can’t use the Exadata for my File Server, I can’t use the Exadata to boot RedHat Linux from this etc. Currently Exadata is a response to the Oracle market (big market by the way) related to better performance and low price without too much noise related to tuning the Storage, OS, Database in order to get performance.
    Maybe EMC can get better results with a specific setup, but in order to get those results they are spending 5 – 15 days or more for tune all the system (storage, fc switches, os, database etc). And maybe a similar result I can have-it with 2-3 days installation of Exadata – more economically.
    The performance & reliability of the application is what everyone is looking at the end. And more and more for lower prices.

    So what is the storage?
    - Hardware and
    - Software
    In my perspective the software is the one giving more value to a storage right now – 2009 (staring from controllers os, replication, snaps etc).
    Coming back to Exadata I didn’t see any documents related with Disaster Recovery, Performance Tools, Update mechanism, security etc – where I think Oracle Exadata haze a very big gap against competition.

    Let’s see what will be next XIV ? – what IBM will do with this, what EMC will develop related with the better understanding of the data store in their disk systems etc.

    Best regards,
    Florin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s