Partway Researched With A Chance Of FUD

I tend to keep the content of this blog fairly technical and engineering focused, but every now and then I have to venture off and do an editorial post.  Recently some of the ParAccel management decided to fire up the FUD machine on the ParAccel blog and take aim at Oracle’s Exadata making the following claims:

“There are 12 SAS disks in the storage server with a speed of about 75 MB/s [The SUN Oracle Exadata Storage Server datasheet claims 125 MB/s but we think that is far-fetched.]” -Rick Glick, Vice President of Technology and Architecture (link)

“We stand by the 75MB/sec as a conservative, reliable number. We see higher numbers in disk tests, but never anywhere near 125MB/sec.” -Barry Zane, Chief Technology Officer (link)

Far Fetched Or Fact?

As a database performance engineer, I strive to be extremely detailed and well researched with my work. Clearly, these comments from Rick and Barry were not well researched as is evident from information publicly available on the Internet.

The first bit of documentation I would research before making such comments would be the hard disk drive specification sheet. The 12 drives in the Exadata Storage Server, a Sun Fire X4275, are 3.5-inch 15K RPM SAS 2.0 6Gb/sec 600GB drives. Looking at the drive spec sheet, it clearly states that the sustained sequential read is 122 MB/sec (at ID) to 204 MB/sec (at OD) [that’s Inner Diameter & Outer Diameter]. Seems to me that Oracle’s claim of 1500MB/s per Exadata Storage Server (125MB/s for each of the 12 SAS drives) is certainly between 122MB/s and 204MB/s.

Now granted, one might think that vendors overstate their performance claims, so it may be resourceful to search the Internet for some third party evaluation of this hard disk. I went to a fairly well known Internet search engine to try find more information using a highly sophisticated and complex set of search keywords.  To my astonishment, there at the top of the search results page was a write up by a third party. I would encourage reading the entire article, but if you want to just skip to page 5 [Benchmarks – HD Tune Pro] you will be presented with data that shows the minimum (120MB/s), average (167MB/s) and maximum (200MB/s) read throughput for sequential read tests performed by the author for the hard disk drive in dispute. Looks to me that those numbers are completely in line with the Sun spec sheet – no over exaggeration going on here. At this point there should be exactly zero doubt that the drives themselves, with the proper SAS controller, are easily physically capable of 125MB/s read rates and more.

Stand By Or Sit Down?

Interestingly enough, after both I comment and Kevin Closson comment, calling out this ill researched assertion on the physics of HDDs, Barry Zane then responds:

As I see it, there are three possibilities:

  1. Disk vendors are overly optimistic in their continuous sequential read rates.
  2. The newer class of SAS2 compatible 15Krpm drives and controllers are faster than the older generation we’ve measured.
  3. Our disk access patterns are not getting all the available performance.

Let’s drill into each of these possibilities:

  1. Perhaps vendors are overly optimistic, but how overly optimistic could they possibly be? I mean, really, 125MB/s is easily between the spec sheet rates of 122MB/s and 204MB/s. Truly 75MB/s is a low ball number for these drives. Even Exadata V1 SAS drives more than 75MB/s per drive and the HDD is not the limiting factor in the scan throughput (a good understanding of the hardware components should lead you to what is). Even the Western Digital 300GB 10K RPM VelociRaptor disk drive has benchmarks that show a maximum sequential data transfer rate of more than 120 MB/s and sustain a minimum of 75MB/s even on the innermost part of the platter, and that is a SATA drive commonly used in PCs!
  2. Barry states that ParAccel has no experience nor metrics (measurements) with these drives or seemingly any drives like them, but yet Barry calls “75MB/sec as a conservative, reliable number”.  Just how reliable of a number can it possibly be when you have exactly zero data points and zero experience with the HDDs in dispute?  Is this a debate that can be won by strength of personality or does it actually require data, numbers and facts?
  3. Perhaps the ParAccel database has disk access patterns that can not drive the scan rates that Exadata can, but should one assert that because ParAccel database may not drive that IO rate, Exadata can’t, even when said rate is within the realm of physical capability? I certainly would think not.  Not unless the intention is simply to promote FUD.

So, as I see it, there are exactly two possibilities: Either one has technical knowledge on what they are talking about (and they have actual data/facts to support it) or they do not and they are just making things up.  At this point I think the answer is quite clear in this situation; Rick and Barry had no data to support their (incorrect) assertions.

And The Truth Shall Be Revealed

Three weeks after Barry’s “three possibilities” comment, Barry reveals the real truth:

…we [ParAccel] have gotten a number of newer servers with SAS2 drives…[and] the newer generation of disk drives are faster than my experience…Exadata’s claim of 1500MB/sec per server seems completely reasonable…My apologies for any confusion created.

As it has come to pass, my assertion that ParAccel had absolutely no experience and thus no data to support their claims is validated (not that I really had any doubts).  Spreading FUD generally does cause unnecessary confusion, but then again, that is usually the intention.  I would expect such nonsense from folks with marketing in their title, but I hold a higher bar for people with technology in their titles.  This was a simple debate about physical disk drive characteristics (and not software) and that is something anyone could get concrete factual data on (assuming they actually take the time and effort).

And Isn’t It Ironic… Don’t You Think?

The same day I read Barry’s “truth comment” I also read Jerome Pineau’s blog post on social media marketing.  I could not help but recognize (and laugh about) the irony of the situation.  Jerome lists several tips on being successful in SMM and the first two really stood out to me:

  1. Do not profess expertise on topics you know little about. Eventually, it will show.
  2. Always remain honest. Never lie. Your most important asset is credibility. You can fix almost any mistake except credibility damage.

Truly, truly ironic…

8 comments

  1. chet

    Since I have zero expertise on the top 90% of your post, I won’t say anything. :)

    And Isn’t It Ironic…

    I’m not sure why people don’t get this yet. If you do offer an opinion on something you should clearly state, IANAExpert or something along those lines.

    I’ve never had a problem with not knowing something, but I tend to keep my mouth shut about it.

    BTW, I think you should do more “editorial” type pieces…very well done.

  2. Greg Rahn

    @Chet
    Unfortunately in the data warehousing arena, it seems common (and seemingly acceptable) to make up some FUD and sling it at some other vendor. Much of the stuff is just so silly, it isn’t worth an engineer’s time to blast holes in it. Take for instance these statements from a “white paper” authored by Richard Burns, Senior Consultant at Teradata that was published around this time last year relating to Exadata V1:

    The enterprise class SAS disks used by Exadata, rotating at 15K RPM, are capable of delivering data to requestors at about 80 MBps. To maximize I/O throughput, Exadata reads data off disk in large chunks. Exadata defaults to 8MB data blocks, with an option for 4MB blocks. At an 8MB block size, ten concurrent I/Os saturate a drive, even without allowances for seek time.

    Out of those four sentences, three of them contain inaccurate (wrong) assertions. The most entertaining bit is the last sentence. Richard Burns apparently thinks that the throughput capacity of a HDD (in MB/s) divided by the I/O size results in the number of concurrent I/Os…but he would be very wrong. This is, however, some very creative math. I guess that is the difference between a consultant and an engineer. Engineers usually write papers for SIGMOD, VLDB, or similar and consultants write “white paper” FUD.

  3. chet

    can I do an internship with you or something? do you have an opening as a mentor?

    Now you’re going to make me learn hardware. Like I don’t have enough to do? :)

  4. Barry Zane

    Greg, I’m not sure I follow what your point is. As described in the full response, for any database query that is disk-bound, faster disks will improve performance proportionally for any database. So, if one database system is 10X faster than another database system, then it is still 10X faster if each leverages the continuous improvements made in hardware (e.g. disks, CPUs, controllers, etc). I’m not naming names, though. 

    Interesting that Teradata’s white paper also quotes 80 MB/sec. This is a company that has sold billions of dollars worth of hardware systems with disk drives. It reinforces that their experience matches my experience, but are now eclipsed by the newer drives. Additionally, it wouldn’t be in their interest to downplay the performance of the drives they rely on either. We should all thank Seagate and the other drive vendors for their improvements. These are the class of drives our customers deploy.

    However, it is certainly unclear what he means by “saturate”. Certainly, if a drive delivers 125MB/sec and there are 5 concurrent requests, then each request will get 25MB/sec, more or less. On the other hand, combining that with the point of 8MB blocks is a little confusing unless he’s saying that big reads allow the percentage of time spent seeking to be ignored, which is true.

    Thanks for posting the full link. I invite any reader to go to it. Good, bad or indifferent, you probably should have included a link to the Teradata document to allow the reader to draw their own conclusions – http://www.teradata.com/t/assets/0/206/276/87e1747c-7ccf-4be3-b812-1dba03dce5d7.pdf

  5. Greg Rahn

    @Barry

    Are you referring to the point of this blog post? If so, it baffles me that you don’t (or choose not to) follow my point, but let me spell it out for you as simply as I can: Rick made and you supported incorrect assertions, pushed them as fact, without any research or data. If you don’t have either experience or data on something, you certainly should not be blogging on it unless you really have a desire for zero credibility. Take the time, do the research, get some numbers from the lab and then speak/blog intelligently.

    You specifically mention:

    “We see higher numbers [than 75MB/s] in disk tests, but never anywhere near 125MB/sec.”

    Perhaps you would like to elaborate on how ParAccel runs their disk tests (and on what hardware). To demonstrate how poorly researched your comment is, I’ve run a simple disk test (output below) on drive in an Exadata V1 HP ProLiant DL180 G5 server which is able to perform at over 170MB/s, almost 100MB/s more than you have claimed to observed in tests!!! So it really raises the question: Why are your numbers so off base? Granted, application performance won’t achieve what a micro benchmark can, but still… It’s simply a matter of physics, not fiction.

    There really is nothing interesting in that “white paper” from Teradata other than the amount of rubbish and FUD it contains. Papers are written by people, not companies, so the author’s experience and knowledge has little to nothing to do with how many dollars of hardware Teradata has sold. Have you any idea the I/O rates a Teradata system gets per drive? They’d be lucky to get 20MB/s with their I/O patterns (around 64K and random). Curt Monash reports “15 MB/second on their fastest disks” in this post. Why do you think that the Teradata 5555 system has 100 146GB (and maybe now the 300GB) 15K RPM FCAL drives per node (A 3/5 clique has 3 two socket quad-core Harpertown nodes [plus 1 spare] with 5 arrays each with 60 HDDs in it). Perhaps it would now seem obvious why Teradata is excited about SSD. With I/O patterns like that I would be too!

    I digress…I’ve probably done more than enough technical research for you at this point…

    [root@exadata-v1 ~]# dd if=/dev/cciss/c0d11 of=/dev/null bs=1M iflag=direct count=50000
    50000+0 records in
    50000+0 records out
    52428800000 bytes (52 GB) copied, 302.569 seconds, 173 MB/s
    
    [root@exadata-v1 ~]# collectl -sD | grep c0d11
    # DISK STATISTICS (/sec)
    #           Pct
    #Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
    cciss/c0d11 170604      0 1499  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 173180      0 1522  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 172684      0 1518  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 171488      0 1507  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 172156      0 1513  114       0      0    0    0     113     5     3      0   98
    cciss/c0d11 172932      0 1520  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 172700      0 1517  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 170512      0 1499  114       0      0    0    0     113     5     3      0   98
    cciss/c0d11 172776      0 1518  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 173584      0 1526  114       0      0    0    0     113     5     3      0   99
    cciss/c0d11 170356      0 1497  114       0      0    0    0     113     5     3      0   99
    
  6. Kevin Closson

    @Barry Zane

    Barry and company stated that our knowledge (not belief) of how 3.5 600Gb 15K RPM SAS drives perform was in error. Barry further said:

    “If we see in the neighborhood of 125GB/sec from the newer drives, I’m buying the beer. J”

    Barry, I feel you got a Get Out of Jail Free card and you do owe me a beer. You folks should not have closed down responses on the Partway There post especially after admitting that testing finally had proven to you what everyone else already knew.

    Bad mojo. But I still want my beer.

    The views expressed in this comment are my own and do not necessarily reflect the views of Oracle. The views and opinions expressed by others on this comment thread are theirs, not mine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s