Oracle Exadata Storage Server and the HP Oracle Database Machine

If you haven’t been under a rock you know that Larry Ellison announced the Oracle Exadata Storage Server and the HP Oracle Database Machine at Oracle OpenWorld 2008. There seems to be quite a bit of interest and excitement about the product and I for one will say that I am extremely excited about it especially after having used it. If you were an OOW attendee, hopefully you were able to see the HP Oracle Database Machine live demo that was in the Moscone North lobby. Kevin Closson and I were both working the live demo Thursday morning and Doug Burns snapped a few photos of Kevin and I doing the demo.

HP Oracle Database Machine Demos

In order to demonstrate Oracle Exadata, we had an HP Oracle Database Machine set up with some live demos. This Database Machine was split into two parts, the first had two Oracle database servers and two Oracle Exadata servers, the second had six Oracle database servers and 12 Oracle Exadata servers. A table scan query was started on the two Oracle Exadata servers config. The same query was then started on the 12 Oracle Exadata servers config. The scan rates were displayed on the screen and one could see that each Exadata cell was scanning at a rate around 1GB/s for a total aggregate of around 14GB/s. Not too bad for a single 42U rack of gear. This demo also showed that the table scan time was linear with the number of Exadata cells: 10 seconds vs. 60 seconds. With six times the number of Exadata cells, the table scan time was cut by 6.

The second live demo we did was to execute query consisting of a four table join (PRODUCTS, STORES, ORDERS, ORDER_ITEMS) with some data that was based off one of the trial customers. The query was to find how many items were sold yesterday in four southwestern states of which the item name contained the string “chili sauce”. The ORDER_ITEMS table contained just under 2 billion rows for that day and the ORDERS table contained 130 million rows for the day. This query’s execution time was less than 20 seconds. The execution plan for this query was all table scans – no indexes, etc were used.

When One HP Oracle Database Machine Is Not Enough

As a demonstration of the linear scalability of Oracle Exadata, a configuration of six (6) HP Oracle Database Machines for a total of 84 Exadata cells was assembled. 14 days worth of POS (point of sale) data onto one Database Machine and executed a query to full table scan the entire 14 days. Another 14 days of data were loaded and a second Database Machine was added to the configuration. The query was run again, now against 28 days across two Database Machines. This process was repeated, loading 14 more days of data and adding another Database Machine until 84 days were loaded across six Database Machines. As expected, all six executions of the query were nearly identical in execution time demonstrating the scalability of the product. The amazing bit about this all was with six Database Machines and 84 days of data (around 163 billion rows), the physical I/O scan rate was over 74 GB/s (266.4 TB/hour) sustained. To put that in perspective, it equates to scanning 1 TB of uncompressed data in just 13.5 seconds. In this case, Oracle’s compression was used so the time to scan 1 TB of user data was just over 3 seconds. Now that is extreme performance!!!

As I’m getting ready to post this, I see Kevin has beat me to it. Man, that guy is an extreme blogging machine.

Initial Customer Experiences

Several Oracle customers had a 1/2 HP Oracle Database Machine* (see Kevin’s comments below) to do testing with their data and their workloads. These are the ones that were highlighted in Larry’s keynote.

M-Tel

  • Currently runs on two IBM P570s with EMC CX-30 storage
  • 4.5TB of Call Data Records
  • Exadata speedup: 10x to 72x (average 28x)
  • “Every query was faster on Exadata compared to our current systems. The smallest performance improvement was 10x and the biggest one was 72x.”

LGR Telecommunications

  • Currently runs on HP Superdome and XP24000 storage
  • 220TB of Call Data Records
  • “Call Data Records queries that used to run over 30 minutes now complete in under 1 minute. That’s extreme performance.”

CME Group

  • “Oracle Exadata outperforms anything we’ve tested to date by 10 to 15 times. This product flat out screams.”

Giant Eagle

  • Currently runs on IBM P570 (13 CPUs) and EMC CLARiiON and DMX storage
  • 5TB of retail data
  • Exadata speedup: 3x to 50x (average 16x)

30 comments

  1. Pingback: On SAGE and Oracle’s new 11g SQL Tuning Workshop Education Content « H.Tonguç Yılmaz - Oracle Blog
  2. Pingback: Oracle Database Machine performance and compression | DBMS2 -- DataBase Management System Services
  3. Kevin Closson

    Even though most of us on the Exadata team have routinely stated that the Beta participants received “a half rack”, doing so is not precise. The Beta participants received a configuration with 4 database servers and 6 Exadata Storage Servers. A production HP Oracle Database Machine has 8 database servers and 14 Exadata Storage Servers. Likewise the Exadata Storage Server Software was executing on Proliant DL185 hardware which is significantly less powerful than the production DL180 G5 hardware. So, less and less-powerful. Just FYI.

  4. Kevin Closson

    Eek,I have to apologize.

    I had wires crossed between Beta1 and Beta2 specifically the host hardware for Exadata Storage Server. So, while we did only ship them 6 Exadata cells (as opposed to the 7 of a true half rack) in the Beta2 program, we did not ship them Proliant DL185 hardware as that was the Beta1 platform. Sorry. Having said that, customers should be glad that the platform is the DL180 G5 because it is significantly more capable of handling Exadata Storage Server Software as a workload.

  5. Allan Nelson

    We have an Oracle E-Business suite database with a monstrous appendage in the form of 486 separate and distinct reports that the users can run anytime they please. There are also some large ad-hoc query loads and some miscellaneous cube building.

    With this setup it seems dreamy for a brute force approach. Will this machine run E-Business suite 11.5.10.1?

  6. Greg Rahn

    @Allan

    If E-Business suite 11.5.10.1 is certified with Oracle Database 11.1.0.7 + RAC then I guess yes.

    This setup allows for disk scan rates that normally would take quite a large amount of FC (fibre channel) storage.

  7. hrishy

    Hi Greg

    I read somewhere that this beast ship query results to the server rather then the blocks a la netezza with its Snippet Processing Unit is it the reason Oracle is able to full scan the table structures with billion records and yet have a response time of seconds ?

    Whats the physics behind this ?

    regards
    Hrishy

  8. Greg Rahn

    @hrishy

    You are correct. Exadata filters rows and columns. I know that Netezza filters at least rows. Not sure about columns.

    The physics is to do sequential I/O on lots of drives and be able to deliver it to the database host. The SAS drives can do sequential scans at around 85 MB/s.

    Netezza has 1 PPC core for every 1 7200 RPM drive. Exadata has 8 Intel Xeon cores for every 12 15,000 RPM drives, and that does not account for any Oracle database CPUs.

  9. César Augusto de Oliveira

    Hi ,
    Here in Brazil we have deployed an Oracle 9i/DMX3/AIX 5.3 solution that is delivering up to 1.6 GB/s/Storage. This is around 60% faster that Oracle Exadata.
    We believe that Oracle Exadata can be a good choice only if the price will be cheaper.

    Thanks !

  10. Greg Rahn

    @César

    If your storage is delivering 1.6 GB/s this is 60% faster than one Oracle Exadata Storage Server which delivers 1 GB/s, but the minimum production deployment would be two (for redundancy) so the I/O scan rate would then be 2 GB/s which would be 25% faster than your current system. This would also be using just 24 hard disk drives and 4U of rack space. How many does your system have and how many rack units of space does it take? My guess that your system has at least twice that many drives and takes up much more space.

  11. César Augusto de Oliveira

    Hi ,
    We have just one Storage EMC DMX3 delivering up to 1.6 GB/s to a Datawarehouse Cellphone Traffic System (we have the real max transfer of 1586,4 MB/s/Storage ), but , if we put another storage and split our system teorically we can get 3.2 GB/s that is around 60% faster Oracle Exadata.
    Obviously that we have more that 24 hard disks but our storage is shared with other database system´s not like DW. But when we have planned this environmet we creates chunks of 8MB for example and we have changed the disks parameters like queue depth , max_transfer and others to support disk intensive approach.
    What is the real gain that we can get supposing if change from the EMC tecnology and plaining our disk distribution like Teradata AMP´s to Exadata or XiV based tecnology ?
    Remember that in EMC tecnology we have other good tools like BCV… Our analysis need to show all the gains and not only performance !

    Thanks in advance …

  12. Greg Rahn

    @César

    Just as you would double your EMC DMX3, you can double and scale out Oracle Exadata, so your solution is not really 60% faster. Let’s make it a comparison of equals. With your current solution you can scan at 1.6 GB/s, two HP Oracle Exadata Storage Servers scan at 2 GB/s. If you double your solution to get 3.2 GB/s then use 4 HP Oracle Exadata Storage Servers to get 4 GB/s. In any case, it takes less hardware to get a faster and more scalable I/O throughput rate with Oracle Exadata.

    As I have written above, we used 84 Oracle Exadata Storage Servers and 1008 hard disk drives we achieved an I/O scan rate of 74 GB/s. How many EMC DMX3 cabinets and drives would it take to achieve that? Think about this: If one used 4 Gb HBAs and you could read at wire speed of 400 MB/s, one would need 185 single port 4 Gbit HBAs. Not very feasible, now is it?

    Exadata allows I/O Resource Management (IORM) as just one of the tools to manage I/O. That is something that can not be done with any other storage platform because it is specific to the iDB protocol Oracle wrote. Also, the ZDP protocol is very light weight: it transfers 1 GB/s with only 2% CPU utilization. Fibre attached storage requires much more CPU to move the same amount of data.

    If you have not, I would suggest reading the technical information found on
    http://www.oracle.com/technology/products/bi/db/exadata

  13. César Augusto de Oliveira

    Greg , we understood…
    What am i saying is that is very possible to beat Oracle Exadata , IBM XiV , or another any solution just using best practices of parallel query , parallel scan IO , good distribution of disks/controllers and disk intensive IO planning(spindles,queue-depth,read-ahead,release-behind,big-chunk,outer edge formatting,redo dedicated disks,etc…).
    Of course , if you are planning to perform 8 MB/IO , that´s why you can get small cpu activity …. All the effort is in disk not cpu !
    Thinking about the customer like us …

    The final question is : How much expensive ( money , DC size , good product support , backup speed and effort to migrate) are “N” Exadata Servers or “N/1.6″ EMC DMX3 Boxes ?
    50 EMC DMX3 Boxes or 80 Oracle Exadata ? 5 EMC DMX3 Boxes or 8 Oracle Exadata ?

    Today the EMC costs are high but with Grid like solutions that you are delivering , the final question is only price and price is negotiated, since you have know-how to deploy a good computational environment !

    César Augusto de Oliveira

  14. Greg Rahn

    @César

    I’m in database performance engineering, not sales, so I can not help you with the price, but I can help you with the technical bits.

    Any solution can probably “beat” any other solution given enough time, money and engineering, but I do not think that is a discussion worth having. The point is that with Oracle Exadata you get more with less. More total I/O bandwidth, sequential I/O, 80 MB/s per disk scan rates (using SAS drives), smart scans, less cabling between db hosts and storage, etc. All that and it is much more simple to get that performance. No need to worry about LUN configuration, stripe size/depth, RAID choice, multi-pathing, database file placement, etc.

    But don’t believe it because I told you…see the comments from the two customers in the Telco industry that used it with their data and their workloads and compare it to what they use in production for hardware today.

  15. César Augusto de Oliveira

    Greg , i´m 16 years performance specialist on Oracle/Teradata/Unix/Storage too , then we knows what are the concepts that Exadata/XiV-like tecnologies has encapsulated.

    César Augusto de Oliveira
    Telecom Itália Móbile

  16. Greg Rahn

    @César

    My comment was not meant to challenge your experience, it was simply to mention some of the things that one has to deal with when managing storage. Not everyone does as good a job as it seems you have. I’m sure you seen those systems. Bottom line is this: If you are happy with your performance, then I am as well.

    Thanks for your comments.

  17. Dharmendra

    Greg,

    Few questions on Exadata/HP-DB Machine.

    1) Since now only fitered rows would be returned to DB instance, Oracle instance would require less amount of SGA. Right?
    2)Can you please provide the list of operation that are smart-scan enabled and are not smart-scan enabled. Technical paper on Exadata list only two..predicate filtering and Star-Transformation. so wondering if operations like SORT, HASH/Sort-Merge joins are smart-scan enabled.

    Thanks.

  18. Greg Rahn

    @Dharmendra
    1) Rows retrieved via Parallel Execution (PX) direct path reads are not read back into the SGA with or without Exadata storage, they are read back into PGA memory. Smart Scan is not used with serial reads so it behaves the same with or without Exadata Smart Scans. So to answer your question: Smart Scans have no effect on what blocks are returned to the SGA.

    2) Exadata 11.1 Smart Scan operations include:

    – Restriction (filtering of rows)
    – Projection (filtering of columns)
    – Join Filters (Bloom Filters) commonly used in Fact/Dimension joins found in Star Schemas. This is different than Star Transformation.

    Other join operations such as HASH, SORT, SORT-MERGE, NESTED LOOP, etc. are done by the Oracle Database Grid.

  19. Dharmendra

    @Greg Rahn
    Thanks Greg!

    1) Is there a Roadmap/plan for other join operations such HASH, SORT to be included in near future?

    2) Does Smart scan requires PQ to be enabled at session/table level? If not what’s DOP being used? What’s the max, # of concurrent operations would occur with same DOP ( before it decides to downgrade the DOP or queue the other operations )?

    Thanks

  20. Greg Rahn

    @Dharmendra

    1) I can’t give an official comment on that, but my guess would be no. Here is why: in order to do a HASH join in Exadata all the rows for a given hash key need to be on the same Exadata Storage Server, which is not the case today. This is commonly known as hash distribution (shared nothing databases generally do this), which is slightly different than hash partitioning. Until then, only the DB grid can do the HASH join as it has visibility into all the data because of the shared storage architecture.

    2) Smart Scan requires PQ. How one involks PQ can vary. It can be done by altering a session, the parallel attribute on the table, or a hint in the query. Exadata has no effect nor knowledge what DOP is used, it simply services the I/O requests.

  21. Dharmendra

    @Greg Rahn
    Thanks again Greg!

    Regarding 2, aren’t smart-scans are done at exadata layer?

    can you tell me how this Query (made up) be parallelized at exadata level. i.e. what operations will be executed at exadata (smart-scans) and what would be at the host(DB-Grid) level?

    what’s required to be done at table/session/query level to make performance of this query most optimized on DB machine?

    select r.reg_name, p.prod_name, sum(s.rev_amt)
    from sales s, region r, product p
    where s.reg_id = r.reg_id
    and s.prd_id = p.prd_id
    and r.region_name in ( ‘ASIA PACIFIC’,’EMEA’ )
    and s.sale_date between to_date(’01/01/2001′, ‘MM/DD/YYYY’)
    and to_date(’12/31/2001′, ‘MM/DD/YYYY’)
    group by r.reg_name, p.prod_name
    order by 1, 2, 3 desc

    Sales table is huge (1TB) with 10 years of sales data and partitioned on sales date ( one partition per quarter ).

  22. Greg Rahn

    @Dharmendra

    First, Smart Scans are performed in Exadata on the data that PQ requests. Second Exadata does not execute or parallelize queries, it simply scans and filters data. Exadata is storage software, not database software. It just knows how to apply some operations that the database software tells it (like filtering/restriction).

    Nothing is really required, it just works. You don’t “tune” for Exadata.

    For this specific query I would guess the following would take place:
    – partition elimination (this is not Exadata specific but since this query has a predicate on SALES.SALE_DATE and that is the partition key, then it will eliminate the partitions not required and only scan the 4 quarterly partitions for 2001)
    – projection (smart scan will only send the required columns back to the DB grid)
    – possible use of bloom filter (depending on the selectivity of the REGION_NAME predicate, a bloom filter may be created from the values for REG_ID and pushed down to Exadata to be used as a Smart Scan filter)

    The results, post Smart Scan, will be sent to the DB grid and the joins performed there. So assuming a Bloom/Join filter on REGION.REG_ID, then SALES and REGION would be joined, aggregated, and ordered in the DB grid.

    My guess is this query would run a few seconds. Assuming an equal distribution of data across years, 1/10 of 1TB is 100GB. The scan rate is 14GB/s for a 1 rack DB Machine, so the table scan of 100GB of SALES would probably take less than a second and the join would also be quite fast with 64 Harpertown CPU cores doing the that and aggregation.

  23. Dharmendra

    @Greg Rahn
    Thanks Greg again for detailed explanation.

    >First, Smart Scans are performed in Exadata on the data that PQ requests.
    >Nothing is really required, it just works. You don’t “tune” for Exadata.

    As I understand from your answers so far, To enable smartscan , PQ needs to be enabled at Table/Session/Query level.. else data would be fetched in normal fashion (i.e. all the data would be returned to DB grid ). Correct?

    In Technical overview document, Each Exadata Cell is assigned set of disks. How data is distributed accross the Exadata Cells? Is it done thru ASM (1MB per Disk/Lun in the Disk Group)?

    TIA.

  24. Greg Rahn

    @Dharmendra

    Correct. Statements that use parallel execution use Smart Scan, non-PX (serial) statements do not.

    As the Technical Overview of Exadata whitepaper states on page 21:

    ASM automatically stripes the database data across Exadata disks and cells to ensure a balanced I/O load and optimum performance.

    And Best Practices for Migrating to Oracle Exadata Storage Server states on page 7:

    Oracle Exadata Storage Server performs best when scanning at least 4MB contiguous chunks. To ensure this occurs at an ASM level, the disk group’s allocation unit size should be set to 4MB.

  25. Oozy

    @Greg Rahn

    I have a question about smart scan in joins: if smart scan can filter rows, why in joins – after finding REG_IDs corresponding to REGION_NAMEs – doesn’t smart scan do “normal” row filtering, and instead it uses bloom filter???

    Many thanks in advance!

  26. Greg Rahn

    @Oozy

    For “normal” smart scans the list of values is known at parse time and is generally small in number. Most people don’t write an IN predicate with 5,000 values for example. With a join, the list of values could be fairly large (compared to writing it by hand) so it is much more efficient to apply this filtering with a bloom filter. Think of it as bulk set based elimination. Do note that a bloom filter can return false positives but no false negatives, so the database still needs to apply this predicate filter as well. The benefit of the bloom filter being applied in the Exadata storage is that it significantly reduces the number of rows returned to the database. This saves on channel bandwidth as well as greatly reducing the work the database has to do for this join operation.

  27. Oozy

    @Greg Rahn

    Many thanks for your answer, now I understand! I would be grateful for answering one more question… :)

    Column projection – I assume that this only works if a table row is located in more than 1 oracle block (this means either row chaining or row migration)? If each and every row in a table fits into 1 oracle block, is column projection relevant?

  28. Oozy

    Greg,

    We know that Smart Scan requires Parallel Query to work? Sometimes we want to
    disable parallelism for a query, for various reasons (in particular: bugs…:)). Is it possible, that in future releases of Exadata, Smart Scan will also be supported for noparallel queries?

    Many thanks in advance!

  29. Greg Rahn

    @Oozy

    I can not comment on what could happen in the future. I would comment that if you are hitting bugs that there be a SR and bug number to reference (bugs do not get fixed unless they are logged with support). I have yet to hit any issues where it was required to disable parallel query to work around a bug. That in itself would just kill performance anyway.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s