Oracle Exadata and Netezza TwinFin Compared – An Engineer’s Analysis

There seems to be little debate that Oracle’s launch of the Oracle Exadata Storage Server and the Sun Oracle Database Machine has created buzz in the database marketplace. Apparently there is so much buzz and excitement around these products that two competing vendors, Teradata and Netezza, have both authored publications that contain a significant amount of discussion about the Oracle Database with Real Application Clusters (RAC) and Oracle Exadata. Both of these vendor papers are well structured but make no mistake, these are marketing publications written with the intent to be critical of Exadata and discuss how their product is potentially better. Hence, both of these papers are obviously biased to support their purpose.

My intent with this blog post is simply to discuss some of the claims, analyze them for factual accuracy, and briefly comment on them. After all, Netezza clearly states in their publication:

The information shared in this paper is made available in the spirit of openness. Any inaccuracies result from our mistakes, not an intent to mislead.

In the interest of full disclosure, my employer is Oracle Corporation, however, this is a personal blog and what I write here are my own ideas and words (see disclaimer on the right column). For those of you who don’t know, I’m a database performance engineer with the Real-World Performance Group which is part of Server Technologies. I’ve been working with Exadata since before it was launched publicly and have worked on dozens of data warehouse proofs-of-concept (PoCs) running on the Exadata powered Sun Oracle Database Machine. My thoughts and comments are presented purely from an engineer’s standpoint.

The following writings are the basis of my discussion:

  1. Teradata: Exadata – the Sequel: Exadata V2 is Still Oracle
  2. Daniel Abadi: Defending Oracle Exadata
  3. Netezza: Oracle Exadata and Netezza TwinFin™ Compared

If you have not read Daniel Abadi’s blog post I strongly suggest you do before proceeding further. I think it is very well written and is presented from a vendor neutral point of view so there is no marketing gobbledygook to sort through. Several of the points in the Teradata writing which he discusses are also presented (or similarly presented) in the Netezza eBook, so you can relate his responses to those arguments as well. Since I feel Daniel Abadi did an excellent job pointing out the major flaws with the Teradata paper, I’m going to limit my discuss to the Netezza eBook.

Understanding Exadata Smart Scan

As a prerequisite for the discussion of the Netezza and Teradata papers, it’s imperative that we take a minute to understand the basics of Exadata Smart Scan. The Smart Scan optimizations include the following:

  • Data Elimination via Storage Indexes
  • Restriction/Row Filtering/Predicate Filtering
  • Projection/Column Filtering
  • Join Processing/Filtering via Bloom Filters and Bloom Pruning

The premise of these optimizations is reduce query processing times in the following ways:

  • I/O Elimination – don’t read data off storage that is not needed
  • Payload Reduction – don’t send data to the Oracle Database Servers that is not needed

OK. Now that you have a basic understanding, let’s dive into the claims…

Netezza’s Claims

Let’s discuss a few of Netezza claims against Exadata:

Claim: Exadata Smart Scan does not work with index-organized tables or clustered tables.

While this is a true statement, its intent is clearly to mislead you. Both of these structures are designed for OLTP workloads, not data warehousing. In fact, if one were to actually read the Oracle Database 11.2 documentation for index-organized tables you would find the following (source):

Index-organized tables are ideal for OLTP applications, which require fast primary key access

If one were to research table clusters you would find the Oracle Database 11.2 documentation offers the following guidelines (source):

Typically, clustering tables is not appropriate in the following situations:

  • The tables are frequently updated.
  • The tables frequently require a full table scan.
  • The tables require truncating.

As anyone can see from the Oracle Database 11.2 Documentation, neither of these structures are appropriate for data warehousing.

Apparently this was not what Netezza really wanted you to know so they uncovered a note on IOTs from almost a decade ago, dating back to 2001 – Oracle 9i time frame, that while it clearly states:

[an IOT] enables extremely fast access to table data for primary key based [OLTP] queries

it also suggests that an IOT may be used as a fact table. Clearly this information is quite old and outdated and should probably be removed. What was a recommendation for Oracle Database 9i Release 1 in 2001 is not necessarily a recommendation for Oracle Database 11g Release 2 in 2010. Technology changes so using the most recent recommendations as a basis for discussion is appropriate, not some old, outdated stuff from nearly 10 years ago. Besides, the Oracle Database Machine runs version 11g Release 2, not 9i Release 1.

Bottom line: I’d say this “limitation” has an impact on a nice round number of Exadata data warehouse customers – exactly zero (zero literally being a round number). IOTs and clustered tables are both structures optimized for fast primary key access, like the type of access in OLTP workloads, not data warehousing. The argument that Smart Scan does not work for these structures is really no argument at all.

Claim: Exadata Smart Scan does not work with the TIMESTAMP datatype.

Phil Francisco seems to have left out some very important context in making this accusation, because this is not at all what the cited blog post by Christian Antognini discusses. This post clearly states the discussion is about:

What happens [with Smart Scan] when predicates contain functions or expressions?

Nowhere at all does that post make an isolated reference that Smart Scan does not work with the TIMESTAMP datatype. What this blog post does state is this:

when a TIMESTAMP datatype is involved [with datetime functions], offloading almost never happens

While the Netezza paper references what the blog post author has written, some very important context has been omitted. In doing so, Netezza has taken a specific reference and turned it into a misleading generalization.

The reality is that Smart Scan does indeed work for the TIMESTAMP datatype and here is a basic example to demonstrate such:

SQL> describe t
 Name           Null?    Type
 -------------- -------- ------------------
 ID             NOT NULL NUMBER
 N                       NUMBER
 BF                      BINARY_FLOAT
 BD                      BINARY_DOUBLE
 D                       DATE
 T                       TIMESTAMP(6)
 S                       VARCHAR2(4000)    

SQL> SELECT * FROM t WHERE t = to_timestamp('01-01-2010','DD-MM-YYYY');

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     1 |    52 |     4   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| T    |     1 |    52 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("T"=TIMESTAMP' 2010-01-01 00:00:00.000000000')
       filter("T"=TIMESTAMP' 2010-01-01 00:00:00.000000000')

You can see that the Smart Scan offload is taking place by the presence of the storage clause (highlighted) in the Predicate Information section above. What Christian Antognini did observe is bug 9682721 and the bugfix resolves the datetime function offload issues for all but a couple scenarios (which he blogs about here) and those operations can (and usually are) expressed differently. For example, an expression using ADD_MONTHS() can easily be expressed using BETWEEN.

Bottom line: Exadata Smart Scan does work with the TIMESTAMP datatype.

Claim: When transactions (insert, update, delete) are operating against the data warehouse concurrent with query activity, smart scans are disabled. Dirty buffers turn off smart scan.

Yet again, Netezza presents only a half-truth. While it is true that an active transaction disables Smart Scan, they fail to further clarify that Smart Scan is only disabled for those blocks that contain an active transaction – the rest of the blocks are able to be Smart Scanned. The amount of data that is impacted by insert, update, delete will generally be a very small fraction of the total data in a data warehouse. Also, data that is inserted via direct path operations is not subject to MVCC (the method Oracle uses for read consistency) as the blocks that are used are new blocks so no read consistent view is needed.

Bottom line: While this claim is partially true, it clearly attempts to overstate the impact of this scenario in a very negative way. Not having Smart Scan for small number of blocks will have a negligible impact on performance.

Also see Daniel Abadi: Exadata does NOT Support Active Data Warehousing

Claim: Using [a shared-disk] architecture for a data warehouse platform raises concern that contention for the shared resource imposes limits on the amount of data the database can process and the number of queries it can run concurrently.

It is unclear what resource Netezza is referring to here, it simply states “the shared resource”. You know the one? Yes, that one… Perhaps they mean the disks themselves, but that is completely unknown. Anyway…

Exadata uses at least a 4 MB Automatic Storage Management (ASM) allocation unit (AU) [more on ASM basics]. This means that there is at least 4 MB of contiguous physical data laid out on the HDD which translates into 4 MB of contiguous data streamed off of disk for full table scans before the head needs to perform a seek. With such large I/O requests the HDDs are able to spend nearly all the time transferring data, and very little time finding it and that is what matters most. Clearly if Exadata is able to stream data off of disk at 125 MB/s per disk (near physics speed for this type of workload) then any alleged “contention” is really not an issue. In many multi-user data warehouse workloads for PoCs, I’ve observed that each Exadata Storage Server is able to perform very close or at the data sheet physical HDD I/O rate of 1500 MB/s per server.

Bottom line: The scalability differences between shared-nothing and shared-disk are very much exaggerated. By doing large sequential I/Os the disk spends its time returning data, not finding it. Simply put – there really is no “contention”.

Also see Daniel Abadi: 1) Exadata does NOT Enable High Concurrency & 2) Exadata is NOT Intelligent Storage; Exadata is NOT Shared-Nothing

Claim: Analytical queries, such as “find all shopping baskets sold last month in Washington State, Oregon and California containing product X with product Y and with a total value more than $35″ must retrieve much larger data sets, all of which must be moved from storage to database.

I find it so ironic that Netezza mentions this type of query as nearly an identical (but more complex) one was used by my group at Oracle OpenWorld 2009 in our The Terabyte Hour with the Real-World Performance Group session. The exact analytical query we ran live for the audience to demonstrate the features of Oracle Exadata and the Oracle Database Machine was, “What were the most popular items in the baskets of shoppers who visited stores in California in the first week of May and did not buy bananas?”

Let’s translate the Netezza analytical question into some tables and SQL to see what the general shape of this query may look like:

select
   count(*)  -- using count(*) for simplicity of the example
from (
   select
      td.transaction_id,
      sum(td.sales_dollar_amt) total_sales_amt,
      sum(case when p.product_description in ('Brand #42 beer') then 1 else 0 end) count_productX,
      sum(case when p.product_description in ('Brand #42 frozen pizza') then 1 else 0 end) count_productY
   from transaction_detail td
      join d_store s   on (td.store_key = s.store_key)
      join d_product p on (td.product_key = p.product_key)
   where
      s.store_state in ('CA','OR','WA') and
      td.transaction_time >= timestamp '2010-07-01 00:00:00' and
      td.transaction_time <  timestamp '2010-08-01 00:00:00'
   group by td.transaction_id
) x
where
   total_sales_amt > 35 and
   count_productX > 0 and
   count_productY > 0

To me, this isn’t a particularly complex analytical question/query. As written, it’s just a 3 table join (could be 4 if I added a D_DATE I suppose), but it doesn’t require anything fancy – just a simple GROUP BY with a CASE in the SELECT to count how many times Product X and Product Y appear in a given basket.

Netezza claims that analytical queries like this one must move all the data from storage to the database, but that simply is not true. Here is why:

  1. Simple range partitioning on the event timestamp (a very common data warehousing practice for those databases that support partitioning), or even Exadata Storage Indexes, will eliminate any I/O for data other than the one month window that is required for this query.
  2. A bloom filter can be created and pushed into Exadata to be used as a storage filter for the list of STORE_KEY values that represent the three state store restriction.

Applying both of #1 and #2, the only data that is returned to the database for the fact table are rows for stores in Washington State, Oregon and California for last month. Clearly this is only a subset of the data for the entire fact table.

This is just one example, but there are obviously different representations of the same data and query that could be used. I chose what I thought was the most raw, unprocessed, uncooked form simply because Netezza seems to boast about brute force type of operations. Even then, considering a worst case scenario, Exadata does not have to move all the data back to the database. Other data/table designs that I’ve seen from customers in the retail business would allow even less data to be returned.

Bottom line: There are numerous ways that Exadata can restrict the data that is set to the database servers and it’s likely that any query with any predicate restrictions can do so. Certainly it is possible even with the analytic question that Netezza mentions.

Claim: To evenly distribute data across Exadata’s grid of storage servers requires administrators trained and experienced in designing, managing and maintaining complex partitions, files, tablespaces, indices, tables and block/extent sizes.

Interestingly enough, the author of the Teradata paper seems to have a better grasp than Netezza on how data distribution and ASM work describing it on page 9:

Distribution of data on Exadata storage is managed by Oracle’s Automatic Storage Manager (ASM). By default, ASM stripes each Oracle data partition across all available disks on every Exadata cell.

So if by default ASM evenly stripes data across all available disks on Exadata Storage Server (and it does, in a round robin manner) what exactly is so difficult here? What training and experience is really required for something that does data distribution automatically? I can only assert that Phil Francisco has not even read the Teradata paper (but it would seem he has since he even mentions it on his blog), let alone Introduction to Oracle Automatic Storage Management. It’s claims like this that really make me question how genuine his “no intent to mislead” statement really is.

Bottom line: Administrators need not worry about data distribution with Exadata and ASM – it is done automatically and evenly for you.

Conclusion

I’m always extremely reluctant to believe much of what vendors say about other vendors, especially when they preface their publication with something like: “One caveat: Netezza has no direct access to an Exadata machine“, and “Any inaccuracies result from our mistakes, not an intent to mislead” yet they still feel qualified enough to write about said technology and claim it as fact. I also find it interesting that both Teradata and Netezza have published anti-Exadata papers, but neither Netezza nor Teradata have published anti-vendor papers about each other (that I know of). Perhaps Exadata is much more of a competitor than either of them let on. They do protest too much, methinks.

The list of claims I’ve discussed certainly is not an exhaustive list by any means but I think it is fairly representative of the quality found in Netezza’s paper. While sometimes the facts are correct, the arguments are overstated and misleading. Other times, the facts are simply wrong. Netezza clearly attempts to create the illusion of problems simply where they do not exist.

Hopefully this blog post has left you a more knowledgeable person when it comes to Oracle and Exadata. I’ve provided fact and example wherever possible and kept assertions to a minimum.

I’d like to end with a quote from Daniel Abadi’s response to the Teradata paper which I find more than applicable to the Netezza paper as well:

Many of the claims and inferences made in the paper about Exadata are overstated, and the reader needs to be careful not to be mislead into believing in the existence problems that don’t actually present themselves on realistic data sets and workloads.

Courteous and professional comments are welcome. Anonymous comments are discouraged. Snark and flame will end up in the recycle bin. Comment accordingly.

27 comments

  1. Bradd Piontek

    How fitting I saw this this morning as Oracle is coming in today to give their dog and pony show on Exadata :) Great Stuff , Greg!!!!

    I enjoyed your analysis but do have one question. Exadata V2 marketing claims it is also for OLTP workloads. My assumption is that is where some of the competitors are picking on some of the OLTP-only features of oracle as they relate to Exadata.

  2. Greg Rahn

    @Bradd

    The main Exadata functionality that makes it a viable solution for OLTP is the Smart Flash Cache. This was not present in the V1 Exadata systems and is why it was for data warehousing only. With the addition of the Smart Flash Cache in Exadata V2, the storage can now support a large number of IOPS which is a requirement for most OLTP workloads. The big difference between Smart Flash Cache and other storage cache is the software is smart enough not to pollute the cache with table scan blocks that will only be used once. Also see: Exadata Smart Flash Cache and the SUN Oracle Database Machine Technical White Paper (PDF).

    I’m doubtful that either Netezza or Teradata will adopt too much in terms of OLTP features – both are pure data warehouse solutions and are unable to do both. I do know that nearly every database vendor is investigating the advantages of flash/SSD storage though.

  3. Bradd Piontek

    Greg,
    I completely understand the differences between V1 and V2. I was just wondering if that was why the DW appliance vendors were being misleading (likely on purpose). Otherwise, why bring up OLTP-related ‘short-comings’ of Exadata :)

    Dumb, and totally unrelated question. How do I get a cool avatar picture to show up on my posts?

  4. Greg Rahn

    @Bradd

    IMO the reason that the DW vendors are being misleading is because it’s very hard to come up with things that differentiate themselves from the competition. When you can’t make yourself look good (or good enough), try and make the competition look bad. Desperate times call for desperate measures. This is basic marketing FUD 101. I find it unfortunate that Netezza and Teradata find it acceptable to stoop to such low levels.

    Check out Gravatar.

  5. Mike Kearney

    Greg, what was the total value of Brand #42 beer sold in each basket?

    For the sake of brevity and good manners, we didn’t want to dominate Greg’s comments with a point-by-point analysis of the above posting. We did want to tackle one point here and let you know that we’ve posted our deeper analysis at http://www.enzeecommunity.com/blogs/nzblog/2010/08/26/talkin-bout-my-generation.

    Our discussion point from our eBook (www.netezza.com/exadata-twinfin-compared) — Analytical queries, such as “find all shopping baskets sold last month in Washington State, Oregon and California containing product X with product Y and with a total value more than $35″ must retrieve much larger data sets, all of which must be moved from storage to database.

    Greg shows some nice SQL to demonstrate how Exadata processes the beer and pizza query. Give the business an answer and they always come back with a new question: “Greg, what was the total value of Brand #42 beer’ sold in each basket?”

    Greg can now update his SQL with the clause:

    sum(case when p.product_description in (‘Brand #42 beer’) then td.sales_dollar_amt else 0 end) sum_productX,

    and re-run the query. Business users love IT when we give them a fast performing system but are less forgiving when a query, that yesterday ran blazingly fast, today slows to a snail’s pace. Exadata cannot push down the newly introduced sum for parallel processing by its storage nodes as the join must be processed first, and the storage nodes cannot process joins. Any function or calculation that uses columns from two or more tables must be evaluated on the RAC database servers. The query performance is going to degrade significantly sending the database expert back to the Oracle documentation in an attempt to find a new way to resolve the amended query so it completes at a time acceptable to the business.

    We invite you to visit our blog for further discussion on the points Greg made above at: http://www.enzeecommunity.com/blogs/nzblog/2010/08/26/talkin-bout-my-generation

  6. Greg Rahn

    @Mike

    Thanks for polluting the Internet with more marketing hot air and FUD from the Netezza side. It’s nice to see you guys try so hard to attack the competition. Exadata must really be a fierce competitor to warrant all of this.

    BTW- adding the sum of the beer to that query has a trivial impact and it does the sum in parallel. I tested it on my Exadata system – something that you admittedly can not do.

    BTW2- no matter how hard you try, no matter what connection you think you have found, trying to make a stink that Smart Scan doesn’t work on IOTs just demonstrates your lack of understanding of when and where structures should be used. If you think IOTs are so important for DW, then tell your engineers to have Netezza support them.

  7. Uwe Hesse

    “One caveat: Netezza has no direct access to an Exadata machine” (!)
    Yes, quite a caveat :-)
    The boldness to write a criticizing technical review about a product without having access to it is amazing.

  8. Brian Ganly

    Hi,

    Interesting article written from an Oracle point of view as opposed to Netezza. I would not say this is an unbiased article. From a customer point of view I have used both Oracle (14 years) and Netezza (5 years) for Data Warehousing and I would veer towards Netezza for the following reasons.

    For what you pay
    o It is fast,
    o It is simple to use
    o Netezza is an easy company to do business with
    o Exceptional support

    The Oracle Exadata Strapline, Brawny Hardware, Brainy software is where I have problems.
    o Brawny Hardware – Excellent.
    o Brainy Software – The same complexity of indexes and partitioning along with all the other overhead of running Oracle that there always has been.

    This overhead translates into higher project development cost and higher operating cost.
    An example would be a 10 Billion row Call Data Record table with 15 local indexes and 10 partitions. Just looking at the table, partitions and indexes, ignoring any Oracle or Operating System storage management you get a breakdown of objects to manage something like

    Oracle
    Table……………1
    Columns…………25
    Partitions………10
    Indexes…………15
    Index Partitions..150
    #Objects……….201

    Netezza
    Table…………..1
    Columns………..25
    Distribute on key
    #Objects……….26

    So the Oracle DBA has to look carefully at the indexes, the data content of the indexes and design the indexes and then build them according to what are the most suitable index types for the columns being indexed. The number of indexes has an impact on the loading speed of the table and will need to be managed over time. Partitions will need to be managed and added to.

    The Netezza developer can be quickly taught about data types and get on with the business of naming the table, deciding on the columns and their data types and putting the data into the database. My experience is this a big saving on development time and cost and performance is still excellent. This combined with excellent support makes a compelling case for Netezza

    When a database becomes just a database and you can start to think about the business implications of the data and not worry about data volume or system performance then you have a valuable business tool.

  9. Greg Rahn

    @Brian

    This article may be slightly biased, however, the most important part is that it is factually accurate (I actually understand the Oracle Database and Exadata internals versus Netezza thinking they understand) and the facts are not presented in a misleading way. I’m also not peddling the product I work with like a religious fanatic either.

    With Exadata, there really is no need for indexes in a data warehouse. Features like Exadata Storage Indexes (similar to Netezza Zone Maps) aid beyond partitioning in eliminating unnecessary I/O and created automatically and require zero maintenance. And to be honest, most Oracle shops abuse indexing in a data warehouse – it simply is not the solution. The reason they resort to over-indexing is that their hardware is simply not sized to do parallel table scans – it lacks the I/O bandwidth capacity. As a result they use indexes to reduce I/O but that results in retrieving data from the table by rowid, one block at a time (a very random I/O pattern). Compare this to an optimized table scan (Exadata Smart Scan) where the table is streamed off of disk using a large sequential I/O access. There certainly are people that have figured this out but many seem not to have. In fact, I know of an Oracle Database Machine customer who migrated over 80TB of table data off their previous Oracle system that contained zero indexes, but it also had the I/O bandwidth capacity necessary to deliver the desired query performance.

    I’ve seen similar types of comparison numbers but I think it’s a weak argument. People can make things as easy or difficult as they desire. Just like choosing a distribution key, the partition key is chosen once, when the table is created. Oracle has interval partitioning which automatically creates partitions for new keys so there is no need to continually create partitions. But even so, when I worked on building and supporting an Oracle data warehouse at my prior job, we used range partitioning and spent less than 5 minutes a month creating new partitions. Such tasks can easily be scripted and automated as well. We also had a very index-light design, relied on Oracle’s Parallel Execution and compression, leveraged simple partitioning, and had hardware that could deliver the I/O rates we needed.

  10. Thomas Teske

    The discussion is interesting, however: what’s the benefit to a customer? Getting back to that thought, would help to shed probably more light on the relevance of some questions and answers provided above. None of the q/a is inadequate, but the relevance is not included.

    My suggestion: let’s add relevance – should you have a question, say why you think it is specifically important.

  11. Chris Craft

    Regarding relevance… The “Brainy Software” is relevant to customers because it means less work for the DBA, and less labor cost for the company to achieve better performance for end users. Brainy Software does not mean doing things in Exadata the same way you did in Oracle9i. As Greg indicated, we tend to use FAR FEWER indexes (and often NO indexes) with Exadata.

    Brian Ganley gives a telecom example that I find particularly funny, since we just completed a large evaluation involving CDR (Call Detail Record) data for a telecom customer. Our partitioning used the same key columns as our competitor used in their “distribution key”, so about the same amount of work went into analyzing which key columns to use. We didn’t use ANY indexes, so all of the comments about “needing” indexes are not correct.

    The point about “Brainy Software” is that we have some cool new features in Exadata that will save time, labor and money, while providing vastly improved performance. Most things are backward compatible with earlier Oracle releases, so you’ve got some flexibility, but you should be adapting your approach to the new product. If Brian Ganley ever gets a chance to work with Exadata, I’m afraid he’s going to pick up where he left off 5 years ago and try to manage Exadata the same way he managed Oracle9i.

  12. Stephen Lee

    Actually, with Exadata, existing customers can choose to use less indexes and partitions but the query response time for large aggregation or data scan still shows very impressive performance. I did a simple test query on over 100 billion rows table with about 10 columns and only 1 PK index with a sum function for a full table scan on only 8 CPU cores DB Grid from an Exadata V1 machine with X2 software (without the flashdisk), it only took a little bit over 12 minutes. That’s is quite fast to me. When I did this similar query on a similar table with only 40 billion rows on a 10gR1 database with 48 CPU cores AIX p595 with high end Tier-1 Hitachi SAN, it took more than 50+ minutes. BTW, I do not work for Oracle either.

  13. Amir Riaz

    Dont use indexes with Oracle Exadata, I may be the first to advocate that thing after Oracle Exadata v1 release and have written this on Oracle Exadata group on linkedin . I dont believe that how people get angry on this, especially Oracle and OLTP oriented peoples and especially Oracle Exadata competitive. Some of them even complained to my manager since we do some business with them and I have to remove those posts. Its good that Greg has written the same things on his blog. nice work Greg and keep up the good work. I don’t work for Oracle either. But respect them due to their product and will to listen and bear.

  14. Greg Rahn

    @Amir Riaz

    I think the best way to put it is that indexes (for data warehousing) should be used when there is a requirement to do so (when it makes sense). Most frequently, for known queries that are extremely selective and need to support a high concurrency/throughput. This is a big change from the way many have run their Oracle DW — smothering their DW with indexes because the I/O is undersized. Too many indexes is a symptom, not the root cause.

  15. Bibhu Datta Rout

    Greg,it was really a great article. I am also in fix what to choose between Exadata and Netezza for my enterprise warehouse application which would be more than 150 billion records when all data migrated to warehouse platform. I think since you are working for Oracle, you can always request your employer to purchase a piece of software from Netezza with best possible hardware combinations and compare with Exadata with similar configurations and publish the result over internet. You could probably show many query performance in similar conditions. It will be great help for people like us who are completely dependent upon the sales/marketing guys from software vendors. For my company, money won’t be a constraint, but performance is definitely a very big risk. It would save a lot of our time. Regards, Bibhu

  16. Gabramel

    Greg,

    Nice article. I am just reading the Netezza paper.

    You don’t appear to have debunked the following statement.

    “Exadata is unable to process this three table join in its MPP tier and instead must inefficiently move all the data required by the calculation across the network to Oracle RAC.”

    Not many queries exist where data is only required from two tables. Are Oracle suggesting we need to change the way data is structured to enable best use of Exadata – increasing TCO significantly?

    Thanks & Nice post.

  17. Greg Rahn

    @Bibhu

    Unfortunately your time and workload is what is required to best evaluate the technologies. I certainly can not get a Netezza system and publish benchmarks on it. Netezza themselves don’t even publish benchmarks. They are a member of the TPC but have yet to publish any benchmark.

  18. Pingback: Debunking More Netezza FUD About Exadata | Structured Data
  19. DIBYA SANYAL

    Greg,
    Factual and to-the-point article. I have been doing a DB comparison between Oracle Exadata, Netezza and Teradata for an upcoming engagement. Your article helps a lot. Specifically the index abuse part :) One QQ: Any best practices around migrating a ~ 100TB Oracle 10g Warehouse to Oracle Exadata? Pointers to existing material will be great!
    Thanks and great post!

  20. obaid

    Hi,
    I have a DWH on Exadata without using a single index and nobody can say that it is slow.
    This is just for your information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s