A few weeks back one of the Vertica developers put up a blog post on counting triangles in an undirected graph with reciprocal edges. The author was comparing the size of the data and the elapsed times to run this calculation on Hadoop and Vertica and put up the work on github and encouraged others: “do try this at home.” So I did.
Vertica draws attention to the fact that their compression brought the size of the 86,220,856 tuples down to 560MB in size, from a flat file size of 1,263,234,543 bytes resulting in around a 2.25X compression ratio. My first task was to load the data and see how Oracle’s Hybrid Columnar Compression would compare. Below is a graph of the sizes.
As you can see, Oracle’s default HCC query compression (query high) compresses the data over 2X more than Vertica and even HCC query low compression beats out Vertica’s compression number.
Query Elapsed Times
The closest gear I had to Vertica’s hardware was an Exadata X2-2 system — both use 2 socket, 12 core Westmere-EP nodes. While one may try to argue that Exadata may somehow influence the execution times, I’ll point out that I was using In-Memory Parallel Execution so no table data was even read from spinning disk or Exadata Flash Cache — it’s all memory resident in the database nodes’ buffer cache. This seems to be inline with how Vertica executed their tests though not explicitly stated (it’s a reasonable assertion).
After I loaded the data and gathered table stats, I fired off the exact same SQL query that Vertica used to count triangles to see how Oracle would compare. I ran the query on 1, 2 and 4 nodes just like Vertica. Below is a graph of the results.
As you can see, the elapsed times are reasonably close but overall in the favor of Oracle winning 2 of the 3 scale points as well as having a lower sum of the three executions: Vertica 519 seconds, Oracle 487 seconds — advantage Oracle of 32 seconds.
It Should Go Faster!
As a database performance engineer I was thinking to myself, “it really should go faster!” I took a few minutes to look over things to see what could make this perform better. You might think I was looking at parameters or something like that, but you would be wrong. After a few minutes of looking at the query and the execution plan it became obvious to me — it could go faster! I made a rather subtle change to the SQL query and reran my experiments. With the modified SQL query Oracle was now executing twice as fast on 1 node than Vertica was on 4 nodes. Also, on 4 nodes, the elapsed time came in at just 14 seconds, compared to the 97 seconds Vertica reported — a difference of almost 7X! Below are the combined results.
What’s The Go Fast Trick?
I was thinking a bit more about the problem at hand — we need to count vertices but not count them twice since they are reciprocal. Given that for any edge, it exists in both directions, the query can be structured like Vertica wrote it — doing the filtering with a join predicate like e1.source < e2.source to eliminate the duplicates or we can simply use a single table filter predicate like source < dest before the join takes place. One of the first things they taught me in query planning and optimization class was to filter early! That notation pays off big here because the early filter cuts the rows going into the first join as well as the output of the first join by a factor of 2 — 1.8 billion rows output vs. 3.6 billion. That’s a huge savings not only in the first join, but also in the second join as well.
Here is what my revised query looks like:
with e1 as (select * from edges where source < dest), e2 as (select * from edges where source < dest), e3 as (select * from edges where source > dest) select count(*) from e1 join e2 on (e1.dest = e2.source) join e3 on (e2.dest = e3.source) where e3.dest = e1.source
First, I’d like to thank the Vertica team for throwing the challenge out there and being kind enough to provide the data, code and their elapsed times. I always enjoy a challenge — especially one that I can improve upon. Now, I’m not going to throw any product marketing nonsense out there as that is certainly not my style (and there certainly is more than enough of that already), but rather I’ll just let the numbers do the talking. I’d also like to point out that this experiment was done without any structure other than the table. And in full disclosure, all of my SQL commands are available as well.
The other comment that I would make is that the new and improved execution times really make a mockery of the exercise when comparing to Hadoop MapReduce or Pig, but I would also mention that this test case is extremely favorable for parallel pipelined databases that can perform all in-memory operations and given the data set is so small, this is the obviously the case. Overall, in my opinion, a poor problem choice to compare the three technologies as it obviously (over) highlights the right tool for the job cliche.
Experiments performed on Oracle Database 188.8.131.52.
Github source code: https://gist.github.com/grahn/1289188
As Oracle OpenWorld is just around the corner and you are probably getting your session schedule together, I thought I’d pass on this (nearly complete) list of sessions of which one or more of the presenters is an OakTable member. There is no doubt in my mind that these sessions will contain some of the best technical content presented at OpenWorld so be sure and pre-register for these sessions today.
Hope to see you there!
There seems to be little debate that Oracle’s launch of the Oracle Exadata Storage Server and the Sun Oracle Database Machine has created buzz in the database marketplace. Apparently there is so much buzz and excitement around these products that two competing vendors, Teradata and Netezza, have both authored publications that contain a significant amount of discussion about the Oracle Database with Real Application Clusters (RAC) and Oracle Exadata. Both of these vendor papers are well structured but make no mistake, these are marketing publications written with the intent to be critical of Exadata and discuss how their product is potentially better. Hence, both of these papers are obviously biased to support their purpose.
My intent with this blog post is simply to discuss some of the claims, analyze them for factual accuracy, and briefly comment on them. After all, Netezza clearly states in their publication:
The information shared in this paper is made available in the spirit of openness. Any inaccuracies result from our mistakes, not an intent to mislead.
In the interest of full disclosure, my employer is Oracle Corporation, however, this is a personal blog and what I write here are my own ideas and words (see disclaimer on the right column). For those of you who don’t know, I’m a database performance engineer with the Real-World Performance Group which is part of Server Technologies. I’ve been working with Exadata since before it was launched publicly and have worked on dozens of data warehouse proofs-of-concept (PoCs) running on the Exadata powered Sun Oracle Database Machine. My thoughts and comments are presented purely from an engineer’s standpoint.
The following writings are the basis of my discussion:
- Teradata: Exadata – the Sequel: Exadata V2 is Still Oracle
- Daniel Abadi: Defending Oracle Exadata
- Netezza: Oracle Exadata and Netezza TwinFin™ Compared
If you have not read Daniel Abadi’s blog post I strongly suggest you do before proceeding further. I think it is very well written and is presented from a vendor neutral point of view so there is no marketing gobbledygook to sort through. Several of the points in the Teradata writing which he discusses are also presented (or similarly presented) in the Netezza eBook, so you can relate his responses to those arguments as well. Since I feel Daniel Abadi did an excellent job pointing out the major flaws with the Teradata paper, I’m going to limit my discuss to the Netezza eBook.
Understanding Exadata Smart Scan
As a prerequisite for the discussion of the Netezza and Teradata papers, it’s imperative that we take a minute to understand the basics of Exadata Smart Scan. The Smart Scan optimizations include the following:
- Data Elimination via Storage Indexes
- Restriction/Row Filtering/Predicate Filtering
- Projection/Column Filtering
- Join Processing/Filtering via Bloom Filters and Bloom Pruning
The premise of these optimizations is reduce query processing times in the following ways:
- I/O Elimination – don’t read data off storage that is not needed
- Payload Reduction – don’t send data to the Oracle Database Servers that is not needed
OK. Now that you have a basic understanding, let’s dive into the claims…
Let’s discuss a few of Netezza claims against Exadata:
Claim: Exadata Smart Scan does not work with index-organized tables or clustered tables.
While this is a true statement, its intent is clearly to mislead you. Both of these structures are designed for OLTP workloads, not data warehousing. In fact, if one were to actually read the Oracle Database 11.2 documentation for index-organized tables you would find the following (source):
Index-organized tables are ideal for OLTP applications, which require fast primary key access
If one were to research table clusters you would find the Oracle Database 11.2 documentation offers the following guidelines (source):
Typically, clustering tables is not appropriate in the following situations:
- The tables are frequently updated.
- The tables frequently require a full table scan.
- The tables require truncating.
As anyone can see from the Oracle Database 11.2 Documentation, neither of these structures are appropriate for data warehousing.
Apparently this was not what Netezza really wanted you to know so they uncovered a note on IOTs from almost a decade ago, dating back to 2001 – Oracle 9i time frame, that while it clearly states:
[an IOT] enables extremely fast access to table data for primary key based [OLTP] queries
it also suggests that an IOT may be used as a fact table. Clearly this information is quite old and outdated and should probably be removed. What was a recommendation for Oracle Database 9i Release 1 in 2001 is not necessarily a recommendation for Oracle Database 11g Release 2 in 2010. Technology changes so using the most recent recommendations as a basis for discussion is appropriate, not some old, outdated stuff from nearly 10 years ago. Besides, the Oracle Database Machine runs version 11g Release 2, not 9i Release 1.
Bottom line: I’d say this “limitation” has an impact on a nice round number of Exadata data warehouse customers – exactly zero (zero literally being a round number). IOTs and clustered tables are both structures optimized for fast primary key access, like the type of access in OLTP workloads, not data warehousing. The argument that Smart Scan does not work for these structures is really no argument at all.
Claim: Exadata Smart Scan does not work with the TIMESTAMP datatype.
Phil Francisco seems to have left out some very important context in making this accusation, because this is not at all what the cited blog post by Christian Antognini discusses. This post clearly states the discussion is about:
What happens [with Smart Scan] when predicates contain functions or expressions?
Nowhere at all does that post make an isolated reference that Smart Scan does not work with the TIMESTAMP datatype. What this blog post does state is this:
when a TIMESTAMP datatype is involved [with datetime functions], offloading almost never happens
While the Netezza paper references what the blog post author has written, some very important context has been omitted. In doing so, Netezza has taken a specific reference and turned it into a misleading generalization.
The reality is that Smart Scan does indeed work for the TIMESTAMP datatype and here is a basic example to demonstrate such:
SQL> describe t Name Null? Type -------------- -------- ------------------ ID NOT NULL NUMBER N NUMBER BF BINARY_FLOAT BD BINARY_DOUBLE D DATE T TIMESTAMP(6) S VARCHAR2(4000) SQL> SELECT * FROM t WHERE t = to_timestamp('01-01-2010','DD-MM-YYYY'); Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 52 | 4 (0)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 52 | 4 (0)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("T"=TIMESTAMP' 2010-01-01 00:00:00.000000000') filter("T"=TIMESTAMP' 2010-01-01 00:00:00.000000000')
You can see that the Smart Scan offload is taking place by the presence of the storage clause (highlighted) in the Predicate Information section above. What Christian Antognini did observe is bug 9682721 and the bugfix resolves the datetime function offload issues for all but a couple scenarios (which he blogs about here) and those operations can (and usually are) expressed differently. For example, an expression using ADD_MONTHS() can easily be expressed using BETWEEN.
Bottom line: Exadata Smart Scan does work with the TIMESTAMP datatype.
Claim: When transactions (insert, update, delete) are operating against the data warehouse concurrent with query activity, smart scans are disabled. Dirty buffers turn off smart scan.
Yet again, Netezza presents only a half-truth. While it is true that an active transaction disables Smart Scan, they fail to further clarify that Smart Scan is only disabled for those blocks that contain an active transaction – the rest of the blocks are able to be Smart Scanned. The amount of data that is impacted by insert, update, delete will generally be a very small fraction of the total data in a data warehouse. Also, data that is inserted via direct path operations is not subject to MVCC (the method Oracle uses for read consistency) as the blocks that are used are new blocks so no read consistent view is needed.
Bottom line: While this claim is partially true, it clearly attempts to overstate the impact of this scenario in a very negative way. Not having Smart Scan for small number of blocks will have a negligible impact on performance.
Also see Daniel Abadi: Exadata does NOT Support Active Data Warehousing
Claim: Using [a shared-disk] architecture for a data warehouse platform raises concern that contention for the shared resource imposes limits on the amount of data the database can process and the number of queries it can run concurrently.
It is unclear what resource Netezza is referring to here, it simply states “the shared resource”. You know the one? Yes, that one… Perhaps they mean the disks themselves, but that is completely unknown. Anyway…
Exadata uses at least a 4 MB Automatic Storage Management (ASM) allocation unit (AU) [more on ASM basics]. This means that there is at least 4 MB of contiguous physical data laid out on the HDD which translates into 4 MB of contiguous data streamed off of disk for full table scans before the head needs to perform a seek. With such large I/O requests the HDDs are able to spend nearly all the time transferring data, and very little time finding it and that is what matters most. Clearly if Exadata is able to stream data off of disk at 125 MB/s per disk (near physics speed for this type of workload) then any alleged “contention” is really not an issue. In many multi-user data warehouse workloads for PoCs, I’ve observed that each Exadata Storage Server is able to perform very close or at the data sheet physical HDD I/O rate of 1500 MB/s per server.
Bottom line: The scalability differences between shared-nothing and shared-disk are very much exaggerated. By doing large sequential I/Os the disk spends its time returning data, not finding it. Simply put – there really is no “contention”.
Also see Daniel Abadi: 1) Exadata does NOT Enable High Concurrency & 2) Exadata is NOT Intelligent Storage; Exadata is NOT Shared-Nothing
Claim: Analytical queries, such as “find all shopping baskets sold last month in Washington State, Oregon and California containing product X with product Y and with a total value more than $35″ must retrieve much larger data sets, all of which must be moved from storage to database.
I find it so ironic that Netezza mentions this type of query as nearly an identical (but more complex) one was used by my group at Oracle OpenWorld 2009 in our The Terabyte Hour with the Real-World Performance Group session. The exact analytical query we ran live for the audience to demonstrate the features of Oracle Exadata and the Oracle Database Machine was, “What were the most popular items in the baskets of shoppers who visited stores in California in the first week of May and did not buy bananas?”
Let’s translate the Netezza analytical question into some tables and SQL to see what the general shape of this query may look like:
select count(*) -- using count(*) for simplicity of the example from ( select td.transaction_id, sum(td.sales_dollar_amt) total_sales_amt, sum(case when p.product_description in ('Brand #42 beer') then 1 else 0 end) count_productX, sum(case when p.product_description in ('Brand #42 frozen pizza') then 1 else 0 end) count_productY from transaction_detail td join d_store s on (td.store_key = s.store_key) join d_product p on (td.product_key = p.product_key) where s.store_state in ('CA','OR','WA') and td.transaction_time >= timestamp '2010-07-01 00:00:00' and td.transaction_time < timestamp '2010-08-01 00:00:00' group by td.transaction_id ) x where total_sales_amt > 35 and count_productX > 0 and count_productY > 0
To me, this isn’t a particularly complex analytical question/query. As written, it’s just a 3 table join (could be 4 if I added a D_DATE I suppose), but it doesn’t require anything fancy – just a simple GROUP BY with a CASE in the SELECT to count how many times Product X and Product Y appear in a given basket.
Netezza claims that analytical queries like this one must move all the data from storage to the database, but that simply is not true. Here is why:
- Simple range partitioning on the event timestamp (a very common data warehousing practice for those databases that support partitioning), or even Exadata Storage Indexes, will eliminate any I/O for data other than the one month window that is required for this query.
- A bloom filter can be created and pushed into Exadata to be used as a storage filter for the list of STORE_KEY values that represent the three state store restriction.
Applying both of #1 and #2, the only data that is returned to the database for the fact table are rows for stores in Washington State, Oregon and California for last month. Clearly this is only a subset of the data for the entire fact table.
This is just one example, but there are obviously different representations of the same data and query that could be used. I chose what I thought was the most raw, unprocessed, uncooked form simply because Netezza seems to boast about brute force type of operations. Even then, considering a worst case scenario, Exadata does not have to move all the data back to the database. Other data/table designs that I’ve seen from customers in the retail business would allow even less data to be returned.
Bottom line: There are numerous ways that Exadata can restrict the data that is set to the database servers and it’s likely that any query with any predicate restrictions can do so. Certainly it is possible even with the analytic question that Netezza mentions.
Claim: To evenly distribute data across Exadata’s grid of storage servers requires administrators trained and experienced in designing, managing and maintaining complex partitions, files, tablespaces, indices, tables and block/extent sizes.
Interestingly enough, the author of the Teradata paper seems to have a better grasp than Netezza on how data distribution and ASM work describing it on page 9:
Distribution of data on Exadata storage is managed by Oracle’s Automatic Storage Manager (ASM). By default, ASM stripes each Oracle data partition across all available disks on every Exadata cell.
So if by default ASM evenly stripes data across all available disks on Exadata Storage Server (and it does, in a round robin manner) what exactly is so difficult here? What training and experience is really required for something that does data distribution automatically? I can only assert that Phil Francisco has not even read the Teradata paper (but it would seem he has since he even mentions it on his blog), let alone Introduction to Oracle Automatic Storage Management. It’s claims like this that really make me question how genuine his “no intent to mislead” statement really is.
Bottom line: Administrators need not worry about data distribution with Exadata and ASM – it is done automatically and evenly for you.
I’m always extremely reluctant to believe much of what vendors say about other vendors, especially when they preface their publication with something like: “One caveat: Netezza has no direct access to an Exadata machine“, and “Any inaccuracies result from our mistakes, not an intent to mislead” yet they still feel qualified enough to write about said technology and claim it as fact. I also find it interesting that both Teradata and Netezza have published anti-Exadata papers, but neither Netezza nor Teradata have published anti-vendor papers about each other (that I know of). Perhaps Exadata is much more of a competitor than either of them let on. They do protest too much, methinks.
The list of claims I’ve discussed certainly is not an exhaustive list by any means but I think it is fairly representative of the quality found in Netezza’s paper. While sometimes the facts are correct, the arguments are overstated and misleading. Other times, the facts are simply wrong. Netezza clearly attempts to create the illusion of problems simply where they do not exist.
Hopefully this blog post has left you a more knowledgeable person when it comes to Oracle and Exadata. I’ve provided fact and example wherever possible and kept assertions to a minimum.
I’d like to end with a quote from Daniel Abadi’s response to the Teradata paper which I find more than applicable to the Netezza paper as well:
Many of the claims and inferences made in the paper about Exadata are overstated, and the reader needs to be careful not to be mislead into believing in the existence problems that don’t actually present themselves on realistic data sets and workloads.
Courteous and professional comments are welcome. Anonymous comments are discouraged. Snark and flame will end up in the recycle bin. Comment accordingly.
[back to Introduction]
Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets. PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel. The goal of Oracle PX: divide and conquer. This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets.
I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution. There is an entire chapter in the Oracle Database documentation on PX as well as several white papers. I’ve listed all these in the Resources section at the bottom of this post. Read them, but as always, feel free to post questions/comments here. Discussion adds great value.
A Basic Example of Parallel Execution
Consider a simple one table query like the one below.
You can see that the PX Coordinator (also known as the Query Coordinator or QC) breaks up the “work” into several chunks and those chunks are worked on by the PX Server Processes. The technical term for the chunk a PX Server Process works on is called a granule. Granules can either be block-based or partition-based.
When To Use Parallel Execution
PX is a key component in data warehousing as that is where large data sets usually exist. The most common operations that use PX are queries (SELECTs) and data loads (INSERTs or CTAS). PX is most commonly controlled by using the PARALLEL attribute on the object, although it can be controlled by hints or even Oracle’s Database Resource Manager. If you are not using PX in your Oracle data warehouse than you are probably missing out on a shedload of performance opportunity.
When an object has its PARALLEL attribute set or the PARALLEL hint is used queries will leverage PX, but to leverage PX for DML operations (INSERT/DELETE/UPDATE) remember to alter your session by using the command:
alter session [enable|force] parallel dml;
Do Not Fear Parallel Execution
Since Oracle’s PX is designed to take advantage of multiple CPUs (or CPU cores) at a time, it can leverage significant hardware resources, if available. From my experiences in talking with Oracle DBAs, the ability for PX to do this scares them. This results in DBAs implementing a relatively small degree of parallelism (DOP) for a system that could possibly support a much higher level (based on #CPUs). Often times though, the system that PX is being run on is not a balanced system and frequently has much more CPU power than disk and channel bandwidth, so data movement from disk becomes the bottleneck well before the CPUs are busy. This results in many statements like “Parallel Execution doesn’t work” or similar because the user/DBA isn’t observing a decrease in execution time with more parallelism. Bottom line: if the hardware resources are not available, the software certainly can not scale.
Just for giggles (and education), here is a snippet from top(1) from a node from an Oracle Database Machine running a single query (across all 8 database nodes) at DOP 256.
top - 20:46:44 up 5 days, 3:48, 1 user, load average: 36.27, 37.41, 35.75 Tasks: 417 total, 43 running, 373 sleeping, 0 stopped, 1 zombie Cpu(s): 95.6%us, 1.6%sy, 0.0%ni, 2.2%id, 0.0%wa, 0.2%hi, 0.4%si, 0.0%st Mem: 74027752k total, 21876824k used, 52150928k free, 440692k buffers Swap: 16771852k total, 0k used, 16771852k free, 13770844k cached USER PID PR NI VIRT SHR RES S %CPU %MEM TIME+ COMMAND oracle 16132 16 0 16.4g 5.2g 5.4g R 63.8 7.6 709:33.02 ora_p011_orcl oracle 16116 16 0 16.4g 4.9g 5.1g R 60.9 7.2 698:35.63 ora_p003_orcl oracle 16226 15 0 16.4g 4.9g 5.1g R 59.9 7.2 702:01.01 ora_p028_orcl oracle 16110 16 0 16.4g 4.9g 5.1g R 58.9 7.2 697:20.51 ora_p000_orcl oracle 16122 15 0 16.3g 4.9g 5.0g R 56.9 7.0 694:54.61 ora_p006_orcl
(Quite the TIME+ column there, huh!)
In this post I’ve been a bit light on the technicals of PX, but that is mostly because 1) this is a fundamentals post and 2) there is a ton of more detail in the referenced documentation and I really don’t feel like republishing what already exists. Bottom line, Oracle Parallel Execution is a must for scaling performance in your Oracle data warehouse. Take the time to understand how to leverage it to maximize performance in your environment and feel free to start a discussion here if you have questions.
[back to Introduction]
Partitioning is an essential performance feature for an Oracle data warehouse because partition elimination (or partition pruning) generally results in the elimination of a significant amount of table data to be scanned. This results in a need for less system resources and improved query performance. Someone once told me “the fastest I/O is the one that never happens.” This is precisely the reason that partitioning is a must for Oracle data warehouses – it’s a huge I/O eliminator. I frequently refer to partition elimination as the anti-index. An index is used to find a small amount data that is required; partitioning is used to eliminate vasts amounts of data that is not required.
Main Uses For Partitioning
I would classify the main reasons to use partitioning in your Oracle data warehouse into these four areas:
- Data Elimination
- Partition-Wise Joins
- Manageability (Partition Exchange Load, Local Indexes, etc.)
- Information Lifecycle Management (ILM)
The most common partitioning design pattern found in Oracle data warehouses is to partition the fact tables by range (or interval) on the event date/time column. This allows for partition elimination of all the data not in the desired time window in queries. For example: If I have a fact table that contains point of sale (POS) data, each line item for a given transaction has a time stamp of when the item was scanned. Let’s say this value is stored in column EVENT_TS which is a DATE or TIMESTAMP data type. In most cases it would make sense to partition by range on EVENT_TS using one day partitions. This means every query that uses a predicate filter on EVENT_TS (which should be nearly every one) can eliminate significant amounts of data that is not required to satisfy the query predicate. If you want to look at yesterday’s sales numbers, there is no need to bring back rows from last week or last month!
Depending on the schema design of your data warehouse you may also chose to subpartition a table. This allows one to further segment a table to allow for even more data elimination or it can allow for partition-wise joins which allow for reduced usage of CPU and memory resources by minimizing the amount of data exchanged between parallel execution server processes. In third normal form (3NF) schemas it is very beneficial to use hash partitioning or subpartitioning to allow for partition-wise joins (see Oracle Parallel Execution: Interconnect Myths And Misunderstandings) for this exact reason. Dimensional models (star schemas) may also benefit from hash subpartitioning and partition-wise joins. Generally it is best to hash subpartition on a join key column to a very large dimension, like CUSTOMER, so that a partition-wise join will be used between the fact table and the large dimension table.
Managing large objects can be challenging for a variety of reasons which is why Oracle Partitioning allows for many operations to be done at a global or partition (or subpartition) level. This makes it much easier to deal with tables or indexes of large sizes. It also is transparent to applications so the SQL that runs against a non-partitioned object will run as-is against a partitioned object. Some of the key features include:
- Partition Exchange Load – Data can be loaded “out of line” and exchanged into a partitioned table.
- Local Indexes – It takes much less time to build local indexes than global indexes.
- Compression – Can be applied at the segment level so it’s possible to have a mix of compressed and non-compressed partitions.
- Segment Moves/Rebuilds/Truncates/Drops – Each partition (or subpartition) is a segment and can be operated on individually and independently of the other partitions in the table.
- Information Lifecycle Management (ILM) – Partitioning allows implementation of an ILM strategy.
I’d classify partitioning as a “must have” for Oracle data warehouses for both the performance and manageability reasons described above. Partitioning should lower query response time as well as resource utilization do to “smart” data access (only go after the data the query needs). There are additional partitioning design patterns as well and the Oracle documentation contains descriptions of them as well as examples.
Oracle Documentation References:
The first thing that comes to most people’s mind when database table compression is mentioned is the savings it yields in terms of disk space. While reducing the footprint of data on disk is relevant, I would argue it is the lesser of the benefits for data warehouses. Disk capacity is very cheap and generally plentiful, however, disk bandwidth (scan speed) is proportional to the number of spindles, no mater what the disk capacity and thus is more expensive. Table compression reduces the footprint on the disk drives that a given data set occupies so the amount of physical data that must be read off the disk platters is reduced when compared to the uncompressed version. For example, if 4000 GB of raw data can compress to 1000 GB, it can be read off the same disk drives 4X as fast because it is reading and transferring 1/4 of the data off the spindles (relative to the uncompressed size). Likewise, table compression allows for the database buffer cache to contain more data without having to increase the memory allocation because more rows can be stored in a compressed block/page compared to an uncompressed block/page.
Row major table compression comes in two flavors with the Oracle database: BASIC and OLTP. In 11.1 these were also known by the key phrases COMPRESS or COMPRESS FOR DIRECT_LOAD OPERATIONS and COMPRESS FOR ALL OPERATIONS. The BASIC/DIRECT_LOAD compression has been part of the Oracle database since version 9 and ALL OPERATIONS/OLTP compression was introduced in 11.1 with the Advanced Compression option.
Oracle row major table compression works by storing the column values for a given block in a symbol table at the beginning of the block. The more repeated values per block, even across columns, the better the compression ratio. Sorting data can increase the compression ratio as ordering the data will generally allow more repeat values per block. Specific compression ratios and gains from sorting data are very data dependent but compression ratios are generally between 2x and 4x.
Compression does add some CPU overhead when direct path loading data, but there is no measurable performance overhead when reading data as the Oracle database can operate on compressed blocks directly without having to first uncompress the block. The additional CPU required when bulk loading data is generally well worth the down wind gains for data warehouses. This is because most data in a well designed data warehouse is write once, read many times. Insert only and infrequently modified tables are ideal candidates for BASIC compression. If the tables have significant DML performed against them, then OLTP compression would be advised (or no compression).
Given that most Oracle data warehouses that I have seen are constrained by I/O bandwidth (see Balanced Hardware Configuration) it is highly recommended to leverage compression so the logical table scan rate can increase proportionally to the compression ratio. This will result in faster table and partition scans on the same hardware.
Oracle Documentation References:
- Performance Tuning Guide: Table Compression
- VLDB and Partitioning Guide: Partitioning and Table Compression
- Guidelines for Managing Tables: Consider Using Table Compression
- Table Compression in Oracle9i Release 2: A Performance Analysis
- Table Compression in Oracle Database 10g Release 2
- Advanced Compression with Oracle Database 11g Release 2
[back to Introduction]
If you want to build a house that will stand the test of time, you need to build on a solid foundation. The same goes for architecting computer systems that run databases. If the underlying hardware is not sized appropriately it will likely lead to people blaming software. All too often I see data warehouse systems that are poorly architected for the given workload requirements. I frequently tell people, “you can’t squeeze blood from a turnip“, meaning if the hardware resources are not there for the software to use, how can you expect the software to scale?
Undersizing data warehouse systems has become an epidemic with open platforms – platforms that let you run on any brand and configuration of hardware. This problem has been magnified over time as the size of databases have grown significantly, and generally outpacing the experience of those managing them. This has caused the “big three” database vendors to come up with suggested or recommended hardware configurations for their database platforms:
- Oracle: Optimized Warehouse Initiative
- Microsoft: SQL Server Fast Track Data Warehouse
- IBM: Balanced Configuration Unit (BCU)
Simply put, the reasoning behind those initiatives was to help customers architect systems that are well balanced and sized appropriately for the size of their data warehouse.
Balanced Hardware Configurations
The foundation for a well performing data warehouse (or any system for that matter) is the hardware that it runs on. There are three main hardware resources to consider when sizing your data warehouse hardware. Those are:
- Number of CPUs
- Number of storage devices (HDDs or SSDs)
- I/O bandwidth between CPUs and storage devices
NB: I’ve purposely left off memory (RAM) as most systems are pretty well sized at 2GB or 4GB per CPU core these days.
A balanced system has the following characteristics:
As you can see, each of the three components are sized proportionally to each other. This allows for the max system throughput capacity as no single resource will become the bottleneck before any other. This was one of the critical design decisions that went into the Oracle Database Machine.
Most DBAs and System Admins know what the disk capacity numbers are for their systems, but when it comes to I/O bandwidth or scan rates, most are unaware of what the system is capable of in theory, let alone in practice. Perhaps I/O bandwidth utilization should be included in the system metrics that are collected for your databases. You do collect system metrics, right?
There are several “exchanges” that data must flow through from storage devices to host CPUs, many of which could become bottlenecks. Those include:
- Back-end Fibre Channel loops (the fibre between the drive shelves and the storage array server processor)
- Front-end Fibre Channel ports
- Storage array server processors (SP)
- Host HBAs
One should understand the throughput capacity of each of these components to ensure that one (or more) of them do not restrict the flow of data to the CPUs prematurely.
Unbalanced Hardware Configurations
All too frequently systems are not architected as balanced systems and the system ends up being constrained in one of the following three scenarios:
From the production systems that I have seen, the main deficiency is in I/O bandwidth (both I/O Channel and HDD). I believe there are several reasons for this. First, too many companies capacity plan for their data warehouse based on the size the data occupies on disk alone. That is, they purchase the number of HDDs for the system based on the drive capacity, not on the I/O bandwidth requirement. Think of it like this: If you were to purchase 2 TB of mirrored disk capacity (4 TB total) would you rather purchase 28 x 146 GB drives or 14 x 300 GB drives (or even 4 x 1 TB drives)? You may ask: Well, what is the difference (other than price); in each case you have the same net capacity, correct? Indeed, both configurations do have the same capacity, but I/O bandwidth (how fast you can read data off the HDDs) is proportional to the number of HDDs, not the capacity. Thus it should be slightly obvious then that 28 HDDs can deliver 2X the disk I/O bandwidth that 14 HDDs can. This means that it will take 2X as long to read the same amount of data off of 14 HDDs as 28 HDDs.
Unfortunately what tends to happen is that the bean counter types will see only two things:
- The disk capacity (space) you want to purchase (or the capacity that is required)
- The price per MB/GB/TB
This is where someone worthy of the the title systems architect needs to stand up and explain the concept of I/O bandwidth and the impact it has on data warehouse performance (your systems architect does know this, correct?). This is generally a difficult discussion because I/O bandwidth is not a line item on a purchase order, it is a derived metric that requires both thought and engineering (which means someone had to do some thinking about the requirements for this system!).
When sizing the hardware for your data warehouse consider your workload and understand following (and calculate numbers for them!):
- What rate (in MB/GB per second) can the CPUs consume data?
- What rate can storage devices produce data (scan rate)?
- What rate can the data be delivered from the storage array(s) to the host HBAs?
If you are unable to answer these questions in theory then you need to sit down and do some calculations. Then you need to use some micro benchmarks (like Oracle ORION) and prove out those calculations. This will give you the “speed limit” and an metric by which you can measure your database workload against. All computer systems much obey the laws of physics! There is no way around that.
Kevin Closson has several good blog posts on related topics including:
- SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!
- Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?
as well as numerous others.
Oracle Documentation References:
At the 2009 Oracle OpenWorld Unconference back in October I lead a chalk and talk session entitled The Core Performance Fundamentals Of Oracle Data Warehousing. Since this was a chalk and talk I spared the audience any powerpoint slides but I had several people request that make it into a presentation so they could share it with others. After some thought, I decided that a series of blog posts would probably be a better way to share this information, especially since I tend to use slides as a speaking outline, not a condensed version of a white paper. This will be the first of a series of posts discussing what I consider to be the key features and technologies behind well performing Oracle data warehouses.
As an Oracle database performance engineer who has done numerous customer data warehouse benchmarks and POCs over the past 5+ years, I’ve seen many data warehouse systems that have been plagued with problems on nearly every DBMS commonly used in data warehousing. Interestingly enough, many of these systems were facing many of the same problems. I’ve compiled a list of topics that I consider to be key features and/or technologies for Oracle data warehouses:
Core Performance Fundamental Topics
- Balanced Hardware Configuration
- Table Compression
- Parallel Execution
- Data Loading
- Row vs. Set Processing
- Indexing and Materalized Views
In the upcoming posts, I’ll deep dive into each one of these topics discussing why these areas are key for a well performing Oracle data warehouse. Stay tuned…
The Solaris x86-64 port of Oracle Database 11g Release 2 can now be downloaded from OTN. Get it while it’s hot!