[back to Introduction]
Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets. PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel. The goal of Oracle PX: divide and conquer. This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets.
I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution. There is an entire chapter in the Oracle Database documentation on PX as well as several white papers. I’ve listed all these in the Resources section at the bottom of this post. Read them, but as always, feel free to post questions/comments here. Discussion adds great value.
A Basic Example of Parallel Execution
Consider a simple one table query like the one below.
You can see that the PX Coordinator (also known as the Query Coordinator or QC) breaks up the “work” into several chunks and those chunks are worked on by the PX Server Processes. The technical term for the chunk a PX Server Process works on is called a granule. Granules can either be block-based or partition-based.
When To Use Parallel Execution
PX is a key component in data warehousing as that is where large data sets usually exist. The most common operations that use PX are queries (SELECTs) and data loads (INSERTs or CTAS). PX is most commonly controlled by using the PARALLEL attribute on the object, although it can be controlled by hints or even Oracle’s Database Resource Manager. If you are not using PX in your Oracle data warehouse than you are probably missing out on a shedload of performance opportunity.
When an object has its PARALLEL attribute set or the PARALLEL hint is used queries will leverage PX, but to leverage PX for DML operations (INSERT/DELETE/UPDATE) remember to alter your session by using the command:
alter session [enable|force] parallel dml;
Do Not Fear Parallel Execution
Since Oracle’s PX is designed to take advantage of multiple CPUs (or CPU cores) at a time, it can leverage significant hardware resources, if available. From my experiences in talking with Oracle DBAs, the ability for PX to do this scares them. This results in DBAs implementing a relatively small degree of parallelism (DOP) for a system that could possibly support a much higher level (based on #CPUs). Often times though, the system that PX is being run on is not a balanced system and frequently has much more CPU power than disk and channel bandwidth, so data movement from disk becomes the bottleneck well before the CPUs are busy. This results in many statements like “Parallel Execution doesn’t work” or similar because the user/DBA isn’t observing a decrease in execution time with more parallelism. Bottom line: if the hardware resources are not available, the software certainly can not scale.
Just for giggles (and education), here is a snippet from top(1) from a node from an Oracle Database Machine running a single query (across all 8 database nodes) at DOP 256.
top - 20:46:44 up 5 days, 3:48, 1 user, load average: 36.27, 37.41, 35.75 Tasks: 417 total, 43 running, 373 sleeping, 0 stopped, 1 zombie Cpu(s): 95.6%us, 1.6%sy, 0.0%ni, 2.2%id, 0.0%wa, 0.2%hi, 0.4%si, 0.0%st Mem: 74027752k total, 21876824k used, 52150928k free, 440692k buffers Swap: 16771852k total, 0k used, 16771852k free, 13770844k cached USER PID PR NI VIRT SHR RES S %CPU %MEM TIME+ COMMAND oracle 16132 16 0 16.4g 5.2g 5.4g R 63.8 7.6 709:33.02 ora_p011_orcl oracle 16116 16 0 16.4g 4.9g 5.1g R 60.9 7.2 698:35.63 ora_p003_orcl oracle 16226 15 0 16.4g 4.9g 5.1g R 59.9 7.2 702:01.01 ora_p028_orcl oracle 16110 16 0 16.4g 4.9g 5.1g R 58.9 7.2 697:20.51 ora_p000_orcl oracle 16122 15 0 16.3g 4.9g 5.0g R 56.9 7.0 694:54.61 ora_p006_orcl
(Quite the TIME+ column there, huh!)
In this post I’ve been a bit light on the technicals of PX, but that is mostly because 1) this is a fundamentals post and 2) there is a ton of more detail in the referenced documentation and I really don’t feel like republishing what already exists. Bottom line, Oracle Parallel Execution is a must for scaling performance in your Oracle data warehouse. Take the time to understand how to leverage it to maximize performance in your environment and feel free to start a discussion here if you have questions.
[back to Introduction]
Partitioning is an essential performance feature for an Oracle data warehouse because partition elimination (or partition pruning) generally results in the elimination of a significant amount of table data to be scanned. This results in a need for less system resources and improved query performance. Someone once told me “the fastest I/O is the one that never happens.” This is precisely the reason that partitioning is a must for Oracle data warehouses – it’s a huge I/O eliminator. I frequently refer to partition elimination as the anti-index. An index is used to find a small amount data that is required; partitioning is used to eliminate vasts amounts of data that is not required.
Main Uses For Partitioning
I would classify the main reasons to use partitioning in your Oracle data warehouse into these four areas:
- Data Elimination
- Partition-Wise Joins
- Manageability (Partition Exchange Load, Local Indexes, etc.)
- Information Lifecycle Management (ILM)
The most common partitioning design pattern found in Oracle data warehouses is to partition the fact tables by range (or interval) on the event date/time column. This allows for partition elimination of all the data not in the desired time window in queries. For example: If I have a fact table that contains point of sale (POS) data, each line item for a given transaction has a time stamp of when the item was scanned. Let’s say this value is stored in column EVENT_TS which is a DATE or TIMESTAMP data type. In most cases it would make sense to partition by range on EVENT_TS using one day partitions. This means every query that uses a predicate filter on EVENT_TS (which should be nearly every one) can eliminate significant amounts of data that is not required to satisfy the query predicate. If you want to look at yesterday’s sales numbers, there is no need to bring back rows from last week or last month!
Depending on the schema design of your data warehouse you may also chose to subpartition a table. This allows one to further segment a table to allow for even more data elimination or it can allow for partition-wise joins which allow for reduced usage of CPU and memory resources by minimizing the amount of data exchanged between parallel execution server processes. In third normal form (3NF) schemas it is very beneficial to use hash partitioning or subpartitioning to allow for partition-wise joins (see Oracle Parallel Execution: Interconnect Myths And Misunderstandings) for this exact reason. Dimensional models (star schemas) may also benefit from hash subpartitioning and partition-wise joins. Generally it is best to hash subpartition on a join key column to a very large dimension, like CUSTOMER, so that a partition-wise join will be used between the fact table and the large dimension table.
Managing large objects can be challenging for a variety of reasons which is why Oracle Partitioning allows for many operations to be done at a global or partition (or subpartition) level. This makes it much easier to deal with tables or indexes of large sizes. It also is transparent to applications so the SQL that runs against a non-partitioned object will run as-is against a partitioned object. Some of the key features include:
- Partition Exchange Load – Data can be loaded “out of line” and exchanged into a partitioned table.
- Local Indexes – It takes much less time to build local indexes than global indexes.
- Compression – Can be applied at the segment level so it’s possible to have a mix of compressed and non-compressed partitions.
- Segment Moves/Rebuilds/Truncates/Drops – Each partition (or subpartition) is a segment and can be operated on individually and independently of the other partitions in the table.
- Information Lifecycle Management (ILM) – Partitioning allows implementation of an ILM strategy.
I’d classify partitioning as a “must have” for Oracle data warehouses for both the performance and manageability reasons described above. Partitioning should lower query response time as well as resource utilization do to “smart” data access (only go after the data the query needs). There are additional partitioning design patterns as well and the Oracle documentation contains descriptions of them as well as examples.
Oracle Documentation References:
[back to Introduction]
If you want to build a house that will stand the test of time, you need to build on a solid foundation. The same goes for architecting computer systems that run databases. If the underlying hardware is not sized appropriately it will likely lead to people blaming software. All too often I see data warehouse systems that are poorly architected for the given workload requirements. I frequently tell people, “you can’t squeeze blood from a turnip“, meaning if the hardware resources are not there for the software to use, how can you expect the software to scale?
Undersizing data warehouse systems has become an epidemic with open platforms – platforms that let you run on any brand and configuration of hardware. This problem has been magnified over time as the size of databases have grown significantly, and generally outpacing the experience of those managing them. This has caused the “big three” database vendors to come up with suggested or recommended hardware configurations for their database platforms:
- Oracle: Optimized Warehouse Initiative
- Microsoft: SQL Server Fast Track Data Warehouse
- IBM: Balanced Configuration Unit (BCU)
Simply put, the reasoning behind those initiatives was to help customers architect systems that are well balanced and sized appropriately for the size of their data warehouse.
Balanced Hardware Configurations
The foundation for a well performing data warehouse (or any system for that matter) is the hardware that it runs on. There are three main hardware resources to consider when sizing your data warehouse hardware. Those are:
- Number of CPUs
- Number of storage devices (HDDs or SSDs)
- I/O bandwidth between CPUs and storage devices
NB: I’ve purposely left off memory (RAM) as most systems are pretty well sized at 2GB or 4GB per CPU core these days.
A balanced system has the following characteristics:
As you can see, each of the three components are sized proportionally to each other. This allows for the max system throughput capacity as no single resource will become the bottleneck before any other. This was one of the critical design decisions that went into the Oracle Database Machine.
Most DBAs and System Admins know what the disk capacity numbers are for their systems, but when it comes to I/O bandwidth or scan rates, most are unaware of what the system is capable of in theory, let alone in practice. Perhaps I/O bandwidth utilization should be included in the system metrics that are collected for your databases. You do collect system metrics, right?
There are several “exchanges” that data must flow through from storage devices to host CPUs, many of which could become bottlenecks. Those include:
- Back-end Fibre Channel loops (the fibre between the drive shelves and the storage array server processor)
- Front-end Fibre Channel ports
- Storage array server processors (SP)
- Host HBAs
One should understand the throughput capacity of each of these components to ensure that one (or more) of them do not restrict the flow of data to the CPUs prematurely.
Unbalanced Hardware Configurations
All too frequently systems are not architected as balanced systems and the system ends up being constrained in one of the following three scenarios:
From the production systems that I have seen, the main deficiency is in I/O bandwidth (both I/O Channel and HDD). I believe there are several reasons for this. First, too many companies capacity plan for their data warehouse based on the size the data occupies on disk alone. That is, they purchase the number of HDDs for the system based on the drive capacity, not on the I/O bandwidth requirement. Think of it like this: If you were to purchase 2 TB of mirrored disk capacity (4 TB total) would you rather purchase 28 x 146 GB drives or 14 x 300 GB drives (or even 4 x 1 TB drives)? You may ask: Well, what is the difference (other than price); in each case you have the same net capacity, correct? Indeed, both configurations do have the same capacity, but I/O bandwidth (how fast you can read data off the HDDs) is proportional to the number of HDDs, not the capacity. Thus it should be slightly obvious then that 28 HDDs can deliver 2X the disk I/O bandwidth that 14 HDDs can. This means that it will take 2X as long to read the same amount of data off of 14 HDDs as 28 HDDs.
Unfortunately what tends to happen is that the bean counter types will see only two things:
- The disk capacity (space) you want to purchase (or the capacity that is required)
- The price per MB/GB/TB
This is where someone worthy of the the title systems architect needs to stand up and explain the concept of I/O bandwidth and the impact it has on data warehouse performance (your systems architect does know this, correct?). This is generally a difficult discussion because I/O bandwidth is not a line item on a purchase order, it is a derived metric that requires both thought and engineering (which means someone had to do some thinking about the requirements for this system!).
When sizing the hardware for your data warehouse consider your workload and understand following (and calculate numbers for them!):
- What rate (in MB/GB per second) can the CPUs consume data?
- What rate can storage devices produce data (scan rate)?
- What rate can the data be delivered from the storage array(s) to the host HBAs?
If you are unable to answer these questions in theory then you need to sit down and do some calculations. Then you need to use some micro benchmarks (like Oracle ORION) and prove out those calculations. This will give you the “speed limit” and an metric by which you can measure your database workload against. All computer systems much obey the laws of physics! There is no way around that.
Kevin Closson has several good blog posts on related topics including:
- SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!
- Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?
as well as numerous others.
Oracle Documentation References:
At the 2009 Oracle OpenWorld Unconference back in October I lead a chalk and talk session entitled The Core Performance Fundamentals Of Oracle Data Warehousing. Since this was a chalk and talk I spared the audience any powerpoint slides but I had several people request that make it into a presentation so they could share it with others. After some thought, I decided that a series of blog posts would probably be a better way to share this information, especially since I tend to use slides as a speaking outline, not a condensed version of a white paper. This will be the first of a series of posts discussing what I consider to be the key features and technologies behind well performing Oracle data warehouses.
As an Oracle database performance engineer who has done numerous customer data warehouse benchmarks and POCs over the past 5+ years, I’ve seen many data warehouse systems that have been plagued with problems on nearly every DBMS commonly used in data warehousing. Interestingly enough, many of these systems were facing many of the same problems. I’ve compiled a list of topics that I consider to be key features and/or technologies for Oracle data warehouses:
Core Performance Fundamental Topics
- Balanced Hardware Configuration
- Table Compression
- Parallel Execution
- Data Loading
- Row vs. Set Processing
- Indexing and Materalized Views
In the upcoming posts, I’ll deep dive into each one of these topics discussing why these areas are key for a well performing Oracle data warehouse. Stay tuned…
A few weeks ago I read Curt Monash’s report on interpreting the results of data warehouse proofs-of-concept (POCs) and I have to say, I’m quite surprised that this topic hasn’t been covered more by analysts in the data warehousing space. I understand that analysts are not database performance engineers, but where do they think that the performance claims of 10x to 100x or more come from? Do they actually investigate these claims or just report on them? I can not say that I have ever seen any database analyst offer any technical insight into these boasts of performance. If some exist be sure to leave a comment and point me to them.
Oracle Exadata Performance Architect Kevin Closson has blogged about a 485x performance increase of Oracle Exadata vs. Oracle Exadata and his follow-up post to explain exactly where the 485x performance gain comes from gave me the nudge to finish this post that had been sitting in my drafts folder since I first read Curt’s post.
Customer Bechmarketing Claims
I thought I would compile a list of what the marketing folks at other database vendors are saying about the performance of their products. Each of these statements have been taken from the given vendor’s website.
- Netezza: 10-100 times faster than traditional solutions…but it is not uncommon to see performance differences as large as 200x to even 400x or more when compared to existing Oracle systems
- Greenplum: often 10 to 100 times faster than traditional solutions
- DATAllegro: 10-100x performance over traditional platforms
- Vertica: Performs 30x-200x faster than other solutions
- ParAccel: 20X – 200X performance gains
- EXASolution: can perform up to 100 times faster than with traditional databases
- Kognitio WX2: Tests have shown to out-perform other database / data warehouse solutions by 10-60 times
Certainly seems these vendors are a positioning themselves against traditional database solutions, whatever that means. And differences as large as 400x against Oracle? What is it exactly they are comparing?
Investigative Research On Netezza’s Performance Claims
Using my favorite Internet search engine I came across this presentation by Netezza dated October 2007. On slide 21 Netezza is comparing an NPS 8150 (112 SPU, up to 4.5 TB of user data) server to IBM DB2 UDB on a p680 with 12 CPUs (the existing solution). Not being extremely familiar with the IBM hardware mentioned, I thought I’d research to see exactly what an IBM p680 server consists of. The first link in my search results took me to here where the web page states:
The IBM eServer pSeries 680 has been withdrawn from the market, effective March 28, 2003.
Searching a bit more I came across this page which states that the 12 CPUs in the pSeries 680 are RS64 IV microprocessors. According to Wikipedia the “RS64-IV or Sstar was introduced in 2000 at 600 MHz, later increased to 750 MHz”. Given that at best, the p680 had 12 CPUs running at 750 MHz and the NPS 8150 had 112 440GX PowerPC processors I would give the compute advantage to Netezza by a significant margin. I guess it is cool to brag how your most current hardware beat up on some old used and abused server who has already been served its end-of-life notice. I found it especially intriguing that Netezza is boasting about beating out an IBM p680 server that has been end-of-lifed more than four years prior to the presentation’s date. Perhaps they don’t have any more recent bragging to do?
Going back one slide to #20 you will notice a comparison of Netezza and Oracle. Netezza clearly states they used a NPS 8250 (224 SPUs, up to 9 TB of user data) against Oracle 10g RAC running on Sun/EMC. Well ok…Sun/EMC what??? Obviously there were at least 2 Sun servers, since Oracle 10g RAC is involved, but they don’t mention the server models at all, nor the storage, nor the storage connectivity to the hosts. Was this two or more Sun Netra X1s or what??? Netezza boasts a 449x improvement in a “direct comparison on one day’s worth of data”. What exactly is being compared is up to the imagination. I guess this could be one query or many queries, but the marketeers intentionally fail to mention. They don’t even mention the data set size being compared. Given that Netezza can read data off the 224 drives at 60-70 MB/s, the NPS 8250 has a total scan rate of over 13 GB/s. I can tell you first hand that there are very few Sun/EMC solutions that are configured to support 13 GB/s of I/O bandwidth. Most configurations of that vintage probably don’t support 1/10th of that I/O bandwidth (1.3 GB/s).
Here are a few more comparisons that I have seen in Netezza presentations:
- NPS 8100 (112 SPUs/4.5 TB max) vs. SAS on Sun E5500/6 CPUs/6GB RAM
- NPS 8100 (112 SPUs/4.5 TB max) vs. Oracle 8i on Sun E6500/12 CPUs/8 GB RAM
- NPS 8400 (448 SPUs/18 TB max) vs. Oracle on Sun (exact hardware not mentioned)
- NPS 8100 (112 SPUs/4.5 TB max) vs. IBM SP2 (database not mentioned)
- NPS 8150z (112 SPUs/5.5 TB max) vs. Oracle 9i on Sun/8 CPUs
- NPS 8250z (224 SPUs/11 TB max) vs. Oracle 9i on Sun/8 CPUs
As you can see, Netezza has a way of finding the oldest hardware around and then comparing it to its latest, greatest NPS. Just like Netezza slogan, [The Power to ]Question Everything™, I suggest you question these benchmarketing reports. Database software is only as capable as the hardware it runs on and when Netezza targets the worst performing and oldest systems out there, they are bound to get some good marketing numbers. If they compete against the latest, greatest database software running on the latest, greatest hardware, sized competitively for the NPS being used, the results are drastically different. I can vouch for that one first hand having done several POCs against Netezza.
One Benchmarketing Claim To Rule Them All
Now, one of my favorite benchmarketing reports is one from Vertica. Michael Stonebraker’s blog post on customer benchmarks contains the following table:
Take a good look at the Query 2 results. Vertica takes a query running in the current row store from running in 4.5 hours (16,200 seconds) to 1 second for a performance gain of 16,200x. Great googly moogly batman, that is reaching ludicrous speed. Heck, who needs 100x or 400x when you do 16,200x. That surely warrants an explanation of the techniques involved there. It’s much, much more than simply column store vs. row store. It does raise the question (at least to me): why Vertica doesn’t run every query in 1 second. I mean, come on, why doesn’t that 19 minute row store query score better than a 30x gain? Obviously there is a bit of the magic pixie dust going on here with, what I would refer to as “creative solutions” (in reality it is likely just a very well designed projection/materaizied view, but by showing the query and telling us how it was possible would make it less unimpressive [sic]).
What Is Really Going On Here
First of all, you will notice that not one of these benchmarketing claims is against a vendor run system. Each and every one of these claims are against existing customer systems. The main reason for this is that most vendors prohibit benchmark results being published with out prior consent from the vendor in the licensing agreement. Seems the creative types have found that taking the numbers from the existing, production system is not prohibited in the license agreement so they compare that to their latest, greatest hardware/software and execute or supervise the execution of a benchmark on their solution. Obviously this is a one sided apples to bicycles comparison, but quite favorable for bragging rights for the new guy.
I’ve been doing customer benchmarks and proof of concepts (POCs) for almost 5 years at Oracle. I can guarantee you that Netezza has never even come close to getting 10x-100x the performance over Oracle running on a competitive hardware platform. Now I can say that it is not uncommon for Oracle running on a balanced system to perform 10x to 1000x (ok, in extreme cases) over an existing poorly performing Oracle system. All it takes is to have a very unbalanced system with no I/O bandwidth, not be using parallel query, not use compression, poor or no use of partitioning and you have created a springboard for any vendor to look good.
One More Juicy Marketing Tidbit
While searching the Internet for creative marketing reports I have to admit that the crew at ParAccel probably takes the cake (and not in an impressive way). On one of their web pages they have these bullet points (plus a few more uninteresting ones):
- All operations are done in parallel (A non-parallel DBMS must scan all of the data sequentially)
- Adaptive compression makes disks faster…
Ok, so I can kinda, sorta see the point that a non-parallel DBMS must do something sequentially…not sure how else it would do it, but then again, I don’t know any enterprise database that is not capable of parallel operations. However, I’m going to need a bit of help on the second point there…how exactly does compression make disks faster? Disks are disks. Whether or not compression is involved has nothing to do with how fast a disk is. Perhaps they mean that compression can increase the logical read rate from a disk given that compression allows more data to be stored in the same “space” on the disk, but that clearly is not what they have written. Reminds me of DATAllegro’s faster-than-wirespeed claims on scan performance. Perhaps these marketing guys should have their numbers and wording validated by some engineers.
Do You Believe In Magic Or Word Games?
Creditable performance claims need to be accounted for and explained. Neil Raden from Hired Brains Research offers guidance for evaluating benchmarks and interpreting market messaging in his paper, Questions to Ask a Data Warehouse Appliance Vendor. I think Neil shares the same opinion of these silly benchmarketing claims. Give his paper a read.