The Core Performance Fundamentals Of Oracle Data Warehousing – Parallel Execution
[back to Introduction]
Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets. PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel. The goal of Oracle PX: divide and conquer. This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets.
I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution. There is an entire chapter in the Oracle Database documentation on PX as well as several white papers. I’ve listed all these in the Resources section at the bottom of this post. Read them, but as always, feel free to post questions/comments here. Discussion adds great value.
A Basic Example of Parallel Execution
Consider a simple one table query like the one below.

You can see that the PX Coordinator (also known as the Query Coordinator or QC) breaks up the “work” into several chunks and those chunks are worked on by the PX Server Processes. The technical term for the chunk a PX Server Process works on is called a granule. Granules can either be block-based or partition-based.
When To Use Parallel Execution
PX is a key component in data warehousing as that is where large data sets usually exist. The most common operations that use PX are queries (SELECTs) and data loads (INSERTs or CTAS). PX is most commonly controlled by using the PARALLEL attribute on the object, although it can be controlled by hints or even Oracle’s Database Resource Manager. If you are not using PX in your Oracle data warehouse than you are probably missing out on a shedload of performance opportunity.
When an object has its PARALLEL attribute set or the PARALLEL hint is used queries will leverage PX, but to leverage PX for DML operations (INSERT/DELETE/UPDATE) remember to alter your session by using the command:
alter session [enable|force] parallel dml;
Do Not Fear Parallel Execution
Since Oracle’s PX is designed to take advantage of multiple CPUs (or CPU cores) at a time, it can leverage significant hardware resources, if available. From my experiences in talking with Oracle DBAs, the ability for PX to do this scares them. This results in DBAs implementing a relatively small degree of parallelism (DOP) for a system that could possibly support a much higher level (based on #CPUs). Often times though, the system that PX is being run on is not a balanced system and frequently has much more CPU power than disk and channel bandwidth, so data movement from disk becomes the bottleneck well before the CPUs are busy. This results in many statements like “Parallel Execution doesn’t work” or similar because the user/DBA isn’t observing a decrease in execution time with more parallelism. Bottom line: if the hardware resources are not available, the software certainly can not scale.
Just for giggles (and education), here is a snippet from top(1) from a node from an Oracle Database Machine running a single query (across all 8 database nodes) at DOP 256.
top - 20:46:44 up 5 days, 3:48, 1 user, load average: 36.27, 37.41, 35.75 Tasks: 417 total, 43 running, 373 sleeping, 0 stopped, 1 zombie Cpu(s): 95.6%us, 1.6%sy, 0.0%ni, 2.2%id, 0.0%wa, 0.2%hi, 0.4%si, 0.0%st Mem: 74027752k total, 21876824k used, 52150928k free, 440692k buffers Swap: 16771852k total, 0k used, 16771852k free, 13770844k cached USER PID PR NI VIRT SHR RES S %CPU %MEM TIME+ COMMAND oracle 16132 16 0 16.4g 5.2g 5.4g R 63.8 7.6 709:33.02 ora_p011_orcl oracle 16116 16 0 16.4g 4.9g 5.1g R 60.9 7.2 698:35.63 ora_p003_orcl oracle 16226 15 0 16.4g 4.9g 5.1g R 59.9 7.2 702:01.01 ora_p028_orcl oracle 16110 16 0 16.4g 4.9g 5.1g R 58.9 7.2 697:20.51 ora_p000_orcl oracle 16122 15 0 16.3g 4.9g 5.0g R 56.9 7.0 694:54.61 ora_p006_orcl
(Quite the TIME+ column there, huh!)
Summary
In this post I’ve been a bit light on the technicals of PX, but that is mostly because 1) this is a fundamentals post and 2) there is a ton of more detail in the referenced documentation and I really don’t feel like republishing what already exists. Bottom line, Oracle Parallel Execution is a must for scaling performance in your Oracle data warehouse. Take the time to understand how to leverage it to maximize performance in your environment and feel free to start a discussion here if you have questions.
References
- Concepts: Parallel Execution
- VLDB and Partitioning Guide: Using Parallel Execution
- Parallelism and Scalability for Data Warehousing
- Oracle Database Parallel Execution Fundamentals in Oracle 11g Release 2
- Parallel Execution and Workload Management
Greg,
thank you for this short article about Parallel Execution! Regarding “Do not fear Parallel Execution”: What do you think resp. what is your experience about the feature Automatic DOP introduced in 11g? Isn’t that something to reduce the risk of overloading the system with too much or inappropriate Parallel Execution?
@Uwe Hesse
Auto DOP is a feature to simplify the decision of “how parallel” to set things as well as to allow the system to achieve the best query execution time as well as query throughput without overloading the system (go fast, but stay in control). Since this is a new feature in 11.2, I’ve only a couple data points from customer POCs but it looks good thus far. I think it’s a great feature to evaluate and consider.
Thanks for this much needed series, and thanks for directing me towards these pages from the OTN forum.
best regards
Fahd
@zvika
I’ll have to look into creating a test case using MTI with error logging to see how it behaves.
thanks a lot greg
i will wait for your input .
Thanks for the great article, I was actually reluctant to use parallel execution as it might slow other queries. But now not.
Greg,
Nice article. One of the area where I see lots of challanges in PQ is multiple user adaptation (when you have multiple load streams running concurrently ).
I haven’t get my feet wet on 11g yet but on 10g , if I enable MUA then I loose consistency in elapsed time as Oracle starts downgrading PQ if one job is running with Max DOP. If I disable it I run a risk of overloading/crashing system.
What do you recommend to achive consistent load times using PQ? Can you make subsequent jobs queue up until one is completed?
Thanks.
@Dharmendra
Two options depending on the version:
- database resource manager using active session limit and/or max degree of parallelism
- in >=11.2 auto DOP and parallel statement queuing
Hi,
we are using AUTO DOP and overall it seems to be working fine however, we do have situations where Oracle decides to parallelize operations which run much faster in serial. Do you know of any settings/parameters that we could tweak in order for Oracle to stop doing this?
In my mind, Oracle is choosing a parallel plan because it estimates that it has a lower cost however when the plan is actually executed things dont go as planned. What could be causing the optimizer to cost things in this way?
Thanks,
Roberto.
You can control this via the PARALLEL_MIN_TIME_THRESHOLD parameter.
Without seeing the serial and parallel plans along with the execution stats of each, I’d be only guessing as to why this is happening.
Greg,
Thanks for replying. Is it not possible to get execution stats using dbms_xplan.display_cursor? I have statistics_level=ALL yet dbms_xplan is still showing:
- Warning: basic plan statistics not available. These are only collected when:
* hint ‘gather_plan_statistics’ is used for the statement or
* parameter ‘statistics_level’ is set to ‘ALL’, at session or system level
Roberto.
It is possible. See http://structureddata.org/2007/11/21/troubleshooting-bad-execution-plans/.
But much better would be the SQL Monitor output. See http://structureddata.org/2008/01/06/oracle-11g-real-time-sql-monitoring-using-dbms_sqltunereport_sql_monitor/
Greg,
That was a fast reply, thanks!
If I have statistics_level=ALL I should not have to specify the /*+ gather_plan_statistics */ hint, right?
Roberto.
Either one should work.
just as an FYI, I can only get execution statistics if the statement is still in the cursor cache. If I try pulling it from AWR then I get the following warning:
Warning: basic plan statistics not available. These are only collected when:
* hint ‘gather_plan_statistics’ is used for the statement or
* parameter ‘statistics_level’ is set to ‘ALL’, at session or system level
Correct – that’s the way it works.
[...] excellent scenario where parallel degree execution makes a lot of sense – data warehouses. (blog). One of my favorite lines, Greg confirms what many of us already know “PX is not, however, [...]
Could you please correct the links mentioned at the bottom of the page, now error 404 not found is returned.
I’ve updated the links. Thanks.
Greg,
have you done the MTI test case? I cannot get a MTI to do parallel inserts using AutoDOP. When I change the statement to a standard insert, it does parallelise the insert. Have you made similar observations?
Thanks!
Chris