The Core Performance Fundamentals Of Oracle Data Warehousing – Parallel Execution

[back to Introduction]

Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets.  PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel.  The goal of Oracle PX: divide and conquer.  This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets.

I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution.  There is an entire chapter in the Oracle Database documentation on PX as well as several white papers.  I’ve listed all these in the Resources section at the bottom of this post.  Read them, but as always, feel free to post questions/comments here.  Discussion adds great value.

A Basic Example of Parallel Execution

Consider a simple one table query like the one below.

You can see that the PX Coordinator (also known as the Query Coordinator or QC) breaks up the “work” into several chunks and those chunks are worked on by the PX Server Processes.  The technical term for the chunk a PX Server Process works on is called a granule.  Granules can either be block-based or partition-based.

When To Use Parallel Execution

PX is a key component in data warehousing as that is where large data sets usually exist.  The most common operations that use PX are queries (SELECTs) and data loads (INSERTs or CTAS).  PX is most commonly controlled by using the PARALLEL attribute on the object, although it can be controlled by hints or even Oracle’s Database Resource Manager.  If you are not using PX in your Oracle data warehouse than you are probably missing out on a shedload of performance opportunity.

When an object has its PARALLEL attribute set or the PARALLEL hint is used queries will leverage PX, but to leverage PX for DML operations (INSERT/DELETE/UPDATE) remember to alter your session by using the command:

alter session [enable|force] parallel dml;

Do Not Fear Parallel Execution

Since Oracle's PX is designed to take advantage of multiple CPUs (or CPU cores) at a time, it can leverage significant hardware resources, if available.  From my experiences in talking with Oracle DBAs, the ability for PX to do this scares them. This results in DBAs implementing a relatively small degree of parallelism (DOP) for a system that could possibly support a much higher level (based on #CPUs).  Often times though, the system that PX is being run on is not a balanced system and frequently has much more CPU power than disk and channel bandwidth, so data movement from disk becomes the bottleneck well before the CPUs are busy.  This results in many statements like "Parallel Execution doesn't work" or similar because the user/DBA isn't observing a decrease in execution time with more parallelism.  Bottom line:  if the hardware resources are not available, the software certainly can not scale.

Just for giggles (and education), here is a snippet from top(1) from a node from an Oracle Database Machine running a single query (across all 8 database nodes) at DOP 256.

top - 20:46:44 up 5 days,  3:48,  1 user,  load average: 36.27, 37.41, 35.75
Tasks: 417 total,  43 running, 373 sleeping,   0 stopped,   1 zombie
Cpu(s): 95.6%us,  1.6%sy,  0.0%ni,  2.2%id,  0.0%wa,  0.2%hi,  0.4%si,  0.0%st
Mem:  74027752k total, 21876824k used, 52150928k free,   440692k buffers
Swap: 16771852k total,        0k used, 16771852k free, 13770844k cached

USER       PID  PR  NI  VIRT  SHR  RES S %CPU %MEM    TIME+  COMMAND
oracle   16132  16   0 16.4g 5.2g 5.4g R 63.8  7.6 709:33.02 ora_p011_orcl
oracle   16116  16   0 16.4g 4.9g 5.1g R 60.9  7.2 698:35.63 ora_p003_orcl
oracle   16226  15   0 16.4g 4.9g 5.1g R 59.9  7.2 702:01.01 ora_p028_orcl
oracle   16110  16   0 16.4g 4.9g 5.1g R 58.9  7.2 697:20.51 ora_p000_orcl
oracle   16122  15   0 16.3g 4.9g 5.0g R 56.9  7.0 694:54.61 ora_p006_orcl

(Quite the TIME+ column there, huh!)

Summary

In this post I've been a bit light on the technicals of PX, but that is mostly because 1) this is a fundamentals post and 2) there is a ton of more detail in the referenced documentation and I really don't feel like republishing what already exists. Bottom line, Oracle Parallel Execution is a must for scaling performance in your Oracle data warehouse.  Take the time to understand how to leverage it to maximize performance in your environment and feel free to start a discussion here if you have questions.

References

20 comments

  1. Uwe Hesse

    Greg,
    thank you for this short article about Parallel Execution! Regarding “Do not fear Parallel Execution”: What do you think resp. what is your experience about the feature Automatic DOP introduced in 11g? Isn’t that something to reduce the risk of overloading the system with too much or inappropriate Parallel Execution?

  2. Greg Rahn

    @Uwe Hesse

    Auto DOP is a feature to simplify the decision of “how parallel” to set things as well as to allow the system to achieve the best query execution time as well as query throughput without overloading the system (go fast, but stay in control). Since this is a new feature in 11.2, I’ve only a couple data points from customer POCs but it looks good thus far. I think it’s a great feature to evaluate and consider.

  3. Kuldeep

    Thanks for the great article, I was actually reluctant to use parallel execution as it might slow other queries. But now not.

  4. Dharmendra

    Greg,

    Nice article. One of the area where I see lots of challanges in PQ is multiple user adaptation (when you have multiple load streams running concurrently ).
    I haven’t get my feet wet on 11g yet but on 10g , if I enable MUA then I loose consistency in elapsed time as Oracle starts downgrading PQ if one job is running with Max DOP. If I disable it I run a risk of overloading/crashing system.

    What do you recommend to achive consistent load times using PQ? Can you make subsequent jobs queue up until one is completed?

    Thanks.

  5. Greg Rahn

    @Dharmendra

    Two options depending on the version:
    - database resource manager using active session limit and/or max degree of parallelism
    - in >=11.2 auto DOP and parallel statement queuing

  6. Roberto

    Hi,
    we are using AUTO DOP and overall it seems to be working fine however, we do have situations where Oracle decides to parallelize operations which run much faster in serial. Do you know of any settings/parameters that we could tweak in order for Oracle to stop doing this?
    In my mind, Oracle is choosing a parallel plan because it estimates that it has a lower cost however when the plan is actually executed things dont go as planned. What could be causing the optimizer to cost things in this way?

    Thanks,
    Roberto.

  7. Roberto

    Greg,
    Thanks for replying. Is it not possible to get execution stats using dbms_xplan.display_cursor? I have statistics_level=ALL yet dbms_xplan is still showing:

    - Warning: basic plan statistics not available. These are only collected when:
    * hint ‘gather_plan_statistics’ is used for the statement or
    * parameter ‘statistics_level’ is set to ‘ALL’, at session or system level

    Roberto.

  8. Roberto

    Greg,

    That was a fast reply, thanks!

    If I have statistics_level=ALL I should not have to specify the /*+ gather_plan_statistics */ hint, right?

    Roberto.

  9. Roberto

    just as an FYI, I can only get execution statistics if the statement is still in the cursor cache. If I try pulling it from AWR then I get the following warning:

    Warning: basic plan statistics not available. These are only collected when:
    * hint ‘gather_plan_statistics’ is used for the statement or
    * parameter ‘statistics_level’ is set to ‘ALL’, at session or system level

  10. Pingback: Hint Abuse: SQL Server vs Oracle
  11. adam

    Could you please correct the links mentioned at the bottom of the page, now error 404 not found is returned.

  12. Chris

    Greg,
    have you done the MTI test case? I cannot get a MTI to do parallel inserts using AutoDOP. When I change the statement to a standard insert, it does parallelise the insert. Have you made similar observations?
    Thanks!
    Chris

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s