Category: Performance

Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…

Background

Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync                     3,155,253       9,197      3    6.4 Commit

Drilling into this a bit deeper the two log file events reported the following in the Foreground Wait Events and Background Wait Events sections of the AWR report:

Foreground Wait Events               
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was  ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0

                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         659,157     0        526       1      0.2   18.2

It is generally always worth looking at the breakdown of these times, as they are averages. We really want to understand the entire histogram of these wait events. For that we can look at the Wait Event Histogram section of the AWR report as below.

Wait Event Histogram
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was  % of Waits: column heading of <=1s is truly 1s is truly >=1024ms
-> Ordered by Event (idle events last)

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits   <1ms  <2ms  <4ms  <8ms <16ms <32ms  1s
-------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write    661.6K  84.7   9.7   4.7    .4    .1    .3    .0
log file sync              3138.K  14.0  42.8  30.4   7.9   2.3   2.6    .1

Wait Event Histogram Detail (64 msec to 2 sec)
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> Units for % of Total Waits:
   ms is milliseconds
   s is 1024 milliseconds (approximately 1 second)
-> % of Total Waits: total waits for all wait classes, including Idle
-> % of Total Waits: value of .0 indicates value was  Ordered by Event (only non-idle events are displayed)

                                                 % of Total Waits
                                 -----------------------------------------------
                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   =2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write       52 100.0    .0    .0
log file sync               3535  99.9    .1    .0

One thing that you should notice here is there are two sections of Wait Event Histogram; the buckets less than 32ms and buckets greater than 32ms. It is also important to note that only the absence of a value means no timings fell into that bucket — so even though the report shows .0 there are still events in that bucket (read the section description as it mentions this).

We can see from the second histogram section that there were 52 times that log file parallel write was over 64ms as well as 3535 times log file sync was over 64ms. At this point a hypothesis is formed that the two events are correlated — that is, the belief is the long log file parallel write events may be causing the long log file sync. To find data that supports the hypothesis (or not) we can look at the Active Session History (ASH) data to get a more granular view of the wait events.

Diving Into The ASHes With R

If the hypothesis is correct, the ASH data should show times where we observe long log file parallel write and long log file sync waits. One could write SQL against V$ACTIVE_SESSION_HISTORY to collect all the samples that may demonstrate the hypothesis — e.g. collect a list of sample ids for each event and examine them for overlap, but a visualization is worth a thousand words.

One way to get quick and easy visualizations from ASH data is to use R. R has become a very popular tool for those doing statistical analysis and it has some quite useful graphing and plotting packages built in. R can connect to Oracle via a JDBC package which makes importing data trivial.

Here is a plot that I put together using R for ~300 sample ids (~5 minutes) from ASH (recall that TIME_WAITED is in microseconds):

Rplot01

As you can see from the plots, nearly every time there is a long log file parallel write there are also numerous long log file sync events. This data supports the hypothesis.

Averages Suck

Most any statistician (or anyone keen on numbers and stats) will tell you averages suck. This does not mean averages are not useful, but one has to keep in mind averages can hide infrequent outliers (which may make them a tad bit evil). For instance, AWR is reporting an average of 1ms for log file parallel write but we can see that there are several that are falling in the 20ms range in the 5 minute capture. If we zoom in on the graph (set the y axis at a max of 3ms), we can understand why:
Rplot02

Most of the log file parallel write events are coming in around 0.5ms so even with some 20ms outliers it still yields an average of 1ms, thus hiding the magnitude of impact for the outliers. This is why drilling down into the ASH data was important for us to understand the scope of the issue.

With More Data Comes More Questions

At this point, the visualization of ASH data shows a strong correlation between log file sync and log file parallel write outliers, but the root cause has not yet been identified. Perhaps there is more insight lurking in the ASH data?

Given this is an OLTP workload and log file parallel write is an IO operation, perhaps it’s worth looking at another IO operation like, say, db file sequential read – single block IO times.

In the below plot, I’ve taken a 60 second window of ASH data and plotted all the db file sequential read events.

Rplot11
Interestingly, there are several samples where the IOs are taking significantly longer than most — the majority are below 10ms but we see a few samples where there are groups in the 20ms or more range. Let’s add the log file sync events to the plot.
Rplot12
Hmmm… even more interesting. The data is showing a correlation between log file sync and db file sequential read. Any bets on what the plot looks like if log file parallel write is added?

Rplot13

Very interesting. The data is showing us that all 3 events are correlated strongly. Clearly we are on to something here…and using R to plot the ASH data was an easy way to present and understand it visually.

Summary

While this blog post did not go into specific root cause, it was a healthy journey in debugging and data driven analysis. Keep in mind that just because AWR averages look good, don’t overlook the fact that the event histograms should also be reviewed to see if there are outliers. R allows an easy way to put scatter plots of event times together to get a nice visual of what is going on.

Source Code

Here is the code I used to do the analysis. It’s pretty straight forward and well commented (I think). Enjoy!

I’ll also mention that I use the RStudio IDE for R.

Counting Triangles Faster

A few weeks back one of the Vertica developers put up a blog post on counting triangles in an undirected graph with reciprocal edges. The author was comparing the size of the data and the elapsed times to run this calculation on Hadoop and Vertica and put up the work on github and encouraged others: “do try this at home.” So I did.

Compression

Vertica draws attention to the fact that their compression brought the size of the 86,220,856 tuples down to 560MB in size, from a flat file size of 1,263,234,543 bytes resulting in around a 2.25X compression ratio. My first task was to load the data and see how Oracle’s Hybrid Columnar Compression would compare. Below is a graph of the sizes.

As you can see, Oracle’s default HCC query compression (query high) compresses the data over 2X more than Vertica and even HCC query low compression beats out Vertica’s compression number.

Query Elapsed Times

The closest gear I had to Vertica’s hardware was an Exadata X2-2 system — both use 2 socket, 12 core Westmere-EP nodes. While one may try to argue that Exadata may somehow influence the execution times, I’ll point out that I was using In-Memory Parallel Execution so no table data was even read from spinning disk or Exadata Flash Cache — it’s all memory resident in the database nodes’ buffer cache. This seems to be inline with how Vertica executed their tests though not explicitly stated (it’s a reasonable assertion).

After I loaded the data and gathered table stats, I fired off the exact same SQL query that Vertica used to count triangles to see how Oracle would compare. I ran the query on 1, 2 and 4 nodes just like Vertica. Below is a graph of the results.

As you can see, the elapsed times are reasonably close but overall in the favor of Oracle winning 2 of the 3 scale points as well as having a lower sum of the three executions: Vertica 519 seconds, Oracle 487 seconds — advantage Oracle of 32 seconds.

It Should Go Faster!

As a database performance engineer I was thinking to myself, “it really should go faster!” I took a few minutes to look over things to see what could make this perform better. You might think I was looking at parameters or something like that, but you would be wrong. After a few minutes of looking at the query and the execution plan it became obvious to me — it could go faster! I made a rather subtle change to the SQL query and reran my experiments. With the modified SQL query Oracle was now executing twice as fast on 1 node than Vertica was on 4 nodes. Also, on 4 nodes, the elapsed time came in at just 14 seconds, compared to the 97 seconds Vertica reported — a difference of almost 7X! Below are the combined results.

What’s The Go Fast Trick?

I was thinking a bit more about the problem at hand — we need to count vertices but not count them twice since they are reciprocal. Given that for any edge, it exists in both directions, the query can be structured like Vertica wrote it — doing the filtering with a join predicate like e1.source < e2.source to eliminate the duplicates or we can simply use a single table filter predicate like source < dest before the join takes place. One of the first things they taught me in query planning and optimization class was to filter early! That notation pays off big here because the early filter cuts the rows going into the first join as well as the output of the first join by a factor of 2 — 1.8 billion rows output vs. 3.6 billion. That’s a huge savings not only in the first join, but also in the second join as well.

Here is what my revised query looks like:

with
  e1 as (select * from edges where source < dest),
  e2 as (select * from edges where source < dest),
  e3 as (select * from edges where source > dest)
select count(*)
from e1
join e2 on (e1.dest = e2.source)
join e3 on (e2.dest = e3.source)
where e3.dest = e1.source

Summary

First, I’d like to thank the Vertica team for throwing the challenge out there and being kind enough to provide the data, code and their elapsed times. I always enjoy a challenge — especially one that I can improve upon. Now, I’m not going to throw any product marketing nonsense out there as that is certainly not my style (and there certainly is more than enough of that already), but rather I’ll just let the numbers do the talking. I’d also like to point out that this experiment was done without any structure other than the table. And in full disclosure, all of my SQL commands are available as well.

The other comment that I would make is that the new and improved execution times really make a mockery of the exercise when comparing to Hadoop MapReduce or Pig, but I would also mention that this test case is extremely favorable for parallel pipelined databases that can perform all in-memory operations and given the data set is so small, this is the obviously the case. Overall, in my opinion, a poor problem choice to compare the three technologies as it obviously (over) highlights the right tool for the job cliche.

Experiments performed on Oracle Database 11.2.0.2.

Github source code: https://gist.github.com/grahn/1289188

Exadata Smart Flash Logging Explained

I’ve seen some posts on the blogosphere where people attempt to explain (or should I say guess) how Exadata Smart Flash Logging works and most of them are wrong. Hopefully this post will help clear up some the misconceptions out there.

The following is an excerpt from the paper entitled “Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine” that goes into technical detail on the Exadata Smart Flash Logging feature.

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.

The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.

By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.

Critical Skills for Performance Work

I was just watching John Rauser’s keynote “What is a Career in Big Data?” from last weeks Strata Conference New York and I have to say it’s an amazing talk. I would highly recommended it to anyone who does any type of data analysis, including any type of performance analysis.

I found many of the “critical skill” points John made to have a strong correlation to performance analysis work as well. Some quotations that really stand out to me:

On writing:

“[writing]…it’s the first major difference between mediocrity and greatness.” [10:39]

“If it isn’t written down, it never happened…if your writing is so opaque that people can not understand your work, then you may as well never have never done it.” [10:50]

On skepticism:

“If you take a skeptical attitude toward your analysis you’ll look just as hard for data that refutes your hypothesis as you will for data that confirms it. A skeptic attacks the same question from many different angles and dramatically increases their confidence in the results.” [11:40]

The main reason that I wanted to highlight the critical skills of writing and skepticism is I see (read) way too many blog posts from Oracle users that fail on both of those skills. The writing simply fails to clearly communicate the issue at hand, what analysis was done and what data was used to draw the conclusion. Many blog posts also are written without any level of skepticism–they simply “report” their findings and fail to question their own work for accuracy. I hope this talk inspires you to raise your bar when it comes to performance work. Enjoy!

Real-World Performance Videos on YouTube – Data Warehousing

Here are some videos of a data warehouse demo that the Real-World Performance Group has been running for a while now and we thought it was time to put them on YouTube. Hope you find them informative.

Migrate a 1TB Data warehouse in 20 Minutes (Part 1)

Migrate a 1TB Data warehouse in 20 Minutes (Part 2)

Migrate a 1TB Data warehouse in 20 Minutes (Part 3)

Migrate a 1TB Data warehouse in 20 Minutes (Part 4)

Reading Parallel Execution Plans With Bloom Pruning And Composite Partitioning

You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives.

One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning.

Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a dimensional (star) model and has a data generator.) STORE_SALES contains a 5 year window of data ranging from 1998-01-02 to 2003-01-02.

Range Partitioned STORE_SALES

For this example I used range partitioning on STORE_SALES.SS_SOLD_DATE_SK using 60 one month partitions (plus 1 partition for NULL SS_SOLD_DATE_SK values) that align with the date dimension (DATE_DIM) on calendar month boundaries. STORE_SALES has the parallel attribute (PARALLEL 16 in this case) set on the table to enable Oracle’s Parallel Execution (PX). Let’s look at the execution time and plan for our test query:

SQL> select
  2    max(ss_sales_price)
  3  from
  4    store_sales ss,
  5    date_dim d
  6  where
  7    ss_sold_date_sk = d_date_sk and
  8    d_year = 2000
  9  ;

MAX(SS_SALES_PRICE)
-------------------
                200

Elapsed: 00:00:41.67

SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +parallel +partition +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select   max(ss_sales_price) from   store_sales ss,   date_dim d where
 ss_sold_date_sk=d_date_sk and   d_year = 2000

Plan hash value: 934332680

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |             |       |       |        |      |            |
|   2 |   PX COORDINATOR              |             |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001    |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |             |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |             |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |             |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000     |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |             |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000    |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM    |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |             |:BF0000|:BF0000|  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES |:BF0000|:BF0000|  Q1,01 | PCWP |            |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("SS_SOLD_DATE_SK"="D_DATE_SK")
  10 - filter("D_YEAR"=2000)
  12 - access(:Z>=:Z AND :Z<=:Z)

In this execution plan you can see the creation of the bloom filter on line 7 which is populated from the values of D_DATE_SK from DATE_DIM. That bloom filter is then used to partition prune on the STORE_SALES table. This is why we see :BF0000 in the Pstart/Pstop columns.

Range-Hash Composite Partitioned STORE_SALES

For the next experiment, I kept the same range partitioning scheme but also added hash subpartitioning using the SS_ITEM_SK column (using 4 hash subpartitions per range partition). STORE_SALES2 has 61 range partitions x 4 hash subpartitions for a total of 244 aggregate partitions. Let’s look at the execution plan for our test query:

SQL> select
  2    max(ss_sales_price)
  3  from
  4    store_sales2 ss,
  5    date_dim d
  6  where
  7    ss_sold_date_sk = d_date_sk and
  8    d_year = 2000
  9  ;

MAX(SS_SALES_PRICE)
-------------------
                200

Elapsed: 00:00:41.06

SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +parallel +partition +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select   max(ss_sales_price) from   store_sales2 ss,   date_dim d where
  ss_sold_date_sk=d_date_sk and   d_year = 2000

Plan hash value: 2496395846

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |              |       |       |        |      |            |
|   2 |   PX COORDINATOR              |              |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001     |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |              |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |              |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |              |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000      |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |              |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000     |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM     |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |              |     1 |     4 |  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES2 |     1 |   244 |  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("SS_SOLD_DATE_SK"="D_DATE_SK")
  10 - filter("D_YEAR"=2000)
  12 - access(:Z>=:Z AND :Z<=:Z)

Once again you can see the creation of the bloom filter from DATE_DIM on line 7, however you will notice that we no longer see :BF0000 as our Pstart and Pstop values. In fact, it may appear that partition pruning is not taking place at all as we see 1/244 as our Pstart/Pstop values. However, if we compare the execution times between the range and range/hash queries you note they are identical to the nearest second, thus there really is no way that partition (bloom) pruning is not taking place. After all, if this plan read all 5 years of data it would take 5 times as long as reading just 1 year and that certainly is not the case. Would you have guessed that partition pruning is taking place had we not worked though the range only experiment first? Hmmm…

So What Is Going On?

Before we dive in, let’s quickly look at what the execution plans would look like if PX was not used (using serial execution).

--
-- Range Partitioned, Serial Execution
--

---------------------------------------------------------------------
| Id  | Operation                     | Name        | Pstart| Pstop |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |
|   1 |  SORT AGGREGATE               |             |       |       |
|*  2 |   HASH JOIN                   |             |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000     |       |       |
|*  4 |     TABLE ACCESS FULL         | DATE_DIM    |       |       |
|   5 |    PARTITION RANGE JOIN-FILTER|             |:BF0000|:BF0000|
|   6 |     TABLE ACCESS FULL         | STORE_SALES |:BF0000|:BF0000|
---------------------------------------------------------------------
              
--
-- Range-Hash Composite Partitioned, Serial Execution
--
                                       
----------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |
|   1 |  SORT AGGREGATE               |              |       |       |
|*  2 |   HASH JOIN                   |              |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000      |       |       |
|*  4 |     TABLE ACCESS FULL         | DATE_DIM     |       |       |
|   5 |    PARTITION RANGE JOIN-FILTER|              |:BF0000|:BF0000|
|   6 |     PARTITION HASH ALL        |              |     1 |     4 |
|   7 |      TABLE ACCESS FULL        | STORE_SALES2 |     1 |   244 |
----------------------------------------------------------------------

When using composite partitioning, pruning is placed on one of the partition iterators. When the two nested partition iterators (range/hash in this case) are changed into a block iterator (line 14 – PX BLOCK ITERATOR), we have to pick a “victim” in the query plan tree since only one node in the plan needs now to carry the pruning information (with PX the pruning is really done by the QC, not the row source like in serial plans). As a result, the information associated the the victimized partition iterator is lost in the explain plan. This is why there is no :BF0000 for Pstart/Pstop in the plan in this case. It is probably more accurate to have the parallel plans for both range and range/hash look like this:

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |              |       |       |        |      |            |
|   2 |   PX COORDINATOR              |              |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001     |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |              |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |              |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |              |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000      |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |              |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000     |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM     |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |              |       |       |  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES  |:BF0000|:BF0000|  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------

Where the bloom pruning is on the TABLE ACCESS FULL row source. This is because there is no Pstart/Pstop for a PX BLOCK ITERATOR row source (it’s block ranges, so partition information is lost – it had been contained in level above this).

Hopefully this helps you understand and correctly identify execution plans contain bloom pruning even though at first glance you may not think they do. If you are uncertain, use the execution stats for the query looking at metrics like amount of data read and execution times to provide some empirical insight.