Bloom Filters

October 23, 2007
By

The other day I was reading the 11g Database VLDB and Partitioning Guide and came across the below execution plan for a partial partition-wise join between sales and customers.

---------------------------------------------------------------------------
| Id  | Operation                             | Name      | Pstart| Pstop |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |           |       |       |
|   1 |  PX COORDINATOR                       |           |       |       |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10002  |       |       |
|*  3 |    FILTER                             |           |       |       |
|   4 |     HASH GROUP BY                     |           |       |       |
|   5 |      PX RECEIVE                       |           |       |       |
|   6 |       PX SEND HASH                    | :TQ10001  |       |       |
|   7 |        HASH GROUP BY                  |           |       |       |
|*  8 |         HASH JOIN                     |           |       |       |
|   9 |          PART JOIN FILTER CREATE      | :BF0000   |       |       |
|  10 |           PX RECEIVE                  |           |       |       |
|  11 |            PX SEND PARTITION (KEY)    | :TQ10000  |       |       |
|  12 |             PX BLOCK ITERATOR         |           |       |       |
|  13 |              TABLE ACCESS FULL        | CUSTOMERS |       |       |
|  14 |          PX PARTITION HASH JOIN-FILTER|           |:BF0000|:BF0000|
|* 15 |           TABLE ACCESS FULL           | SALES     |:BF0000|:BF0000|
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(COUNT(SYS_OP_CSR(SYS_OP_MSR(COUNT(*)),0))>100)
   8 - access("S"."CUST_ID"="C"."CUST_ID")
  15 - filter("S"."TIME_ID"<=TO_DATE(' 1999-10-01 00:00:00',
                                     'syyyy-mm-dd hh24:mi:ss') AND
              "S"."TIME_ID">=TO_DATE(' 1999-07-01 00:00:00',
                                     'syyyy-mm-dd hh24:mi:ss'))

At first, it may seem just like another parallel execution plan, but if you look again you may notice that the Pstart and Pstop values are :BF0000 (id 15) and :BF0000 also appears on as part of the PART JOIN FILTER CREATE (id 9). So what exactly is :BF0000? This is a bloom filter being applied as part of the DFO (data flow operator).

It’s not necessary to completely understand bloom filters to take advantage of them, but the next time you see :BF0000 in your parallel execution plan, you will be able to recognize what it is.

Tags: ,

4 Responses to Bloom Filters

  1. Christian Antognini on October 23, 2007 at 11:04 pm

    Hi Greg

    I guess you know it… But for the other readers that might ignore it just a short information from my part. Bloom filters are already available in 10gR2. Unfortunately, for some unknown reasons (at least for me), Oracle never documented such a wonderful feature!

    Take care,
    Chris

  2. Greg Rahn on October 24, 2007 at 8:13 am

    You are correct Christian, bloom filters are in 10gR2 and I don’t believe they were specifically documented but probably fell under “Optimizer Enhancements” or similar.

  3. Ichiro on June 14, 2010 at 1:17 am

    I agree if SALES and/or CUSTOMERS were created as partitioned tables. But I’m not sure if they’re not partitioned. In your test, BF was usefule as “HASH GROUP BY” execution.

    • Greg Rahn on June 14, 2010 at 5:57 am

      @Ichiro

      The SALES tables is partitioned on CUST_ID and you can tell this because the execution plan has BF0000 in the Pstart/Pstop column and CUST_ID is the only predicate on SALES (see Predicate Information section). In this case the BF populated by CUST_IDs from CUSTOMER and applied to SALES to prune on the partitioned table (we call this bloom pruning). The BF has nothing to do with the HASH GROUP BY row source.

Leave a Reply

Your email address will not be published. Required fields are marked *

*