There Is No Time Like ‘%NOW%’ To Use Dynamic Sampling
I recently came across a query in which the Optimizer was making a poor cardinality estimate, which in turn caused inefficient join type, which in turn caused the query to run excessively long. This post is a reenactment of my troubleshooting.
The Suspect SQL
The original SQL was quite large and had a fairly complex plan so I simplified it down to this test case for the purpose of this blog post:
select [...]
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
al1.business_type in ('002', '003', '007', '009') and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%' or
al1.cust_po like '%TEST%'));
----------------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE | | 9 |
| 3 | PARTITION HASH ALL | | 9 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| FACT_TABLE | 9 |
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | FACT_BX10 | |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AL1"."CUST_PO" NOT LIKE '%MATERIAL HANDLING%' AND
"AL1"."CUST_PO" NOT LIKE '%SECURITY%' AND
"AL1"."CUST_PO" NOT LIKE '%SUMMER OF CREATIVITY%' AND
"AL1"."CUST_PO" NOT LIKE '%TEST%')
6 - access("AL1"."ORDER_TYPE"='Z010')
Looking at the plan I would surmise that the cardinality estimate for the fact table (line 4) must surely be greater than 9. To find out how far the Optimizer’s guess is off, I ran this query which contained all the filter predicates for FACT_TABLE:
select count(*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%' or
al1.cust_po like '%TEST%'));
COUNT(*)
----------
1,324,510
As you can see, the cardinality estimate in this case is way off. The Optimizer estimated 9 rows and in reality the query returns 1.3 million rows, only a difference of 6 orders of magnitude (10^6 or 1,000,000). How could this be? Let’s try and understand why and where the cardinality estimate went wrong.
Bite Size Chunks
I find the easiest way to debug these issues is to use start with one predicate then add one predicate at a time, noting the cardinality estimate and comparing it to the actual cardinality value.
One Predicate, Two Predicate, Red Predicate, Blue Predicate
explain plan for select count (*) from fact_table al1 where al1.region = '003' / select * from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE'));
-------------------------------------------------------------- | Id | Operation | Name | Rows | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | SORT AGGREGATE | | 1 | | 2 | PARTITION LIST SINGLE| | 141M| | 3 | PARTITION HASH ALL | | 141M| | 4 | TABLE ACCESS FULL | FACT_TABLE | 141M| -------------------------------------------------------------- COUNT(*) ----------- 141,821,991
Looks good so far. REGION is the list partition key so we’d expect an accurate estimate.
explain plan for
select count (*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010'
/
select *
from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE'));
------------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE | | 1456K|
| 3 | PARTITION HASH ALL | | 1456K|
| 4 | BITMAP CONVERSION COUNT | | 1456K|
|* 5 | BITMAP INDEX SINGLE VALUE| FACT_BX10 | |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("AL1"."ORDER_TYPE"='Z010')
COUNT(*)
----------
1,324,642
No issues here. 1,456,000 statistically equivalent to 1,324,642.
explain plan for
select count (*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%'))
/
select *
from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE'));
----------------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE | | 72803 |
| 3 | PARTITION HASH ALL | | 72803 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| FACT_TABLE | 72803 |
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | FACT_BX10 | |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AL1"."CUST_PO" NOT LIKE '%MATERIAL HANDLING%')
6 - access("AL1"."ORDER_TYPE"='Z010')
COUNT(*)
----------
1,324,642
With the addition of the NOT LIKE predicate we start to see a bit of a difference. This plan has a 20x reduction from the previous cardinality estimate (1,456,000/72,803 = 20). Let’s add one more NOT LIKE predicate and see what we get.
explain plan for
select count (*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%'))
/
select *
from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE'));
----------------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE | | 3640 |
| 3 | PARTITION HASH ALL | | 3640 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| FACT_TABLE | 3640 |
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | FACT_BX10 | |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AL1"."CUST_PO" NOT LIKE '%MATERIAL HANDLING%' AND
"AL1"."CUST_PO" NOT LIKE '%SECURITY%')
6 - access("AL1"."ORDER_TYPE"='Z010')
COUNT(*)
----------
1,324,642
With the addition of a second NOT LIKE predicate the cardinality estimate has dropped to 3,640 from 72,803, a 20x reduction.
explain plan for
select count (*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%'))
/
select *
from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE'));
----------------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE | | 182 |
| 3 | PARTITION HASH ALL | | 182 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| FACT_TABLE | 182 |
| 5 | BITMAP CONVERSION TO ROWIDS | | |
|* 6 | BITMAP INDEX SINGLE VALUE | FACT_BX10 | |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AL1"."CUST_PO" NOT LIKE '%MATERIAL HANDLING%' AND
"AL1"."CUST_PO" NOT LIKE '%SECURITY%' AND
"AL1"."CUST_PO" NOT LIKE '%SUMMER OF CREATIVITY%')
6 - access("AL1"."ORDER_TYPE"='Z010')
COUNT(*)
----------
1,324,642
With the addition of the third NOT LIKE predicate the cardinality estimate has dropped from 3,640 to 182, another 20x reduction. Looks like we may have found the issue. Each NOT LIKE predicate appears to result in a 20x reduction (5% selectivity) from the previous estimate. The original query had all four NOT LIKE predicates on it and had a cardinality estimate of 9. If we work the math: 182 * 5% = 9.
Looking at the query we can see there are four NOT LIKE predicates each with a leading and trailing wild card (%). Since DBMS_STATS does not gather table column information on parts of strings, each of the NOT LIKE predicates will have a default selectivity guess of 5%. Given this query has four NOT LIKE predicates, the total reduction for those four predicates will be 5%^4 = 1/160,000 = 0.00000625 which is quite significant, and in this case not representative and the root cause of the original query’s suboptimal access and join type.
Dynamic Sampling To The Rescue
Dynamic Sampling was designed with cases like this in mind. That is, cases where the Optimizer has to resort to selectivity guesses and could very well guess poorly. Substring guessing is not simply done as the substring could appear anywhere in the string. Let’s see what the cardinality estimate is when I add a dynamic_sampling hint to the query.
explain plan for
select /*+ dynamic_sampling(al1 4) */
count (*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%' or
al1.cust_po like '%TEST%'))
/
select *
from table(dbms_xplan.display(format=>'BASIC ROWS PREDICATE NOTE'));
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
| 2 | PARTITION LIST SINGLE| | 1606K|
| 3 | PARTITION HASH ALL | | 1606K|
|* 4 | TABLE ACCESS FULL | FACT_TABLE | 1606K|
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("AL1"."ORDER_TYPE"='Z010' AND
"AL1"."CUST_PO" NOT LIKE '%MATERIAL HANDLING%' AND
"AL1"."CUST_PO" NOT LIKE '%SECURITY%' AND
"AL1"."CUST_PO" NOT LIKE '%SUMMER OF CREATIVITY%' AND
"AL1"."CUST_PO" NOT LIKE '%TEST%')
Note
-----
- dynamic sampling used for this statement
With a level 4 dynamic_sampling hint the Optimizer estimates 1.6 million rows, very close to the actual value of 1.3 million rows. This estimate is surely close enough to give us the optimal access and join type such that the original query should perform optimally.
Summary
There are case where the Optimizer guesses and sometimes can guess poorly. When this is the case, dynamic sampling can be used to give the Optimizer a better cardinality guess. Dynamic sampling is probably best suited for queries that run minutes or longer, such that the overhead of the dynamic sampling query is only a fraction of the total run time. In these cases, the minimal overhead of dynamic sampling can well out weigh the cost of a suboptimal plan.
All tests were performed on 11.1.0.6.0.
10gR2, 11gR1, Data Warehousing, Execution Plans, Optimizer, Oracle, Performance, SQL Tuning, Statistics, VLDB
Trackbacks
Comments
I work on a relatively large datawarehouse (multi-terrabyte). The number one tuning hint I tell people is to help Oracle get the cardinality right along each step of the operation. If Oracle gets that right, it will usually come up with a good plan as far as join orders, join types and access paths.
This hint is one of my favorites because often we have data that violates the core assumption that no two columns are correalted (hence a reason for extended statistics in 11g.)
It’s not perfect. It only helps improve cardinalities of a single table. It does not effect (near as I can tell in 9i/10g) the join cardinality calculation.
The thing I love about this hint though is that it is basically maintenance proof. We are not “locking” the optimizer into a certain order, join type or access path. We’re just saying maybe you should sample the data first before you decide.
When new functionality is added to the optimizer, we don’t have to re-evaulate hinted queries because this hint does not advocate a single approach. Purely delightful.
You are completely spot on Steve. Sometimes sampling data is the only way to get the true relationship between the columns.
Currently multi-column/extended statistics only work for single equality predicates so dynamic sampling may be one of the only ways to get the cardinality estimate close.
I find that dynamic sampling to be very seductive. For quite a while I’ve been telling report developers who have problems with execution plans to turn up the dynamic sampling setting through a hint to see if the plan improves — sometime the default sampling rate is not high enough, and frankly it’s often an easy win.
If they’d just learn to apply dynamic sampling by default then I’d be practically out of a job.
Great post! I liked the “methodology” of adding the predicates one at a time.
Why did you choose level 4 for dynamic sampling? According to the docs (11g), when you use a hint like in your case, level 1 could be enough.
Todor-
I had chosen 4 because I had initially used
optimizer_dynamic_sampling = 4. I do recognize that they are not equals. I did just run a quick test and ran through all 10 levels of the hint. Here are the results:Interesting enough, a larger sample does not always mean a more accurate estimate. This is probably data dependent. For some reason, level 1 did not produce a note in the explain plan that dynamic sampling was used, and thus the low estimate of 9. I’ll investigate this.
OK, so playing out with the level seems to be worth the effort.
As for level 1 – could it be that level 1 (OPTIMIZIER HINT) is only for the unanalyzed tables and it is missing in the docs?
my first question was (and still is), why the optimizer is not regarding the NOT operator ?
In my opinion the estimated selectivity for the negated LIKE predicate shoud be:
s(NOT LIKE =1-s( LIKE )
but instead it seems to be equal.
Applying this to the presented query the total reduction for those four predicates should only be 95%^4 (about 0.8) instead of 5%^4 resulting into an about 100.000 times greater cardinality estimation.
Regards,
Rainer Stenzel
additional question and remark after some tests:
Was the presented scenario investigated with Oracle 9i ?
With Oracle 10g I found modified selectivities
with literals:
5% for column LIKE wild_card_prefixed_pattern
50% for column NOT LIKE wild_card_prefixed_pattern
respective with bind variables:
1% for column LIKE pattern_bind_variable
5% for column NOT LIKE pattern_bind_variable
undocumented and consequently quite wired.
Regards,
Rainer Stenzel
@Rainer
Q: Why the optimizer is not regarding the NOT operator ?
In my opinion the estimated selectivity for the negated LIKE predicate should be:
s(NOT LIKE)=1-s(LIKE)
A: Because that is not currently the logic in the code. They are not equivalent today.
I only performed my tests on 11.1.0.6.
If you check out Cost-Based Oracle Fundamentals by Jonathan Lewis, page 133, table 6-6 “Fixed Percentages Used for Selectivity on Character Expressions”, you can see his observations.
Hello Greg,
unfortunately I did not see your hint to 11.1.0.6 first time and can not test on 11g by myself currently.I did some tests on 10g again and found:
1) if using literals (or bind peeking ?) the optimizer
1.1) is applying s(column NOT LIKE pattern)=1-s(column LIKE pattern) as expected and
1.2) is using different fixed selectivities for wildcard prefixed patterns and not wildcard prefixed patterns.
1.3) is using different fixed selectivities for non wildcard prefixed patterns with varying length.
2) is using the same selectivity (5%) for LIKE :v as for NOT LIKE :v when using bind variables and (however) not using bind peeking.
May be things changed again with using 11g and adaptive cursor sharing,
==> but at least on 10g I would expect a better cardinality estimation by the optimizer than presented from your example.
Have a look at the different cardinality assumptions at the example below: