Archive

Archive for the ‘Troubleshooting’ Category

Glenn Fawcetts's Oracle Analysis 101

December 10th, 2008

Glenn Fawcett has put together a nice slide deck entitled “Oracle Analysis 101” that discusses techniques to collect and analyze performance data related to Oracle databases. It’s a good read so check it out. Glenn is also a great resource for Oracle on Sun Niagara 2 Processors (CMT) so be sure to peek at his other blog posts as well.

Post to Twitter Tweet This Post

Performance, Troubleshooting

DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS

October 14th, 2008

I’ve written before on choosing an optimal stats gathering strategy but I recently came across a scenario that I didn’t directly blog about and think it deserves attention. As I mentioned in that previous post, one should only deviate from the defaults when they have a reason to, and fully understand that reason and the effect of that decision.

Understanding METHOD_OPT

The METHOD_OPT parameter of DBMS_STATS controls two things:

  1. on which columns statistics will be collected
  2. on which columns histograms will be collected (and how many buckets)

It is very important to understand #1 and how the choice of METHOD_OPT effects the collection of column statistics.

Prerequisite: Where Do I Find Column Statistics?

Understanding where to find column statistics is vital for troubleshooting bad execution plans. These views will be the arrows in your quiver:

  • USER_TAB_COL_STATISTICS
  • USER_PART_COL_STATISTICS
  • USER_SUBPART_COL_STATISTICS

Depending on if the table is partitioned or subpartitioned, and depending on what GRANULARITY the stats were gathered with, the latter two of those views may or may not be populated.

The Bane of METHOD_OPT: FOR ALL INDEXED COLUMNS

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. Allow me to explain. Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' (a common METHOD_OPT I see) tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain“. Is that really what you want? My guess is probably not. Let me work through a few examples to explain why.

I’m going to start with this table.

SQL> exec dbms_random.initialize(1);

PL/SQL procedure successfully completed.

SQL> create table t1
  2  as
  3  select
  4    column_value                    pk,
  5    round(dbms_random.value(1,2))   a,
  6    round(dbms_random.value(1,5))   b,
  7    round(dbms_random.value(1,10))  c,
  8    round(dbms_random.value(1,100)) d,
  9    round(dbms_random.value(1,100)) e
 10  from table(counter(1,1000000))
 11  /

Table created.

SQL> begin
  2    dbms_stats.gather_table_stats(
  3      ownname => user ,
  4      tabname => 'T1' ,
  5      estimate_percent => 100 ,
  6      cascade => true);
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> select
  2    COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  3    to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  4  from user_tab_col_statistics
  5  where table_name='T1'
  6  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 18:39:51
A                      2 NONE                      1 2008-13-10 18:39:51
B                      5 NONE                      1 2008-13-10 18:39:51
C                     10 NONE                      1 2008-13-10 18:39:51
D                    100 NONE                      1 2008-13-10 18:39:51
E                    100 NONE                      1 2008-13-10 18:39:51

6 rows selected.

This 6 column table contains 1,000,000 rows of randomly generated numbers. I’ve queried USER_TAB_COL_STATISTICS to display some of the important attributes (NDV, Histogram, Number of Buckets, etc).

I’m going to now put an index on T1(PK), delete the stats and recollect stats using two different METHOD_OPT parameters that each use 'FOR ALL INDEXED COLUMNS'.

SQL> create unique index PK_T1 on T1(PK);

Index created.

SQL> begin
  2    dbms_stats.delete_table_stats(user,'T1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname => user ,
  6      tabname => 'T1' ,
  7      estimate_percent => 100 ,
  8      method_opt => 'for all indexed columns' ,
  9      cascade => true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 HEIGHT BALANCED          75 2008-13-10 18:41:10

SQL> begin
  2    dbms_stats.delete_table_stats(user,'T1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname => user ,
  6      tabname => 'T1' ,
  7      estimate_percent => 100 ,
  8      method_opt => 'for all indexed columns size auto' ,
  9      cascade => true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 18:41:12

Notice that in both cases only column PK has stats on it. Columns A,B,C,D and E do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

Now one might think that is no big deal or perhaps they do not realize this is happening because they do not look at their stats. Let’s see what we get for cardinality estimates from the Optimizer for a few scenarios.

SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5  /

  COUNT(*)
----------
    500227

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  4df0g0r99zmba, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.24 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |    500K|00:00:00.50 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"=1)

Notice the E-Rows estimate for T1. The Optimizer is estimating 10,000 rows when in reality there is 500,227. The estimate is off by more than an order of magnitude (50x). Normally the calculation for the cardinality would be (for a one table single equality predicate):
number of rows in T1 * 1/NDV = 1,000,000 * 1/2 = 500,000
but in this case 10,000 is the estimate. Strangely enough (or not), 10,000 is exactly 0.01 (1%) of 1,000,000. Because there are no column stats for T1.A, the Optimizer is forced to make a guess, and that guess is 1%.

As you can see from the 10053 trace (below), since there are no statistics on the column, defaults are used. In this case they yield very poor cardinality estimations.

SINGLE TABLE ACCESS PATH
  -----------------------------------------
  BEGIN Single Table Cardinality Estimation
  -----------------------------------------
  Column (#2): A(NUMBER)  NO STATISTICS (using defaults)
    AvgLen: 13.00 NDV: 31250 Nulls: 0 Density: 3.2000e-05
  Table: T1  Alias: T1
    Card: Original: 1000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  -----------------------------------------
  END   Single Table Cardinality Estimation
  -----------------------------------------

Now that I’ve demonstrated how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.

SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6  /

  COUNT(*)
----------
    124724

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  ctq8q59qdymw6, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.19 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    100 |    124K|00:00:00.25 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("A"=1 AND "B"=3))

Yikes. In this case the cardinality estimate is 100 when the actual number of rows is 124,724, a difference of over 3 orders of magnitude (over 1000x). Where did the 100 row estimate come from? In this case there are two equality predicates so the selectivity is calculated as 1% * 1% or 0.01 * 0.01 = 0.0001. 1,000,000 * 0.0001 = 100. Funny that. (The 1% is the default selectivity for an equality predicate w/o stats.)

Now let’s add a derived predicate as well and check the estimates.

SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e > 50
  7  /

  COUNT(*)
----------
    109816

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e > 50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.22 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      5 |    109K|00:00:00.33 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("A"=1 AND "B"=3 AND "D"+"E">50))

Doh! The cardinality estimate is now 5, but the actual number of rows being returned is 109,816. Not good at all. The Optimizer estimated 5 rows because it used a default selectivity of 1% (for A=1) * 1% (for B=3) * 5% (for D+E > 50) * 1,000,000 rows. Now can you see why column statistics are very important? All it takes is a few predicates and the cardinality estimation becomes very small, very fast. Now consider this:

  • What is likely to happen in a data warehouse where the queries are 5+ table joins and the fact table columns do not have indexes?
  • Would the Optimizer choose the correct driving table?
  • Would nested loops plans probably be chosen when it is really not appropriate?

Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

  • Non representative (or missing) statistics lead to
  • Poor cardinality estimates which leads to
  • Poor access path selection which leads to
  • Poor join method selection which leads to
  • Poor join order selection which leads to
  • Poor SQL execution times

Take 2: Using the Defaults

Now I’m going to recollect stats with a default METHOD_OPT and run through the 3 execution plans again:

SQL> begin
  2    dbms_stats.delete_table_stats(user,'t1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname => user ,
  6      tabname => 'T1' ,
  7      estimate_percent => 100 ,
  8      degree => 8,
  9      cascade => true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, histogram, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics where table_name='T1'
  4  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 19:44:32
A                      2 FREQUENCY                 2 2008-13-10 19:44:32
B                      5 FREQUENCY                 5 2008-13-10 19:44:32
C                     10 FREQUENCY                10 2008-13-10 19:44:32
D                    100 NONE                      1 2008-13-10 19:44:32
E                    100 NONE                      1 2008-13-10 19:44:32

6 rows selected.
SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5  /

  COUNT(*)
----------
    500227

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  4df0g0r99zmba, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.20 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    500K|    500K|00:00:00.50 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"=1)
SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6  /

  COUNT(*)
----------
    124724

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  ctq8q59qdymw6, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.14 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    124K|    124K|00:00:00.25 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1))
SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e > 50
  7  /

  COUNT(*)
----------
    109816

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e>50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.17 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   6236 |    109K|00:00:00.22 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1 AND "D"+"E">50))

As you can see, the first two queries have spot on cardinality estimates, but the the third query isn’t as good as it uses a column combination and there are no stats on D+E columns, only D and E individually. I’m going to rerun the third query with dynamic sampling set to 4 (in 10g it defaults to 2) and reevaluate the cardinality estimate.

SQL> alter session set optimizer_dynamic_sampling=4;

Session altered.

SQL> select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e > 50
  7  /

  COUNT(*)
----------
    109816

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 1
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e > 50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.17 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    102K|    109K|00:00:00.22 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1 AND "D"+"E">50))

Note
-----
   - dynamic sampling used for this statement

Bingo! Close enough to call statistically equivalent.

Summary

I hope this little exercise demonstrates how important it is to have representative statistics and that when statistics are representative the Optimizer can very often accurately estimate the cardinality and thus choose the best plan for the query. Remember these points:

  • Recent statistics do not necessarily equate to representative statistics.
  • Statistics are required on all columns to yield good plans – not just indexed columns.
  • You probably should not be using METHOD_OPT => 'FOR ALL INDEXED COLUMNS SIZE AUTO', especially in a data warehouse where indexes are used sparingly.
  • Dynamic Sampling can assist with cardinality estimates where existing stats are not enough.

Tests performed on 10.2.0.4

Post to Twitter Tweet This Post

10gR2, 11gR1, Data Warehousing, Execution Plans, Optimizer, Oracle, Performance, SQL Tuning, Statistics, Troubleshooting

Understanding Performance

September 8th, 2008

There has been some debate in forumsphere/blogosphere centered around Steve Karam’s observation of a 20x elapsed time difference in an update statement “by only changing the block size”. At this point in time, it is pretty much understood (I hope) that this performance delta is directly related to bug 6918210. This bug manifests its nasty head when the following conditions exist:

  • table in an ASSM tablespace
  • 16KB or 32KB block size
  • row migrations at the ASSM storage layer

You might ask what are “row migrations“? Row migrations are generally caused by too low of a PCTFREE setting on the table. When row has grown in length due to an UPDATE and no longer fits in the current block it is migrated to a new block in which there is enough space for it. This migration is an insert followed by an delete at the storage layer.

Jonathan Lewis gets the credit for putting the test case (alt. URL) together that reproduces performance bug. Since then I’ve run the test case numerous times, with some slight modifications, but I would like to note directly some results that I observed by modifying PCTFREE. These tests were executed on 10.2.0.4 using 200,000 rows vs. the 830,000 rows that the original test case contains. The reason for that is I could produce the scenarios I wanted and I didn’t have to wait for hours when the bug was present.

BLOCK SIZE
PCT_FREE
BLOCKS
(BEFORE/AFTER)
BYTES
(BEFORE/AFTER)
AVG_ROW_LEN
(BEFORE/AFTER)
UPDATE TIME
4096
10
742 / 2758
3,039,232 / 11,296,768
5 / 11
00:00:26.71
4096
35
994 / 994
4,071,424 / 4,071,424
5 / 11
00:00:08.09
8192
10
370 / 1378
3,031,040 / 11,288,576
5 / 11
00:00:27.36
8192
35
496 / 496
4,063,232 / 4,063,232
5 / 11
00:00:08.99
16384
10
184 / 814
3,014,656 / 13,336,576
5 / 11
00:10:59.95
16384
35
247 / 247
4,049,318 / 4,049,318
5 / 11
00:00:08.81

In Steve’s case, he started with a 16KB block and a default PCTFREE setting of 10 (he has not noted otherwise). Because the test case consists of a two column table where the second column starts out NULL and then is updated to the first column’s value, the default PCTFREE setting is not optimal. As you can see from the table, with a value of PCTFREE 10, quite a few row migrations take place. This is evident by the increase of the number of blocks in the table. Notice that when PCTFREE 35 is used, the number of blocks does not change. This is because the row can be expanded in length (in place) and thus no row migrations take place. It is also worth noting that when row migrations take place, the table grows in size to around 4x the original size. Most importantly, performance is best when PCTFREE is optimal, no matter what the block size.

Does Block Size Matter?

In this case block size matters when bug 6918210 is encountered, but what really matters is an appropriate setting of PCTFREE. As the metrics demonstrate, performance is equal between each of the three block sizes and a given PCTFREE setting, the exception being the bug. Performance is best with an optimal PCTFREE setting, no matter what the block size.

How Would One Have Discovered This Was A Bug?

First and foremost, when drastic performance deltas are observed, they should be throughly investigated. If it is too good to be true, it probably is. Because this bug is in the ASSM layer it is difficult to diagnose. This is because that layer does not contain the same level of instrumentation that other layers do. Even if you look at the 10046 trace or the ASH data, it will not provide sufficient information. So what does one do in this case? Well, if you can not get debug info from inside the database, then you have to look outside. Specifically, getting a stack trace using pstack. When the 16KB/PCTFREE 10 update was running I took some stack traces. What I noticed was this:

$ pstack 23233
#0  0x09ed3e26 in kdb_count_pdml_itl ()
#1  0x09ed42ad in kdb_diff_pdml_itl ()
#2  0x09ef02a7 in kdt_bseg_srch_cbk ()
#3  0x089d2ddd in ktspfpblk ()
#4  0x089d32b9 in ktspfsrch ()
#5  0x089ce530 in ktspscan_bmb ()
#6  0x089ccff0 in ktspgsp_cbk1 ()
#7  0x089cd612 in ktspgsp_cbk ()
#8  0x09ef2946 in kdtgsp ()
#9  0x09ef0f61 in kdtgrs ()
#10 0x09ee848b in kdtInsRow ()
#11 0x09ea724b in kdumrp ()
#12 0x09ea35f0 in kduurp ()
#13 0x09e96dd0 in kdusru ()
#14 0x095b4ad4 in updrowFastPath ()
#15 0x0a1eb0f4 in __PGOSF287_qerupFRowProcedure ()
#16 0x0a1eb7d0 in qerupFetch ()
#17 0x0a215c26 in qerstFetch ()
#18 0x095a42b4 in updaul ()
#19 0x095a6aa3 in updThreePhaseExe ()
#20 0x095a594f in updexe ()
#21 0x0a3c68f1 in opiexe ()
#22 0x0a4cd1db in kpoal8 ()
#23 0x08cd5138 in opiodr ()
#24 0x0acdb78d in ttcpip ()
#25 0x08cd1212 in opitsk ()
#26 0x08cd3d51 in opiino ()
#27 0x08cd5138 in opiodr ()
#28 0x08cca5f2 in opidrv ()
#29 0x0962f797 in sou2o ()
#30 0x08299b3b in opimai_real ()
#31 0x08299a2b in main ()

I would not expect anyone to know what these function do, but some might have some guesses based on parts of the names. While that may be useful, it is not necessary. What is important is that this stack trace was identical for nearly the entire 11 minutes that 16KB/PCTFREE 10 update ran. The most important part of performance troubleshooting is gathering good information. Even if you do not know how to exactly interpret it, it is likely that someone in Oracle Support can.

Now for those of you who are really paying attention, you may notice that the two functions (ktspscan_bmb & ktspfsrch) that Jonathan lists in his key statistics are in this stack trace. This is not just coincidence.

Troubleshoot Like A Real Expert

One of the guys who frequently uses (and blogs about) this type of performance troubleshooting approach is Tanel Poder. I would have really liked to try out his DTrace script but I did not have a Solaris host readily available, so I was stuck with old unsexy pstack on Linux. This brings me to an awesome quote from his recent blog post using DTrace:

…the key point of this article is that I could have started guessing or setting various events or undocumented parameters for probing Oracle here and there, for figuring out what’s going on. But I used a simple systematic approach instead. When Oracle Wait Interface and V$SESSTAT counters didn’t help, looked into process stack next… And if I hadn’t figured out the cause myself, I would have had some solid evidence for support.

I can not stress enough how important the approach matters. People who follow this approach will ultimately be successful and also much better off than those who just hack away. Understanding the why of performance is key.

Post to Twitter Tweet This Post

Oracle, Performance, Troubleshooting

Oracle 11g: Incremental Global Statistics On Partitioned Tables

July 16th, 2008

Previously I blogged about the new and improved DBMS_STATS.AUTO_SAMPLE_SIZE used to calculate NDV in Oracle 11g and now I wanted to touch on another new feature of DBMS_STATS in 11g: Incremental Global Statistics On Partitioned Tables.

Before Incremental Global Stats (Two-Pass Method)

When DBMS_STATS.GATHER_TABLE_STATS collects statistics on a partitioned table, generally it does so at the partition and table (global) level (the default behavior can be modified by changing the GRANULARITY parameter). This is done in two steps. First, partition level stats are gathered by scanning the partition(s) that have stale or empty stats, then a full table scan is executed to gather the global statistics. As more partitions are added to a given table, the longer the execution time for GATHER_TABLE_STATS, due to the full table scan requited for global stats.

Using Incremental Global Stats (Synopsis-Based Method)

Incremental Global Stats works by collecting stats on partitions and storing a synopsis which is the statistics metadata for that partition and the columns for that partition. This synopsis is stored in the SYSAUX tablespace, but is quite small (only a few kilobytes). Global stats are then created not by reading the entire table, but by aggregating the synopses from each partition. Incremental Global Stats, in conjunction with the new 11g DBMS_STATS.AUTO_SAMPLE_SIZE, yield a significant reduction in the time to collect statistics and produce near perfect accuracy.

Turning On Incremental Global Stats

Incremental Global Stats can only be used for partitioned tables and is activated by this command:

SQL> exec DBMS_STATS.SET_TABLE_PREFS(user,'FOO','INCREMENTAL','TRUE')

-- To see the value for INCREMENTAL for a given table:

SQL> select dbms_stats.get_prefs('INCREMENTAL', tabname=>'FOO') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL',TABNAME=>'FOO')
--------------------------------------------------
TRUE

You may also use any of the other DBMS_STATS.SET_*_PREFS as well.

A Real-World Example

To demonstrate the benefit of Incremental Global Statistics, I created a range partitioned table consisting of 60 range partitions. The target table starts empty and one million (1,000,000) rows are inserted into a single partition of the table and then statistics are gathered. This is done 60 times, simulating loading 60 one day partitions (one at a time) emulating a daily ETL/ELT process over 60 days.


Incremental_Stats.png

Elapsed Times
Partitions
Incremental=FALSE
Incremental=TRUE
1
00:00:20.36
00:00:21.14
10
00:02:27.25
00:00:37.76
20
00:04:46.23
00:00:49.83
30
00:07:05.47
00:01:01.80
40
00:09:11.09
00:01:23.33
50
00:11:33.18
00:01:30.40
60
00:13:18.15
00:01:40.28
Cumulative Elapsed Time
06:42:21.20
01:00:53.80

As you can see from the chart and the table, without Incremental Global Stats the time to gather stats increases pretty much linearly with the number of partitions, but with Incremental Global Stats the elapse time only slightly increases. The big difference is in the cumulative elapsed time: It takes 6 hours 42 minutes without Incremental Global Stats, but only 1 hour with. Quite a significant savings over time!

Revisiting The Math

For this experiment the time to gather stats without Incremental Global Stats is:
(time to scan & gather for 1 partition) + (time to scan and gather for entire table)
When Incremental Global Stats is used the time to gather stats is:
(time to scan & gather for 1 partition) + (time to aggregate all synopses)

The Diff Test

I exported the stats into a stats table and then ran the diff to compare the two runs. This will show us how comparable the two methods of stats gathering are.

SQL> set long 500000 longchunksize 500000
SQL> select report, maxdiffpct from
     table(dbms_stats.diff_table_stats_in_stattab(user,'CATALOG_SALES','STATS_DEFAULT','STATS_INC'));

REPORT
------------------------------------------------------------------------------------
MAXDIFFPCT
----------
###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE	      : CATALOG_SALES
OWNER	      : TPCDS
SOURCE A      : User statistics table STATS_DEFAULT
	      : Statid	   :
	      : Owner	   : TPCDS
SOURCE B      : User statistics table STATS_INC
	      : Statid	   :
	      : Owner	   : TPCDS
PCTTHRESHOLD  : 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NO DIFFERENCE IN TABLE / (SUB)PARTITION STATISTICS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME	SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

CS_BILL_ADDR_SK A   1001152 .000000998 NO   148640  5	 C102  C402  5.9E+07
		B   1001176 .000000998 NO   148613  5	 C102  C402  5.9E+07
CS_BILL_CDEMO_S A   1868160 .000000535 NO   148646  6	 C102  C4025 5.9E+07
		B   1878320 .000000532 NO   148753  6	 C102  C4025 5.9E+07
CS_BILL_CUSTOME A   1942528 .000000514 NO   148104  6	 C102  C403  5.9E+07
		B   1949464 .000000512 NO   148192  6	 C102  C403  5.9E+07
CS_BILL_HDEMO_S A   7200    .000138888 NO   148227  4	 C102  C249  5.9E+07
		B   7200    .000138888 NO   148250  4	 C102  C249  5.9E+07
CS_CALL_CENTER_ A   30	    .033333333 NO   148310  3	 C102  C11F  5.9E+07
		B   30	    .033333333 NO   148272  3	 C102  C11F  5.9E+07
CS_CATALOG_PAGE A   11092   .000090155 NO   148111  5	 C102  C3023 5.9E+07
		B   11092   .000090155 NO   148154  5	 C102  C3023 5.9E+07
CS_EXT_LIST_PRI A   1133824 .000000881 NO   148461  6	 C102  C3036 5.9E+07
		B   1131680 .000000883 NO   148368  6	 C102  C3036 5.9E+07
CS_EXT_WHOLESAL A   394880  .000002532 NO   148842  5	 C102  C302  5.9E+07
		B   394880  .000002532 NO   148772  5	 C102  C302  5.9E+07
CS_ITEM_SK	A   205888  .000004857 NO   0	    5	 C102  C3152 5.9E+07
		B   205408  .000004868 NO   0	    5	 C102  C3152 5.9E+07
CS_LIST_PRICE	A   29896   .000033449 NO   148438  5	 C102  C204  5.9E+07
		B   29896   .000033449 NO   148458  5	 C102  C204  5.9E+07
CS_ORDER_NUMBER A   7151104 .000000139 NO   0	    6	 C102  C4102 5.9E+07
		B   7122072 .000000140 NO   0	    6	 C102  C4102 5.9E+07
CS_PROMO_SK	A   1000    .001       NO   148617  4	 C102  C20B  5.9E+07
		B   1000    .001       NO   148693  4	 C102  C20B  5.9E+07
CS_QUANTITY	A   100     .01        NO   148737  3	 C102  C202  5.9E+07
		B   100     .01        NO   148751  3	 C102  C202  5.9E+07
CS_SHIP_ADDR_SK A   1001088 .000000998 NO   148150  5	 C102  C402  5.9E+07
		B   1001152 .000000998 NO   148235  5	 C102  C402  5.9E+07
CS_SHIP_CDEMO_S A   1870592 .000000534 NO   148918  6	 C102  C4025 5.9E+07
		B   1878272 .000000532 NO   148862  6	 C102  C4025 5.9E+07
CS_SHIP_CUSTOME A   1938816 .000000515 NO   148300  6	 C102  C403  5.9E+07
		B   1948928 .000000513 NO   148309  6	 C102  C403  5.9E+07
CS_SHIP_DATE_SK A   1884    .000530785 NO   148674  6	 C4032 C4032 5.9E+07
		B   1884    .000530785 NO   148608  6	 C4032 C4032 5.9E+07
CS_SHIP_HDEMO_S A   7200    .000138888 NO   148172  4	 C102  C249  5.9E+07
		B   7200    .000138888 NO   148161  4	 C102  C249  5.9E+07
CS_SHIP_MODE_SK A   20	    .05        NO   148437  3	 C102  C115  5.9E+07
		B   20	    .05        NO   148486  3	 C102  C115  5.9E+07
CS_SOLD_DATE_SK A   1595    .000626959 NO   0	    6	 C4032 C4032 5.9E+07
		B   1587    .000630119 NO   0	    6	 C4032 C4032 5.9E+07
CS_WAREHOUSE_SK A   15	    .066666666 NO   148651  3	 C102  C110  5.9E+07
		B   15	    .066666666 NO   148620  3	 C102  C110  5.9E+07
CS_WHOLESALE_CO A   9901    .000100999 NO   149054  4	 C102  C202  5.9E+07
		B   9901    .000100999 NO   149099  4	 C102  C202  5.9E+07

The stats diff shows that for many columns the NDV is identical and the others are statistically equivalent (close enough to be the same). I will certainly be adding this feature to my “conviction must use list” for Oracle 11g.

Further Reading

If you are interested in the bits and bytes of how the synopsis-based method works, I would suggest you read the whitepaper, Efficient and Scalable Statistics Gathering for Large Databases in Oracle 11g that was presented on this topic at SIGMOD 2008.

Post to Twitter Tweet This Post

10gR2, 11gR1, Data Warehousing, Execution Plans, Optimizer, Oracle, Statistics, Troubleshooting, VLDB

Using Bitmap Indexes Effectively

May 29th, 2008

Recently I was reading this thread, “Trying to make use of bitmap indexes” on the Oracle Forum. Before I had finished a working example, Jonathan Lewis had posted his response which was on par with my thoughts. Since this is a topic I see frequently, I thought I would finish my experiment and publish it here.

What We Are Given

The author of the original post had stated that the table in question contains about 16 million rows and states: “The table contains three IDEAL columns for bitmap indexes the first of which
may have only two, the second three and the third four distinct values. I was planning to change the index type on these columns to BITMAP [from B-tree]
.” To keep the focus of this post narrow, I’m only going to discuss whether or not one should consider bitmap indexes for queries, and not discuss potential update related issues.

The Data

For this experiment, I’m going to create a table that has three columns with the given NDV from above and add in a few extra filler columns to pad it out a bit. Since I do not know the exact table structure, I’ll just go with a simple example. In reality, the posters table may be wider, but for this example, it is what it is.

create table bm_test
nologging compress
as
select
  round(dbms_random.value(1, 2)) a  -- NDV 2
, round(dbms_random.value(1, 3)) b  -- NDV 3
, round(dbms_random.value(1, 4)) c  -- NDV 4
, to_char(800000+100000*dbms_random.normal,'fm000000000000') c3
, to_char(800000+100000*dbms_random.normal,'fm000000000000') c4
, to_char(15000+2000*dbms_random.normal,'fm000000') c5
, to_char(80000+10000*dbms_random.normal,'fm000000000000') c6
from dual
connect by level <= 16000000
/

desc bm_test
 Name		   Null?    Type
 ----------------- -------- ------------
 A			    NUMBER
 B			    NUMBER
 C			    NUMBER
 C3			    VARCHAR2(13)
 C4			    VARCHAR2(13)
 C5			    VARCHAR2(7)
 C6			    VARCHAR2(13)

exec dbms_stats.gather_table_stats(user,'BM_TEST');

create bitmap index bm1 on bm_test(a);
create bitmap index bm2 on bm_test(b);
create bitmap index bm3 on bm_test(c);

select a, b, c, count(*)
from bm_test
group by a,b,c
order by a,b,c;

         A          B          C   COUNT(*)
---------- ---------- ---------- ----------
         1          1          1     333292
         1          1          2     666130
         1          1          3     666092
         1          1          4     333585
         1          2          1     668594
         1          2          2    1332121
         1          2          3    1332610
         1          2          4     668608
         1          3          1     333935
         1          3          2     666055
         1          3          3     666619
         1          3          4     333106
         2          1          1     333352
         2          1          2     665038
         2          1          3     665000
         2          1          4     333995
         2          2          1     669120
         2          2          2    1332744
         2          2          3    1332766
         2          2          4     668411
         2          3          1     333891
         2          3          2     665924
         2          3          3     664799
         2          3          4     334213

24 rows selected.

select segment_name,
       segment_type,
       sum(blocks) blocks,
       sum(bytes)/1024/1024 mb
from user_segments
where segment_name like 'BM%'
group by segment_name, segment_type;

SEGMENT_NAME SEGMENT_TYPE     BLOCKS         MB
------------ ------------ ---------- ----------
BM_TEST      TABLE            102592      801.5
BM1          INDEX               768          6
BM2          INDEX              1152          9
BM3          INDEX              1408         11

select object_name, object_id
from user_objects
where object_name like 'BM%'

OBJECT_NAME   OBJECT_ID
------------ ----------
BM_TEST           54744
BM1               54745
BM2               54746
BM3               54747

The Queries And Execution Plans

The original post did not contain any queries or predicates, so for the purpose of this example I’m going to assume that there are exactly three predicates, one on each of column A, B and C, and that each predicate is a single equality (e.g. A=1 and B=1 and C=1). Looking at the data distribution from the query above, we observe there are approximately three different grouping counts: the lower around 333,000 the middle around 666,000 and the upper around 1,300,000. I will choose tuples from each of these groupings for the three test cases.

Query A

select *
from bm_test
where a=1 and b=1 and c=1;

333292 rows selected.
Plan hash value: 3643416817

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |       |       | 23314 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID | BM_TEST |   326K|    17M| 23314   (1)| 00:04:40 |
|   2 |   BITMAP CONVERSION TO ROWIDS|         |       |       |            |          |
|   3 |    BITMAP AND                |         |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| BM3     |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| BM2     |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| BM1     |       |       |            |          |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("C"=1)
   5 - access("B"=1)
   6 - access("A"=1)

Query B

select *
from bm_test
where a=1 and b=1 and c=2;

666130 rows selected.
Plan hash value: 3202922749

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |       |       | 27105 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID | BM_TEST |   653K|    34M| 27105   (1)| 00:05:26 |
|   2 |   BITMAP CONVERSION TO ROWIDS|         |       |       |            |          |
|   3 |    BITMAP AND                |         |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| BM2     |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| BM1     |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| BM3     |       |       |            |          |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("B"=1)
   5 - access("A"=1)
   6 - access("C"=2)

Query C

select *
from bm_test
where a=1 and b=2 and c=2;

1332121 rows selected.
Plan hash value: 1873942893

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |       |       | 28243 (100)|          |
|*  1 |  TABLE ACCESS FULL| BM_TEST |  1377K|    72M| 28243   (2)| 00:05:39 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("C"=2 AND "B"=2 AND "A"=1))

As you can see from the execution plans, Query A and B use the bitmap indexes and Query C uses a Full Table Scan. Of the 16,000,000 rows, Query A returns 333,292 (2.08%), Query B returns 666,130 (4.16%) and Query C returns 1,332,121 rows (8.33%). I think it is important to note that the change in the execution plan from index access to table scan is due to the costing, not directly due to the percentage of data returned.

Execution Times

I’m going to gather two sets of execution times. The first will be with a cold buffer cache, and the second with a warm buffer cache. All elapsed times are in seconds.

Query Execution Plan Cold Cache Warm Cache
A Bitmap Index 38 3
B Bitmap Index 40 4
C FTS 16 16

As you can see from the execution times, there is a significant difference (approx. 11x) between the cold and warm cache executions of each Query A and Query B. The other observation is that Query C (FTS) is faster than Query A (Index Access) on a cold cache. We surely need to account for this. One observation I made (from iostat) is that the I/O throughput rate for Query A and Query B was around 23MB/s while the I/O rate for Query C was around the 55MB/s range during the cold cache execution. None of the queries used the Parallel Query Option.

Lets take a look at the tkprof output from both the cold and warm cache executions of Query A and see if we can find where the time is being spent. The traces were collected using event 10046, level 8.

Query A TKPROF – Warm Cache

select /* warm cache */ *
from bm_test
where a=1 and b=1 and c=1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     3334      2.20       2.17          0     102184          0      333292
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     3336      2.20       2.18          0     102184          0      333292

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 31

Rows     Row Source Operation
-------  ---------------------------------------------------
 333292  TABLE ACCESS BY INDEX ROWID BM_TEST (cr=102184 pr=0 pw=0 time=19332 us cost=23314 size=17945290 card=326278)
 333292   BITMAP CONVERSION TO ROWIDS (cr=1162 pr=0 pw=0 time=2329 us)
     92    BITMAP AND  (cr=1162 pr=0 pw=0 time=1691 us)
    642     BITMAP INDEX SINGLE VALUE BM3 (cr=367 pr=0 pw=0 time=104 us)(object id 54747)
    697     BITMAP INDEX SINGLE VALUE BM2 (cr=396 pr=0 pw=0 time=92 us)(object id 54746)
    727     BITMAP INDEX SINGLE VALUE BM1 (cr=399 pr=0 pw=0 time=117 us)(object id 54745)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    3337        0.00          0.00
  SQL*Net message from client                  3337        0.00          1.04

When the cache is warm, there are no physical reads that take place. This would explain the fast execution of the query.

Note: For Bitmap execution plans, the number that appears in the rows column is actually bitmap fragments (compressed rowids), not actual rows. This is why the number looks suspiciously small.

Query A TKPROF – Cold Cache

select /* cold cache */ *
from bm_test
where a=1 and b=1 and c=1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     3334     11.44      36.22      99722     102184          0      333292
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     3336     11.45      36.22      99722     102184          0      333292

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 31

Rows     Row Source Operation
-------  ---------------------------------------------------
 333292  TABLE ACCESS BY INDEX ROWID BM_TEST (cr=102184 pr=99722 pw=99722 time=294694 us cost=23314 size=17945290 card=326278)
 333292   BITMAP CONVERSION TO ROWIDS (cr=1162 pr=1041 pw=1041 time=2490 us)
     92    BITMAP AND  (cr=1162 pr=1041 pw=1041 time=5104 us)
    642     BITMAP INDEX SINGLE VALUE BM3 (cr=367 pr=324 pw=324 time=1840 us)(object id 54747)
    697     BITMAP INDEX SINGLE VALUE BM2 (cr=396 pr=351 pw=351 time=1817 us)(object id 54746)
    727     BITMAP INDEX SINGLE VALUE BM1 (cr=399 pr=366 pw=366 time=1534 us)(object id 54745)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    3336        0.00          0.00
  SQL*Net message from client                  3336        0.00          1.12
  db file sequential read                     99722        0.04         30.60

As you can see the majority of the time was spent on db file sequential read doing the 99,722 physical reads. This explains the difference in elapsed time between the cold and warm cache executions of Query A: it comes down to physical I/O. But why does Query C run in half the time that Query A runs in when the cache is cold, given that Query C is doing a FTS and Query A is not? Shouldn’t the FTS plan be slower than the index plan?

Looking at the raw trace file for Query A, we observe the following:

WAIT #2: nam='db file sequential read' ela= 241 file#=1 block#=1770152 blocks=1 obj#=54744 tim=1212013191665924
WAIT #2: nam='db file sequential read' ela= 232 file#=1 block#=1770153 blocks=1 obj#=54744 tim=1212013191666240
WAIT #2: nam='db file sequential read' ela= 351 file#=1 block#=1770156 blocks=1 obj#=54744 tim=1212013191666650
WAIT #2: nam='db file sequential read' ela= 240 file#=1 block#=1770157 blocks=1 obj#=54744 tim=1212013191666948
WAIT #2: nam='db file sequential read' ela= 298 file#=1 block#=1770158 blocks=1 obj#=54744 tim=1212013191667306

As you can see, the table is being read sequentially 1 block at a time. Let’s examine the TKPROF from Query C.

Query C TKPROF

select *
from bm_test
where a=1 and b=2 and c=2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    13323      5.99      11.17     102592     115831          0     1332121
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    13325      5.99      11.17     102592     115831          0     1332121

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 31

Rows     Row Source Operation
-------  ---------------------------------------------------
1332121  TABLE ACCESS FULL BM_TEST
(cr=115831 pr=102592 pw=102592 time=102744 us cost=28243 size=75768825 card=1377615)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   13325        0.00          0.01
  SQL*Net message from client                 13325        0.00          4.23
  db file sequential read                         2        0.02          0.03
  direct path read                              952        0.08          5.20

The majority of the time is spent on direct path read.

Let’s dig deeper and look at the raw trace file from Query C.

WAIT #2: nam='direct path read' ela= 6029 file number=1 first dba=1785609 block cnt=128 obj#=54744 tim=1212013229612857
WAIT #2: nam='direct path read' ela= 8638 file number=1 first dba=1787017 block cnt=128 obj#=54744 tim=1212013229628256
WAIT #2: nam='direct path read' ela= 7019 file number=1 first dba=1789193 block cnt=128 obj#=54744 tim=1212013229642410
WAIT #2: nam='direct path read' ela= 9276 file number=1 first dba=1791497 block cnt=128 obj#=54744 tim=1212013229658400
WAIT #2: nam='direct path read' ela= 6173 file number=1 first dba=1792777 block cnt=128 obj#=54744 tim=1212013229671314

As you can see with Query C, the read size is 128 blocks or 1MB (128 blocks * 8k block), the largest I/O that Oracle will issue. This explains the difference in the observed I/O throughput (23MB/s vs. 55MB/s): the bitmap index plan reads the table 1 block at a time, and the FTS reads (most of) it 128 blocks at a time. It makes good sense that if the read throughput rate is ~2x (23MB/s vs. 55MB/s) then the execution time would be ~0.5 as long (38 seconds vs. 16 seconds). The larger I/O size will have a higher throughput rate compared to a smaller I/O size. The exact breakdown of the multi-block reads are:

BLOCK_COUNT      COUNT TOTAL_BLOCKS
----------- ---------- ------------
          7          2           14
          8        106          848
          9         34          306
         16         10          160
         33          8          264
        119         42         4998
        128        750        96000
            ---------- ------------
sum                952       102590

Making Sense Of All The Observations

If we look at the tkprof output again from Query A, we see there are 99,722 waits on db file sequential read. Of those 99,722 waits, 98,681 are on the table (grep is our friend here using the raw trace file and the event and object number), the remaining are for the indexes. This tells us that 98,681 out of 102,592 total blocks of the table were retrieved, just 1 block at a time. Basically we have done a very inefficient full table scan. This explains our two observations: 1) why the FTS is faster than the index access plan with a cold cache and 2) why the FTS has a higher read throughput than the index access plan. It all comes down to efficient physical I/O.

The Big Picture

Just because a column has a low NDV does not necessarily mean it is an ideal candidate for a bitmap index. Just like B-tree indexes, bitmap indexes are best leveraged when the combination of them makes it very selective (returns only a small number of rows). The classic example of using a bitmap index on a gender column (male/female) is a horrible one in my opinion. If there are only two values, and there is an even distribution of data, 50% selectivity is too large and thus not a good candidate for a bitmap index. Would you use any index to access 50% of a table?

Bitmap indexes can be very useful in making queries run fast, but if the BITMAP CONVERSION TO ROWIDS returns a large list of rowids, you may find that a FTS (or partition scan) may yield better performance, but may use more I/O resources. It comes down to a trade off: If there is a high buffer cache hit rate for the objects in the bitmap plans, it will run reasonably fast and requite less physical I/O. If the objects are unlikely to be in the buffer cache, a FTS will yield better performance as long as it is not bottlenecked on I/O bandwidth.

Post to Twitter Tweet This Post

10gR2, 11gR1, Data Warehousing, Execution Plans, Optimizer, Performance, SQL Tuning, Statistics, Troubleshooting, VLDB

Choosing An Optimal Stats Gathering Strategy

March 26th, 2008

Recently the Oracle Optimizer Development Team put out a White Paper entitled Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer. This paper discusses the main differences between 9i and 10g in the subject area of the Optimizer and Statistics. As G.I. Joe said, “Now we know! And knowing is half the battle.” The other half of the battle is successfully applying that knowledge to the databases that you manage. Statistics are input to the Oracle Optimizer and the foundation of good plans. If the statistics supplied to the Oracle Optimizer are non-representative we can probably expect GIGO (Garbage In, Garbage Out). On the other hand, if the statistics are representative, chances quite good that the Oracle Optimizer will choose the optimal plan. In this post I’d like to discuss my thoughts on how to choose an optimal stats gathering strategy.

Suggested Readings

If you haven’t done so already, I would first suggest reading the following:

Start With A Clean Slate

My first recommendation is to unset any Optimizer related parameters that exist in your init.ora, unless you have specific recommendations from the application vendor. This includes (but is not limited to):

  • optimizer_index_caching
  • optimizer_index_cost_adj
  • optimizer_mode
  • db_file_multiblock_read_count

In almost every case, the defaults for these parameters are more than acceptable.

The same goes for any events and undocumented/hidden/underscore parameters that are set. Hidden parameters and events should only be used to temporarily work around bugs under the guidance and direction of Oracle Support and Development. Contrary to what you may find on the Internet via your favorite search engine, hidden parameters are not meant to be tuning mechanisms and are not a source of magic performance gains. They are mechanisms that developers have instrumented into their code to debug problems and only those developers know and understand the full impact of changing hidden parameters.

High Level Strategy

  • Start With the Defaults: In most cases, the defaults for Optimizer parameters and DBMS_STATS are adequate. If you are upgrading from 9i to 10g, do your homework and note the differences in the defaults. Test them to see if they work well for your data and your execution plans.
  • Dealing With Suboptimal Execution Plans: There may be cases of query plan regression. It is very important to be diligent about finding root cause. Often times many plan regressions surface from the same root cause. This means if you can correctly diagnose and resolve the root cause, you have the potential to resolve many plan regressions.
  • Adjust Strategy to Cope with the Exceptions: Once it is understood why the suboptimal plan was chosen, a resolution can be tested and implemented.

Start With The Defaults

I can not stress enough how important it is to start with the defaults for Optimizer and DBMS_STATS parameters. The Real-World Performance Group has dealt with numerous cases where customers are not using the default values for Optimizer parameters and by simply setting them back to the default values performance increases . If you are planning to regression test your system because you are upgrading your database, there is no better time to do a reset of these parameters. It will also put you in a much better place to troubleshoot if you run into issues. Give the software the best chance to do what it can for you. Don’t try and be too clever for your own good.

One of the most common problems I’ve seen is that customers have chosen a magic fixed percentage for ESTIMATE_PERCENT in DBMS_STATS. Why do I call it magic? Because the customers had no systematic reasoning for the value they chose.

In the Real-World Performance Group Roundtable session at Oracle OpenWorld 2007 the topic of DBMS_STATS came up and I asked “How many people are using ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE?” A handful of people raised their hands. I then asked, “For those of you who are using a fixed value for ESTIMATE_PERCENT, who can explain how they chose their number, other than just picking it out of the air.” Not one person raised their hand. Scary! The moral of the story: You should have a documented reason (and test case) to deviate from the defaults.

Dealing With Suboptimal Execution Plans

Probably the most common root cause (I’d say >90%) for suboptimal execution plans is poor cardinality estimates by the Optimizer. Poor cardinality estimates are generally the result of non-representative statistics. This means that the root cause of most Optimizer related issues are actually stats related, reaffirming how important it is to have representative stats. For more details on troubleshooting poor cardinality estimates, I would suggest reading my post on Troubleshooting Bad Execution Plans.

In 10g, Automatic SQL Tuning was introduced via Enterprise Manager (which uses the package DBMS_SQLTUNE). I would highly recommend that you evaluate this tool (if you have it licensed). I’ve found that it can often come up with quite good suggestions and fixes.

Another option that is available to help get more accurate cardinality estimates is dynamic sampling. This is probably an underused option which can help with getting more accurate cardinality estimates when there is data correlation, etc. Dynamic sampling is most appropriate for DSS and data warehouse databases, where queries run for minutes, not seconds. See my post There Is No Time Like ‘%NOW%’ To Use Dynamic Sampling for a real-world example.

Adjust Strategy To Cope With The Exceptions

There are three scenarios that seem to be rather common:

  1. Non-representative NDV with skewed data
  2. Out-of-range predicates and partitioned tables
  3. Bind peeking when histograms exist

Non-Representative NDV With Skewed Data

There are cases where the defaults for DBMS_STATS (ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE) may not yield representative NDV in 9i and 10g. One specific case that I’ve seen repeatedly is when there is a large number of values and significant data skew. In this case a fixed sample size that yields representative NDV should be chosen. For a more in-depth review of this see my post on the new 11g DBMS_STATS.AUTO_SAMPLE_SIZE, it goes through an example of the 10g AUTO_SAMPLE_SIZE NDV/skew issue.

While having accurate NDV statistics is desirable, do not be come obsessed with having perfect NDV statistics. The goal is to have the Optimizer choose the desired plan, not have perfect NDV. Having more accurate NDV may not change the plan. This is a case when less than perfect may be good enough. Don’t lose focus of the goal.

Out-of-range Predicates And Partitioned Tables

Another case that frequently comes up is usually related to out-of-range predicates with partitioned tables that do not have representative stats. Don Seiler’s write-up of his real-world case is a poster child for this exception. If you are bulk loading data into a partitioned table, it is necessary that if statistics exist, they are representative. This problem generally surfaces when statistics have been collected on an empty partition (so all stats are zeros) and now the partition has been bulk loaded. There are a few options here:

  • Stats are gathered immediately after the loading directly into the target table.
  • Data is loaded into a staging table, stats are gathered, and the staging table is partition exchanged into the target table.
  • Stats are cloned or copied (see DBMS_STATS.COPY_TABLE_STATS) from a similar partition.
  • There are no statistics and dynamic sampling will kick in (assuming it is set to the default of 2 or higher).

From what I have seen, this case generally shows up if the query plan has partition elimination to a single partition. This is because when only one partitioned is accessed only partition stats are used, but when more than one partition is accessed, both the global/table stats and partition stats are used.

Bind Peeking When Histograms Exist

The combination of these two seem to be a frequent cause of issue in 10g with OLTP systems. In my opinion, these two features should not be used together (or used with complete understanding and extreme caution). My reasoning for this is when histograms exist, execution plans can vary based on the filter predicates. Well designed OLTP systems use bind variables for execution plan reuse. On one hand plans can change, and on the other hand plans will be reused. This seems like a complete conflict of interest, hence my position of one or the other. Oh, and lets not overlook the fact that if you have a RAC database it’s possible to have a different plan on each instance depending on what the first value peeked is. Talk about a troubleshooting nightmare.

Unlike many, I think that disabling bind peeking is not the right answer. This does not address the root cause, it attempts to curb the symptom. If you are running an OLTP system, and you are using the nightly GATHER_STATS_JOB be mindful that it uses it’s own set of parameters: it overrides most of the parameters. The doc says:

When GATHER AUTO is specified, the only additional valid parameters are stattab, statid, objlist and statown; all other parameter settings are ignored.

It may be best to change the default value of the METHOD_OPT via DBMS_STATS.SET_PARAM to 'FOR ALL COLUMNS SIZE REPEAT' and gather stats with your own job. Why REPEAT and not SIZE 1? You may find that a histogram is needed somewhere and using SIZE 1 will remove it the next time stats are gathered. Of course, the other option is to specify the value for METHOD_OPT in your gather stats script.

Common Pitfalls And Problems

  • Disabling The 10g GATHER_STATS_JOB: As many of you know in 10g the GATHER_STATS_JOB was introduced. Since many customers have custom stats gathering scripts in place, many have chosen to disable this job. Disabling the GATHER_STATS_JOB entirely is not recommended because it also gathers dictionary stats (SYS/SYSTEM schemas). If you wish to collect your statistics manually, then you should change the value of AUTOSTATS_TARGET to ORACLE instead of AUTO (DBMS_STATS.SET_PARAM('AUTOSTATS_TARGET','ORACLE')). This will keep the dictionary stats up to date and allow you to manually gather stats on your schemas as you have done so in 9i.
  • Representative Statistics: When troubleshooting bad execution plans it is important to evaluate if the statistics are representative. Many times customers respond with “I just gathered statsitics” or “The statistics are recent“. Recently gathered statistics does not equate to representative statistics. Albert Einstein once said “The definition of insanity is doing the same thing over and over again and expecting different results”. It applies here as well.
  • Too Much Time Spent On Stats Gathering: Often when customers say their stats gathering is taking too long I ask to see their DBMS_STATS script. Generally there are three reasons that stats gathering is taking too long:
    1. Stats are gathered with too fine of setting for GRANULARITY: It is usually unnecessary to gather subpartition stats for composite partitioned tables. Don’t spend time collecting stats that are not needed. Don’t override the default for GRANULARITY unless you have a reason: the default probably is sufficient.
    2. Stats are gathered with unnecessarily large ESTIMATE_PERCENT: Use DBMS_STATS.AUTO_SAMPLE_SIZE to start with and adjust if necessary. No need to sample 10% when 1% or less yields representative statistics. Give DBMS_STATS.AUTO_SAMPLE_SIZE the chance to choose the best sample size.
    3. Stats are gathered more frequently than required or on data that hasn’t changed: The GATHER_STATS_JOB uses OPTIONS => 'GATHER AUTO' so it only gathers statistics on objects with more than a 10% change with a predefined set of options. If you are gathering statistics on tables/partitions that haven’t changed, or haven’t changed significantly, you may be spending time gathering unnecessary statistics. For example, there is no need to gather partition stats on last months (or older) if the data in the partition is no longer volatile.
  • Not Understanding The Data: The answers are almost always in the data…skew, correlation, etc. Many operational DBAs don’t have an in-depth understanding of the data they are managing. If this is the case, grab an engineer or analyst that is familiar with the data and work together. Two smart people working on a problem is almost always better than one!
  • Do Not Mess With Optimizer Parameters: If an execution plan is not choosing an index, understand why. Use the tools available. The GATHER_PLAN_STATISTICS hint is a prefect place to start. Fiddling with Optimizer parameters is not the solution.

Post to Twitter Tweet This Post

10gR2, Execution Plans, Optimizer, Oracle, Statistics, Troubleshooting

Awareness Test

March 21st, 2008

I came across this video the other day and I immediately thought of rule #3 from my Got Root Cause? post:

Be detail oriented, but do not become too obsessed with any one detail.

Watch the video and see if you pass the Awareness Test. Most importantly, remember the punch line of the video the next time you are troubleshooting.



Post to Twitter Tweet This Post

Oracle, Troubleshooting

ANSI Outer Joins And Lateral Views

February 18th, 2008

A few months ago the Oracle Optimizer Team did a blog post entitled Outerjoins in Oracle. In the Lateral View section of that post they go through some examples and discuss how a query is transformed with the ANSI outer join syntax. I thought it would be useful to go through an example that recently came through the Real-World Performance Group. For simplicity purposes and so that you can play along at home, the test case has been recreated to use EMP and DEPT which have been created and populated via the $ORACLE_HOME/rdbms/admin/utlsampl.sql script.

The Three Test Cases

Consider the following three SQL statements:

Query A: Oracle Outer Join Syntax

SELECT d.dname, d.deptno, e.ename
FROM   dept d, emp e
WHERE  d.deptno = e.deptno(+) and
       d.deptno in (10,40)

Query B: ANSI Outer Join Syntax Version 1

SELECT d.dname, d.deptno, e.ename
FROM   dept d LEFT OUTER JOIN emp e
ON     d.deptno = e.deptno
WHERE  d.deptno in (10,40)

Query C: ANSI Outer Join Syntax Version 2

SELECT d.dname, d.deptno, e.ename
FROM   dept d LEFT OUTER JOIN emp e
ON     d.deptno = e.deptno and
       d.deptno in (10,40)

Do note the slight difference between the two ANSI versions: Query B has the filter predicate in the WHERE clause, where Query C has the filter predicate in the ON clause.

Query Results

Query A

DNAME              DEPTNO ENAME
-------------- ---------- ----------
ACCOUNTING             10 CLARK
ACCOUNTING             10 KING
ACCOUNTING             10 MILLER
OPERATIONS             40

4 rows selected.

Query B

DNAME              DEPTNO ENAME
-------------- ---------- ----------
ACCOUNTING             10 CLARK
ACCOUNTING             10 KING
ACCOUNTING             10 MILLER
OPERATIONS             40

4 rows selected.

Query C

DNAME              DEPTNO ENAME
-------------- ---------- ----------
ACCOUNTING             10 CLARK
ACCOUNTING             10 KING
ACCOUNTING             10 MILLER
RESEARCH               20
SALES                  30
OPERATIONS             40

6 rows selected.

Whoa! Query C returned 6 rows, while Query A and Query B returned 4 rows. Must be a wrong results bug…or is it?

Execution Plans

To start troubleshooting the difference in results sets, lets examine the execution plan of each query.

Query A

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3713469723

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     9 |   198 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |      |     9 |   198 |     5  (20)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DEPT |     2 |    26 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    14 |   126 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
   2 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)

Query B

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3713469723

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     9 |   198 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |      |     9 |   198 |     5  (20)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DEPT |     2 |    26 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    14 |   126 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
   2 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)

Query C

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 498633241

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     4 |    80 |    10   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER  |      |     4 |    80 |    10   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | DEPT |     4 |    52 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |      |     1 |     7 |     2   (0)| 00:00:01 |
|*  4 |    FILTER            |      |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| EMP  |     1 |     9 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)
   5 - filter("D"."DEPTNO"="E"."DEPTNO" AND ("E"."DEPTNO"=10 OR
              "E"."DEPTNO"=40))

For some reason (which we need to investigate!), Query C has a different execution plan. Perhaps this is why the result sets are different.

The 10053 Trace

Perhaps the 10053 trace from Query C can help us understand why the execution plan is different. Looking at the trace file we find that Query C has been transformed (the below has been modified for formatting purposes):

SELECT d.dname, d.deptno, lv.ename
FROM dept d,
LATERAL(
 (SELECT e.deptno, e.ename
  FROM   emp e
  WHERE  d.deptno=e.deptno and
        (d.deptno=10 or d.deptno=40))
)(+) lv

You will notice that the transformation of Query C contains a Lateral View. The Oracle Optimizer Team’s post on Outerjoins in Oracle gives us the definition of a Lateral View:

A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.

What does this mean? It means that Query C is not the same query as Query B and Query A. Just the slight change of the ANSI syntax causes the meaning of the business question being answered to change! Query C applies the deptno in (10,40) filter to EMP first (returning 3 rows in which deptno=10, there are no deptno=40 rows) and then outer joins that result set to DEPT but does not apply the deptno in (10,40) filter to DEPT, essentially resulting in this query, which is most likely not what the user had intended:

SELECT d.dname, d.deptno, sq.ename
FROM   dept d,
   (SELECT e.deptno, e.ename
    FROM   emp e
    WHERE  e.deptno in (10,40)) sq
WHERE d.deptno=sq.deptno (+)

Summary

Filters specified in the ON clause of outer joins are transformed internally into lateral views and will be applied before the join. While ANSI join syntax is preferred by some, be certain that your query matches the business question being asked!

Post to Twitter Tweet This Post

Execution Plans, Optimizer, Oracle, SQL Tuning, Troubleshooting

Oracle 11g: Real-Time SQL Monitoring Using DBMS_SQLTUNE.REPORT_SQL_MONITOR

January 6th, 2008

Many times a DBA wants to know where a SQL statement is in its execution plan and where the time is being spent. There are a few ways to find out this information, but an 11g new feature makes gathering this information extremely easy. Oracle 11g Real-Time SQL Monitoring allows you to monitor the performance of SQL statements while they are executing as well as see the breakdown of time and resources used for recently completed statements. It is on by default when STATISTICS_LEVEL is set to to ALL or TYPICAL (the default value) and monitors statements that consume more than 5 seconds of CPU or IO time, as well as any parallel execution (PQ, PDML, PDDL). One can override the default actions by using the MONITOR or NO_MONITOR hint. The 11g Documentation has a text version of a SQL Monitor Report but the report output can be html, text or xml.

Real-Time SQL Monitoring Report Walkthrough

To demonstrate the Real-Time SQL Monitoring feature, I started a parallel query and every 60 seconds or so I captured a Real-Time SQL Monitoring report in html format using DBMS_SQLTUNE.REPORT_SQL_MONITOR. Reports 1 through 4 are captures while the query is executing, Report 5 is a post execution capture. Each of the below links will open in a new window.

As you browse through the SQL Monitoring Reports you will see which operation(s) of the execution plan is/are active, how long they have been active, as well as wait events and database time. You can mouse over each of the colored bars to get more detail. As the SQL Monitoring Report notes, this query was executed using a DOP of 8, but there were 16 slaves, 2 sets of 8 which act in a producer/consumer pair. In the SQL Plan Monitoring Details section you can see which operations were performed by each slave set, as well as the QC, by the colors of the Id column. This report also makes it very easy to see that there was some skew in the work for the first slave set. Slaves p003 and p004 performed much more IO than the other slaves and it stands out by the length of the colored bars. The “Active Period” column allows one to see which operations were active, for how long, and at what point of the overall execution. I feel this report gives a great visualization of the execution and a visual breakdown of DB Time and Wait Activity. I’m quite certain I will be using this feature frequently when troubleshooting execution plans and as well as db time drill down.

I’m really excited about Real-Time SQL Monitoring’s ability to capture the Estimated and Actual number of rows for each row source. This eliminates the need to run a query with a GATHER_PLAN_STATISTICS hint as I discussed in my post: Troubleshooting Bad Execution Plans.

More Information

There are a few slides on Real-Time SQL Monitoring in the DBAs’ New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g (starting on page 27) presentation from OOW 2007. As the presentation mentions, Real-Time SQL Monitoring will also be a part of 11g Enterprise Manager Grid Control.

Addendum (2008/02/12)
If you want to get a SQL Monitor report for a statement you just ran in your session (similar to dbms_xplan.display_cursor) then use this command:

select
DBMS_SQLTUNE.REPORT_SQL_MONITOR(
   session_id=>sys_context('userenv','sid'),
   report_level=>'ALL') as report
from dual;

Post to Twitter Tweet This Post

11gR1, Execution Plans, Oracle, Performance, SQL Tuning, Troubleshooting

Oracle Myth Busting: Show, Don't Tell

December 16th, 2007

Richard Foote has recently started blogging (as of December 11th) and one of his recent posts discusses Oracle Myths and Information Pollution. I find this topic very interesting as I’m always amazed at the number of people who make changes to their production database based on the results from their favorite Internet search engine, and don’t even bother to test it themselves first! Like Richard, I’d encourage anyone who gets information from the Internet to do your diligence and understand the why, and not just the results someone else observed. I guess one could associate it to third grade arithmetic, where having the answer (or making a claim) is not sufficient, one must show their work to get full credit. This way even if the answer (or claim) is incorrect, one can understand the line of thought and where the “miscalculation” took place.

I’m also adding Richard Foote to my blogroll.

Bloggers’ Note: In this post I was going to use a reference to masses of lemmings running off a cliff, but then I found out that is a myth also. Amazing what a little research turns up…

Post to Twitter Tweet This Post

Oracle, Troubleshooting

Oracle Optimizer Development Team Starts A Blog

December 5th, 2007

Since the introduction of the Cost Based Optimizer (CBO) in Oracle 7.0, people have been both fascinated and terrified by it and the statistics that feed it. There has long been a belief that a degree in witchcraft or black magic is required to successfully work with the CBO. Some people feel this shroud of mystery is caused in part by the lack of documentation or detailed examples about the Optimizer. In order to help remove this shroud and the black magic that surround the Optimizer, the development team responsible for it have started a blog. The blog postings will give in-depth descriptions of the technology behind the CBO, as well as practical advice like best practices for using the Optimizer. Their first blog post, Why are there more cursors in 11g for my query containing bind variables? went up on December 3rd and it discusses the controversial topic of Bind Peeking and how it has been enhanced in 11g. I recommend you check out their blog at http://optimizermagic.blogspot.com.

Post to Twitter Tweet This Post

Execution Plans, Optimizer, Oracle, Performance, Statistics, Troubleshooting

Troubleshooting Bad Execution Plans

November 21st, 2007

One of the most common performance issues DBAs encounter are bad execution plans. Many try to resolve bad executions plans by setting optimizer related parameters or even hidden underscore parameters. Some even try to decipher a long and complex 10053 trace in hopes to find an answer. While changing parameters or analyzing a 10053 trace might be useful for debugging at some point, I feel there is a much more simple way to start to troubleshoot bad execution plans.

Verify The Query Matches The Business Question

This seems like an obvious thing to do, but I’ve seen numerous cases where the SQL query does not match the business question being asked. Do a quick sanity check verifying things like: join columns, group by, subqueries, etc. The last thing you want to do is consume time trying to debug a bad plan for an improperly written SQL query. Frequently I’ve found that this is the case for many of those “I’ve never got it to run to completion” queries.

What Influences The Execution Plan

I think it’s important to understand what variables influence the Optimizer in order to focus the debugging effort. There are quite a number of variables, but frequently the cause of the problem ones are: (1) non-default optimizer parameters and (2) non-representative object/system statistics. Based on my observations I would say that the most abused Optimizer parameters are:

  • OPTIMIZER_INDEX_CACHING
  • OPTIMIZER_INDEX_COST_ADJ
  • DB_FILE_MULTIBLOCK_READ_COUNT

Many see setting these as a solution to get the Optimizer to choose an index plan over a table scan plan, but this is problematic in several ways:

  1. This is a global change to a local problem
  2. Although it appears to solve one problem, it is unknown how many bad execution plans resulted from this change
  3. The root cause of why the index plan was not chosen is unknown, just that tweaking parameters gave the desired result
  4. Using non-default parameters makes it almost impossible to correctly and effectively troubleshoot the root cause

Object and system statistics can have a large influence on execution plans, but few actually take the time to sanity check them during triage. These statistics exist in views like:

  • ALL_TAB_COL_STATISTICS
  • ALL_PART_COL_STATISTICS
  • ALL_INDEXES
  • SYS.AUX_STATS$

Using GATHER_PLAN_STATISTICS With DBMS_XPLAN.DISPLAY_CURSOR

As a first step of triage, I would suggest executing the query with a GATHER_PLAN_STATISTICS hint followed by a call to DBMS_XPLAN.DISPLAY_CURSOR. The GATHER_PLAN_STATISTICS hint allows for the collection of extra metrics during the execution of the query. Specifically, it shows us the Optimizer’s estimated number of rows (E-Rows) and the actual number of rows (A-Rows) for each row source. If the estimates are vastly different from the actual, one probably needs to investigate why. For example: In the below plan, look at line 8. The Optimizer estimates 5,899 rows and the row source actually returns 5,479,000 rows. If the estimate is off by three orders of magnitude (1000), chances are the plan will be sub-optimal. Do note that with Nested Loop Joins you need to multiply the Starts column by the E-Rows column to get the A-Rows values (see line 10).

select /*+ gather_plan_statistics */ ... from ... ;
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

------------------------------------------------------------------------------------------
|  Id | Operation                              | Name         | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------------
|   1 | SORT GROUP BY                          |              |     1  |      1 | 1      |
|*  2 |  FILTER                                |              |     1  |        | 1728K  |
|   3 |   NESTED LOOPS                         |              |     1  |      1 | 1728K  |
|*  4 |    HASH JOIN                           |              |     1  |      1 | 1728K  |
|   5 |     PARTITION LIST SINGLE              |              |     1  |   6844 | 3029   |
|*  6 |      INDEX RANGE SCAN                  | PROV_IX13    |     1  |   6844 | 3029   |
|   7 |     PARTITION LIST SINGLE              |              |     1  |   5899 | 5479K  |
|*  8 |      TABLE ACCESS BY LOCAL INDEX ROWID | SERVICE      |     1  |   5899 | 5479K  |
|*  9 |       INDEX SKIP SCAN                  | SERVICE_IX8  |     1  |   4934 | 5479K  |
|  10 |    PARTITION LIST SINGLE               |              |  1728K |      1 | 1728K  |
|* 11 |     INDEX RANGE SCAN                   | CLAIM_IX7    |  1728K |      1 | 1728K  |
------------------------------------------------------------------------------------------

Using The CARDINALITY Hint

Now that I’ve demonstrated how to compare the cardinality estimates to the actual number of rows, what are the debugging options? If one asserts that the Optimizer will choose the optimal plan if it can accurately estimate the number of rows, one can test using the not so well (un)documented CARDINALITY hint. The CARDINALITY hint tells the Optimizer how many rows are coming out of a row source. The hint is generally used like such:

select /*+ cardinality(a 100) */ * from dual a;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 |   200 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DUAL |   100 |   200 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

In this case I told the Optimizer that DUAL would return 100 rows (when in reality it returns 1 row) as seen in the Rows column from the autotrace output. The CARDINALITY hint is one tool one can use to give the Optimizer accurate information. I usually find this the best way to triage a bad plan as it is not a global change, it only effects a single execution of a statement in my session. If luck has it that using a CARDINALITY hint yields an optimal plan, one can move on to debugging where the cardinality is being miscalculated. Generally the bad cardinality is the result of non-representative table/column stats, but it also may be due to data correlation or other factors. This is where it pays off to know and understand the size and shape of the data. If the Optimizer still chooses a bad plan even with the correct cardinality estimates, it’s time to place a call to Oracle Support as more in-depth debugging is likely required.

Where Cardinality Can Go Wrong

There are several common scenarios that can lead to inaccurate cardinality estimates. Some of those on the list are:

  1. Data skew: Is the NDV inaccurate due to data skew and a poor dbms_stats sample?
  2. Data correlation: Are two or more predicates related to each other?
  3. Out-of-range values: Is the predicate within the range of known values?
  4. Use of functions in predicates: Is the 5% cardinality guess for functions accurate?
  5. Stats gathering strategies: Is your stats gathering strategy yielding representative stats?

Some possible solutions to these issues are:

  1. Data skew: Choose a sample size that yields accurate NDV. Use DBMS_STATS.AUTO_SAMPLE_SIZE in 11g.
  2. Data correlation: Use Extended Stats in 11g. If <= 10.2.0.3 use a CARDINALITY hint if possible.
  3. Out-of-range values: Gather or manually set the statistics.
  4. Use of functions in predicates: Use a CARDINALITY hint where possible.
  5. Stats gathering strategies: Use AUTO_SAMPLE_SIZE. Adjust only where necessary. Be mindful of tables with skewed data.

How To Best Work With Oracle Support

If you are unable to get to the root cause on your own, it is likely that you will be in contact with Oracle Support. To best assist the support analyst I would recommend you gather the following in addition to the query text:

  1. Output from the GATHER_PLAN_STATISTICS and DBMS_XPLAN.DISPLAY_CURSOR
  2. SQLTXPLAN output. See Metalink Note 215187.1
  3. 10053 trace output. See Metalink Note 225598.1
  4. DDL for all objects used (and dependencies) in the query. This is best gotten as a expdp (data pump) using CONTENT=METADATA_ONLY. This will also include the object statistics.
  5. Output from: select pname, pval1 from sys.aux_stats$ where sname='SYSSTATS_MAIN';
  6. A copy of your init.ora

Having this data ready before you even make the call (or create the SR on-line) should give you a jump on getting a quick(er) resolution.

Summary

While this blog post is not meant to be a comprehensive troubleshooting guide for bad execution plans, I do hope that it does help point you in the right direction the next time you encounter one. Many of the Optimizer issues I’ve seen are due to incorrect cardinality estimates, quite often due to inaccurate NDV or the result of data correlation. I believe that if you use a systematic approach you will find that debugging bad execution plans may be as easy as just getting the cardinality estimate correct.

Post to Twitter Tweet This Post

Execution Plans, Optimizer, Oracle, Performance, SQL Tuning, Statistics, Troubleshooting