Category: Troubleshooting

Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…

Background

Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync                     3,155,253       9,197      3    6.4 Commit

Drilling into this a bit deeper the two log file events reported the following in the Foreground Wait Events and Background Wait Events sections of the AWR report:

Foreground Wait Events               
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was  ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0

                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         659,157     0        526       1      0.2   18.2

It is generally always worth looking at the breakdown of these times, as they are averages. We really want to understand the entire histogram of these wait events. For that we can look at the Wait Event Histogram section of the AWR report as below.

Wait Event Histogram
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was  % of Waits: column heading of <=1s is truly 1s is truly >=1024ms
-> Ordered by Event (idle events last)

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits   <1ms  <2ms  <4ms  <8ms <16ms <32ms  1s
-------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write    661.6K  84.7   9.7   4.7    .4    .1    .3    .0
log file sync              3138.K  14.0  42.8  30.4   7.9   2.3   2.6    .1

Wait Event Histogram Detail (64 msec to 2 sec)
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> Units for % of Total Waits:
   ms is milliseconds
   s is 1024 milliseconds (approximately 1 second)
-> % of Total Waits: total waits for all wait classes, including Idle
-> % of Total Waits: value of .0 indicates value was  Ordered by Event (only non-idle events are displayed)

                                                 % of Total Waits
                                 -----------------------------------------------
                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   =2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write       52 100.0    .0    .0
log file sync               3535  99.9    .1    .0

One thing that you should notice here is there are two sections of Wait Event Histogram; the buckets less than 32ms and buckets greater than 32ms. It is also important to note that only the absence of a value means no timings fell into that bucket — so even though the report shows .0 there are still events in that bucket (read the section description as it mentions this).

We can see from the second histogram section that there were 52 times that log file parallel write was over 64ms as well as 3535 times log file sync was over 64ms. At this point a hypothesis is formed that the two events are correlated — that is, the belief is the long log file parallel write events may be causing the long log file sync. To find data that supports the hypothesis (or not) we can look at the Active Session History (ASH) data to get a more granular view of the wait events.

Diving Into The ASHes With R

If the hypothesis is correct, the ASH data should show times where we observe long log file parallel write and long log file sync waits. One could write SQL against V$ACTIVE_SESSION_HISTORY to collect all the samples that may demonstrate the hypothesis — e.g. collect a list of sample ids for each event and examine them for overlap, but a visualization is worth a thousand words.

One way to get quick and easy visualizations from ASH data is to use R. R has become a very popular tool for those doing statistical analysis and it has some quite useful graphing and plotting packages built in. R can connect to Oracle via a JDBC package which makes importing data trivial.

Here is a plot that I put together using R for ~300 sample ids (~5 minutes) from ASH (recall that TIME_WAITED is in microseconds):

Rplot01

As you can see from the plots, nearly every time there is a long log file parallel write there are also numerous long log file sync events. This data supports the hypothesis.

Averages Suck

Most any statistician (or anyone keen on numbers and stats) will tell you averages suck. This does not mean averages are not useful, but one has to keep in mind averages can hide infrequent outliers (which may make them a tad bit evil). For instance, AWR is reporting an average of 1ms for log file parallel write but we can see that there are several that are falling in the 20ms range in the 5 minute capture. If we zoom in on the graph (set the y axis at a max of 3ms), we can understand why:
Rplot02

Most of the log file parallel write events are coming in around 0.5ms so even with some 20ms outliers it still yields an average of 1ms, thus hiding the magnitude of impact for the outliers. This is why drilling down into the ASH data was important for us to understand the scope of the issue.

With More Data Comes More Questions

At this point, the visualization of ASH data shows a strong correlation between log file sync and log file parallel write outliers, but the root cause has not yet been identified. Perhaps there is more insight lurking in the ASH data?

Given this is an OLTP workload and log file parallel write is an IO operation, perhaps it’s worth looking at another IO operation like, say, db file sequential read – single block IO times.

In the below plot, I’ve taken a 60 second window of ASH data and plotted all the db file sequential read events.

Rplot11
Interestingly, there are several samples where the IOs are taking significantly longer than most — the majority are below 10ms but we see a few samples where there are groups in the 20ms or more range. Let’s add the log file sync events to the plot.
Rplot12
Hmmm… even more interesting. The data is showing a correlation between log file sync and db file sequential read. Any bets on what the plot looks like if log file parallel write is added?

Rplot13

Very interesting. The data is showing us that all 3 events are correlated strongly. Clearly we are on to something here…and using R to plot the ASH data was an easy way to present and understand it visually.

Summary

While this blog post did not go into specific root cause, it was a healthy journey in debugging and data driven analysis. Keep in mind that just because AWR averages look good, don’t overlook the fact that the event histograms should also be reviewed to see if there are outliers. R allows an easy way to put scatter plots of event times together to get a nice visual of what is going on.

Source Code

Here is the code I used to do the analysis. It’s pretty straight forward and well commented (I think). Enjoy!

I’ll also mention that I use the RStudio IDE for R.

Creating Optimizer Trace Files

Many Oracle DBA’s are probably familiar with what Optimizer trace files are and likely know how to create them. When I say “Optimizer trace” more than likely you think of event 10053, right? SQL code like this probably is familiar then:

alter session set tracefile_identifier='MY_10053';
alter session set events '10053 trace name context forever';
select /* hard parse comment */ * from emp where ename = 'SCOTT';
alter session set events '10053 trace name context off';

In 11g, a new diagnostic events infrastructure was implemented and there are various levels of debug output that you can control for sql compilation. ORADEBUG shows us the hierarchy.

SQL> oradebug doc component SQL_Compiler

  SQL_Compiler                    SQL Compiler
    SQL_Parser                    SQL Parser (qcs)
    SQL_Semantic                  SQL Semantic Analysis (kkm)
    SQL_Optimizer                 SQL Optimizer
      SQL_Transform               SQL Transformation (kkq, vop, nso)
        SQL_MVRW                  SQL Materialized View Rewrite
        SQL_VMerge                SQL View Merging (kkqvm)
        SQL_Virtual               SQL Virtual Column (qksvc, kkfi)
      SQL_APA                     SQL Access Path Analysis (apa)
      SQL_Costing                 SQL Cost-based Analysis (kko, kke)
        SQL_Parallel_Optimization SQL Parallel Optimization (kkopq)
    SQL_Code_Generator            SQL Code Generator (qka, qkn, qke, kkfd, qkx)
      SQL_Parallel_Compilation    SQL Parallel Compilation (kkfd)
      SQL_Expression_Analysis     SQL Expression Analysis (qke)
      SQL_Plan_Management         SQL Plan Managment (kkopm)
    MPGE                          MPGE (qksctx)

My personal preference for Optimizer tracing is to stick with the most detailed level, in this case SQL_Compiler vs. just SQL_Optimizer.

Given that, we can do the following in 11g:

alter session set tracefile_identifier='MY_SQL_Compiler_TRACE';
alter session set events 'trace [SQL_Compiler.*]';
select /* hard parse comment */ * from emp where ename = 'SCOTT';
alter session set events 'trace [SQL_Compiler.*] off';

One of the big drawbacks of using the 10053 event or the SQL_Compiler event are that two things need to happen: 1) you have to have the SQL text and 2) a hard parse needs to take place (so there is actually sql compilation). What if you want to get an Optimizer trace file for a statement already executed in your database and is in the cursor cache? Chances are you know how to do #1 & #2 but it’s kind of a pain, right? Even more of a pain if the query is pages of SQL or you don’t have the application schema password, etc.

In 11gR2 (11.2) there was a procedure added to DBMS_SQLDIAG called DUMP_TRACE. The DUMP_TRACE procedure didn’t make the DBMS_SQLDIAG documentation but here is the declaration:

-- $ORACLE_HOME/rdbms/admin/dbmsdiag.sql
-------------------------------- dump_trace ---------------------------------
-- NAME: 
--     dump_trace - Dump Optimizer Trace
--
-- DESCRIPTION:
--     This procedure dumps the optimizer or compiler trace for a give SQL 
--     statement identified by a SQL ID and an optional child number. 
--
-- PARAMETERS:
--     p_sql_id          (IN)  -  identifier of the statement in the cursor 
--                                cache
--     p_child_number    (IN)  -  child number
--     p_component       (IN)  -  component name
--                                Valid values are Optimizer and Compiler
--                                The default is Optimizer
--     p_file_id         (IN)  -  file identifier
------------------------------------------------------------------------------
PROCEDURE dump_trace(
              p_sql_id         IN varchar2,
              p_child_number   IN number   DEFAULT 0,
              p_component      IN varchar2 DEFAULT 'Optimizer',
              p_file_id        IN varchar2 DEFAULT null);

As you can see, you can specify either Optimizer or Compiler as the component name which is the equivalent of the SQL_Compiler or SQL_Optimizer events. Conveniently you can use P_FILE_ID to add a trace file identifier to your trace file. The four commands used above can be simplified to just a single call. For example:

SQL> begin
  2    dbms_sqldiag.dump_trace(p_sql_id=>'6yf5xywktqsa7',
  3                            p_child_number=>0,
  4                            p_component=>'Compiler',
  5                            p_file_id=>'MY_TRACE_DUMP');
  6  end;
  7  /

PL/SQL procedure successfully completed.

If we look at the trace file we can see that DBMS_SQLDIAG.DUMP_TRACE added in a comment /* SQL Analyze(1443,0) */ and did the hard parse for us (Thanks!).

Enabling tracing for cur#=9 sqlid=as9bkjstppk0a recursive
Parsing cur#=9 sqlid=as9bkjstppk0a len=91 
sql=/* SQL Analyze(1443,0) */ select /* hard parse comment */ * from emp where ename = 'SCOTT'
End parsing of cur#=9 sqlid=as9bkjstppk0a
Semantic Analysis cur#=9 sqlid=as9bkjstppk0a
OPTIMIZER INFORMATION

******************************************
----- Current SQL Statement for this session (sql_id=as9bkjstppk0a) -----
/* SQL Analyze(1443,0) */ select /* hard parse comment */ * from emp where ename = 'SCOTT'
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x16fd3a368       145  package body SYS.DBMS_SQLTUNE_INTERNAL
0x16fd3a368     12085  package body SYS.DBMS_SQLTUNE_INTERNAL
0x18e7fead8      1229  package body SYS.DBMS_SQLDIAG
0x16fdbddd0         1  anonymous block
*******************************************

Hopefully you don’t find yourself having to get too many Optimizer Trace Dumps, but if you do and you’re on 11.2, the hard work has been done for you.

Footnote:
Due to a bug in DBMS_ASSERT, you will need to specify a value for P_COMPONENT. If you leave it NULL, it will error like such:

SQL> begin
  2    dbms_sqldiag.dump_trace(p_sql_id=>'6yf5xywktqsa7',
  3                            p_child_number=>0,
  4                            p_component=>NULL,
  5                            p_file_id=>'MY_TRACE_DUMP');
  6  end;
  7  /
begin
*
ERROR at line 1:
ORA-44003: invalid SQL name
ORA-06512: at 'SYS.DBMS_ASSERT', line 160
ORA-06512: at 'SYS.DBMS_SQLDIAG', line 1182
ORA-06512: at line 2

Reading Parallel Execution Plans With Bloom Pruning And Composite Partitioning

You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives.

One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning.

Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a dimensional (star) model and has a data generator.) STORE_SALES contains a 5 year window of data ranging from 1998-01-02 to 2003-01-02.

Range Partitioned STORE_SALES

For this example I used range partitioning on STORE_SALES.SS_SOLD_DATE_SK using 60 one month partitions (plus 1 partition for NULL SS_SOLD_DATE_SK values) that align with the date dimension (DATE_DIM) on calendar month boundaries. STORE_SALES has the parallel attribute (PARALLEL 16 in this case) set on the table to enable Oracle’s Parallel Execution (PX). Let’s look at the execution time and plan for our test query:

SQL> select
  2    max(ss_sales_price)
  3  from
  4    store_sales ss,
  5    date_dim d
  6  where
  7    ss_sold_date_sk = d_date_sk and
  8    d_year = 2000
  9  ;

MAX(SS_SALES_PRICE)
-------------------
                200

Elapsed: 00:00:41.67

SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +parallel +partition +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select   max(ss_sales_price) from   store_sales ss,   date_dim d where
 ss_sold_date_sk=d_date_sk and   d_year = 2000

Plan hash value: 934332680

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |             |       |       |        |      |            |
|   2 |   PX COORDINATOR              |             |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001    |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |             |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |             |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |             |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000     |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |             |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000    |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM    |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |             |:BF0000|:BF0000|  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES |:BF0000|:BF0000|  Q1,01 | PCWP |            |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("SS_SOLD_DATE_SK"="D_DATE_SK")
  10 - filter("D_YEAR"=2000)
  12 - access(:Z>=:Z AND :Z<=:Z)

In this execution plan you can see the creation of the bloom filter on line 7 which is populated from the values of D_DATE_SK from DATE_DIM. That bloom filter is then used to partition prune on the STORE_SALES table. This is why we see :BF0000 in the Pstart/Pstop columns.

Range-Hash Composite Partitioned STORE_SALES

For the next experiment, I kept the same range partitioning scheme but also added hash subpartitioning using the SS_ITEM_SK column (using 4 hash subpartitions per range partition). STORE_SALES2 has 61 range partitions x 4 hash subpartitions for a total of 244 aggregate partitions. Let’s look at the execution plan for our test query:

SQL> select
  2    max(ss_sales_price)
  3  from
  4    store_sales2 ss,
  5    date_dim d
  6  where
  7    ss_sold_date_sk = d_date_sk and
  8    d_year = 2000
  9  ;

MAX(SS_SALES_PRICE)
-------------------
                200

Elapsed: 00:00:41.06

SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +parallel +partition +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select   max(ss_sales_price) from   store_sales2 ss,   date_dim d where
  ss_sold_date_sk=d_date_sk and   d_year = 2000

Plan hash value: 2496395846

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |              |       |       |        |      |            |
|   2 |   PX COORDINATOR              |              |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001     |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |              |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |              |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |              |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000      |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |              |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000     |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM     |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |              |     1 |     4 |  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES2 |     1 |   244 |  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("SS_SOLD_DATE_SK"="D_DATE_SK")
  10 - filter("D_YEAR"=2000)
  12 - access(:Z>=:Z AND :Z<=:Z)

Once again you can see the creation of the bloom filter from DATE_DIM on line 7, however you will notice that we no longer see :BF0000 as our Pstart and Pstop values. In fact, it may appear that partition pruning is not taking place at all as we see 1/244 as our Pstart/Pstop values. However, if we compare the execution times between the range and range/hash queries you note they are identical to the nearest second, thus there really is no way that partition (bloom) pruning is not taking place. After all, if this plan read all 5 years of data it would take 5 times as long as reading just 1 year and that certainly is not the case. Would you have guessed that partition pruning is taking place had we not worked though the range only experiment first? Hmmm…

So What Is Going On?

Before we dive in, let’s quickly look at what the execution plans would look like if PX was not used (using serial execution).

--
-- Range Partitioned, Serial Execution
--

---------------------------------------------------------------------
| Id  | Operation                     | Name        | Pstart| Pstop |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |       |       |
|   1 |  SORT AGGREGATE               |             |       |       |
|*  2 |   HASH JOIN                   |             |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000     |       |       |
|*  4 |     TABLE ACCESS FULL         | DATE_DIM    |       |       |
|   5 |    PARTITION RANGE JOIN-FILTER|             |:BF0000|:BF0000|
|   6 |     TABLE ACCESS FULL         | STORE_SALES |:BF0000|:BF0000|
---------------------------------------------------------------------
              
--
-- Range-Hash Composite Partitioned, Serial Execution
--
                                       
----------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |
|   1 |  SORT AGGREGATE               |              |       |       |
|*  2 |   HASH JOIN                   |              |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000      |       |       |
|*  4 |     TABLE ACCESS FULL         | DATE_DIM     |       |       |
|   5 |    PARTITION RANGE JOIN-FILTER|              |:BF0000|:BF0000|
|   6 |     PARTITION HASH ALL        |              |     1 |     4 |
|   7 |      TABLE ACCESS FULL        | STORE_SALES2 |     1 |   244 |
----------------------------------------------------------------------

When using composite partitioning, pruning is placed on one of the partition iterators. When the two nested partition iterators (range/hash in this case) are changed into a block iterator (line 14 – PX BLOCK ITERATOR), we have to pick a “victim” in the query plan tree since only one node in the plan needs now to carry the pruning information (with PX the pruning is really done by the QC, not the row source like in serial plans). As a result, the information associated the the victimized partition iterator is lost in the explain plan. This is why there is no :BF0000 for Pstart/Pstop in the plan in this case. It is probably more accurate to have the parallel plans for both range and range/hash look like this:

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |       |       |        |      |            |
|   1 |  SORT AGGREGATE               |              |       |       |        |      |            |
|   2 |   PX COORDINATOR              |              |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)        | :TQ10001     |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE            |              |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                |              |       |       |  Q1,01 | PCWP |            |
|   6 |       BUFFER SORT             |              |       |       |  Q1,01 | PCWC |            |
|   7 |        PART JOIN FILTER CREATE| :BF0000      |       |       |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE            |              |       |       |  Q1,01 | PCWP |            |
|   9 |          PX SEND BROADCAST    | :TQ10000     |       |       |        | S->P | BROADCAST  |
|* 10 |           TABLE ACCESS FULL   | DATE_DIM     |       |       |        |      |            |
|  11 |       PX BLOCK ITERATOR       |              |       |       |  Q1,01 | PCWC |            |
|* 12 |        TABLE ACCESS FULL      | STORE_SALES  |:BF0000|:BF0000|  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------

Where the bloom pruning is on the TABLE ACCESS FULL row source. This is because there is no Pstart/Pstop for a PX BLOCK ITERATOR row source (it’s block ranges, so partition information is lost – it had been contained in level above this).

Hopefully this helps you understand and correctly identify execution plans contain bloom pruning even though at first glance you may not think they do. If you are uncertain, use the execution stats for the query looking at metrics like amount of data read and execution times to provide some empirical insight.

Glenn Fawcetts’s Oracle Analysis 101

Glenn Fawcett has put together a nice slide deck entitled “Oracle Analysis 101” that discusses techniques to collect and analyze performance data related to Oracle databases. It’s a good read so check it out. Glenn is also a great resource for Oracle on Sun Niagara 2 Processors (CMT) so be sure to peek at his other blog posts as well.

DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS

I’ve written before on choosing an optimal stats gathering strategy but I recently came across a scenario that I didn’t directly blog about and think it deserves attention. As I mentioned in that previous post, one should only deviate from the defaults when they have a reason to, and fully understand that reason and the effect of that decision.

Understanding METHOD_OPT

The METHOD_OPT parameter of DBMS_STATS controls two things:

  1. on which columns statistics will be collected
  2. on which columns histograms will be collected (and how many buckets)

It is very important to understand #1 and how the choice of METHOD_OPT effects the collection of column statistics.

Prerequisite: Where Do I Find Column Statistics?

Understanding where to find column statistics is vital for troubleshooting bad execution plans. These views will be the arrows in your quiver:

  • USER_TAB_COL_STATISTICS
  • USER_PART_COL_STATISTICS
  • USER_SUBPART_COL_STATISTICS

Depending on if the table is partitioned or subpartitioned, and depending on what GRANULARITY the stats were gathered with, the latter two of those views may or may not be populated.

The Bane of METHOD_OPT: FOR ALL INDEXED COLUMNS

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. Allow me to explain. Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' (a common METHOD_OPT I see) tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain“. Is that really what you want? My guess is probably not. Let me work through a few examples to explain why.

I’m going to start with this table.

SQL&gt; exec dbms_random.initialize(1);

PL/SQL procedure successfully completed.

SQL&gt; create table t1
  2  as
  3  select
  4    column_value                    pk,
  5    round(dbms_random.value(1,2))   a,
  6    round(dbms_random.value(1,5))   b,
  7    round(dbms_random.value(1,10))  c,
  8    round(dbms_random.value(1,100)) d,
  9    round(dbms_random.value(1,100)) e
 10  from table(counter(1,1000000))
 11  /

Table created.

SQL&gt; begin
  2    dbms_stats.gather_table_stats(
  3      ownname =&gt; user ,
  4      tabname =&gt; 'T1' ,
  5      estimate_percent =&gt; 100 ,
  6      cascade =&gt; true);
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL&gt; select
  2    COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  3    to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  4  from user_tab_col_statistics
  5  where table_name='T1'
  6  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 18:39:51
A                      2 NONE                      1 2008-13-10 18:39:51
B                      5 NONE                      1 2008-13-10 18:39:51
C                     10 NONE                      1 2008-13-10 18:39:51
D                    100 NONE                      1 2008-13-10 18:39:51
E                    100 NONE                      1 2008-13-10 18:39:51

6 rows selected.

This 6 column table contains 1,000,000 rows of randomly generated numbers. I’ve queried USER_TAB_COL_STATISTICS to display some of the important attributes (NDV, Histogram, Number of Buckets, etc).

I’m going to now put an index on T1(PK), delete the stats and recollect stats using two different METHOD_OPT parameters that each use 'FOR ALL INDEXED COLUMNS'.

SQL&gt; create unique index PK_T1 on T1(PK);

Index created.

SQL&gt; begin
  2    dbms_stats.delete_table_stats(user,'T1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname =&gt; user ,
  6      tabname =&gt; 'T1' ,
  7      estimate_percent =&gt; 100 ,
  8      method_opt =&gt; 'for all indexed columns' ,
  9      cascade =&gt; true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL&gt; select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 HEIGHT BALANCED          75 2008-13-10 18:41:10

SQL&gt; begin
  2    dbms_stats.delete_table_stats(user,'T1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname =&gt; user ,
  6      tabname =&gt; 'T1' ,
  7      estimate_percent =&gt; 100 ,
  8      method_opt =&gt; 'for all indexed columns size auto' ,
  9      cascade =&gt; true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL&gt; select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 18:41:12

Notice that in both cases only column PK has stats on it. Columns A,B,C,D and E do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

Now one might think that is no big deal or perhaps they do not realize this is happening because they do not look at their stats. Let’s see what we get for cardinality estimates from the Optimizer for a few scenarios.

SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5  /

  COUNT(*)
----------
    500227

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  4df0g0r99zmba, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.24 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |    500K|00:00:00.50 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"=1)

Notice the E-Rows estimate for T1. The Optimizer is estimating 10,000 rows when in reality there is 500,227. The estimate is off by more than an order of magnitude (50x). Normally the calculation for the cardinality would be (for a one table single equality predicate):
number of rows in T1 * 1/NDV = 1,000,000 * 1/2 = 500,000
but in this case 10,000 is the estimate. Strangely enough (or not), 10,000 is exactly 0.01 (1%) of 1,000,000. Because there are no column stats for T1.A, the Optimizer is forced to make a guess, and that guess is 1%.

As you can see from the 10053 trace (below), since there are no statistics on the column, defaults are used. In this case they yield very poor cardinality estimations.

SINGLE TABLE ACCESS PATH
  -----------------------------------------
  BEGIN Single Table Cardinality Estimation
  -----------------------------------------
  Column (#2): A(NUMBER)  NO STATISTICS (using defaults)
    AvgLen: 13.00 NDV: 31250 Nulls: 0 Density: 3.2000e-05
  Table: T1  Alias: T1
    Card: Original: 1000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  -----------------------------------------
  END   Single Table Cardinality Estimation
  -----------------------------------------

Now that I’ve demonstrated how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.

SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6  /

  COUNT(*)
----------
    124724

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  ctq8q59qdymw6, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.19 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    100 |    124K|00:00:00.25 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("A"=1 AND "B"=3))

Yikes. In this case the cardinality estimate is 100 when the actual number of rows is 124,724, a difference of over 3 orders of magnitude (over 1000x). Where did the 100 row estimate come from? In this case there are two equality predicates so the selectivity is calculated as 1% * 1% or 0.01 * 0.01 = 0.0001. 1,000,000 * 0.0001 = 100. Funny that. (The 1% is the default selectivity for an equality predicate w/o stats.)

Now let’s add a derived predicate as well and check the estimates.

SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e &gt; 50
  7  /

  COUNT(*)
----------
    109816

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e &gt; 50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.22 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      5 |    109K|00:00:00.33 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("A"=1 AND "B"=3 AND "D"+"E"&gt;50))

Doh! The cardinality estimate is now 5, but the actual number of rows being returned is 109,816. Not good at all. The Optimizer estimated 5 rows because it used a default selectivity of 1% (for A=1) * 1% (for B=3) * 5% (for D+E > 50) * 1,000,000 rows. Now can you see why column statistics are very important? All it takes is a few predicates and the cardinality estimation becomes very small, very fast. Now consider this:

  • What is likely to happen in a data warehouse where the queries are 5+ table joins and the fact table columns do not have indexes?
  • Would the Optimizer choose the correct driving table?
  • Would nested loops plans probably be chosen when it is really not appropriate?

Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

  • Non representative (or missing) statistics lead to
  • Poor cardinality estimates which leads to
  • Poor access path selection which leads to
  • Poor join method selection which leads to
  • Poor join order selection which leads to
  • Poor SQL execution times

Take 2: Using the Defaults

Now I’m going to recollect stats with a default METHOD_OPT and run through the 3 execution plans again:

SQL&gt; begin
  2    dbms_stats.delete_table_stats(user,'t1');
  3
  4    dbms_stats.gather_table_stats(
  5      ownname =&gt; user ,
  6      tabname =&gt; 'T1' ,
  7      estimate_percent =&gt; 100 ,
  8      degree =&gt; 8,
  9      cascade =&gt; true);
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL&gt; select column_name, num_distinct, histogram, NUM_BUCKETS,
  2  to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
  3  from user_tab_col_statistics where table_name='T1'
  4  /

COLUMN_NAME NUM_DISTINCT HISTOGRAM       NUM_BUCKETS LAST_ANALYZED
----------- ------------ --------------- ----------- -------------------
PK               1000000 NONE                      1 2008-13-10 19:44:32
A                      2 FREQUENCY                 2 2008-13-10 19:44:32
B                      5 FREQUENCY                 5 2008-13-10 19:44:32
C                     10 FREQUENCY                10 2008-13-10 19:44:32
D                    100 NONE                      1 2008-13-10 19:44:32
E                    100 NONE                      1 2008-13-10 19:44:32

6 rows selected.
SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5  /

  COUNT(*)
----------
    500227

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  4df0g0r99zmba, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.20 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    500K|    500K|00:00:00.50 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("A"=1)
SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6  /

  COUNT(*)
----------
    124724

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  ctq8q59qdymw6, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.14 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    124K|    124K|00:00:00.25 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1))
SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e &gt; 50
  7  /

  COUNT(*)
----------
    109816

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e&gt;50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.17 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   6236 |    109K|00:00:00.22 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1 AND "D"+"E"&gt;50))

As you can see, the first two queries have spot on cardinality estimates, but the the third query isn’t as good as it uses a column combination and there are no stats on D+E columns, only D and E individually. I’m going to rerun the third query with dynamic sampling set to 4 (in 10g it defaults to 2) and reevaluate the cardinality estimate.

SQL&gt; alter session set optimizer_dynamic_sampling=4;

Session altered.

SQL&gt; select /*+ gather_plan_statistics */
  2    count(*)
  3  from t1
  4  where a=1
  5    and b=3
  6    and d+e &gt; 50
  7  /

  COUNT(*)
----------
    109816

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  5x200q9rqvvfu, child number 1
-------------------------------------
select /*+ gather_plan_statistics */   count(*) from t1 where a=1   and b=3
 and d+e &gt; 50

Plan hash value: 3724264953

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.17 |    3466 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    102K|    109K|00:00:00.22 |    3466 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("B"=3 AND "A"=1 AND "D"+"E"&gt;50))

Note
-----
   - dynamic sampling used for this statement

Bingo! Close enough to call statistically equivalent.

Summary

I hope this little exercise demonstrates how important it is to have representative statistics and that when statistics are representative the Optimizer can very often accurately estimate the cardinality and thus choose the best plan for the query. Remember these points:

  • Recent statistics do not necessarily equate to representative statistics.
  • Statistics are required on all columns to yield good plans – not just indexed columns.
  • You probably should not be using METHOD_OPT => 'FOR ALL INDEXED COLUMNS SIZE AUTO', especially in a data warehouse where indexes are used sparingly.
  • Dynamic Sampling can assist with cardinality estimates where existing stats are not enough.

Tests performed on 10.2.0.4

Understanding Performance

There has been some debate in forumsphere/blogosphere centered around Steve Karam’s observation of a 20x elapsed time difference in an update statement “by only changing the block size”. At this point in time, it is pretty much understood (I hope) that this performance delta is directly related to bug 6918210. This bug manifests its nasty head when the following conditions exist:

  • table in an ASSM tablespace
  • 16KB or 32KB block size
  • row migrations at the ASSM storage layer

You might ask what are “row migrations“? Row migrations are generally caused by too low of a PCTFREE setting on the table. When row has grown in length due to an UPDATE and no longer fits in the current block it is migrated to a new block in which there is enough space for it. This migration is an insert followed by an delete at the storage layer.

Jonathan Lewis gets the credit for putting the test case (alt. URL) together that reproduces performance bug. Since then I’ve run the test case numerous times, with some slight modifications, but I would like to note directly some results that I observed by modifying PCTFREE. These tests were executed on 10.2.0.4 using 200,000 rows vs. the 830,000 rows that the original test case contains. The reason for that is I could produce the scenarios I wanted and I didn’t have to wait for hours when the bug was present.

BLOCK SIZE
PCT_FREE
BLOCKS
(BEFORE/AFTER)
BYTES
(BEFORE/AFTER)
AVG_ROW_LEN
(BEFORE/AFTER)
UPDATE TIME
4096
10
742 / 2758
3,039,232 / 11,296,768
5 / 11
00:00:26.71
4096
35
994 / 994
4,071,424 / 4,071,424
5 / 11
00:00:08.09
8192
10
370 / 1378
3,031,040 / 11,288,576
5 / 11
00:00:27.36
8192
35
496 / 496
4,063,232 / 4,063,232
5 / 11
00:00:08.99
16384
10
184 / 814
3,014,656 / 13,336,576
5 / 11
00:10:59.95
16384
35
247 / 247
4,049,318 / 4,049,318
5 / 11
00:00:08.81

In Steve’s case, he started with a 16KB block and a default PCTFREE setting of 10 (he has not noted otherwise). Because the test case consists of a two column table where the second column starts out NULL and then is updated to the first column’s value, the default PCTFREE setting is not optimal. As you can see from the table, with a value of PCTFREE 10, quite a few row migrations take place. This is evident by the increase of the number of blocks in the table. Notice that when PCTFREE 35 is used, the number of blocks does not change. This is because the row can be expanded in length (in place) and thus no row migrations take place. It is also worth noting that when row migrations take place, the table grows in size to around 4x the original size. Most importantly, performance is best when PCTFREE is optimal, no matter what the block size.

Does Block Size Matter?

In this case block size matters when bug 6918210 is encountered, but what really matters is an appropriate setting of PCTFREE. As the metrics demonstrate, performance is equal between each of the three block sizes and a given PCTFREE setting, the exception being the bug. Performance is best with an optimal PCTFREE setting, no matter what the block size.

How Would One Have Discovered This Was A Bug?

First and foremost, when drastic performance deltas are observed, they should be throughly investigated. If it is too good to be true, it probably is. Because this bug is in the ASSM layer it is difficult to diagnose. This is because that layer does not contain the same level of instrumentation that other layers do. Even if you look at the 10046 trace or the ASH data, it will not provide sufficient information. So what does one do in this case? Well, if you can not get debug info from inside the database, then you have to look outside. Specifically, getting a stack trace using pstack. When the 16KB/PCTFREE 10 update was running I took some stack traces. What I noticed was this:

$ pstack 23233
#0  0x09ed3e26 in kdb_count_pdml_itl ()
#1  0x09ed42ad in kdb_diff_pdml_itl ()
#2  0x09ef02a7 in kdt_bseg_srch_cbk ()
#3  0x089d2ddd in ktspfpblk ()
#4  0x089d32b9 in ktspfsrch ()
#5  0x089ce530 in ktspscan_bmb ()
#6  0x089ccff0 in ktspgsp_cbk1 ()
#7  0x089cd612 in ktspgsp_cbk ()
#8  0x09ef2946 in kdtgsp ()
#9  0x09ef0f61 in kdtgrs ()
#10 0x09ee848b in kdtInsRow ()
#11 0x09ea724b in kdumrp ()
#12 0x09ea35f0 in kduurp ()
#13 0x09e96dd0 in kdusru ()
#14 0x095b4ad4 in updrowFastPath ()
#15 0x0a1eb0f4 in __PGOSF287_qerupFRowProcedure ()
#16 0x0a1eb7d0 in qerupFetch ()
#17 0x0a215c26 in qerstFetch ()
#18 0x095a42b4 in updaul ()
#19 0x095a6aa3 in updThreePhaseExe ()
#20 0x095a594f in updexe ()
#21 0x0a3c68f1 in opiexe ()
#22 0x0a4cd1db in kpoal8 ()
#23 0x08cd5138 in opiodr ()
#24 0x0acdb78d in ttcpip ()
#25 0x08cd1212 in opitsk ()
#26 0x08cd3d51 in opiino ()
#27 0x08cd5138 in opiodr ()
#28 0x08cca5f2 in opidrv ()
#29 0x0962f797 in sou2o ()
#30 0x08299b3b in opimai_real ()
#31 0x08299a2b in main ()

I would not expect anyone to know what these function do, but some might have some guesses based on parts of the names. While that may be useful, it is not necessary. What is important is that this stack trace was identical for nearly the entire 11 minutes that 16KB/PCTFREE 10 update ran. The most important part of performance troubleshooting is gathering good information. Even if you do not know how to exactly interpret it, it is likely that someone in Oracle Support can.

Now for those of you who are really paying attention, you may notice that the two functions (ktspscan_bmb & ktspfsrch) that Jonathan lists in his key statistics are in this stack trace. This is not just coincidence.

Troubleshoot Like A Real Expert

One of the guys who frequently uses (and blogs about) this type of performance troubleshooting approach is Tanel Poder. I would have really liked to try out his DTrace script but I did not have a Solaris host readily available, so I was stuck with old unsexy pstack on Linux. This brings me to an awesome quote from his recent blog post using DTrace:

…the key point of this article is that I could have started guessing or setting various events or undocumented parameters for probing Oracle here and there, for figuring out what’s going on. But I used a simple systematic approach instead. When Oracle Wait Interface and V$SESSTAT counters didn’t help, looked into process stack next… And if I hadn’t figured out the cause myself, I would have had some solid evidence for support.

I can not stress enough how important the approach matters. People who follow this approach will ultimately be successful and also much better off than those who just hack away. Understanding the why of performance is key.