Choosing An Optimal Stats Gathering Strategy

Recently the Oracle Optimizer Development Team put out a White Paper entitled Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer. This paper discusses the main differences between 9i and 10g in the subject area of the Optimizer and Statistics. As G.I. Joe said, “Now we know! And knowing is half the battle.” The other half of the battle is successfully applying that knowledge to the databases that you manage. Statistics are input to the Oracle Optimizer and the foundation of good plans. If the statistics supplied to the Oracle Optimizer are non-representative we can probably expect GIGO (Garbage In, Garbage Out). On the other hand, if the statistics are representative, chances quite good that the Oracle Optimizer will choose the optimal plan. In this post I’d like to discuss my thoughts on how to choose an optimal stats gathering strategy.

Suggested Readings

If you haven’t done so already, I would first suggest reading the following:

Start With A Clean Slate

My first recommendation is to unset any Optimizer related parameters that exist in your init.ora, unless you have specific recommendations from the application vendor. This includes (but is not limited to):

  • optimizer_index_caching
  • optimizer_index_cost_adj
  • optimizer_mode
  • db_file_multiblock_read_count

In almost every case, the defaults for these parameters are more than acceptable.

The same goes for any events and undocumented/hidden/underscore parameters that are set. Hidden parameters and events should only be used to temporarily work around bugs under the guidance and direction of Oracle Support and Development. Contrary to what you may find on the Internet via your favorite search engine, hidden parameters are not meant to be tuning mechanisms and are not a source of magic performance gains. They are mechanisms that developers have instrumented into their code to debug problems and only those developers know and understand the full impact of changing hidden parameters.

High Level Strategy

  • Start With the Defaults: In most cases, the defaults for Optimizer parameters and DBMS_STATS are adequate. If you are upgrading from 9i to 10g, do your homework and note the differences in the defaults. Test them to see if they work well for your data and your execution plans.
  • Dealing With Suboptimal Execution Plans: There may be cases of query plan regression. It is very important to be diligent about finding root cause. Often times many plan regressions surface from the same root cause. This means if you can correctly diagnose and resolve the root cause, you have the potential to resolve many plan regressions.
  • Adjust Strategy to Cope with the Exceptions: Once it is understood why the suboptimal plan was chosen, a resolution can be tested and implemented.

Start With The Defaults

I can not stress enough how important it is to start with the defaults for Optimizer and DBMS_STATS parameters. The Real-World Performance Group has dealt with numerous cases where customers are not using the default values for Optimizer parameters and by simply setting them back to the default values performance increases . If you are planning to regression test your system because you are upgrading your database, there is no better time to do a reset of these parameters. It will also put you in a much better place to troubleshoot if you run into issues. Give the software the best chance to do what it can for you. Don’t try and be too clever for your own good.

One of the most common problems I’ve seen is that customers have chosen a magic fixed percentage for ESTIMATE_PERCENT in DBMS_STATS. Why do I call it magic? Because the customers had no systematic reasoning for the value they chose.

In the Real-World Performance Group Roundtable session at Oracle OpenWorld 2007 the topic of DBMS_STATS came up and I asked “How many people are using ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE?” A handful of people raised their hands. I then asked, “For those of you who are using a fixed value for ESTIMATE_PERCENT, who can explain how they chose their number, other than just picking it out of the air.” Not one person raised their hand. Scary! The moral of the story: You should have a documented reason (and test case) to deviate from the defaults.

Dealing With Suboptimal Execution Plans

Probably the most common root cause (I’d say >90%) for suboptimal execution plans is poor cardinality estimates by the Optimizer. Poor cardinality estimates are generally the result of non-representative statistics. This means that the root cause of most Optimizer related issues are actually stats related, reaffirming how important it is to have representative stats. For more details on troubleshooting poor cardinality estimates, I would suggest reading my post on Troubleshooting Bad Execution Plans.

In 10g, Automatic SQL Tuning was introduced via Enterprise Manager (which uses the package DBMS_SQLTUNE). I would highly recommend that you evaluate this tool (if you have it licensed). I’ve found that it can often come up with quite good suggestions and fixes.

Another option that is available to help get more accurate cardinality estimates is dynamic sampling. This is probably an underused option which can help with getting more accurate cardinality estimates when there is data correlation, etc. Dynamic sampling is most appropriate for DSS and data warehouse databases, where queries run for minutes, not seconds. See my post There Is No Time Like ‘%NOW%’ To Use Dynamic Sampling for a real-world example.

Adjust Strategy To Cope With The Exceptions

There are three scenarios that seem to be rather common:

  1. Non-representative NDV with skewed data
  2. Out-of-range predicates and partitioned tables
  3. Bind peeking when histograms exist

Non-Representative NDV With Skewed Data

There are cases where the defaults for DBMS_STATS (ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE) may not yield representative NDV in 9i and 10g. One specific case that I’ve seen repeatedly is when there is a large number of values and significant data skew. In this case a fixed sample size that yields representative NDV should be chosen. For a more in-depth review of this see my post on the new 11g DBMS_STATS.AUTO_SAMPLE_SIZE, it goes through an example of the 10g AUTO_SAMPLE_SIZE NDV/skew issue.

While having accurate NDV statistics is desirable, do not be come obsessed with having perfect NDV statistics. The goal is to have the Optimizer choose the desired plan, not have perfect NDV. Having more accurate NDV may not change the plan. This is a case when less than perfect may be good enough. Don’t lose focus of the goal.

Out-of-range Predicates And Partitioned Tables

Another case that frequently comes up is usually related to out-of-range predicates with partitioned tables that do not have representative stats. Don Seiler’s write-up of his real-world case is a poster child for this exception. If you are bulk loading data into a partitioned table, it is necessary that if statistics exist, they are representative. This problem generally surfaces when statistics have been collected on an empty partition (so all stats are zeros) and now the partition has been bulk loaded. There are a few options here:

  • Stats are gathered immediately after the loading directly into the target table.
  • Data is loaded into a staging table, stats are gathered, and the staging table is partition exchanged into the target table.
  • Stats are cloned or copied (see DBMS_STATS.COPY_TABLE_STATS) from a similar partition.
  • There are no statistics and dynamic sampling will kick in (assuming it is set to the default of 2 or higher).

From what I have seen, this case generally shows up if the query plan has partition elimination to a single partition. This is because when only one partitioned is accessed only partition stats are used, but when more than one partition is accessed, both the global/table stats and partition stats are used.

Bind Peeking When Histograms Exist

The combination of these two seem to be a frequent cause of issue in 10g with OLTP systems. In my opinion, these two features should not be used together (or used with complete understanding and extreme caution). My reasoning for this is when histograms exist, execution plans can vary based on the filter predicates. Well designed OLTP systems use bind variables for execution plan reuse. On one hand plans can change, and on the other hand plans will be reused. This seems like a complete conflict of interest, hence my position of one or the other. Oh, and lets not overlook the fact that if you have a RAC database it’s possible to have a different plan on each instance depending on what the first value peeked is. Talk about a troubleshooting nightmare.

Unlike many, I think that disabling bind peeking is not the right answer. This does not address the root cause, it attempts to curb the symptom. If you are running an OLTP system, and you are using the nightly GATHER_STATS_JOB be mindful that it uses it’s own set of parameters: it overrides most of the parameters. The doc says:

When GATHER AUTO is specified, the only additional valid parameters are stattab, statid, objlist and statown; all other parameter settings are ignored.

It may be best to change the default value of the METHOD_OPT via DBMS_STATS.SET_PARAM to 'FOR ALL COLUMNS SIZE REPEAT' and gather stats with your own job. Why REPEAT and not SIZE 1? You may find that a histogram is needed somewhere and using SIZE 1 will remove it the next time stats are gathered. Of course, the other option is to specify the value for METHOD_OPT in your gather stats script.

Common Pitfalls And Problems

  • Disabling The 10g GATHER_STATS_JOB: As many of you know in 10g the GATHER_STATS_JOB was introduced. Since many customers have custom stats gathering scripts in place, many have chosen to disable this job. Disabling the GATHER_STATS_JOB entirely is not recommended because it also gathers dictionary stats (SYS/SYSTEM schemas). If you wish to collect your statistics manually, then you should change the value of AUTOSTATS_TARGET to ORACLE instead of AUTO (DBMS_STATS.SET_PARAM('AUTOSTATS_TARGET','ORACLE')). This will keep the dictionary stats up to date and allow you to manually gather stats on your schemas as you have done so in 9i.
  • Representative Statistics: When troubleshooting bad execution plans it is important to evaluate if the statistics are representative. Many times customers respond with “I just gathered statsitics” or “The statistics are recent“. Recently gathered statistics does not equate to representative statistics. Albert Einstein once said “The definition of insanity is doing the same thing over and over again and expecting different results”. It applies here as well.
  • Too Much Time Spent On Stats Gathering: Often when customers say their stats gathering is taking too long I ask to see their DBMS_STATS script. Generally there are three reasons that stats gathering is taking too long:
    1. Stats are gathered with too fine of setting for GRANULARITY: It is usually unnecessary to gather subpartition stats for composite partitioned tables. Don’t spend time collecting stats that are not needed. Don’t override the default for GRANULARITY unless you have a reason: the default probably is sufficient.
    2. Stats are gathered with unnecessarily large ESTIMATE_PERCENT: Use DBMS_STATS.AUTO_SAMPLE_SIZE to start with and adjust if necessary. No need to sample 10% when 1% or less yields representative statistics. Give DBMS_STATS.AUTO_SAMPLE_SIZE the chance to choose the best sample size.
    3. Stats are gathered more frequently than required or on data that hasn’t changed: The GATHER_STATS_JOB uses OPTIONS => 'GATHER AUTO' so it only gathers statistics on objects with more than a 10% change with a predefined set of options. If you are gathering statistics on tables/partitions that haven’t changed, or haven’t changed significantly, you may be spending time gathering unnecessary statistics. For example, there is no need to gather partition stats on last months (or older) if the data in the partition is no longer volatile.
  • Not Understanding The Data: The answers are almost always in the data…skew, correlation, etc. Many operational DBAs don’t have an in-depth understanding of the data they are managing. If this is the case, grab an engineer or analyst that is familiar with the data and work together. Two smart people working on a problem is almost always better than one!
  • Do Not Mess With Optimizer Parameters: If an execution plan is not choosing an index, understand why. Use the tools available. The GATHER_PLAN_STATISTICS hint is a prefect place to start. Fiddling with Optimizer parameters is not the solution.

29 comments

  1. Pingback: Choosing An Optimal Stats Gathering Strategy auf maol Jeopardy!
  2. Pingback: Oracle Point, Oracle Life. » Aggregation of CBO Issues with Oracle
  3. TongucY

    Hi Greg, another great post, thank you for sharing :)

    I want to have your comments on below two when you are available;

    1- when it comes to ETL may we also advise that CASCADE option as TRUE on table or partitions is unnecessary since the bitmap indexes subpartitions will be re-gathered during rebuilds? tried to build a small example here;

    http://tonguc.yilmaz.googlepages.com/dbms_stats_cascade_needed.txt

    2- as you advised in this post the cost for keeping GRANULARITY option as ALL, do you have an example. I am asking this since we couldn’t find any specific motivation to let this option to its default.

    Best regards.

  4. Greg Rahn

    Tonguc-

    1) You are correct. There is no need to gather stats on an index that was just build. In 10g the default on CREATE INDEX is to compute statistics, so why throw that work away. The high level approach can be load, gather table stats (CASCADE=> FALSE), build indexes.

    2) I mentioned that GRANULARITY should be left to the default (AUTO) which determines the granularity based on the partitioning type. I have never come across a case where ALL is required.

  5. Pingback: A DDL tacking application, if CASCADE option of DBMS_STATS needed everytime, ORA_HASH post update and others « H.Tonguç Yılmaz - Oracle Blog
  6. David Aldridge

    It seems to me that dynamic sampling delivers superior results to using data dictionary statistics in so many cases, such as for multi-column predicates and multi-partiton queries, that we can end up relying on it to the extent that we might almost not collect fact table statistics at all.

    OK maybe an exaggeration, but I’m in the habit at the moment of collecting only the most basic statistics (‘for all columns size 1′)and setting optimizer_dynamic_sampling to 4. Now I’ll admit that I haven’t done an in-depth analysis of exactly what Oracle means by “tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate” (http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032) but the practical effect has been that over the past year I do not recall a situation where we have relied on dynamic sampling in which query optimization could be improved with better data dictionary statistics.

    I have a feeling that dynamic sampling may be the thing of the future, and I hope that Oracle are working on extending the usefulness and efficiency of that (dynamic sampling result caching?) as well as on providing multi-column statistics (11g) etc..

  7. Pingback: Choosing An Optimal Stats Gathering Strategy « The Oracle Sponge
  8. John

    Hi. Nice paper.
    You mentioned setting AUTOSTATS_TARGET to ORACLE would only do dictionary tables. Could you please be more specific on which schemas are gathered if target is set to ORACLE. Were I to make a custom script, I will most likely want to exclude those schemas which AUTO job is already hitting. Is there a way to tell which of the many database schemas are oracle owned?

    TIA

  9. Greg Rahn

    John-

    The documentation states:” Statistics are collected for all Oracle owned objects.” This includes both SYSTEM and SYS schemas as well as some others (but I don’t have the complete list off the top of my head). I’d think it is safe to say that if you did not create the schema, then do not include it in your custom script. That list is probably also version dependent, so I’d be hesitant to rely on it too much. Better to rely on your known list.

  10. Mark Gokman

    Hi Greg,
    The paragraph below seems a little contradictory:
    “Unlike many, I think that disabling bind peeking is not the right answer. This does not address the root cause, it attempts to curb the symptom. If you are running an OLTP system, and you are using the nightly GATHER_STATS_JOB or using the default value for METHOD_OPT, you may want to change the default value of the METHOD_OPT via DBMS_STATS.SET_PARAM to ‘FOR ALL COLUMNS SIZE REPEAT’. Why REPEAT and not SIZE 1? You may find that a histogram is needed somewhere and using SIZE 1 will remove it the next time stats are gathered. Of course, the other option is to specify the value for METHOD_OPT in your gather stats script.

    Earlier you said the bind picking and histograms should not be used together. So, disabling one of them would make sense, but you seem to be recommending to stay with bind picking even when using bind variables, and also use REPEAT option for histograms, in essence, allowing bind picking and histograms to coexist in some cases. It seems to me that migrating from 9i to 10g this approach can cause unpleasant surprises when default bind picking is activated in 10g. I also don’t see why you said “This does not address the root cause, it attempts to curb the symptom.” What is the root cause and what is the symptom? The probelm is that the first execution plan will be applied to all subsequent executions of the statement while bind values will be different. This seems to relate more to the bind picking than to the histograms. Of course the root cause is in data distribution, but again with bind picking we are making things even worse.
    Could you comment?

  11. Greg Rahn

    Mark-

    To be more precise, it is probably not a good idea to use bind variable predicates on columns on which a histogram exists. It is plausible to use histograms in an OLTP system, if the application uses a literal value for that column in the predicate and not a bind value. Perhaps this is where the confusion came in. It is not a global binary “use” or “don’t use”, but rather a “don’t use together for a given column.” Hopefully that adds some clarity. For this reason, I suggested to use REPEAT; for the case where a histogram is desired and the application has been modified to use a literal value. Using “SIZE 1″ would remove the desired histogram on subsequent stats gathers.

    From what I have seen, bind peeking in itself does not generally seem to present an issue. I would guess that for the most part an equal or better plan can be chosen using bind peeking. Of course, this discounts the cases when histograms exist. Given this, would you be inclined to blame bind peeking? I personally do not. This is why my position is not to disable bind peeking, but rather control the collection of histograms to avoid the issue that arises from the combination of the two, for a given column.

  12. Pingback: Appunti sul funzionamento di CBO in Oracle « Oracle and other
  13. raj

    Hi Greg,

    When gathering stats with dbms_stats, I think I’m understanding that if cascade=>false, then index stats, for the related tables, will not be gathered. But if cascade=>true, then the estimate_percent listed will also be used for the related indexes. Is this correct?

    If so, what is the best way to compute stats on all related indexes for a table… irrelevant of the estimate_percent used?

  14. Greg Rahn

    @Raj

    You are correct. If cascade=>true then index stats are also gathered using the specified estimate_precent. If cascade=>false then only table stats are collected, not index stats.

    I am not sure what you mean by “best” (what is your goal?), but when index statistics are gathered as part of a table statistics collection, they are gathered serially, one after another. If you wish to consume more resources to shorten the duration of the collection, you could gather index statistics separately using DBMS_STATS.GATHER_INDEX_STATS in multiple, concurrent database sessions. You can select the granularity that works for you: table at time or index at a time.

  15. Randolf Geist

    Greg,

    regarding the options mentioned how to gather statistics after bulk loading a partition:

    “Data is loaded into a staging table, stats are gathered, and the staging table is partition exchanged into the target table.”

    I think it could be worth to note that this option has to be checked carefully as it effectively disables the (default) SIZE AUTO feature introduced in 10g. So if you e.g. migrate from 9i to 10g and want to take advantage of the SIZE AUTO feature then the data loading needs to be changed in such a way that the statistics are gathered on the actual table (partition) after exchanging the partition. Otherwise the SIZE AUTO feature will be reduced to SIZE 1 as usually no (or may be even misleading) column usage of the staging table is used to gather the statistics.

    I’ve recently posted a note about this along with a test case here: Exchange partition and the SIZE AUTO option of DBMS_STATS

    Regards,
    Randolf

  16. Greg Rahn

    @Randolf

    Thanks much for sharing that scenario. This is why it is extremely important to have a strategy that worka for you: what works for someone else, may not work for you due to limitations like this. A similar scenario is also true for 11g with the use of incremental stats: they can only be gathered on a partitioned table and would not work in a partition exchange scenario.

  17. Pingback: DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS | Structured Data
  18. Pingback: Gathering Statistics « Oracle Database Blog
  19. Pingback: Happy Anniversary To The Structured Data Blog | Structured Data
  20. Pingback: 2009 Year-End Zeitgeist
  21. CJ

    Great article.. but I wish you had more clearly highlighted the importance of TIMING of gathering stats. WHEN to gather stats is a critical part of the stats gathering strategy, and one that needs planning in advance.

    For example – you cannot allow the default gather_stats_job to overlap with your ETL batch run. Another example – a staging table will need to be analyzed as soon as it is populated and before it is used in any queries.

  22. Greg Rahn

    @CJ

    You are absolutely correct. I tend to advise people that for a DW the stats gathering should be part of the data flow (ETL/ELT) process and to disable the default job. For trickle fed or OLTP databases the default stats job generally works well.

  23. Pingback: Data Warehousing and Statistics in Oracle 11g – Automatic Optimizer Statistics Collection « RNM
  24. Pingback: Global statistics high/low values when using DBMS_STATS.COPY_TABLE_STATS « RNM
  25. בורסה

    Excellent post. I used to be checking continuously this weblog and I am inspired! Extremely helpful info specifically the last phase :) I deal with such information much. I used to be seeking this certain info for a long time. Thank you and best of luck.

  26. DBA

    We use estimate_percent=100 on a 2 node RAC.We have highly DML prone tables ranging from 4g to 50.0gb. It takes a lot of time to calculate statistics. I always think do we really need to sample 100%
    ANy inputs from you guys as to what could be more appropriate sample size?

  27. JOHN

    Which is one of the best way to gathered statistics only on table in a very minimum time?
    please let me know about that.

  28. Greg Rahn

    What version?
    Define “best”?

    In 11g, the defaults are the best place to start:
    dbms_stats.gather_table_stats(‘owner’,’table_name’);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s