My Strata + Hadoop World Schedule

Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis.  I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on Hadoop.  Hope to see you there!

Office Hour with Greg Rahn @ the Cloudera Booth
10/29/2013 11:00am – 11:30am EDT (30 minutes)
Room: 3rd Floor, Mercury Ballroom, Booth #403
http://www.cloudera.com/content/cloudera/en/new/strata-hadoop-world.html

Office Hour with Greg Rahn
10/29/2013 2:35pm – 3:05pm EDT (30 minutes)
Room: Table B
http://strataconf.com/stratany2013/public/schedule/detail/32297

@ParquetFormat and Impala NYC Meetup
10/29/2013 6:30pm – 9:30pm EDT
Projective Space LES – 72 Allen Street – New York, NY 10002
https://www.eventbrite.com/event/7852706657

Practical Performance Analysis and Tuning for Cloudera Impala
10/30/2013 2:35pm – 3:15pm EDT (40 minutes)
Room: Murray Hill Suite (capacity: 300)
http://strataconf.com/stratany2013/public/schedule/detail/30551

Upcoming Talks: OakTable World and Strata + Hadoop World

I haven’t had much time over the past year to do many blog posts, but in the next few months I’ll be doing a few talks about what I’ve been working on over that time, Cloudera Impala, an Open Source MPP SQL query engine for Hadoop.  Hope to see you at one of them.

OakTable World – September 23-24, 2013, San Francisco, CA
SQL on Hadoop – 11:00am Tuesday, September 24th.

Strata + Hadoop World – October 28-30, 2013, New York, NY
Practical Performance Analysis and Tuning for Cloudera Impala – 2:35pm Wednesday, October 30th.

OakTable World

This weekend begins Oracle’s OpenWorld and there is something off the beaten path going on — OakTable World. There is a pretty good lineup of speakers, including myself, who will be talking on subjects that probably would not qualify for OpenWorld sessions for various reasons. My talk is entitled “Beyond the Relational Database” and in it I’ll be talking about non-relational (Big Data) technologies and what they have to offer. Be sure to bring an open mind! Hope to see you there.

To make things easy here is an iCal file for my session.

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

$ uname -a
Linux chaos 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux

$ java -version
java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)

Observations

Shortly after I kicked off some Hadoop jobs, I noticed the system CPU percentages were extremely high. This certainly isn’t normal for this type of workload and is pointing to something being wrong or a bug somewhere. Because the issue was related to kernel code (hence high system times), I fired up perf top and tried to see where in the kernel code all this time was being spent (thanks @kevinclosson). Here is single iteration from perf-top which was representative of what I was seeing:

PerfTop:   16096 irqs/sec  kernel:92.6%  exact:  0.0% [1000Hz cycles],  (all, 16 CPUs)
-------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                                                              DSO
             _______ _____ _____________________________________________________________________ __________________

           223182.00 93.8% _spin_lock_irq                                                        [kernel.kallsyms] 
             3879.00  1.6% _spin_lock_irqsave                                                    [kernel.kallsyms] 
             3260.00  1.4% compaction_alloc                                                      [kernel.kallsyms] 
             1992.00  0.8% compact_zone                                                          [kernel.kallsyms] 
             1714.00  0.7% SpinPause                                                             libjvm.so
              716.00  0.3% get_pageblock_flags_group                                             [kernel.kallsyms] 
              596.00  0.3% ParallelTaskTerminator::offer_termination(TerminatorTerminator*)      libjvm.so
              169.00  0.1% _cond_resched                                                         [kernel.kallsyms] 
              114.00  0.0% _spin_lock                                                            [kernel.kallsyms] 
              101.00  0.0% hrtimer_interrupt                                                     [kernel.kallsyms]

At this point I decided to take a 60 second capture using perf record using the following command:

$ sudo perf record -a -g -F 1000 sleep 60

After I had the capture, I built a flame graph using Brendan Gregg’s tools (because I am a big fan of performance data visualizations).

Looking at the functions listed in the Flame Graph (below) it looked like the issue was related to virtual memory and the Linux source shows many of these functions are in linux/mm/compaction.c.

Flame graph cropped

The issue seemed to be around virtual memory, however, this Hadoop job was using just 8 mappers per node and the java heap was set to 1GB, so there was plenty of “leftover” memory on the system, so why would this system be thrashing in the vm kernel code?

Experiment

While eating dinner and having a few beers something came to mind — Linux 6 had a new feature called Transparent Huge Pages, or THP for short. And like all new features that are deemed to add benefit, it is enabled by default. THP can be disabled by running the following command:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

And this change, and only this change, is exactly what I did when I returned from dinner. I then fired off my Hadoop job and watched anxiously. To my pleasant surprise, the elevated sys CPU times were now gone and things looked much more like I wanted them to.

I’ve flipped back and forth several times and have had nothing but high sys times with THP enabled, so it’s pretty reproducible on my system.

Thoughts

I’m not 100% sure why THP are choking up my system (maybe bug vs. feature) but I’m certainly interested if others have seen similar behavior on Linux 6 with data intensive workloads like Hadoop and THP enabled. Other thoughts, experiment results, etc. are also very welcome.

To put things into perspective on how bad it gets, here are two screen captures of Cloudera Manager which highlights the ugly sys CPU times (see the middle chart; green = sys, blue = usr) when THP are enabled.

Do note the time scales are not identical.

Transparent Huge Pages enabled:
Transparent Huge Pages enabled

Transparent Huge Pages disabled:
Transparent Huge Pages disabled

Update:

The issue seems to be related to transparent hugepage compaction and is actually documented on the Cloudera Website (but my google foo did not turn it up) here which recommends the following:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

I did stumble across the bug listed on the Red Hat Bugzilla site in the google cache but it is not publicly available for some reason (bummer).
https://bugzilla.redhat.com/show_bug.cgi?id=805593

Just confirming after I disabled THP defrag on my cluster the high sys CPU times are not present.

Update 2:

Just for documentation’s sake, here are some performance captures between THP enabled and disabled.

# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# collectl -scm -oT -i2
waiting for 2 second sample...
#         <----CPU[HYPER]----->
#Time     cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map 
10:33:08   84   4 46083   6964   5G 324M  61G  60G 563M  25G 
10:33:10   71   7 39933  25281   5G 324M  61G  60G 566M  24G 
10:33:12   89   4 48545  15724   5G 324M  61G  60G 566M  25G 
10:33:14   81   7 44566   8224   7G 324M  61G  60G 566M  23G 
10:33:16   81   4 44604   8815   8G 324M  62G  60G 566M  22G 
10:33:18   87   7 46906  20430   8G 324M  62G  60G 566M  22G 
10:33:20   79   6 43565  11260   6G 324M  62G  61G 565M  24G 
10:33:22   75   6 41113  13180   3G 325M  62G  61G 565M  26G 
10:33:24   64   5 36610   9745   2G 325M  62G  61G 565M  27G 
10:33:26   60   4 34439   7500   1G 325M  62G  61G 565M  28G 
10:33:28   74   5 40507   9870   1G 324M  61G  60G 564M  30G 
10:33:30   73   6 42778   7023   6G 324M  60G  59G 561M  25G 
10:33:32   86   5 46904  11836   5G 324M  61G  59G 561M  26G 
10:33:34   78   3 43803   9378   5G 324M  61G  59G 559M  25G 
10:33:36   83   4 44566  11408   6G 324M  61G  60G 560M  24G 
10:33:38   62   4 35228   7060   7G 324M  61G  60G 559M  23G 
10:33:40   75   7 42878  16457  10G 324M  61G  60G 559M  21G 
10:33:42   88   7 47898  13636   7G 324M  61G  60G 560M  23G 
10:33:44   83   6 45221  17253   5G 324M  61G  60G 560M  25G 
10:33:46   66   4 36586   6875   3G 324M  61G  60G 560M  26G 
10:33:48   66   4 37690   9938   2G 324M  61G  60G 559M  28G 
10:33:50   66   3 37199   6981   1G 324M  61G  60G 559M  28G 

# echo always > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# collectl -scm -oT -i2
waiting for 2 second sample...
#         <----CPU[HYPER]----->
#Time     cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map 
10:51:31   99  81 51547  14961  24G 326M  53G  51G 536M  15G 
10:51:33   92  81 49928  11377  24G 326M  52G  51G 536M  15G 
10:51:35   59  58 39357   2440  24G 326M  52G  51G 536M  15G 
10:51:37   54  53 36825   1639  24G 326M  52G  51G 536M  15G 
10:51:39   88  87 49293   2284  24G 326M  52G  51G 536M  15G 
10:51:41   95  94 50295   1638  24G 326M  52G  51G 536M  15G 
10:51:43   99  98 51780   1838  24G 326M  52G  51G 536M  15G 
10:51:45   97  95 50492   2412  24G 326M  52G  51G 536M  15G 
10:51:47  100  96 50902   2732  24G 326M  52G  51G 536M  15G 
10:51:49  100  89 51097   4748  24G 326M  52G  51G 536M  15G 
10:51:51  100  71 51198  36708  24G 326M  52G  51G 536M  15G 
10:51:53   99  56 51807  50767  24G 326M  52G  51G 536M  15G 
10:51:55  100  51 51363  66095  24G 326M  52G  51G 536M  15G 
10:51:57  100  48 51691  73226  24G 326M  52G  51G 536M  15G 
10:51:59   99  36 52350  87560  24G 326M  52G  51G 536M  15G 
10:52:01   99  51 51809  42327  24G 325M  52G  51G 536M  15G 
10:52:03  100  50 51582  62493  24G 325M  52G  51G 536M  15G 
10:52:05   99  44 52135  69813  24G 326M  52G  50G 536M  15G 
10:52:07   99  39 51505  65393  24G 326M  52G  50G 536M  16G 
10:52:09   98  39 52778  54844  24G 326M  52G  50G 536M  16G 
10:52:11   98  62 51456  30880  24G 326M  52G  50G 536M  16G 
10:52:13  100  83 51014  21095  24G 326M  52G  50G 536M  16G

Update: 2013-09-17
Oracle comments on disabling THP in Oracle Linux: Performance Issues with Transparent Huge Pages (THP)

Pitfalls of Using Parallel Execution with SQL Developer

[This post was originally published on 2012/02/29 and was hidden shortly thereafter. I'm un-hiding it as of 2012/05/30 with some minor edits.]

Many Oracle Database users like tools with GUI interfaces because they add features and functionality that are not easily available from the command line interfaces like SQL*Plus. One of the more popular tools from my experiences is Oracle SQL Developer in part because it’s a free tool from Oracle. Given SQL Developer’s current design (as of version 3.1.07.42), some issues frequently show up when using it with Oracle Databases with Parallel Execution. SQL Developer also contains a bug that exacerbates this issue as well.

The Issue

The crux of the issue with SQL Developer (and possibly other similar tools) and Parallel Execution comes down to how the application uses cursors. By default, SQL Developer has the array fetch size set to 50. This means that for any cursor SQL Developer opens for scolling, it will fetch the first 50 rows and when you scroll to the bottom of those rows in a grid, it will fetch the next 50 rows and so on. The array size can be controlled by going into Properties and changing Database -> Advanced -> SQL Array Fetch Size which allows for a max setting of 500 rows. This is good in the sense that the JDBC application can fetch an array of rows with a single JDBC database call, however, using this approach with Parallel Execution, the PX servers used for this cursor will not be released until the cursor is canceled or the last row is fetched. Currently the only way to force reading until the end of cursor in SQL Developer is to issue a Control+End in the data grid. As a result, any action that uses Parallel Execution and has not fetched all the rows or is not canceled/closed, will squat those Parallel Execution resources and prevent them from being used by other users. If enough users have open cursors backed by Parallel Execution, then it is possible that it could consume all of the Parallel Execution servers and will result in Parallel Execution requests being forced to Serial Execution because resources are not available, even if the system is completely idle.

The SQL Developer Bug

When experimenting with SQL Developer for this blog post I also found and filed a bug (bug 13706186) because it leaks cursors when a user browses data in a table by expanding Tables (in the left pane), clicking on a table name and then the Data tab. Unfortunately this bug adds insult to injury if the table is decorated with a parallel degree attribute because the leaked cursors do not release the Parallel Execution servers until the session is closed, thus preventing other sessions from using them.

This bug is easily demonstrated using the SCOTT schema, but any schema or table will do as long as the table has more rows than the array fetch size. For my example, I’m using a copy of the EMP table, called EMP2, which contains 896 rows and was created using the following SQL:

The steps to demonstrate this issue are as follows:

  1. Set up the EMP2 table using the above script or equivalent.
  2. Use SQL Developer to connect to the SCOTT schema or equivalent.
  3. Expand Tables in the Browser in the left pane.
  4. Click EMP2.
  5. Click on the Data tab of EMP2.
  6. Check the open cursors.
  7. Close the EMP2 tab in the right pane.
  8. Check the open cursors.
  9. Goto Step #4 and repeat.

I’m going to repeat this process two times for a total of three open and close operations. I’ll use this query to show the open cursors for the Data grid query for the EMP2 table (adjust if necessary if you are not using my example):

If we look at the output (scott_emp2_cursors.txt below the EM graphic) from the query we’ll see that the first time the EMP2 Data tab is opened, it opens two identical cursors (sql_exec_id 16777216 & 16777217). After closing the EMP2 Data tab, 16777216 is still open. The second time the EMP2 Data tab is opened, two more identical cursors are opened (sql_exec_id 16777218 & 16777219). The third time two more cursors are opened (sql_exec_id 16777220 & 16777221). After closing the tabs we still see three cursors open (sql_exec_id 16777216, 16777218 & 16777220), each of which are squatting two PX servers.

This behavior can also be seen in 11g Enterprise Manager (or dbconsole) on the SQL Monitoring page by sorting the statements by time — notice the (leaked cusor) statements with the green spinning pinwheels after all tabs have been closed (all parallel statements are monitored by default).

Sqlmon Cursor Leak

By the way, the cursor leak applies for tables without a parallel degree setting as well, but has more significant impact if the table is parallel because PX servers are a shared resource.

(scott_emp2_cursors.txt below)

My Thoughts

Obviously the cursor leak is a SQL Developer bug and needs fixing, but in the interim, DBAs should be aware that this behavior can have a global impact because Parallel Execution servers are shared by all database users. Also, if SQL Developer users are running Parallel Queries and keep the results grid open but do not fetch all the rows by using the Control+End functionality, those Parallel Execution servers will be unavailable for other users to use and could negatively impact other users queries leveraging Parallel Execution.

Personally I’d like to see a few enhancements to SQL Developer to avoid these pitfalls:

  1. Disable Parallel Execution for Table Data browsing.

    Browsing data in a table via a scrollable grid is a “small data” problem and does not require the “big data” resources of Parallel Execution. This could easily be done by adding a NOPARALLEL hint when SQL Developer builds the query string.

  2. Add a property with functionality to read all rows w/o requiring the Control+End command (always fetch until end of cursor) or until a max number or rows are read (or a max amount of memory is used for the result set), then close the cursor.

    By fetching until end of cursor or fetching a max number of rows and closing the cursor, the client will release any Parallel Execution resources it may have used. Obviously fetching all rows could be a problem with large result sets and cause SQL Developer to run out of memory and crash, which would be a bad user experience, but not releasing PX resources can easily lead to many bad user experiences.

    I’ve seen the issue of potentially large result sets dealt with in other JDBC based GUI tools that connect to parallel databases by the tool appending a “LIMIT X” clause to queries where the user can control the value for “X” in a property setting. To the best of my knowledge, no other parallel databases support cursors in the way that Oracle does (being able to fetch rows, pause, then resume fetching), so there is no issue there with squatting resources with them (once they start fetching they must continue until the last row is fetched or the statement is canceled). As of release 11.2, Oracle does not support the LIMIT clause but this functionality could be done in the client by using some some upper limit on “array fetch size” * “number of fetches” or wrapping queries with a “select * from ([query text]) where rownum <= X" or similiar.

There are some “clever” server-side ways to deal with this as well, such as adding a logon trigger that disables parallel query if the V$SESSION.PROGRAM name is “SQL Developer”, but a robust, clean client side solution is preferred by myself and likely other DBAs as well. It’s really just a simple matter of programming.

Summary

When using SQL Developer or similar tools, be aware of the potential to squat Parallel Execution resources if the client tool has open cursors. Educate your SQL Developer users on how they can play well with others in an Oracle Database using Parallel Execution by closing unneeded tabs with open cursors. Be aware of the impact of the cursor leak bug in SQL Developer 3.1.07.42 (and possibly previous releases) until it is fixed.

Personally I’d like to see an enhancement to deal with this behavior and I don’t think it would require much programming. It certainly would allow DBAs to feel more confident that SQL Developer is a tool that can be used on production systems and not result in any issues. What are your thoughts? Do my enhancement requests seem reasonable and warranted?

Visualizing Active Session History (ASH) Data With R

One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…

Background

Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync                     3,155,253       9,197      3    6.4 Commit

Drilling into this a bit deeper the two log file events reported the following in the Foreground Wait Events and Background Wait Events sections of the AWR report:

Foreground Wait Events               
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was  ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0

                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         659,157     0        526       1      0.2   18.2

It is generally always worth looking at the breakdown of these times, as they are averages. We really want to understand the entire histogram of these wait events. For that we can look at the Wait Event Histogram section of the AWR report as below.

Wait Event Histogram
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was  % of Waits: column heading of <=1s is truly 1s is truly >=1024ms
-> Ordered by Event (idle events last)

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits   <1ms  <2ms  <4ms  <8ms <16ms <32ms  1s
-------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write    661.6K  84.7   9.7   4.7    .4    .1    .3    .0
log file sync              3138.K  14.0  42.8  30.4   7.9   2.3   2.6    .1

Wait Event Histogram Detail (64 msec to 2 sec)
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> Units for % of Total Waits:
   ms is milliseconds
   s is 1024 milliseconds (approximately 1 second)
-> % of Total Waits: total waits for all wait classes, including Idle
-> % of Total Waits: value of .0 indicates value was  Ordered by Event (only non-idle events are displayed)

                                                 % of Total Waits
                                 -----------------------------------------------
                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   =2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write       52 100.0    .0    .0
log file sync               3535  99.9    .1    .0

One thing that you should notice here is there are two sections of Wait Event Histogram; the buckets less than 32ms and buckets greater than 32ms. It is also important to note that only the absence of a value means no timings fell into that bucket — so even though the report shows .0 there are still events in that bucket (read the section description as it mentions this).

We can see from the second histogram section that there were 52 times that log file parallel write was over 64ms as well as 3535 times log file sync was over 64ms. At this point a hypothesis is formed that the two events are correlated — that is, the belief is the long log file parallel write events may be causing the long log file sync. To find data that supports the hypothesis (or not) we can look at the Active Session History (ASH) data to get a more granular view of the wait events.

Diving Into The ASHes With R

If the hypothesis is correct, the ASH data should show times where we observe long log file parallel write and long log file sync waits. One could write SQL against V$ACTIVE_SESSION_HISTORY to collect all the samples that may demonstrate the hypothesis — e.g. collect a list of sample ids for each event and examine them for overlap, but a visualization is worth a thousand words.

One way to get quick and easy visualizations from ASH data is to use R. R has become a very popular tool for those doing statistical analysis and it has some quite useful graphing and plotting packages built in. R can connect to Oracle via a JDBC package which makes importing data trivial.

Here is a plot that I put together using R for ~300 sample ids (~5 minutes) from ASH (recall that TIME_WAITED is in microseconds):

Rplot01

As you can see from the plots, nearly every time there is a long log file parallel write there are also numerous long log file sync events. This data supports the hypothesis.

Averages Suck

Most any statistician (or anyone keen on numbers and stats) will tell you averages suck. This does not mean averages are not useful, but one has to keep in mind averages can hide infrequent outliers (which may make them a tad bit evil). For instance, AWR is reporting an average of 1ms for log file parallel write but we can see that there are several that are falling in the 20ms range in the 5 minute capture. If we zoom in on the graph (set the y axis at a max of 3ms), we can understand why:
Rplot02

Most of the log file parallel write events are coming in around 0.5ms so even with some 20ms outliers it still yields an average of 1ms, thus hiding the magnitude of impact for the outliers. This is why drilling down into the ASH data was important for us to understand the scope of the issue.

With More Data Comes More Questions

At this point, the visualization of ASH data shows a strong correlation between log file sync and log file parallel write outliers, but the root cause has not yet been identified. Perhaps there is more insight lurking in the ASH data?

Given this is an OLTP workload and log file parallel write is an IO operation, perhaps it’s worth looking at another IO operation like, say, db file sequential read – single block IO times.

In the below plot, I’ve taken a 60 second window of ASH data and plotted all the db file sequential read events.

Rplot11
Interestingly, there are several samples where the IOs are taking significantly longer than most — the majority are below 10ms but we see a few samples where there are groups in the 20ms or more range. Let’s add the log file sync events to the plot.
Rplot12
Hmmm… even more interesting. The data is showing a correlation between log file sync and db file sequential read. Any bets on what the plot looks like if log file parallel write is added?

Rplot13

Very interesting. The data is showing us that all 3 events are correlated strongly. Clearly we are on to something here…and using R to plot the ASH data was an easy way to present and understand it visually.

Summary

While this blog post did not go into specific root cause, it was a healthy journey in debugging and data driven analysis. Keep in mind that just because AWR averages look good, don’t overlook the fact that the event histograms should also be reviewed to see if there are outliers. R allows an easy way to put scatter plots of event times together to get a nice visual of what is going on.

Source Code

Here is the code I used to do the analysis. It’s pretty straight forward and well commented (I think). Enjoy!

I’ll also mention that I use the RStudio IDE for R.

UKOUG – A Deep Dive into the SQL Monitoring Report

The UK Oracle User Group Conference 2011 is just around the corner and I just realized that I haven’t yet highlighted this, other than the “I’m speaking” banner on my blog.  I’ll be speaking on one of my favorite and most used reports — the SQL Monitor Report.  Below are the session details [direct link].  Hope to see you there!

Presentation Title: A Deep Dive into the SQL Monitoring Report
Speaker: Mr Greg Rahn
Company: Oracle
Presentation abstract: The SQL Monitoring Report was introduced in Oracle Database 11g and has become the single most used diagnostic report by the Oracle Real-World Performance Group and Oracle database development for SQL statement performance issues. This session will start with a technical overview of the SQL Monitoring Report and what metrics and information it provides. From there we’ll deep dive into numerous examples from the field explaining how this report was used to diagnose and validate performance issues. There is a wealth of information in the SQL Monitoring Report and this session will provide the necessary knowledge to best leverage it.
Presentation begins: 05/12/2011 11:05
Presentation duration: 60
Presentation content level: 2 (1 = indepth, 5 = strategic overview)
Audience experience: All experience levels
Audience function: Product Expert, Product User
Speaker biography: Greg Rahn is a database performance engineer in the Real-World Performance Group at Oracle Corporation. He joined the Real-World Performance Group in 2004 and has been working with Oracle databases since 1997. His primary focus is performance and scalability for Oracle data warehouses, specializing in Parallel Execution and Query Optimizer subject areas.
Related topics: Oracle RDBMS: Features and Options, Oracle RDBMS: Performance Managment
Hall: Hall 5