One of the easiest ways to understand something is to see a visualization. Looking at Active Session History (ASH) data is no exception and I’ll dive into how to do so with R and how I used R plots to visually present a problem and confirm a hypothesis. But first some background…
Frequently DBAs use the Automatic Workload Repository (AWR) as an entry point for troubleshooting performance problems and in this case the adventure started the same way. In the AWR report Top 5 Timed Foreground Events, the log file sync event was showing up as the #3 event. This needed deeper investigation as often times the cause for longer log file sync times is related to longer log file parallel write times.
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- log file sync 3,155,253 9,197 3 6.4 Commit
Drilling into this a bit deeper the two log file events reported the following in the Foreground Wait Events and Background Wait Events sections of the AWR report:
Foreground Wait Events -> s - second, ms - millisecond - 1000th of a second -> Only events with Total Wait Time (s) >= .001 are shown -> ordered by wait time desc, waits desc (idle events last) -> %Timeouts: value of 0 indicates value was ordered by wait time desc, waits desc (idle events last) -> Only events with Total Wait Time (s) >= .001 are shown -> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0 Avg %Time Total Wait wait Waits % bg Event Waits -outs Time (s) (ms) /txn time -------------------------- ------------ ----- ---------- ------- -------- ------ log file parallel write 659,157 0 526 1 0.2 18.2
It is generally always worth looking at the breakdown of these times, as they are averages. We really want to understand the entire histogram of these wait events. For that we can look at the Wait Event Histogram section of the AWR report as below.
Wait Event Histogram -> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000 -> % of Waits: value of .0 indicates value was % of Waits: column heading of <=1s is truly 1s is truly >=1024ms -> Ordered by Event (idle events last) % of Waits ----------------------------------------------- Total Event Waits <1ms <2ms <4ms <8ms <16ms <32ms 1s -------------------------- ------ ----- ----- ----- ----- ----- ----- ----- ----- log file parallel write 661.6K 84.7 9.7 4.7 .4 .1 .3 .0 log file sync 3138.K 14.0 42.8 30.4 7.9 2.3 2.6 .1 Wait Event Histogram Detail (64 msec to 2 sec) -> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000 -> Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second) -> % of Total Waits: total waits for all wait classes, including Idle -> % of Total Waits: value of .0 indicates value was Ordered by Event (only non-idle events are displayed) % of Total Waits ----------------------------------------------- Waits 64ms Event to 2s <32ms <64ms <1/8s <1/4s <1/2s <1s =2s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- log file parallel write 52 100.0 .0 .0 log file sync 3535 99.9 .1 .0
One thing that you should notice here is there are two sections of Wait Event Histogram; the buckets less than 32ms and buckets greater than 32ms. It is also important to note that only the absence of a value means no timings fell into that bucket — so even though the report shows .0 there are still events in that bucket (read the section description as it mentions this).
We can see from the second histogram section that there were 52 times that log file parallel write was over 64ms as well as 3535 times log file sync was over 64ms. At this point a hypothesis is formed that the two events are correlated — that is, the belief is the long log file parallel write events may be causing the long log file sync. To find data that supports the hypothesis (or not) we can look at the Active Session History (ASH) data to get a more granular view of the wait events.
Diving Into The ASHes With R
If the hypothesis is correct, the ASH data should show times where we observe long log file parallel write and long log file sync waits. One could write SQL against V$ACTIVE_SESSION_HISTORY to collect all the samples that may demonstrate the hypothesis — e.g. collect a list of sample ids for each event and examine them for overlap, but a visualization is worth a thousand words.
One way to get quick and easy visualizations from ASH data is to use R. R has become a very popular tool for those doing statistical analysis and it has some quite useful graphing and plotting packages built in. R can connect to Oracle via a JDBC package which makes importing data trivial.
Here is a plot that I put together using R for ~300 sample ids (~5 minutes) from ASH (recall that TIME_WAITED is in microseconds):
As you can see from the plots, nearly every time there is a long log file parallel write there are also numerous long log file sync events. This data supports the hypothesis.
Most any statistician (or anyone keen on numbers and stats) will tell you averages suck. This does not mean averages are not useful, but one has to keep in mind averages can hide infrequent outliers (which may make them a tad bit evil). For instance, AWR is reporting an average of 1ms for log file parallel write but we can see that there are several that are falling in the 20ms range in the 5 minute capture. If we zoom in on the graph (set the y axis at a max of 3ms), we can understand why:
Most of the log file parallel write events are coming in around 0.5ms so even with some 20ms outliers it still yields an average of 1ms, thus hiding the magnitude of impact for the outliers. This is why drilling down into the ASH data was important for us to understand the scope of the issue.
With More Data Comes More Questions
At this point, the visualization of ASH data shows a strong correlation between log file sync and log file parallel write outliers, but the root cause has not yet been identified. Perhaps there is more insight lurking in the ASH data?
Given this is an OLTP workload and log file parallel write is an IO operation, perhaps it’s worth looking at another IO operation like, say, db file sequential read – single block IO times.
In the below plot, I’ve taken a 60 second window of ASH data and plotted all the db file sequential read events.
Interestingly, there are several samples where the IOs are taking significantly longer than most — the majority are below 10ms but we see a few samples where there are groups in the 20ms or more range. Let’s add the log file sync events to the plot.
Hmmm… even more interesting. The data is showing a correlation between log file sync and db file sequential read. Any bets on what the plot looks like if log file parallel write is added?
Very interesting. The data is showing us that all 3 events are correlated strongly. Clearly we are on to something here…and using R to plot the ASH data was an easy way to present and understand it visually.
While this blog post did not go into specific root cause, it was a healthy journey in debugging and data driven analysis. Keep in mind that just because AWR averages look good, don’t overlook the fact that the event histograms should also be reviewed to see if there are outliers. R allows an easy way to put scatter plots of event times together to get a nice visual of what is going on.
Here is the code I used to do the analysis. It’s pretty straight forward and well commented (I think). Enjoy!
I’ll also mention that I use the RStudio IDE for R.
I’ve seen some posts on the blogosphere where people attempt to explain (or should I say guess) how Exadata Smart Flash Logging works and most of them are wrong. Hopefully this post will help clear up some the misconceptions out there.
The following is an excerpt from the paper entitled “Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine” that goes into technical detail on the Exadata Smart Flash Logging feature.
Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.
The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.
By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.
In addition, here are some OLTP demos that demonstrate how much performance and throughput can be wasted by poor design and suboptimal database programming.
OLTP Performance – The Trouble with Parsing
OLTP Performance – Concurrent Mid-Tier Connections
The internet buzz seems to be that Larry Ellison, CEO, Oracle Corporation and John Fowler, EVP, Sun Microsystems, Inc. will be announcing a new product, the world’s first OLTP database machine with Sun’s brand new FlashFire technology on Tuesday, September 15, 2009, 1 p.m. PT.
Both Sun and Oracle have Webcast invitations on their websites:
I plan on being at the Oracle Conference Center for the launch and will try and Tweet the highlights. First Oracle Database 11g Release 2, now an OLTP database machine. Are there more innovations up Oracle’s sleeve? I guess we’ll have to wait and see.
Personally I believe the array interface is one of the most overlooked methods to increase database application scalability. Any time an application is selecting or inserting more than a single row, performance benefits are generally observed by using the array interface. The Oracle array interface exists for Oracle Call Interface (OCI), PL/SQL, and JDBC.
When more than one row is being sent between the client and the server, performance can be greatly enhanced by batching these rows together in a single network roundtrip rather than having each row sent in individual network roundtrips. This is in particular useful for INSERT and SELECT statements, which frequently process multiple rows and the feature is commonly known as the array interface.
To use the array interface, the client application will need to represent data as arrays rather than individual variables containing the data of a single row. For queries, most APIs perform automated array processing such that a client side fetch will return rows from an automatically buffered array fetch. This is known as prefetching. For INSERT statements, most APIs require the application code actually contain the array in the client side and use an array version of the bind and execute calls.
JDBC Array Inserts
I used batchInsert.java to insert 10,000 rows into the EMP table varying the batch size from 1 up to 50. Below is a graph of the elapsed times at each batch size.
As one can see by the results, leveraging the array interface for INSERT statements has significant performance gains, even with a small batch size, compared to the single row operation. Update Batching is discussed in more detail in the JDBC Developer’s Guide.
JDBC Array Selects
I ran some tests using rowPrefetch.java which executes a SELECT against the 10,000 row EMP table without any predicate. The prefetch batch size was tested at values between 1 and 10. Below is a graph of the elapsed times.
Setting the prefetch batch size to 5 yielded almost a 2x gain in response time compared to the single row operation. Please see the Row Prefetching section of the JDBC Developer’s Guide for more details on the topic.
The simple tests that were performed demonstrate that using the array interface yields noticable performance gains for both INSERT and SELECT statements. The amount of performance gain will vary by application, but I’m most certain that gains will be observed if the operation is on more than a single row. One should notice that leveraging the array interface not only reduces elapsed time, but also reduces the amount of CPU the operation consumes. That sounds like a win-win to me and why I consider it an important part of database application scalability. After all, the JDBC array interface is not in the Performance Extensions chapter for nothing, right?
The simple tests I performed were run on a single CPU host running Windows XP Pro, Oracle 10.2.0.3. If your application doesn’t reside on the same host as the database, it is likely that you may see even greater gains do to the reduction of the network overhead. As always, your mileage may vary.