Exadata Smart Flash Logging Explained

I’ve seen some posts on the blogosphere where people attempt to explain (or should I say guess) how Exadata Smart Flash Logging works and most of them are wrong. Hopefully this post will help clear up some the misconceptions out there.

The following is an excerpt from the paper entitled “Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine” that goes into technical detail on the Exadata Smart Flash Logging feature.

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.

The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.

By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.

19 comments

  1. Kevin Closson

    I never knew spinning media had a problem with large, sequential writes. But then I never knew MLC flash was optimized for large, sequential writes either. Hmmm…

  2. Anantha

    @Kevin, Greg said Smart Flash Logging (not Flash Cache) is an Exadata only feature. Oracle recognizes that spinning disk can do streaming writes well. In general every disk in an Exadata hosts online redo/archived redo/data/index/temp data you can still have to pay the penalty of seeking to the right part of the disk. All they are doing is issuing a parallel write and acking the write whenever one of them completes, if disk wins then so be it.

    BTW, your tone has decidedly turned anti-Oracle since leaving the nest ;-)

  3. Greg Rahn

    Darius asked if Smart Flash Log was Exadata only or if it was like Database Smart Flash Cache, which works on any Oracle install. I commented on the first part of that question, Kevin just elaborated on the later part.

  4. Kevin Closson

    Anantha,

    I’m not decidedly anti-Oracle. I am, however, decidedly anti-hubris and that is why I left Oracle. I didn’t agree with the ever-increasing over-selling and mis-positioning of products like Exadata when I worked in that development organization any more than I do now. I couldn’t speak freely about my position on those matters then but I can now.

    This is Greg’s blog and he does not share my position though so I’ll leave it at that.

  5. robin chatterjee

    Hi Kevin, isn’t the flash in exadata generally 2 to 3 times faster than disk for writes ? here I guess since log entries are flushed much more frequently than database block there should be an advantage to faster write speeds for these kind of writes.

  6. Kevin Closson

    Hello Robin,

    Exadata Smart Flash Cache random accesses are much faster than mechanical drive accesses, yes. Redo is sequential I/O. But, beside that point, no matter what is done to accelerate Redo on Exadata it still cannot sustain more than 25,000 random writes (DBWR) in a full rack. It is read-optimized platform, not an OLTP/ERP platform.

  7. Greg Rahn

    From experience many ERP apps are much more read than write, even as much as 90% read/10% write and many OLTP are certainly majority read as well. To qualify what you are saying: if the application is DBWR write intensive, then any platform that can not buffer those writes with memory will perform significantly worse than those that can or those that use memory based storage.

  8. Anantha

    @Kevin, it must have been h*ll. I wonder how you spent all those years writing gloriously about Oracle DB at Oracle.

    Every product is oversold by the sales team, that’s their job. It is my responsibility as the customer to sieve through and make my evaluation. I can always think of many scenarios to make Exadata perform poorly, for that matter any server/appliance. The key is to find the weak spot and squeeze it until it keels over.

    We bought Exas because it gave us a prebuilt solution, at a reasonable premium, for OLTP that needed HA. It was a nightmare getting a good RAC cluster built out at the 3-letter outsourcer. For that alone it was worth it for us.

    For all your rhetoric about why Exa is a poor match for OLTP our experience has been good. I’ve spoken to other customers that are seeing good results as well. Know that not all of us are geniuses that can ‘roll our own’ and tune endlessly, we settle for prebuilt. That is fine by me.

  9. Kevin Closson

    @Anantha , it is apparent that you are taking things personally. You’re hunting me down as evidenced by the droplet you left on my blog http://kevinclosson.wordpress.com/2012/01/30/emc-oracle-related-reading-material-of-interest/#comment-38165 . I let your comment through on my blog and my friend Greg has here too so you are being heard.

    I suspect you didn’t actually *read* my “glorious” writings. If you did you’d see how I consistently and concisely pointed out the limitations from day-one.

    Understanding the limitations are important. All technology has limitations.

    Consider the limitations I wrote of in the Winter paper http://kevinclosson.wordpress.com/2009/02/03/announcing-a-wintercorp-paper-about-oracle-exadata-storage-server/ . I showed that serial execution left storage underutilized by 66% for 40% of the processing period. Crystal clear. My IOUG webcasts were the same style (available from my blog). Ever seen my level-headedness about DBFS? Didn’t think so. See: https://docs.google.com/open?id=1YyRduWMt3dZAQTUlMtoTTByNp9YJgPxglQqbfZFWHnnzw2Cxw-ZLF9LzOUHc I could go on and on but I think I would be wasting my time…because you’ve taken the topic personally. But you shouldn’t.

    I don’t think buying Exadata makes anyone a fool. I fully understand the motivations for buying Exadata. In fact, the reason you site has been what I routinely state as the biggest single value add of Exadata–it’s pre-built. You are correct. RAC is tremendously difficult to get running–full stop. It’s for that reason I always espoused the NFS storage provisioning model for RAC. If you can mount an NFS volume you’re pretty much there.

    You, and those like you, are not fools for espousing Exadata–especially for the reason you state. Not to mention it pads the resume.

    I happen to think there are better pre-built solutions that are RAC-ready. I especially call out VCE vBlock not because it has EMC in it, but because it is fully integrated and has large-memory Xeon 5600 nodes (384GB) available and a lot less lock-in.

    May I ask if you’ve found the need to provision flash grid disk yet? Would you ever post an AWR harvested from the peak utilization of your busiest Exadata configuration? Would you otherwise stop taking the topic personally?

  10. Anantha

    We haven’t had a need for flash grid disk, yet. BTW, technology discussions are never personal to me. I have always believed no matter what choice I make today I’m wrong in 5 years. I’m at a point in my career where I no longer need to pad my resume, and didn’t have that need in the past as well.

    As far as the AWR, I will check and post. I’m always willing to learn from the masters of the craft :-)

  11. Costel Toba

    I see that the technical argument is losing strength around hypothetical facts.
    Fact(s):
    - One F20 card used in Exadata for flash cache can sustain 88 K IOPS for Random Write (4K) , specs herehttp://www.oracle.com/us/products/servers-storage/storage/flash-storage/f20-data-sheet-403555.pdf
    - One full Exadata Rack has 56 F20 cards, that provides the theoretical throughput of 4.9million IOPS for Random Write (4K)
    The point that needs to be taken from this article is that with Exadata Smart Flash Logging the spinning disk bottleneck is eliminated. In fact the days of the spinning disk technologies are beginning to darken, and, no offence, the future is not that bright for the spinning disk vendors as well.

  12. vibhu

    Hi all ,
    what is the exact meaning in the given below deatil, Please describe more .

    configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
    Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
    requiring any additional or special administrator intervention beyond what would normally be
    needed for recovery of the database from redo logs

    Thanks
    vibhu

  13. Pingback: Exadata Smart Flash Cache – a note of understanding | Saurabh K. Gupta's Oracle Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s