The Core Performance Fundamentals Of Oracle Data Warehousing – Table Compression

[back to Introduction]
Editor’s note: This blog post does not cover Exadata Hybrid Columnar Compression.

The first thing that comes to most people’s mind when database table compression is mentioned is the savings it yields in terms of disk space. While reducing the footprint of data on disk is relevant, I would argue it is the lesser of the benefits for data warehouses. Disk capacity is very cheap and generally plentiful, however, disk bandwidth (scan speed) is proportional to the number of spindles, no mater what the disk capacity and thus is more expensive. Table compression reduces the footprint on the disk drives that a given data set occupies so the amount of physical data that must be read off the disk platters is reduced when compared to the uncompressed version. For example, if 4000 GB of raw data can compress to 1000 GB, it can be read off the same disk drives 4X as fast because it is reading and transferring 1/4 of the data off the spindles (relative to the uncompressed size). Likewise, table compression allows for the database buffer cache to contain more data without having to increase the memory allocation because more rows can be stored in a compressed block/page compared to an uncompressed block/page.

Row major table compression comes in two flavors with the Oracle database: BASIC and OLTP.  In 11.1 these were also known by the key phrases COMPRESS or COMPRESS FOR DIRECT_LOAD OPERATIONS and COMPRESS FOR ALL OPERATIONS.  The BASIC/DIRECT_LOAD compression has been part of the Oracle database since version 9 and ALL OPERATIONS/OLTP compression was introduced in 11.1 with the Advanced Compression option.

Oracle row major table compression works by storing the column values for a given block in a symbol table at the beginning of the block. The more repeated values per block, even across columns, the better the compression ratio. Sorting data can increase the compression ratio as ordering the data will generally allow more repeat values per block. Specific compression ratios and gains from sorting data are very data dependent but compression ratios are generally between 2x and 4x.

Compression does add some CPU overhead when direct path loading data, but there is no measurable performance overhead when reading data as the Oracle database can operate on compressed blocks directly without having to first uncompress the block. The additional CPU required when bulk loading data is generally well worth the down wind gains for data warehouses. This is because most data in a well designed data warehouse is write once, read many times. Insert only and infrequently modified tables are ideal candidates for BASIC compression.  If the tables have significant DML performed against them, then OLTP compression would be advised (or no compression).

Given that most Oracle data warehouses that I have seen are constrained by I/O bandwidth (see Balanced Hardware Configuration) it is highly recommended to leverage compression so the logical table scan rate can increase proportionally to the compression ratio.  This will result in faster table and partition scans on the same hardware.

Oracle Documentation References:

10 comments

  1. Sachin

    Nice post!
    One question: When does the decompression process takes place? in memory after reading the data? is sequential/scattered read knows the data blocks that needs to be retrieved? most of the reads even in warehouse are index reads from fact tables .. so will the indexes would have the rowids updated?

  2. Pete Scott

    Compression has long been in my top 3 must haves for a DW (with partitions and bitmap indexes – but with more Exadata work I am starting to revise that list)

    Sometimes what to the quick glance is an insert only table may in fact be updated – take slowly changing dimension tables where the “old” record needs to be closed off (updated) when a new record for the same dimensional key comes in. But the inventive mind can always come up with ways to get around that :-) – and without resorting to advanced compression.
    Sadly, “updateable” facts are quite common – for example processes that run through multiple steps (e.g support calls changing states (open, call back, resolve, close)) or stock movement tracking, but again there are ways to avoid the need to locate and update, for example the 11g pivot operator can be used to rotate a sequence of state changes for a call_id, and this is surprisingly efficient – especially so if table design (indexing and partitioning for example) can minimise the blocks accessed to pivot the result set

  3. Greg Rahn

    @Pete Scott

    There will always be times where tables need to be updated (e.g. SCD), however, with compression bulk update operations should be avoided. Updates 1) don’t mix well with compression and 2) are logging operations and thus slower than nologging inserts. IMO updatable facts are a symptom poor design (as you mention “sadly”) – events in the past don’t change, new events take place and should be appended to the table of events.

  4. Pete Scott

    @Greg Rahn
    Exactly – we avoid having to make facts update by either using a rotated view if that is needed or employ a ‘status’ dimension and go the traditional appended fact route.
    For SCDs we have been known to bulk write to a new table and partition exchange it back… but at the at the end of the day it is working out what is best for the typical dataload pattern

  5. Pingback: The Core Performance Fundamentals Of Oracle Data Warehousing – Introduction
  6. Dominic Brooks

    Hi Greg,

    Just to let you know that sadly the last three documentation references you mention are no longer at their original locations you mentioned.

    I did a very quick search (because I wanted to read the material mentioned myself) and the only one which seemed to still be easily locatable was Advanced Compression with 11gR2:
    http://www.oracle.com/technetwork/database/features/storage/advanced-compression-whitepaper-130502.pdf despite links all over the place (including internally within Oracle) to the other docs.

    Cheers,
    Dominic

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s