The Core Performance Fundamentals Of Oracle Data Warehousing – Data Loading

[back to Introduction]

Getting flat file data into your Oracle data warehouse is likely a daily (or more possibly frequent) task, but it certainly does not have to be a difficult one.  Bulk loading data rates are governed by the following operations and hardware resources:

  1. How fast can the data be read
  2. How fast can data be written out
  3. How much CPU power is available

I’m always a bit amazed (and depressed) when I hear people complain that their data loading rates are slow and they proceed to tell me things like:

  • The source files reside on a shared NFS filer (or similar) and it has just a single GbE (1 Gigabit Ethernet) network path to the Oracle database host(s).
  • The source files reside on this internal disk volume which consists of a two disk mirror (or a volume with very few spindles).

Maybe it’s not entirely obvious so let me spell it out (as I did in this tweet):

One can not load data into a database faster than it can be delivered from the source. Database systems must obey the laws of physics!

Or putting it another way: Don’t fall victim to slow data loading because of a slow performing data source.

Given a system that can provide data at fast enough rates, the loading rate becomes a factor of point #2 and #3. The database operations can be simplified to:

  1. Read lines from flat files
  2. Process lines into columns, internal data types and optionally compress
  3. Write rows/columns out to disk

In most cases a reasonably size system becomes CPU bound, not write bound, on data loads as almost all Oracle data warehouses use compression which increases CPU consumption but reduces the IO requirement for the writes.  Or putting it another way:  Bulk loading into a compressed table should be a CPU bound operation, not a disk (write) bound operation.

Data Loading Best Practices (What To Do and Why To Do It)

Oracle offers two methods to load data from flat files: 1) SQL*Loader and 2) External Tables.  I would highly recommend that bulk loads (especially PDML loads) be done via External Tables and SQL*Loader only be used for non-parallel loads (PARALLEL=false) with small amounts of data (not bulk loads).  The high level reason for this recommendation is that External Tables have nearly all the SQL functionality of a heap table and allow numerous more optimizations than SQL*Loader does and there are some undsireable side effects (mostly in the space management layer) from using PARALLEL=true with SQL*Loader.

In order to avoid the reading of the flat files being the bottleneck, use a filesystem that is backed by numerous spindles (more than enough to provide the desired loading rate) and consider using compressed files in conjunction with the external table preprocessor.  Using the preprocessor is especially useful if  there are proportionally more CPU resources on the database system than network or disk bandwidth because the use of compression on the source files allows for a larger logical row “delivery” for a given file size.  Something that may not be obvious either is to put the flat files on a filesystem that is mounted using directio mount options.  This will eliminate the file system cache being flooded with data that will (likely) never be read again (how many times do you load the same files?).  Another option that becomes available with Oracle 11.2 is DBFS (database filesystem) and is what is frequently used with the Oracle Database Machines & Exadata which is a fast and scalable solution for staging flat files.

In order to achieve the best loading speeds be sure to:

  • Use External Tables
  • Use a staging filesystem (and network) fast enough to meet your loading speed requirements (and consider directio mount options)
  • Use Parallel Execution (parallel CTAS or PDML INSERT)
  • Use direct-path loads (nologging CTAS or INSERT /*+ APPEND */)
  • Use a large enough initial/next extent size (8MB is usually enough)

If you follow these basic recommendations you should be able to achieve loading speeds that easily meet your requirements (otherwise you likely just need more hardware).

Loading Data At Ludicrous Speed

I’ve yet to come across a reasonably designed system that is capable of becoming write bound as systems simply either 1) do not have enough CPU to do so or 2) are unable to read the source flat files anywhere near fast enough to do so.  I have, however, conducted experiments to test write throughput of a Sun Oracle Database Machine (Exadata V2) by using flat files cached completely in the filesystem cache and referencing them numerous times in the External Table DDL. The results should be quite eye opening for many, especially those who think the Oracle database can not load data fast.  Loading into an uncompressed table, I was able load just over 1TB of flat file data (over 7.8 billion rows) in a mear 4.6 minutes (275 seconds).  This experiment does not represent typical loading speed rates as it’s unlikely the source files are on a filesystem as fast as main memory, but it does demonstrate that if the flat file data could be delivered at such rates, the Oracle software and hardware can easily load it at close to physics speed (the max speed the hardware is capable of).

SQL> create table fastload
  2  pctfree 0
  3  parallel
  4  nologging
  5  nocompress
  6  storage(initial 8m next 8m)
  7  tablespace ts_smallfile
  8  as
  9  select * from et_fastload;

Table created.

Elapsed: 00:04:35.49

SQL> select count(*) from fastload;

     COUNT(*)
-------------
7,874,466,950

Elapsed: 00:01:06.54

SQL> select ceil(sum(bytes)/1024/1024) mb from user_segments where segment_name='FASTLOAD';

       MB
---------
1,058,750

SQL> exec dbms_stats.gather_table_stats(user,'FASTLOAD');

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks,avg_row_len from user_tables where table_name='FASTLOAD';

  NUM_ROWS     BLOCKS AVG_ROW_LEN
---------- ---------- -----------
7874466950  135520008         133

Just so you don’t think I’m making this stuff up, check out the SQL Monitor Report for the execution, noting the IO throughput graph from the Metrics tab (10GB/s write throughput isn’t half bad).

So as you can see, flat file data loading has really become more of a data delivery problem rather than a data loading problem.  If the Oracle Database, specifically the Exadata powered Oracle Database Machine, can bulk load data from an external table whose files reside in the filesystem cache at a rate of 13TB per hour (give or take), you probably don’t have to worry too much about meeting your data loading rate business requirements (wink).

Note: Loading rates will vary slightly depending on table definition, number of columns, data types, compression type, etc.

References

17 comments

  1. Statistique

    “Loading into an uncompressed table, I was able load just over 1TB of flat file data (over 7.8 billion rows) in a mear 4.6 minutes (275 seconds). ”

    It is pretty impressive indeed !

  2. Karl Arao

    Hi Greg,

    Nice post.. I noticed on the report http://structureddata.org/files/fastload_sqlmon.html on the Activity Tab that you’ve got Average Active Sessions (AAS) way more than the #of CPUs.

    I also see this kind of behavior on other ETL workloads (with parallelism), but these environments only got 16CPUs and I’m seeing AAS to be at around 60-90 when doing massive Insert job.. Yours is reaching max of 320 (but doing CTAS)..

    I also noticed that ETL workloads, are usually attributed by
    1) HIGH “CPU” with HIGH “PX%”, and “Direct Path%” events
    But it could also be
    2) LOW “CPU” with HIGH “PX%”, and “Direct Path%” events

    What can you say about this?
    also with the high AAS value composed mainly of “PX%”, and “Direct Path%” events

  3. zvika

    i posted a question in your last post and then read this post .
    i still think that my system is write bounded as i increased the parallel dop of the external table with no improvment.
    here is the question i posted :
    thanks a lot for this post.
    we are trying to make a process scalable with parallel execution but with no luck.
    we insert from external table (dop 8) to our fact table ( partition by range and subpartition by list , dop 10)
    we are using insert with append and our insert is for 2 tables (using insert all syntax) with error logging table.
    this process today is not scalable as the data grow we will still do the same throughput (about 10k rows per seconds)
    i tried to use parallel dml but i saw that even when oracle open px threads only one thread is inserting the table.
    so it seems that without changing the application (maybe splitting the data or something) i can’t make this insert scalable .
    am i right ?
    Zvika

  4. Greg Rahn

    @Karl Arao

    The number of active sessions can be more than the number of CPUs. In this case I have 16 Nahalem-EP processors with SMT enabled for a total of 128 logical CPU cores (64 physical cores) so it seems this graph isn’t 100% correct – the “CPU cores” line should be at 128, not 64. I’m running at a DOP of 256 with a single slave set, so there are 256 PX Servers working on this load. I’m not sure exactly why this is, but note the amount of area over 256 seems to identically match the gap below, so I suspect it may be related to the sampling or something. I’ll see if I can get a better answer from the code owners.

    I can say that there is too little information to go on to make any reasonable comments. Need more information.

  5. Greg Rahn

    @Kevin Closson

    It was normal redundancy, and you can see the 2x diff (for the mirror payload) in the SQL Monitor Report Metrics tab of Write Bytes vs IO Interconnect Bytes. Write Bytes tracks only the primary extent while IO Interconnect Bytes includes the IO for the mirror extent.

  6. Kevin Closson

    Yep, I know. I just suspected that there may be a reader or two that have not shined their golden magic decoder ring to account for mirrored writes on Exadata :-)

  7. Pingback: Bloemlezing 17 « De Kadenzer Courant
  8. Pingback: Log Buffer #187, a Carnival of the Vanities for DBAs | The Pythian Blog
  9. Stefano

    Also, a simple trick during ELT for DW population: remember to put indexes in unusable state before data load and transformation and then, at the end, enable again indexes.

  10. Ofir Manor

    Hi Greg,
    this is just mind blowing…
    Do you understand what happens in the last 70 secs of the SQL?
    CPU goes to 1, writes have ended etc… looks a bit wierd… otherwise the numbers would be even nicer :)

  11. Pingback: Seeing Exadata in action « Karl Arao's Blog
  12. Andy Thomp

    Thanks for the load suggestions. But if I am doing load from Oracle into Exadata. How can I get the data into flat files quickly? spool is slow like hell, even pro C is not sufficient for few terabytes
    Do you recommend any tool for that?

  13. João Skrock

    Could you please explain better the phrase
    “SQL*Loader does and there are some undsireable side effects (mostly in the space management layer)”?

    I will need to use SQL*Loader (with PARALLEL=TRUE) because we have LOBs.

  14. Greg Rahn

    @João Skrock

    When you use LOBs, you have no other choice. The reason not to use PARALLEL=TRUE, especially with a large number of processes, is that there can be lots of extents created and the metadata overhead can slow down queries later.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s