Upscaling Your Database Application Performance: The Array Interface

Personally I believe the array interface is one of the most overlooked methods to increase database application scalability. Any time an application is selecting or inserting more than a single row, performance benefits are generally observed by using the array interface. The Oracle array interface exists for Oracle Call Interface (OCI), PL/SQL, and JDBC.

Designing Applications For Performance And Scalability tells us:

When more than one row is being sent between the client and the server, performance can be greatly enhanced by batching these rows together in a single network roundtrip rather than having each row sent in individual network roundtrips. This is in particular useful for INSERT and SELECT statements, which frequently process multiple rows and the feature is commonly known as the array interface.

To use the array interface, the client application will need to represent data as arrays rather than individual variables containing the data of a single row. For queries, most APIs perform automated array processing such that a client side fetch will return rows from an automatically buffered array fetch. This is known as prefetching. For INSERT statements, most APIs require the application code actually contain the array in the client side and use an array version of the bind and execute calls.

To demonstrate the performance benefit of using the array interface I wrote two simple Java programs: batchInsert.java and rowPrefetch.java.

JDBC Array Inserts

I used batchInsert.java to insert 10,000 rows into the EMP table varying the batch size from 1 up to 50. Below is a graph of the elapsed times at each batch size.
JDBC Update Batching Performance
As one can see by the results, leveraging the array interface for INSERT statements has significant performance gains, even with a small batch size, compared to the single row operation. Update Batching is discussed in more detail in the JDBC Developer’s Guide.

JDBC Array Selects

I ran some tests using rowPrefetch.java which executes a SELECT against the 10,000 row EMP table without any predicate. The prefetch batch size was tested at values between 1 and 10. Below is a graph of the elapsed times.
JDBC Prefetch Performance
Setting the prefetch batch size to 5 yielded almost a 2x gain in response time compared to the single row operation. Please see the Row Prefetching section of the JDBC Developer’s Guide for more details on the topic.

Summary

The simple tests that were performed demonstrate that using the array interface yields noticable performance gains for both INSERT and SELECT statements. The amount of performance gain will vary by application, but I’m most certain that gains will be observed if the operation is on more than a single row. One should notice that leveraging the array interface not only reduces elapsed time, but also reduces the amount of CPU the operation consumes. That sounds like a win-win to me and why I consider it an important part of database application scalability. After all, the JDBC array interface is not in the Performance Extensions chapter for nothing, right?

Test Environment

The simple tests I performed were run on a single CPU host running Windows XP Pro, Oracle 10.2.0.3. If your application doesn’t reside on the same host as the database, it is likely that you may see even greater gains do to the reduction of the network overhead. As always, your mileage may vary.

16 comments

  1. Dominic

    You preaching to the converted here. Just wish I could influence my Java developers more.

    Did you compare the batch performance to a class using an Oracle object type?

    I know that when at a previous company we evaluated a prototype of batch versus bulk using oracle object type and either FORALL or just a SELECT … FROM TABLE … CAST the latter was significantly quicker again than the batch with similar array sizes. Only trouble is that you’ve got a whole bunch of ugly fiddling with StructDescriptors and/or ArrayDescriptors that just looks bad.

  2. Greg Rahn

    I hope notes like this one help educate Java database developers that there are programming design decisions that have much more influence over performance compared to what a DBA can do after the application is written.

    I didn’t do any tests to compare what you mentioned but I may look into it.

  3. Pingback: Upscaling your JDBC app « OraStory
  4. Dominic

    Hi Greg,

    Took me a while because my Java skills are rusty and were poor before they became rusty.

    But have put together a comparison for the straight Oracle object type insert as mentioned above.

    Here

    Hope that’s ok.

    My batch insert performance was pretty similar to yours. In comparison the Oracle collection of object type was twice the speed of the fastest batch.

    Cheers,
    Dominic

  5. jan van mourik

    Nice, very interesting!
    I’d like to see the tkprof output for a few testcases too (or Hotsos profiler if possible). Did you trace these testcases?

    Regards, jan

  6. jan van mourik

    Thanks Greg. It was also part laziness on my side. Since I’m not a java programmer I had some trouble getting it to work…
    Btw, my results inserting 10,000 rows (emp + pk, no fk):
    Batch size Time(s)
    1 652
    10 65
    20 32
    40 16
    50 14
    100 7
    200 3.8
    400 3.47
    500 2.89
    1000 2.7

    Notice how it scales up to batch size 200!!

    Thanks Greg!!!

    jan

  7. jan van mourik

    Ran a few more tests, this time connecting to the database on my laptop (10g). Also, I added a table EMP2, with a FK from emp2.mgr to emp.empno. So I’m using two preparedStatements. This to see if the batching would cause any problems with the FK. The rows inserted look the same otherwise, and I’m inserting the same number of rows in both tables. The results:

    10,000 rows into both emp and emp2
    batch elapsed in sec
    1 3.266 sec
    10 0.563
    50 0.328
    100 0.297
    200 0.453
    1000 0.437

    100,000 rows
    1 40.9 sec
    100 3.344

    1,000,000 rows
    1 376.532 sec
    100 45
    200 36.922
    500 32.875
    1000 29.563

    Big difference from my previous test, where I accessed a database on another machine over a very slow network. Nice…

    jan

  8. Greg Rahn

    @oracle tom

    I am not familiar with the native JDBC 2.0 implementation method, so I would suggest that you try both and see how they compare. Just turn sql trace on in your test app and then tkprof the trace file.

  9. Chris

    @Greg Rahn
    I believe that the batching API available with JDBC 2.0 onwards ‘batches’ DML statements by putting a BEGIN and END around them, hence submitting them to the database in an anonymous PL/SQL block. IMHO techniques to reduce network round trips, which the JDBC batching API and array interface help to achieve, should be added to the scalability and performance “Holy trinity” of connection management, cursor management and good schema design.

  10. Chris

    I’ve looked at the attached Java code and this is using the JDBC batch API available with JDBC 2.0 onwards. I thought this might have been the native array interface, this is faster as it allows arrays of values to be passed to stored procedures, which can then use bulk binds. Unsurprisingly there is a good example on Ask Tom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:712625135727. Both techniques eliminate network round trips, but the second will be a lot more cursor friendly. The obvious advantage of the JDBC 2.0 batch API is that it is portable and most ORMs (object relational mapping) tools and frameworks support this. As far as I’m aware use native array interface requires hand cranked JDBC code.

  11. Pingback: How to set OCI row prefetch count in oraacess.xml in Oracle 12.1 | Pierre blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s