Upscaling Your Database Application Performance: The Array Interface

April 30th, 2007

Personally I believe the array interface is one of the most overlooked methods to increase database application scalability. Any time an application is selecting or inserting more than a single row, performance benefits are generally observed by using the array interface. The Oracle array interface exists for Oracle Call Interface (OCI), PL/SQL, and JDBC.

Designing Applications For Performance And Scalability tells us:

When more than one row is being sent between the client and the server, performance can be greatly enhanced by batching these rows together in a single network roundtrip rather than having each row sent in individual network roundtrips. This is in particular useful for INSERT and SELECT statements, which frequently process multiple rows and the feature is commonly known as the array interface.

To use the array interface, the client application will need to represent data as arrays rather than individual variables containing the data of a single row. For queries, most APIs perform automated array processing such that a client side fetch will return rows from an automatically buffered array fetch. This is known as prefetching. For INSERT statements, most APIs require the application code actually contain the array in the client side and use an array version of the bind and execute calls.

To demonstrate the performance benefit of using the array interface I wrote two simple Java programs: batchInsert.java and rowPrefetch.java.

JDBC Array Inserts
I used batchInsert.java to insert 10,000 rows into the EMP table varying the batch size from 1 up to 50. Below is a graph of the elapsed times at each batch size.
JDBC Update Batching Performance
As one can see by the results, leveraging the array interface for INSERT statements has significant performance gains, even with a small batch size, compared to the single row operation. Update Batching is discussed in more detail in the JDBC Developer’s Guide.

JDBC Array Selects
I ran some tests using rowPrefetch.java which executes a SELECT against the 10,000 row EMP table without any predicate. The prefetch batch size was tested at values between 1 and 10. Below is a graph of the elapsed times.
JDBC Prefetch Performance
Setting the prefetch batch size to 5 yielded almost a 2x gain in response time compared to the single row operation. Please see the Row Prefetching section of the JDBC Developer’s Guide for more details on the topic.

Summary
The simple tests that were performed demonstrate that using the array interface yields noticable performance gains for both INSERT and SELECT statements. The amount of performance gain will vary by application, but I’m most certain that gains will be observed if the operation is on more than a single row. One should notice that leveraging the array interface not only reduces elapsed time, but also reduces the amount of CPU the operation consumes. That sounds like a win-win to me and why I consider it an important part of database application scalability. After all, the JDBC array interface is not in the Performance Extensions chapter for nothing, right?

Test Enviroment
The simple tests I performed were run on a single CPU host running Windows XP Pro, Oracle 10.2.0.3. If your application doesn’t reside on the same host as the database, it is likely that you may see even greater gains do to the reduction of the network overhead. As always, your mileage may vary.

Post to Twitter Tweet This Post

Performance

  1. Trackbacks

  1. Comments

  2. April 30th, 2007 at 08:45 | #1

    You preaching to the converted here. Just wish I could influence my Java developers more.

    Did you compare the batch performance to a class using an Oracle object type?

    I know that when at a previous company we evaluated a prototype of batch versus bulk using oracle object type and either FORALL or just a SELECT … FROM TABLE … CAST the latter was significantly quicker again than the batch with similar array sizes. Only trouble is that you’ve got a whole bunch of ugly fiddling with StructDescriptors and/or ArrayDescriptors that just looks bad.

  3. April 30th, 2007 at 12:55 | #2

    I hope notes like this one help educate Java database developers that there are programming design decisions that have much more influence over performance compared to what a DBA can do after the application is written.

    I didn’t do any tests to compare what you mentioned but I may look into it.

  4. May 1st, 2007 at 08:20 | #3

    Hi Greg,

    Took me a while because my Java skills are rusty and were poor before they became rusty.

    But have put together a comparison for the straight Oracle object type insert as mentioned above.

    Here

    Hope that’s ok.

    My batch insert performance was pretty similar to yours. In comparison the Oracle collection of object type was twice the speed of the fastest batch.

    Cheers,
    Dominic

  5. Suying
    June 2nd, 2007 at 21:10 | #4

    It’s quite useful to me. After testing, I will give more comments.

  6. thangadurai
    September 25th, 2007 at 22:33 | #5

    its very userful to all, who are working in both 10g And 11g databases.

  7. jan van mourik
    October 25th, 2007 at 10:35 | #6

    Nice, very interesting!
    I’d like to see the tkprof output for a few testcases too (or Hotsos profiler if possible). Did you trace these testcases?

    Regards, jan

  8. October 25th, 2007 at 11:00 | #7

    I don’t think I have the trace files any more but the code is there and one can easily run the test case. The batchInsert.java has the sql_trace line in it.

  9. jan van mourik
    October 26th, 2007 at 12:21 | #8

    Thanks Greg. It was also part laziness on my side. Since I’m not a java programmer I had some trouble getting it to work…
    Btw, my results inserting 10,000 rows (emp + pk, no fk):
    Batch size Time(s)
    1 652
    10 65
    20 32
    40 16
    50 14
    100 7
    200 3.8
    400 3.47
    500 2.89
    1000 2.7

    Notice how it scales up to batch size 200!!

    Thanks Greg!!!

    jan

  10. jan van mourik
    October 27th, 2007 at 12:32 | #9

    Ran a few more tests, this time connecting to the database on my laptop (10g). Also, I added a table EMP2, with a FK from emp2.mgr to emp.empno. So I’m using two preparedStatements. This to see if the batching would cause any problems with the FK. The rows inserted look the same otherwise, and I’m inserting the same number of rows in both tables. The results:

    10,000 rows into both emp and emp2
    batch elapsed in sec
    1 3.266 sec
    10 0.563
    50 0.328
    100 0.297
    200 0.453
    1000 0.437

    100,000 rows
    1 40.9 sec
    100 3.344

    1,000,000 rows
    1 376.532 sec
    100 45
    200 36.922
    500 32.875
    1000 29.563

    Big difference from my previous test, where I accessed a database on another machine over a very slow network. Nice…

    jan

  11. June 30th, 2008 at 03:57 | #10

    Is there a difference in performance if the JDBC 2.0 batching or the oracle implemntation is used? (in java)

  12. July 2nd, 2008 at 21:53 | #11

    @oracle tom

    I am not familiar with the native JDBC 2.0 implementation method, so I would suggest that you try both and see how they compare. Just turn sql trace on in your test app and then tkprof the trace file.

  13. Chris
    August 24th, 2009 at 13:51 | #12

    @Greg Rahn
    I believe that the batching API available with JDBC 2.0 onwards ‘batches’ DML statements by putting a BEGIN and END around them, hence submitting them to the database in an anonymous PL/SQL block. IMHO techniques to reduce network round trips, which the JDBC batching API and array interface help to achieve, should be added to the scalability and performance “Holy trinity” of connection management, cursor management and good schema design.

  14. Chris
    September 23rd, 2009 at 14:08 | #13

    I’ve looked at the attached Java code and this is using the JDBC batch API available with JDBC 2.0 onwards. I thought this might have been the native array interface, this is faster as it allows arrays of values to be passed to stored procedures, which can then use bulk binds. Unsurprisingly there is a good example on Ask Tom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:712625135727. Both techniques eliminate network round trips, but the second will be a lot more cursor friendly. The obvious advantage of the JDBC 2.0 batch API is that it is portable and most ORMs (object relational mapping) tools and frameworks support this. As far as I’m aware use native array interface requires hand cranked JDBC code.

  15. October 2nd, 2009 at 08:44 | #14

    @Chris
    In my comment above, I linked to an example done with the clumsy struct / array descriptor classes (clumsy in my non-existent java anyway).

    Here’s a link to that example – it was twice as fast as the timings for my batch size of 50.