Automatic DB_FILE_MULTIBLOCK_READ_COUNT
Note: Originally this experiment was from a post I wrote on the Oracle Forum: Database – General. I recommend that you read Jonathan Lewis’ summarization of the thread instead of reading all 671 posts (as of today). You will spend much less time and get more out of the discussion.
One of the new features that was released in 10gR2 is the automatic DB_FILE_MULTIBLOCK_READ_COUNT. Below are portions from the documentation that describe this feature.
Oracle Database 10g New Features
The DB_FILE_MULTIBLOCK_READ_COUNT parameter controls the amount of block prefetching done in the buffer cache during scan operations, such as full table scan and index fast full scan. The value of this parameter can have a significant impact on the overall database performance. This feature enables Oracle Database to automatically select the appropriate value for this parameter depending on the operating system optimal I/O size and the size of the buffer cache.
This feature simplifies manageability by automating the tuning of DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter.
Oracle Database Performance Tuning Guide
This parameter specifies the number of blocks that are read in a single I/O during a full table scan or index fast full scan. The optimizer uses the value of DB_FILE_MULTIBLOCK_READ_COUNT to cost full table scans and index fast full scans. Larger values result in a cheaper cost for full table scans and can result in the optimizer choosing a full table scan over an index scan. If this parameter is not set explicitly (or is set is 0), the optimizer will use a default value of 8 when costing full table scans and index fast full scans.
Be Aware of the Bug
Although the documentation states:
If this value is not set explicitly (or is set to 0)…
there is a bug (5768025) if one sets DB_FILE_MULTIBLOCK_READ_COUNT to 0. This will result in making all muti-block I/O requests 1 block (db file sequential read), thus completely disabling the advantage of DB_FILE_MULTIBLOCK_READ_COUNT. Be aware!!! My recommendation: just don’t set it if you want to enable it.
Read I/O Request Size
Currently, the maximum read I/O request size that Oracle can issue to the OS is 1 Megabyte (1MB). The equation for the maximum read I/O request size from the Oracle database is db_file_multiblock_read_count * db_block_size. For example, if you are using a db_block_size of 8192 (8k) and db_file_multiblock_read_count is set to 64 the maximum read size request would be 8192 * 64 = 524,288 bytes or 0.5MB. One could set db_file_multiblock_read_count = 128 to achieve a 1MB read size, but that is the absolute maximum possible.
The advantage of using the automatic DB_FILE_MULTIBLOCK_READ_COUNT is that the database can leverage the benefits of a large read I/O request size without over influencing the cost based optimizer toward full table scans.
The Experiment of Block Size and Automatic DB_FILE_MULTIBLOCK_READ_COUNT
The purpose of this experiment will be to provide metrics so we can answer the question:
Does block size have any impact on elapsed time for a FTS query with 100% physical I/Os when using the automatic DB_FILE_MULTIBLOCK_READ_COUNT?
The experiment:
- 4 identical tables, with block sizes of 2k, 4k, 8k and 16k
- DB_FILE_MULTIBLOCK_READ_COUNT will be unset, letting the Oracle database choose the best size
- cold db cache so forcing 100% physical reads
- ASM storage, so no file system cache
- query will be: select * from table;
For the data in the table I’m going to use the WEB_RETURNS (SF=100GB) table from TPC-DS. The flat file is 1053529104 bytes (~1GB) as reported from the ls command.
-- tablespace create statements create tablespace tpcds_8k datafile '+GROUP1' size 1500m; create tablespace tpcds_2k datafile '+GROUP1' size 1500m blocksize 2k; create tablespace tpcds_4k datafile '+GROUP1' size 1500m blocksize 4k; create tablespace tpcds_16k datafile '+GROUP1' size 1500m blocksize 16k; -- table create statements create table web_returns_8k tablespace tpcds_8k as select * from web_returns_et; create table web_returns_2k tablespace tpcds_2k as select * from web_returns_et; create table web_returns_4k tablespace tpcds_4k as select * from web_returns_et; create table web_returns_16k tablespace tpcds_16k as select * from web_returns_et; -- segment size select segment_name, sum(bytes)/1024/1024 mb from user_segments group by segment_name; SEGMENT_NAME MB -------------------- ---------- WEB_RETURNS_2K 976 WEB_RETURNS_4K 920 WEB_RETURNS_8K 896 WEB_RETURNS_16K 880 SQL> desc WEB_RETURNS_16K Name Null? Type ----------------------------------------- -------- ---------------------------- WR_RETURNED_DATE_SK NUMBER(38) WR_RETURNED_TIME_SK NUMBER(38) WR_ITEM_SK NUMBER(38) WR_REFUNDED_CUSTOMER_SK NUMBER(38) WR_REFUNDED_CDEMO_SK NUMBER(38) WR_REFUNDED_HDEMO_SK NUMBER(38) WR_REFUNDED_ADDR_SK NUMBER(38) WR_RETURNING_CUSTOMER_SK NUMBER(38) WR_RETURNING_CDEMO_SK NUMBER(38) WR_RETURNING_HDEMO_SK NUMBER(38) WR_RETURNING_ADDR_SK NUMBER(38) WR_WEB_PAGE_SK NUMBER(38) WR_REASON_SK NUMBER(38) WR_ORDER_NUMBER NUMBER(38) WR_RETURN_QUANTITY NUMBER(38) WR_RETURN_AMT NUMBER(7,2) WR_RETURN_TAX NUMBER(7,2) WR_RETURN_AMT_INC_TAX NUMBER(7,2) WR_FEE NUMBER(7,2) WR_RETURN_SHIP_COST NUMBER(7,2) WR_REFUNDED_CASH NUMBER(7,2) WR_REVERSED_CHARGE NUMBER(7,2) WR_ACCOUNT_CREDIT NUMBER(7,2) WR_NET_LOSS NUMBER(7,2)
I’m using a Pro*C program to execute each query and fetch the rows with an array size of 100. This way I don’t have to worry about spool space, or overhead of SQL*Plus formatting. I have 4 files that contain the queries for each of the 4 tables for each of the 4 block sizes.
Output from a run is such:
BEGIN_TIMESTAMP QUERY_FILE ELAPSED_SECONDS ROW_COUNT ----------------- -------------------------------- --------------- --------- 20080604 22:22:19 2.sql 125.696083 7197670 20080604 22:24:25 4.sql 125.439680 7197670 20080604 22:26:30 8.sql 125.502804 7197670 20080604 22:28:36 16.sql 125.251398 7197670
As you can see, no matter what the block size, the execution time is the same (discounting fractions of a second).
The TKPROF Output
Below is the TKPROF output from each of the 4 executions.
TKPROF: Release 11.1.0.6.0 - Production on Wed Jun 4 22:35:07 2008 Copyright (c) 1982, 2007, Oracle. All rights reserved. Trace file: v11_ora_12162.trc Sort options: default ******************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ******************************************************************************** /* 2.sql */ select * from web_returns_2k call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 71978 25.39 26.42 493333 560355 0 7197670 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 71980 25.39 26.42 493333 560355 0 7197670 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 50 Rows Row Source Operation ------- --------------------------------------------------- 7197670 TABLE ACCESS FULL WEB_RETURNS_2K (cr=560355 pr=493333 pw=493333 time=88067 us cost=96149 size=770150690 card=7197670) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 71980 0.00 0.16 SQL*Net message from client 71980 0.00 93.20 db file sequential read 3 0.00 0.01 direct path read 1097 0.04 0.13 SQL*Net more data to client 71976 0.00 1.88 ******************************************************************************** /* 4.sql */ select * from web_returns_4k call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.00 0.00 0 0 0 0 Execute 2 0.00 0.03 0 0 0 0 Fetch 71978 24.98 25.92 232603 302309 0 7197670 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 71982 24.98 25.96 232603 302309 0 7197670 Misses in library cache during parse: 0 Parsing user id: 50 Rows Row Source Operation ------- --------------------------------------------------- 7197670 TABLE ACCESS FULL WEB_RETURNS_4K (cr=302309 pr=232603 pw=232603 time=84876 us cost=51644 size=770150690 card=7197670) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 71981 0.00 0.15 SQL*Net message from client 71981 0.00 93.19 db file sequential read 2 0.00 0.01 direct path read 1034 0.02 0.19 SQL*Net more data to client 71976 0.00 1.85 rdbms ipc reply 1 0.03 0.03 ******************************************************************************** /* 8.sql */ select * from web_returns_8k call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.00 0.00 0 0 0 0 Execute 2 0.00 0.01 0 0 0 0 Fetch 71978 24.61 25.71 113157 183974 0 7197670 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 71982 24.61 25.73 113157 183974 0 7197670 Misses in library cache during parse: 0 Parsing user id: 50 Rows Row Source Operation ------- --------------------------------------------------- 7197670 TABLE ACCESS FULL WEB_RETURNS_8K (cr=183974 pr=113157 pw=113157 time=85549 us cost=31263 size=770150690 card=7197670) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 71981 0.00 0.15 SQL*Net message from client 71981 0.00 93.32 db file sequential read 1 0.01 0.01 direct path read 999 0.01 0.17 SQL*Net more data to client 71976 0.00 1.83 rdbms ipc reply 1 0.01 0.01 ******************************************************************************** /* 16.sql */ select * from web_returns_16k call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 71978 24.74 25.59 55822 127217 0 7197670 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 71980 24.74 25.59 55822 127217 0 7197670 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 50 Rows Row Source Operation ------- --------------------------------------------------- 7197670 TABLE ACCESS FULL WEB_RETURNS_16K (cr=127217 pr=55822 pw=55822 time=82996 us cost=21480 size=770150690 card=7197670) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 71980 0.00 0.15 SQL*Net message from client 71980 0.00 93.39 db file sequential read 1 0.00 0.00 direct path read 981 0.01 0.16 SQL*Net more data to client 71976 0.00 1.84 ********************************************************************************
Raw Trace File Metrics
select FILE_ID,TABLESPACE_NAME from dba_data_files where TABLESPACE_NAME like 'TPC%' FILE_ID TABLESPACE_NAME ---------- --------------- 16 TPCDS_8K 17 TPCDS_2K 18 TPCDS_4K 19 TPCDS_16K 2k: WAIT #2: nam='direct path read' ela= 37 file number=17 first dba=33280 block cnt=512 obj#=55839 tim=1212643347820647 4k: WAIT #2: nam='direct path read' ela= 33 file number=18 first dba=16640 block cnt=256 obj#=55840 tim=1212643474070675 8k: WAIT #1: nam='direct path read' ela= 30 file number=16 first dba=8320 block cnt=128 obj#=55838 tim=1212643599631927 16k:WAIT #2: nam='direct path read' ela= 39 file number=19 first dba=55040 block cnt=64 obj#=55841 tim=1212643838893785
The raw trace file shows us that for each block size the reads are optimized to 1MB. For example, with a 2k block, 512 blocks are read at a time. The cnt= is the number of blocks read with a single multi-block read.
|
Block Size
|
MBRC
|
I/O Size
|
|---|---|---|
|
2,048
|
512
|
1MB
|
|
4,096
|
256
|
1MB
|
|
8,192
|
128
|
1MB
|
|
16,384
|
64
|
1MB
|
So What Does This Experiment Demonstrate?
When using the automatic DB_FILE_MULTIBLOCK_READ_COUNT, it actually is not the blocksize that really matters, but the I/O request size. More importantly, the Oracle database can decide the optimal MBRC no matter what the blocksize, demonstrating there is no advantage to a larger (or even smaller) blocksize in this case.
Think of it like this: If I grab $100 from a bucket of coins given these rules:
- with each grab, exactly $1 is retrieved
- the same denomination of coin is always retrieved for a given “run”
- the time to complete the task is only related to the number of grabs, not the number of coins obtained
Regardless of the denomination of the coins grabbed, I need to grab 100 times. I could grab 4 quarters, or 10 dimes or 20 nickels or 100 pennies and each grab “performs” the same.
11gR1, Data Warehousing, Execution Plans, Optimizer, Oracle, Performance, SQL Tuning, Statistics
Trackbacks
Comments
I think an important “miss” in the documentation is that if system statistics are gathered then the value of MBRC from sys.aux_stats$ is used to determine the multiblock read size. Since it’s recommended that those stats are collected I’m surprised that the docs don’t emphasize that more.
Hi Greg,
I think that your test is not appropriate. There are serial direct reads because of 11g version. Size of diect reads is driven by _db_file_direct_io_count parameter (1MB), but NOT by db_file_multiblock_read_count. If you run your tests at 10g database, you will see the “db file scattered read” events for which db_file_multiblock_read_count really works.
Hi
You write “This will result in making all muti-block I/O requests 1 block (db file sequential read),”. I think it should have been “db file scatter read” since this is multi block operation in oracle. The “db file sequential read” is always a singe block read.
Size of direct reads is not driven by _db_file_direct_io_count but db_file_multiblock_read_count (mbrc), just like buffer cache reads.
However mbrc is not necessarily automatically set to a value leading to 1MB ios. It is set to min(_max_io_size , buffers/sessions). This can lead to strange values (meaning not a power of 2) so this is something to monitor. Also it indicates that with a fixed buffer cache size, there is more “chance” to have a smaller value for mbrc with a bigger default block size (as we have less buffers with a bigger block size).
I think Greg’s experiment is not about the type of read (direct/cached) but about the impact of various block sizes on scan performance when using a fixed io size.
Peter-
If you set
db_file_multiblock_read_count=0and hit the bug, what normally would be a multi-block read (FTS/FFS) will be 1 block reads and thus a “db file sequential read”. In this case one will never see a “db file scatter read” because only 1 block is ever read. Give it a try and I think you will see what I mean. Maybe I was unclear by using 1 block and multi-block read in relation to each other.Vyacheslav-
My test is valid as pointed out by Vlad: the test is not about direct vs. cached reads. I specifically mention in my test case that I am forcing 100% physical reads. This is by intent. The point of this experiment is that it is not the block size that matters, rather the I/O size. You can do the experiment in 10g and not use direct reads, and the same observations can be made. Now, since 11g did introduce the serial direct read, the results should not be compared between versions, but the same behavior can still be observed. It just falls under a different event name.
Greg, Vlad,
Thank you for clarification.
Hi Greg,
>> the Oracle database can decide the optimal MBRC no matter what the blocksize, demonstrating there is no advantage to a larger (or even smaller) blocksize in this case.
It would be interesting to observe the impact of the back-end RAID (or ASM stripe size) on this test.
Also, let’s not forget that this only applies to full-scan reads, and MBRC will not help many OLTP systems.
@Donald K. Burleson
>> It would be interesting to observe the impact of the back-end RAID (or ASM stripe size) on this test.
What would be interesting about it?
What metrics would you want to observe?
What would you expect to happen?
I don’t think it is any secret that when reading a significant amount of data (e.g FTS) that larger I/Os are more efficient. MBRC exists for this exact reason. Unfortunately there are many layers that I/Os can be fragmented before the I/O actually gets to the physical disk. Sometimes they are reassembled at the storage level, but not always. This doesn’t present a problem in itself per se, but it does lower the efficiency of each physical disk. Some storage also uses read-ahead/prefetch algorithms to accelerate such scans.
>> let’s not forget that this only applies to full-scan reads
I don’t think anyone is forgetting. I clearly stated this experiment was run with 100% physical I/O and a FTS. I would hope that neither of these would be present in an OLTP database. This is most applicable to a DSS workload (partition scans) where large reads are common.
With respect to OLTP db’s running ERP or MRP type apps, I can say I’ve observed:
They aren’t completely OLTP, there’s some amount of batching.
Some FTS (and Index FFS) happen due to purposeful trade-offs on whether an index is there. Some issues also happen by accident or functional creep.
So, your test is a little bit applicable to OLTP too, and thank you for it, and thanks to Oracle for making the best of odd situations by default (absent bugs, of course).
Greg,
Has any recent experimentation been done with supporting >1MB for MBRC?
Many moons ago, on a now dead & buried Oracle port, we raised the MBRC limit to 4MB and saw real benefit for OLAP/DSS/DW work that hoovered up large chunks of data. Given the advances in the h/w since then, this might be worth revisiting.
@Doug
I am not sure if there has been. I agree that there have been some advances in hardware over the years and it may be of value to revisit it. If I find anything out, I’ll post it here.
Hey, you can all refer to my website http://www.orainsights.com to see some new insights about db_file_multiblock_read_count and 11g
Ido