As a member of the Real-World Performance Group at Oracle I have participated in quite a number of Exadata POCs over the past two years. Often times those POCs are constrained in a number of ways: time, schema/app modifications, etc., because the objective is a proof, not a full blown migration. As a result there is often significant performance that is left on the table just waiting to be fully exploited — the kind of performance that really makes a database performance engineer excited — mind blowing performance. This includes, but is not limited to, data model changes, SQL query modifications and re-engineering batch processes. The reason these types of modifications get me so excited is that design decisions are often influenced by the then current deployment platform and with the Exadata powered Oracle Database Machine those restrictions are frequently lifted. You see, with Exadata the rules change, and so should your design decisions. Sure, you could just pluck-and-plop an existing Oracle data warehouse database onto an Oracle Database Machine and it would likely run much faster than it does on your current system, and you will be wowed, but you very well may shouting four letter expletives describing how fast it is if you do some re-engineering. This is why I’d like to highlight (my emphasis) this quote from a recent Pythian news update:
Pythian provides LinkShare with consulting and technical expertise for the planning, configuration, deployment, management, administration and ongoing operational support of their migration project. This includes re-engineering the database, adjusting the data model, redefining table structures, creating new indexing schemes and re-writing and tuning SQL queries, among other tasks. The project is scheduled for completion later this year and the results will be unveiled at Oracle OpenWorld in September 2010.
Hats off to both Pythian and LinkShare for realizing that they can capitalize on the opportunity to re-engineer with Exadata and fully exploit the power of the Oracle Database Machine platform. I can’t wait until Oracle OpenWorld to hear just how awesome their performance deltas are. Don’t just shoot for a level 5 performance (porting only) increase with Exadata, do a little re-engineering and turn it all the way up to 11 for that extra push over the cliff, Spinal Tap style!
[back to Introduction]
Getting flat file data into your Oracle data warehouse is likely a daily (or more possibly frequent) task, but it certainly does not have to be a difficult one. Bulk loading data rates are governed by the following operations and hardware resources:
- How fast can the data be read
- How fast can data be written out
- How much CPU power is available
I’m always a bit amazed (and depressed) when I hear people complain that their data loading rates are slow and they proceed to tell me things like:
- The source files reside on a shared NFS filer (or similar) and it has just a single GbE (1 Gigabit Ethernet) network path to the Oracle database host(s).
- The source files reside on this internal disk volume which consists of a two disk mirror (or a volume with very few spindles).
Maybe it’s not entirely obvious so let me spell it out (as I did in this tweet):
One can not load data into a database faster than it can be delivered from the source. Database systems must obey the laws of physics!
Or putting it another way: Don’t fall victim to slow data loading because of a slow performing data source.
Given a system that can provide data at fast enough rates, the loading rate becomes a factor of point #2 and #3. The database operations can be simplified to:
- Read lines from flat files
- Process lines into columns, internal data types and optionally compress
- Write rows/columns out to disk
In most cases a reasonably size system becomes CPU bound, not write bound, on data loads as almost all Oracle data warehouses use compression which increases CPU consumption but reduces the IO requirement for the writes. Or putting it another way: Bulk loading into a compressed table should be a CPU bound operation, not a disk (write) bound operation.
Data Loading Best Practices (What To Do and Why To Do It)
Oracle offers two methods to load data from flat files: 1) SQL*Loader and 2) External Tables. I would highly recommend that bulk loads (especially PDML loads) be done via External Tables and SQL*Loader only be used for non-parallel loads (PARALLEL=false) with small amounts of data (not bulk loads). The high level reason for this recommendation is that External Tables have nearly all the SQL functionality of a heap table and allow numerous more optimizations than SQL*Loader does and there are some undsireable side effects (mostly in the space management layer) from using PARALLEL=true with SQL*Loader.
In order to avoid the reading of the flat files being the bottleneck, use a filesystem that is backed by numerous spindles (more than enough to provide the desired loading rate) and consider using compressed files in conjunction with the external table preprocessor. Using the preprocessor is especially useful if there are proportionally more CPU resources on the database system than network or disk bandwidth because the use of compression on the source files allows for a larger logical row “delivery” for a given file size. Something that may not be obvious either is to put the flat files on a filesystem that is mounted using directio mount options. This will eliminate the file system cache being flooded with data that will (likely) never be read again (how many times do you load the same files?). Another option that becomes available with Oracle 11.2 is DBFS (database filesystem) and is what is frequently used with the Oracle Database Machines & Exadata which is a fast and scalable solution for staging flat files.
In order to achieve the best loading speeds be sure to:
- Use External Tables
- Use a staging filesystem (and network) fast enough to meet your loading speed requirements (and consider directio mount options)
- Use Parallel Execution (parallel CTAS or PDML INSERT)
- Use direct-path loads (nologging CTAS or INSERT /*+ APPEND */)
- Use a large enough initial/next extent size (8MB is usually enough)
If you follow these basic recommendations you should be able to achieve loading speeds that easily meet your requirements (otherwise you likely just need more hardware).
Loading Data At Ludicrous Speed
I’ve yet to come across a reasonably designed system that is capable of becoming write bound as systems simply either 1) do not have enough CPU to do so or 2) are unable to read the source flat files anywhere near fast enough to do so. I have, however, conducted experiments to test write throughput of a Sun Oracle Database Machine (Exadata V2) by using flat files cached completely in the filesystem cache and referencing them numerous times in the External Table DDL. The results should be quite eye opening for many, especially those who think the Oracle database can not load data fast. Loading into an uncompressed table, I was able load just over 1TB of flat file data (over 7.8 billion rows) in a mear 4.6 minutes (275 seconds). This experiment does not represent typical loading speed rates as it’s unlikely the source files are on a filesystem as fast as main memory, but it does demonstrate that if the flat file data could be delivered at such rates, the Oracle software and hardware can easily load it at close to physics speed (the max speed the hardware is capable of).
SQL> create table fastload 2 pctfree 0 3 parallel 4 nologging 5 nocompress 6 storage(initial 8m next 8m) 7 tablespace ts_smallfile 8 as 9 select * from et_fastload; Table created. Elapsed: 00:04:35.49 SQL> select count(*) from fastload; COUNT(*) ------------- 7,874,466,950 Elapsed: 00:01:06.54 SQL> select ceil(sum(bytes)/1024/1024) mb from user_segments where segment_name='FASTLOAD'; MB --------- 1,058,750 SQL> exec dbms_stats.gather_table_stats(user,'FASTLOAD'); PL/SQL procedure successfully completed. SQL> select num_rows,blocks,avg_row_len from user_tables where table_name='FASTLOAD'; NUM_ROWS BLOCKS AVG_ROW_LEN ---------- ---------- ----------- 7874466950 135520008 133
Just so you don’t think I’m making this stuff up, check out the SQL Monitor Report for the execution, noting the IO throughput graph from the Metrics tab (10GB/s write throughput isn’t half bad).
So as you can see, flat file data loading has really become more of a data delivery problem rather than a data loading problem. If the Oracle Database, specifically the Exadata powered Oracle Database Machine, can bulk load data from an external table whose files reside in the filesystem cache at a rate of 13TB per hour (give or take), you probably don’t have to worry too much about meeting your data loading rate business requirements (wink).
Note: Loading rates will vary slightly depending on table definition, number of columns, data types, compression type, etc.
[back to Introduction]
Leveraging Oracle’s Parallel Execution (PX) in your Oracle data warehouse is probably the most important feature/technology one can use to speed up operations on large data sets. PX is not, however, “go fast” magic pixi dust for any old operation (if thats what you think, you probably don’t understand the parallel computing paradigm). With Oracle PX, a large task is broken up into smaller parts, sub-tasks if you will, and each sub-task is then worked on in parallel. The goal of Oracle PX: divide and conquer. This allows a significant amount of hardware resources to be engaged in solving a single problem and is what allows the Oracle database to scale up and out when working with large data sets.
I though I’d touch on some basics and add my observations but this is by far not an exhaustive write up on Oracle’s Parallel Execution. There is an entire chapter in the Oracle Database documentation on PX as well as several white papers. I’ve listed all these in the Resources section at the bottom of this post. Read them, but as always, feel free to post questions/comments here. Discussion adds great value.
A Basic Example of Parallel Execution
Consider a simple one table query like the one below.
You can see that the PX Coordinator (also known as the Query Coordinator or QC) breaks up the “work” into several chunks and those chunks are worked on by the PX Server Processes. The technical term for the chunk a PX Server Process works on is called a granule. Granules can either be block-based or partition-based.
When To Use Parallel Execution
PX is a key component in data warehousing as that is where large data sets usually exist. The most common operations that use PX are queries (SELECTs) and data loads (INSERTs or CTAS). PX is most commonly controlled by using the PARALLEL attribute on the object, although it can be controlled by hints or even Oracle’s Database Resource Manager. If you are not using PX in your Oracle data warehouse than you are probably missing out on a shedload of performance opportunity.
When an object has its PARALLEL attribute set or the PARALLEL hint is used queries will leverage PX, but to leverage PX for DML operations (INSERT/DELETE/UPDATE) remember to alter your session by using the command:
alter session [enable|force] parallel dml;
Do Not Fear Parallel Execution
Since Oracle’s PX is designed to take advantage of multiple CPUs (or CPU cores) at a time, it can leverage significant hardware resources, if available. From my experiences in talking with Oracle DBAs, the ability for PX to do this scares them. This results in DBAs implementing a relatively small degree of parallelism (DOP) for a system that could possibly support a much higher level (based on #CPUs). Often times though, the system that PX is being run on is not a balanced system and frequently has much more CPU power than disk and channel bandwidth, so data movement from disk becomes the bottleneck well before the CPUs are busy. This results in many statements like “Parallel Execution doesn’t work” or similar because the user/DBA isn’t observing a decrease in execution time with more parallelism. Bottom line: if the hardware resources are not available, the software certainly can not scale.
Just for giggles (and education), here is a snippet from top(1) from a node from an Oracle Database Machine running a single query (across all 8 database nodes) at DOP 256.
top - 20:46:44 up 5 days, 3:48, 1 user, load average: 36.27, 37.41, 35.75 Tasks: 417 total, 43 running, 373 sleeping, 0 stopped, 1 zombie Cpu(s): 95.6%us, 1.6%sy, 0.0%ni, 2.2%id, 0.0%wa, 0.2%hi, 0.4%si, 0.0%st Mem: 74027752k total, 21876824k used, 52150928k free, 440692k buffers Swap: 16771852k total, 0k used, 16771852k free, 13770844k cached USER PID PR NI VIRT SHR RES S %CPU %MEM TIME+ COMMAND oracle 16132 16 0 16.4g 5.2g 5.4g R 63.8 7.6 709:33.02 ora_p011_orcl oracle 16116 16 0 16.4g 4.9g 5.1g R 60.9 7.2 698:35.63 ora_p003_orcl oracle 16226 15 0 16.4g 4.9g 5.1g R 59.9 7.2 702:01.01 ora_p028_orcl oracle 16110 16 0 16.4g 4.9g 5.1g R 58.9 7.2 697:20.51 ora_p000_orcl oracle 16122 15 0 16.3g 4.9g 5.0g R 56.9 7.0 694:54.61 ora_p006_orcl
(Quite the TIME+ column there, huh!)
In this post I’ve been a bit light on the technicals of PX, but that is mostly because 1) this is a fundamentals post and 2) there is a ton of more detail in the referenced documentation and I really don’t feel like republishing what already exists. Bottom line, Oracle Parallel Execution is a must for scaling performance in your Oracle data warehouse. Take the time to understand how to leverage it to maximize performance in your environment and feel free to start a discussion here if you have questions.
[back to Introduction]
Partitioning is an essential performance feature for an Oracle data warehouse because partition elimination (or partition pruning) generally results in the elimination of a significant amount of table data to be scanned. This results in a need for less system resources and improved query performance. Someone once told me “the fastest I/O is the one that never happens.” This is precisely the reason that partitioning is a must for Oracle data warehouses – it’s a huge I/O eliminator. I frequently refer to partition elimination as the anti-index. An index is used to find a small amount data that is required; partitioning is used to eliminate vasts amounts of data that is not required.
Main Uses For Partitioning
I would classify the main reasons to use partitioning in your Oracle data warehouse into these four areas:
- Data Elimination
- Partition-Wise Joins
- Manageability (Partition Exchange Load, Local Indexes, etc.)
- Information Lifecycle Management (ILM)
The most common partitioning design pattern found in Oracle data warehouses is to partition the fact tables by range (or interval) on the event date/time column. This allows for partition elimination of all the data not in the desired time window in queries. For example: If I have a fact table that contains point of sale (POS) data, each line item for a given transaction has a time stamp of when the item was scanned. Let’s say this value is stored in column EVENT_TS which is a DATE or TIMESTAMP data type. In most cases it would make sense to partition by range on EVENT_TS using one day partitions. This means every query that uses a predicate filter on EVENT_TS (which should be nearly every one) can eliminate significant amounts of data that is not required to satisfy the query predicate. If you want to look at yesterday’s sales numbers, there is no need to bring back rows from last week or last month!
Depending on the schema design of your data warehouse you may also chose to subpartition a table. This allows one to further segment a table to allow for even more data elimination or it can allow for partition-wise joins which allow for reduced usage of CPU and memory resources by minimizing the amount of data exchanged between parallel execution server processes. In third normal form (3NF) schemas it is very beneficial to use hash partitioning or subpartitioning to allow for partition-wise joins (see Oracle Parallel Execution: Interconnect Myths And Misunderstandings) for this exact reason. Dimensional models (star schemas) may also benefit from hash subpartitioning and partition-wise joins. Generally it is best to hash subpartition on a join key column to a very large dimension, like CUSTOMER, so that a partition-wise join will be used between the fact table and the large dimension table.
Managing large objects can be challenging for a variety of reasons which is why Oracle Partitioning allows for many operations to be done at a global or partition (or subpartition) level. This makes it much easier to deal with tables or indexes of large sizes. It also is transparent to applications so the SQL that runs against a non-partitioned object will run as-is against a partitioned object. Some of the key features include:
- Partition Exchange Load – Data can be loaded “out of line” and exchanged into a partitioned table.
- Local Indexes – It takes much less time to build local indexes than global indexes.
- Compression – Can be applied at the segment level so it’s possible to have a mix of compressed and non-compressed partitions.
- Segment Moves/Rebuilds/Truncates/Drops – Each partition (or subpartition) is a segment and can be operated on individually and independently of the other partitions in the table.
- Information Lifecycle Management (ILM) – Partitioning allows implementation of an ILM strategy.
I’d classify partitioning as a “must have” for Oracle data warehouses for both the performance and manageability reasons described above. Partitioning should lower query response time as well as resource utilization do to “smart” data access (only go after the data the query needs). There are additional partitioning design patterns as well and the Oracle documentation contains descriptions of them as well as examples.
Oracle Documentation References:
The first thing that comes to most people’s mind when database table compression is mentioned is the savings it yields in terms of disk space. While reducing the footprint of data on disk is relevant, I would argue it is the lesser of the benefits for data warehouses. Disk capacity is very cheap and generally plentiful, however, disk bandwidth (scan speed) is proportional to the number of spindles, no mater what the disk capacity and thus is more expensive. Table compression reduces the footprint on the disk drives that a given data set occupies so the amount of physical data that must be read off the disk platters is reduced when compared to the uncompressed version. For example, if 4000 GB of raw data can compress to 1000 GB, it can be read off the same disk drives 4X as fast because it is reading and transferring 1/4 of the data off the spindles (relative to the uncompressed size). Likewise, table compression allows for the database buffer cache to contain more data without having to increase the memory allocation because more rows can be stored in a compressed block/page compared to an uncompressed block/page.
Row major table compression comes in two flavors with the Oracle database: BASIC and OLTP. In 11.1 these were also known by the key phrases COMPRESS or COMPRESS FOR DIRECT_LOAD OPERATIONS and COMPRESS FOR ALL OPERATIONS. The BASIC/DIRECT_LOAD compression has been part of the Oracle database since version 9 and ALL OPERATIONS/OLTP compression was introduced in 11.1 with the Advanced Compression option.
Oracle row major table compression works by storing the column values for a given block in a symbol table at the beginning of the block. The more repeated values per block, even across columns, the better the compression ratio. Sorting data can increase the compression ratio as ordering the data will generally allow more repeat values per block. Specific compression ratios and gains from sorting data are very data dependent but compression ratios are generally between 2x and 4x.
Compression does add some CPU overhead when direct path loading data, but there is no measurable performance overhead when reading data as the Oracle database can operate on compressed blocks directly without having to first uncompress the block. The additional CPU required when bulk loading data is generally well worth the down wind gains for data warehouses. This is because most data in a well designed data warehouse is write once, read many times. Insert only and infrequently modified tables are ideal candidates for BASIC compression. If the tables have significant DML performed against them, then OLTP compression would be advised (or no compression).
Given that most Oracle data warehouses that I have seen are constrained by I/O bandwidth (see Balanced Hardware Configuration) it is highly recommended to leverage compression so the logical table scan rate can increase proportionally to the compression ratio. This will result in faster table and partition scans on the same hardware.
Oracle Documentation References:
- Performance Tuning Guide: Table Compression
- VLDB and Partitioning Guide: Partitioning and Table Compression
- Guidelines for Managing Tables: Consider Using Table Compression
- Table Compression in Oracle9i Release 2: A Performance Analysis
- Table Compression in Oracle Database 10g Release 2
- Advanced Compression with Oracle Database 11g Release 2
I tend to keep the content of this blog fairly technical and engineering focused, but every now and then I have to venture off and do an editorial post. Recently some of the ParAccel management decided to fire up the FUD machine on the ParAccel blog and take aim at Oracle’s Exadata making the following claims:
“There are 12 SAS disks in the storage server with a speed of about 75 MB/s [The SUN Oracle Exadata Storage Server datasheet claims 125 MB/s but we think that is far-fetched.]” -Rick Glick, Vice President of Technology and Architecture (link)
“We stand by the 75MB/sec as a conservative, reliable number. We see higher numbers in disk tests, but never anywhere near 125MB/sec.” -Barry Zane, Chief Technology Officer (link)
Far Fetched Or Fact?
As a database performance engineer, I strive to be extremely detailed and well researched with my work. Clearly, these comments from Rick and Barry were not well researched as is evident from information publicly available on the Internet.
The first bit of documentation I would research before making such comments would be the hard disk drive specification sheet. The 12 drives in the Exadata Storage Server, a Sun Fire X4275, are 3.5-inch 15K RPM SAS 2.0 6Gb/sec 600GB drives. Looking at the drive spec sheet, it clearly states that the sustained sequential read is 122 MB/sec (at ID) to 204 MB/sec (at OD) [that's Inner Diameter & Outer Diameter]. Seems to me that Oracle’s claim of 1500MB/s per Exadata Storage Server (125MB/s for each of the 12 SAS drives) is certainly between 122MB/s and 204MB/s.
Now granted, one might think that vendors overstate their performance claims, so it may be resourceful to search the Internet for some third party evaluation of this hard disk. I went to a fairly well known Internet search engine to try find more information using a highly sophisticated and complex set of search keywords. To my astonishment, there at the top of the search results page was a write up by a third party. I would encourage reading the entire article, but if you want to just skip to page 5 [Benchmarks - HD Tune Pro] you will be presented with data that shows the minimum (120MB/s), average (167MB/s) and maximum (200MB/s) read throughput for sequential read tests performed by the author for the hard disk drive in dispute. Looks to me that those numbers are completely in line with the Sun spec sheet – no over exaggeration going on here. At this point there should be exactly zero doubt that the drives themselves, with the proper SAS controller, are easily physically capable of 125MB/s read rates and more.
Stand By Or Sit Down?
As I see it, there are three possibilities:
- Disk vendors are overly optimistic in their continuous sequential read rates.
- The newer class of SAS2 compatible 15Krpm drives and controllers are faster than the older generation we’ve measured.
- Our disk access patterns are not getting all the available performance.
Let’s drill into each of these possibilities:
- Perhaps vendors are overly optimistic, but how overly optimistic could they possibly be? I mean, really, 125MB/s is easily between the spec sheet rates of 122MB/s and 204MB/s. Truly 75MB/s is a low ball number for these drives. Even Exadata V1 SAS drives more than 75MB/s per drive and the HDD is not the limiting factor in the scan throughput (a good understanding of the hardware components should lead you to what is). Even the Western Digital 300GB 10K RPM VelociRaptor disk drive has benchmarks that show a maximum sequential data transfer rate of more than 120 MB/s and sustain a minimum of 75MB/s even on the innermost part of the platter, and that is a SATA drive commonly used in PCs!
- Barry states that ParAccel has no experience nor metrics (measurements) with these drives or seemingly any drives like them, but yet Barry calls “75MB/sec as a conservative, reliable number”. Just how reliable of a number can it possibly be when you have exactly zero data points and zero experience with the HDDs in dispute? Is this a debate that can be won by strength of personality or does it actually require data, numbers and facts?
- Perhaps the ParAccel database has disk access patterns that can not drive the scan rates that Exadata can, but should one assert that because ParAccel database may not drive that IO rate, Exadata can’t, even when said rate is within the realm of physical capability? I certainly would think not. Not unless the intention is simply to promote FUD.
So, as I see it, there are exactly two possibilities: Either one has technical knowledge on what they are talking about (and they have actual data/facts to support it) or they do not and they are just making things up. At this point I think the answer is quite clear in this situation; Rick and Barry had no data to support their (incorrect) assertions.
And The Truth Shall Be Revealed
Three weeks after Barry’s “three possibilities” comment, Barry reveals the real truth:
…we [ParAccel] have gotten a number of newer servers with SAS2 drives…[and] the newer generation of disk drives are faster than my experience…Exadata’s claim of 1500MB/sec per server seems completely reasonable…My apologies for any confusion created.
As it has come to pass, my assertion that ParAccel had absolutely no experience and thus no data to support their claims is validated (not that I really had any doubts). Spreading FUD generally does cause unnecessary confusion, but then again, that is usually the intention. I would expect such nonsense from folks with marketing in their title, but I hold a higher bar for people with technology in their titles. This was a simple debate about physical disk drive characteristics (and not software) and that is something anyone could get concrete factual data on (assuming they actually take the time and effort).
And Isn’t It Ironic… Don’t You Think?
The same day I read Barry’s “truth comment” I also read Jerome Pineau’s blog post on social media marketing. I could not help but recognize (and laugh about) the irony of the situation. Jerome lists several tips on being successful in SMM and the first two really stood out to me:
- Do not profess expertise on topics you know little about. Eventually, it will show.
- Always remain honest. Never lie. Your most important asset is credibility. You can fix almost any mistake except credibility damage.
Truly, truly ironic…
Another year in the books and another year on the Structured Data blog. Hopefully 2009 treated you well and 2010 will bring good things in addition. I thought I’d throw a few Top 5 lists together to reminisce about 2009. Enjoy!
Top 5 Most Visited Blog Posts of 2009
- DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS
- Choosing An Optimal Stats Gathering Strategy
- Top 10 Oracle 11gR2 New Features
- Troubleshooting Bad Execution Plans
- Oracle 11g: Real-Time SQL Monitoring Using DBMS_SQLTUNE.REPORT_SQL_MONITOR
Top 5 Most Popular Search Queries of 2009
- structured data
- oracle 11gr2 new features
- oracle analytic functions
- dbms_stats method_opt
[back to Introduction]
If you want to build a house that will stand the test of time, you need to build on a solid foundation. The same goes for architecting computer systems that run databases. If the underlying hardware is not sized appropriately it will likely lead to people blaming software. All too often I see data warehouse systems that are poorly architected for the given workload requirements. I frequently tell people, “you can’t squeeze blood from a turnip“, meaning if the hardware resources are not there for the software to use, how can you expect the software to scale?
Undersizing data warehouse systems has become an epidemic with open platforms – platforms that let you run on any brand and configuration of hardware. This problem has been magnified over time as the size of databases have grown significantly, and generally outpacing the experience of those managing them. This has caused the “big three” database vendors to come up with suggested or recommended hardware configurations for their database platforms:
- Oracle: Optimized Warehouse Initiative
- Microsoft: SQL Server Fast Track Data Warehouse
- IBM: Balanced Configuration Unit (BCU)
Simply put, the reasoning behind those initiatives was to help customers architect systems that are well balanced and sized appropriately for the size of their data warehouse.
Balanced Hardware Configurations
The foundation for a well performing data warehouse (or any system for that matter) is the hardware that it runs on. There are three main hardware resources to consider when sizing your data warehouse hardware. Those are:
- Number of CPUs
- Number of storage devices (HDDs or SSDs)
- I/O bandwidth between CPUs and storage devices
NB: I’ve purposely left off memory (RAM) as most systems are pretty well sized at 2GB or 4GB per CPU core these days.
A balanced system has the following characteristics:
As you can see, each of the three components are sized proportionally to each other. This allows for the max system throughput capacity as no single resource will become the bottleneck before any other. This was one of the critical design decisions that went into the Oracle Database Machine.
Most DBAs and System Admins know what the disk capacity numbers are for their systems, but when it comes to I/O bandwidth or scan rates, most are unaware of what the system is capable of in theory, let alone in practice. Perhaps I/O bandwidth utilization should be included in the system metrics that are collected for your databases. You do collect system metrics, right?
There are several “exchanges” that data must flow through from storage devices to host CPUs, many of which could become bottlenecks. Those include:
- Back-end Fibre Channel loops (the fibre between the drive shelves and the storage array server processor)
- Front-end Fibre Channel ports
- Storage array server processors (SP)
- Host HBAs
One should understand the throughput capacity of each of these components to ensure that one (or more) of them do not restrict the flow of data to the CPUs prematurely.
Unbalanced Hardware Configurations
All too frequently systems are not architected as balanced systems and the system ends up being constrained in one of the following three scenarios:
From the production systems that I have seen, the main deficiency is in I/O bandwidth (both I/O Channel and HDD). I believe there are several reasons for this. First, too many companies capacity plan for their data warehouse based on the size the data occupies on disk alone. That is, they purchase the number of HDDs for the system based on the drive capacity, not on the I/O bandwidth requirement. Think of it like this: If you were to purchase 2 TB of mirrored disk capacity (4 TB total) would you rather purchase 28 x 146 GB drives or 14 x 300 GB drives (or even 4 x 1 TB drives)? You may ask: Well, what is the difference (other than price); in each case you have the same net capacity, correct? Indeed, both configurations do have the same capacity, but I/O bandwidth (how fast you can read data off the HDDs) is proportional to the number of HDDs, not the capacity. Thus it should be slightly obvious then that 28 HDDs can deliver 2X the disk I/O bandwidth that 14 HDDs can. This means that it will take 2X as long to read the same amount of data off of 14 HDDs as 28 HDDs.
Unfortunately what tends to happen is that the bean counter types will see only two things:
- The disk capacity (space) you want to purchase (or the capacity that is required)
- The price per MB/GB/TB
This is where someone worthy of the the title systems architect needs to stand up and explain the concept of I/O bandwidth and the impact it has on data warehouse performance (your systems architect does know this, correct?). This is generally a difficult discussion because I/O bandwidth is not a line item on a purchase order, it is a derived metric that requires both thought and engineering (which means someone had to do some thinking about the requirements for this system!).
When sizing the hardware for your data warehouse consider your workload and understand following (and calculate numbers for them!):
- What rate (in MB/GB per second) can the CPUs consume data?
- What rate can storage devices produce data (scan rate)?
- What rate can the data be delivered from the storage array(s) to the host HBAs?
If you are unable to answer these questions in theory then you need to sit down and do some calculations. Then you need to use some micro benchmarks (like Oracle ORION) and prove out those calculations. This will give you the “speed limit” and an metric by which you can measure your database workload against. All computer systems much obey the laws of physics! There is no way around that.
Kevin Closson has several good blog posts on related topics including:
- SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!
- Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?
as well as numerous others.
Oracle Documentation References:
The HP-UX Itanium and AIX (PPC64) ports of Oracle Database 11g Release 2 can now be downloaded from OTN. Happy Holidays!!!