[back to Introduction]
If you want to build a house that will stand the test of time, you need to build on a solid foundation. The same goes for architecting computer systems that run databases. If the underlying hardware is not sized appropriately it will likely lead to people blaming software. All too often I see data warehouse systems that are poorly architected for the given workload requirements. I frequently tell people, “you can’t squeeze blood from a turnip“, meaning if the hardware resources are not there for the software to use, how can you expect the software to scale?
Undersizing data warehouse systems has become an epidemic with open platforms – platforms that let you run on any brand and configuration of hardware. This problem has been magnified over time as the size of databases have grown significantly, and generally outpacing the experience of those managing them. This has caused the “big three” database vendors to come up with suggested or recommended hardware configurations for their database platforms:
- Oracle: Optimized Warehouse Initiative
- Microsoft: SQL Server Fast Track Data Warehouse
- IBM: Balanced Configuration Unit (BCU)
Simply put, the reasoning behind those initiatives was to help customers architect systems that are well balanced and sized appropriately for the size of their data warehouse.
Balanced Hardware Configurations
The foundation for a well performing data warehouse (or any system for that matter) is the hardware that it runs on. There are three main hardware resources to consider when sizing your data warehouse hardware. Those are:
- Number of CPUs
- Number of storage devices (HDDs or SSDs)
- I/O bandwidth between CPUs and storage devices
NB: I’ve purposely left off memory (RAM) as most systems are pretty well sized at 2GB or 4GB per CPU core these days.
A balanced system has the following characteristics:
As you can see, each of the three components are sized proportionally to each other. This allows for the max system throughput capacity as no single resource will become the bottleneck before any other. This was one of the critical design decisions that went into the Oracle Database Machine.
Most DBAs and System Admins know what the disk capacity numbers are for their systems, but when it comes to I/O bandwidth or scan rates, most are unaware of what the system is capable of in theory, let alone in practice. Perhaps I/O bandwidth utilization should be included in the system metrics that are collected for your databases. You do collect system metrics, right?
There are several “exchanges” that data must flow through from storage devices to host CPUs, many of which could become bottlenecks. Those include:
- Back-end Fibre Channel loops (the fibre between the drive shelves and the storage array server processor)
- Front-end Fibre Channel ports
- Storage array server processors (SP)
- Host HBAs
One should understand the throughput capacity of each of these components to ensure that one (or more) of them do not restrict the flow of data to the CPUs prematurely.
Unbalanced Hardware Configurations
All too frequently systems are not architected as balanced systems and the system ends up being constrained in one of the following three scenarios:
From the production systems that I have seen, the main deficiency is in I/O bandwidth (both I/O Channel and HDD). I believe there are several reasons for this. First, too many companies capacity plan for their data warehouse based on the size the data occupies on disk alone. That is, they purchase the number of HDDs for the system based on the drive capacity, not on the I/O bandwidth requirement. Think of it like this: If you were to purchase 2 TB of mirrored disk capacity (4 TB total) would you rather purchase 28 x 146 GB drives or 14 x 300 GB drives (or even 4 x 1 TB drives)? You may ask: Well, what is the difference (other than price); in each case you have the same net capacity, correct? Indeed, both configurations do have the same capacity, but I/O bandwidth (how fast you can read data off the HDDs) is proportional to the number of HDDs, not the capacity. Thus it should be slightly obvious then that 28 HDDs can deliver 2X the disk I/O bandwidth that 14 HDDs can. This means that it will take 2X as long to read the same amount of data off of 14 HDDs as 28 HDDs.
Unfortunately what tends to happen is that the bean counter types will see only two things:
- The disk capacity (space) you want to purchase (or the capacity that is required)
- The price per MB/GB/TB
This is where someone worthy of the the title systems architect needs to stand up and explain the concept of I/O bandwidth and the impact it has on data warehouse performance (your systems architect does know this, correct?). This is generally a difficult discussion because I/O bandwidth is not a line item on a purchase order, it is a derived metric that requires both thought and engineering (which means someone had to do some thinking about the requirements for this system!).
When sizing the hardware for your data warehouse consider your workload and understand following (and calculate numbers for them!):
- What rate (in MB/GB per second) can the CPUs consume data?
- What rate can storage devices produce data (scan rate)?
- What rate can the data be delivered from the storage array(s) to the host HBAs?
If you are unable to answer these questions in theory then you need to sit down and do some calculations. Then you need to use some micro benchmarks (like Oracle ORION) and prove out those calculations. This will give you the “speed limit” and an metric by which you can measure your database workload against. All computer systems much obey the laws of physics! There is no way around that.
Kevin Closson has several good blog posts on related topics including:
- SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!
- Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?
as well as numerous others.
Oracle Documentation References:
At the 2009 Oracle OpenWorld Unconference back in October I lead a chalk and talk session entitled The Core Performance Fundamentals Of Oracle Data Warehousing. Since this was a chalk and talk I spared the audience any powerpoint slides but I had several people request that make it into a presentation so they could share it with others. After some thought, I decided that a series of blog posts would probably be a better way to share this information, especially since I tend to use slides as a speaking outline, not a condensed version of a white paper. This will be the first of a series of posts discussing what I consider to be the key features and technologies behind well performing Oracle data warehouses.
As an Oracle database performance engineer who has done numerous customer data warehouse benchmarks and POCs over the past 5+ years, I’ve seen many data warehouse systems that have been plagued with problems on nearly every DBMS commonly used in data warehousing. Interestingly enough, many of these systems were facing many of the same problems. I’ve compiled a list of topics that I consider to be key features and/or technologies for Oracle data warehouses:
Core Performance Fundamental Topics
- Balanced Hardware Configuration
- Table Compression
- Parallel Execution
- Data Loading
- Row vs. Set Processing
- Indexing and Materalized Views
In the upcoming posts, I’ll deep dive into each one of these topics discussing why these areas are key for a well performing Oracle data warehouse. Stay tuned…
The Solaris x86-64 port of Oracle Database 11g Release 2 can now be downloaded from OTN. Get it while it’s hot!
The internet buzz seems to be that Larry Ellison, CEO, Oracle Corporation and John Fowler, EVP, Sun Microsystems, Inc. will be announcing a new product, the world’s first OLTP database machine with Sun’s brand new FlashFire technology on Tuesday, September 15, 2009, 1 p.m. PT.
Both Sun and Oracle have Webcast invitations on their websites:
I plan on being at the Oracle Conference Center for the launch and will try and Tweet the highlights. First Oracle Database 11g Release 2, now an OLTP database machine. Are there more innovations up Oracle’s sleeve? I guess we’ll have to wait and see.
In catching up on blog posts I see that Jonathan Lewis, Christian Antognini and Nuno Souto picked up on the deferred segment creation new feature in Oracle 11gR2. In keeping with the theme, I thought I’d put together the top 10 new features in Oracle Database 11g Release 2 (11.2) that I consider significant.
- Analytic Functions 2.0
- Recursive WITH Clause
- Preprocessing Data for ORACLE_LOADER Access Driver in External Tables
- In-Memory Parallel Execution
- Auto Degree of Parallelism (Auto DOP) and Parallel Statement Queuing
- Significant Performance Improvement of MV On-Commit Fast Refresh
- Database Resource Manager Instance Caging
- ASM Intelligent Data Placement
- Database File System (DBFS)
- Hybrid Columnar Compression
In future posts I’ll dive into some of these in more technical detail but for now I thought I’d throw my list out there to raise awareness of the things I am looking at as a database performance engineer.
Just a quick post to spread the news that Oracle Database 11g Release 2 is now generally available. Download it for Linux today. See the press release for the usual details.
Here is an interesting tidbit of info from a PC World article:
The new release is the product of some 1,500 developers and 15 million hours of testing, according to Mark Townsend, vice president of database product management.
Update: In reading Lowering your IT Costs with Oracle Database 11g Release 2 I noticed this interesting tidbit:
With Oracle Datanase 11g Release 2, the Exadata Storage servers also enable new hybrid columnar compression technology that provides up to a 10 times compression ratio, without any loss of query performance. And, for pure historical data, a new archival level of hybrid columnar compression can be used that provides up to 40 times compression ratios.
Hybrid columnar compression is a new method for organizing how data is stored. Instead of storing the data in traditional rows, the data is grouped, ordered and stored one column at a time. This type of format offers a higher degree of compression for data that is loaded into a table for data warehousing queries. Different compression techniques can also be used on different partitions within a table.
Cool, Exadata now has column organized storage. Things are certainly getting very interesting in Exadata land.
Be sure to check out the Oracle Database 11g Release 2 New Features Guide for more goodies.
Update 2: For those attending OOW 2009, there will be a session on the new technology – Oracle’s Hybrid Columnar Compression: The Next-Generation Compression Technology (S311358) on Tuesday 10/13/2009 13:00 – 14:00.
Oracle and HP have taken back the #1 spot by setting a new performance record in the 1TB TPC-H benchmark. The HP/Oracle result puts the Oracle database ahead of both the Exasol (currently #2 & #3) and ParAccel (currently #4) results in the race for performance at the 1TB scale factor and places Oracle in the >1 million queries per hour (QphH) club, which is no small achievement. Compared to the next best result from HP/Oracle (currently #5), this result has over 9X the query throughput (1,166,976 QphH vs. 123,323 QphH) at around 1/4 the cost (5.42 USD vs. 20.54 USD) demonstrating significantly more performance for the money.
Some of the interesting bits from the hardware side:
- 4 HP BladeSystem c7000 Enclosures
- 64 HP ProLiant BL460c Servers
- 128 Quad-Core Intel Xeon X5450 “Harpertown” Processors (512 cores)
- 2TB Total System Memory (RAM)
- 6 HP Oracle Exadata Storage Servers
As you can see, this was a 64 node Oracle Real Application Cluster (RAC), each node having 2 processors (8 cores). This is also the first TPC-H benchmark from Oracle that used Exadata as the storage platform.
Congratulation to the HP/Oracle team on the great accomplishment!