The Core Performance Fundamentals Of Oracle Data Warehousing – Balanced Hardware Configuration

[back to Introduction]

If you want to build a house that will stand the test of time, you need to build on a solid foundation. The same goes for architecting computer systems that run databases. If the underlying hardware is not sized appropriately it will likely lead to people blaming software. All too often I see data warehouse systems that are poorly architected for the given workload requirements. I frequently tell people, “you can’t squeeze blood from a turnip“, meaning if the hardware resources are not there for the software to use, how can you expect the software to scale?

Undersizing data warehouse systems has become an epidemic with open platforms – platforms that let you run on any brand and configuration of hardware. This problem has been magnified over time as the size of databases have grown significantly, and generally outpacing the experience of those managing them. This has caused the “big three” database vendors to come up with suggested or recommended hardware configurations for their database platforms:

 

Simply put, the reasoning behind those initiatives was to help customers architect systems that are well balanced and sized appropriately for the size of their data warehouse.

Balanced Hardware Configurations

The foundation for a well performing data warehouse (or any system for that matter) is the hardware that it runs on. There are three main hardware resources to consider when sizing your data warehouse hardware. Those are:

  1. Number of CPUs
  2. Number of storage devices (HDDs or SSDs)
  3. I/O bandwidth between CPUs and storage devices

 

NB: I’ve purposely left off memory (RAM) as most systems are pretty well sized at 2GB or 4GB per CPU core these days.

A balanced system has the following characteristics:

dd459146image013en-us.jpg

 

As you can see, each of the three components are sized proportionally to each other. This allows for the max system throughput capacity as no single resource will become the bottleneck before any other. This was one of the critical design decisions that went into the Oracle Database Machine.

Most DBAs and System Admins know what the disk capacity numbers are for their systems, but when it comes to I/O bandwidth or scan rates, most are unaware of what the system is capable of in theory, let alone in practice. Perhaps I/O bandwidth utilization should be included in the system metrics that are collected for your databases. You do collect system metrics, right?

There are several “exchanges” that data must flow through from storage devices to host CPUs, many of which could become bottlenecks. Those include:

  • Back-end Fibre Channel loops (the fibre between the drive shelves and the storage array server processor)
  • Front-end Fibre Channel ports
  • Storage array server processors (SP)
  • Host HBAs

One should understand the throughput capacity of each of these components to ensure that one (or more) of them do not restrict the flow of data to the CPUs prematurely.

 

Unbalanced Hardware Configurations

All too frequently systems are not architected as balanced systems and the system ends up being constrained in one of the following three scenarios:

dd459146image011en-us.jpg

 

From the production systems that I have seen, the main deficiency is in I/O bandwidth (both I/O Channel and HDD). I believe there are several reasons for this. First, too many companies capacity plan for their data warehouse based on the size the data occupies on disk alone. That is, they purchase the number of HDDs for the system based on the drive capacity, not on the I/O bandwidth requirement. Think of it like this: If you were to purchase 2 TB of mirrored disk capacity (4 TB total) would you rather purchase 28 x 146 GB drives or 14 x 300 GB drives (or even 4 x 1 TB drives)? You may ask: Well, what is the difference (other than price); in each case you have the same net capacity, correct? Indeed, both configurations do have the same capacity, but I/O bandwidth (how fast you can read data off the HDDs) is proportional to the number of HDDs, not the capacity. Thus it should be slightly obvious then that 28 HDDs can deliver 2X the disk I/O bandwidth that 14 HDDs can. This means that it will take 2X as long to read the same amount of data off of 14 HDDs as 28 HDDs.

Unfortunately what tends to happen is that the bean counter types will see only two things:

  1. The disk capacity (space) you want to purchase (or the capacity that is required)
  2. The price per MB/GB/TB

This is where someone worthy of the the title systems architect needs to stand up and explain the concept of I/O bandwidth and the impact it has on data warehouse performance (your systems architect does know this, correct?). This is generally a difficult discussion because I/O bandwidth is not a line item on a purchase order, it is a derived metric that requires both thought and engineering (which means someone had to do some thinking about the requirements for this system!).

 

Summary

When sizing the hardware for your data warehouse consider your workload and understand following (and calculate numbers for them!):

  1. What rate (in MB/GB per second) can the CPUs consume data?
  2. What rate can storage devices produce data (scan rate)?
  3. What rate can the data be delivered from the storage array(s) to the host HBAs?

 

If you are unable to answer these questions in theory then you need to sit down and do some calculations. Then you need to use some micro benchmarks (like Oracle ORION) and prove out those calculations. This will give you the “speed limit” and an metric by which you can measure your database workload against. All computer systems much obey the laws of physics! There is no way around that.

Additional Readings

Kevin Closson has several good blog posts on related topics including:

as well as numerous others.

 

Oracle Documentation References:

 

22 comments

  1. Pingback: A Good Day ... - Doug's Oracle Blog
  2. Tanel Poder

    I liked the “You can’t squeeze blood from a turnip” saying.

    When I say something like this I’m blunt so people would remember – “You can’t make bread out of shit” ;-)

  3. Simon Haslam

    Thanks for a thought provoking article Greg. I’m glad there are more discussions about system performance planning again. Even topics like short-stroking of disks are back on the agenda (the number of times I’ve been regarded as a crank and told such things don’t matter with modern disks!).

    There is, however, a contra-argument that says the system may be imbalanced providing the over-sized parts are relatively cheap and recognised as being over-specified. At a purely hardware level, on systems that aren’t full to capacity, this tends to mean you will have more cores than I/O bandwidth, partly becomes cores come in bundles of 4/6/8+ and processors as 1/2/4+ (though of course CPU isn’t a good example if your database is licensed by processor!).

    When I used to work for a large SI in the ’90s it was commonplace for most projects to have a capacity planning phase, e.g. one guy I knew even wrote a book on the subject (ISBN-13: 978-0849327780), which considered the entire system at a high level. These days the approach more often seems to be “hardware’s cheap – we’ll upgrade it later if we need to”. Now that’s fine if you understand where a system is deficicent but unfortunately I’ve been to a number of sites where people have just randomly thrown in a hardware upgrade (most commonly memory to an I/O bound system) and then wondered why it made no difference…

  4. Pingback: Oracle Infogram: Podcasts, Optimizer, PeopleSoft, APEX, DBFS, Index Performance, Performance, EBS, Charity
  5. Greg Rahn

    @Simon Haslam
    Capacity planning has seemingly gone the way of weather forecasting from the days of old…the forecasts are guesses and are not accurate until the day arrives. Very unfortunate. The biggest issue I see is that IT shops just don’t build a realistic performance model and rarely have valid input for that model. They also don’t use realistic hardware constants, and generally focus on sizing the wrong resource. Surprisingly I’ve done some bar napkin straw man numbers for projects that are more accurate than “well researched” spreadsheet fodder. I guess that is something that comes with seeing and building lots of DW systems.

  6. Greg Rahn

    @Tanel Poder
    Reminds me of a quote a good friend and colleague of mine uses: “You’d think that would work…but you’d be wrong”. Blunt is the new black :)

  7. Karl Arao

    Thanks for this Greg! I know this post is first of the series.. I suggest you also put the “balanced hardware configuration” on a real world context like the one written by Krishna Manoharan,

    http://dsstos.blogspot.com/2009/09/download-link-for-storage-design-for.html

    I’ve got some questions on his post, and the most important is the first one (about requirements and some usual predicaments) which will define what you’ll have on your “balanced” configuration..

    I hope you could also shed some light on that area.. :)

  8. Greg Rahn

    @Karl Arao
    Krishna’s post/ppt is just really the basics of capacity planning and understanding what hardware can deliver. The problem I see with IT shops and planning is there is there is either no planning, or it is done incorrectly. Very few shops take the time to build and test, it is more like build and deploy, then “tune”. Let me put this into a slightly different context: If you were tasked with building a bridge from point A to point B what information would you need? Would things like amount of traffic, both peak and average, type of traffic: train, bus, trucks, cars, bicycle & foot be important? It would to me. I see IT shops that design a foot bridge (because it is cheaper) and then try and put cars onto it and then it falls down and they complain. Capacity planning and building computer systems is not about guessing, it is about engineering and calculations. That being said, what calculations go into your system design?

  9. Karl Arao

    @Greg Rahn
    Hi Greg,

    I saw your reply just now.

    “The problem I see with IT shops and planning is there is there is either no planning, or it is done incorrectly.”
    <– Yes, I've said here (http://www.freelists.org/post/oracle-l/Experiencesthoughts-about-hardware-recommendations) I don't like the feeling of guessing or just giving something for the sake of coming up with a hardware recommendation and technical proposal

    "Capacity planning and building computer systems is not about guessing, it is about engineering and calculations."
    <– Yes, I agree with you on this.

    But to build a balanced configuration. On the hardware perspective, I would do what Krishna did on his example.. But I have to match it with the Oracle requirements. The usual issue for me is if you'll have a new environment and you can't document the requirements because you don't have the existing baselines.. (and you don't want to guess)

    – On usual cases it is a new system or application and although you'll have an idea on how many will be using the application, you'll not be able to build enough statistics or numbers to get the requirements you need for you to match it to the storage capacity/performance… this is also the same with CPU…

    – And also you may not have a reference environment (or even close to it) for the new application.. So given this predicament it would be safer to put an assumption to the proposal with something like,
    "With the information available, there is a significant risk the CPU/storage subsystem may not provide enough capacity to meet the application requirements."

    But here's the catch, you still have to recommend the CPUs/storage.. & you have no choice.. :)
    I would like to know what would you do with this kind of scenario?

  10. Greg Rahn

    @Karl Arao

    If your question is “how would I recommend sizing systems of which there is not sufficient information”, the answer is easy, I don’t. I require a workload profile to offer sizing guidelines. That may take the shape of a scale down test and not a full size, but as an engineer, I find it impossible to build good systems without requirements.

  11. Pingback: The Core Performance Fundamentals Of Oracle Data Warehousing – Introduction
  12. Pingback: The Core Performance Fundamentals Of Oracle Data Warehousing – Parallel Execution
  13. LNey

    Greg,

    You have put good information here for us to reflect upon. However, you will agreed that you can have the most powerful machines(hardwares) correct size and all but if your code is not “tight” is just not going to WORK!. Good engineering begings with good understanding of the questions you are asking to your DB. Good engineering is testing to destruction as jonathan will tell you. Notice the sequence…Questions and Testing..then everything else.

  14. Pingback: Configurer sa baie de stockage pour une base de données « EASYTEAM LE BLOG
  15. nm

    Hi Greg,
    I know this is an old article but here goes anyway, you said the following in your post:
    .
    What rate (in MB/GB per second) can the CPUs consume data?
    .
    How does one go about computing that – does each CPU depending on the type have an average number for how much data can be consumed? Much like disks have average IOPS? Or is this something that can be computed from an AWR report by looking at how many IOPS are being performed and multiplying that with the database block size?
    thanks

  16. Greg Rahn

    The rate at which CPUs can consume data can be tested in a lab with some relatively simple micro-benchmark tests. It does vary for CPU type, etc.

  17. josh

    Hi Greg,
    I know this is an old topic but could you point out what kind of micro-benchmark that we can use to get the rate of CPU consumption been trying to find out how the calculation of average 200mb/s came about

  18. Pingback: Configurer sa baie de stockage pour une base de données | EASYTEAM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s