Counting Triangles Faster

A few weeks back one of the Vertica developers put up a blog post on counting triangles in an undirected graph with reciprocal edges. The author was comparing the size of the data and the elapsed times to run this calculation on Hadoop and Vertica and put up the work on github and encouraged others: “do try this at home.” So I did.

Compression

Vertica draws attention to the fact that their compression brought the size of the 86,220,856 tuples down to 560MB in size, from a flat file size of 1,263,234,543 bytes resulting in around a 2.25X compression ratio. My first task was to load the data and see how Oracle’s Hybrid Columnar Compression would compare. Below is a graph of the sizes.

As you can see, Oracle’s default HCC query compression (query high) compresses the data over 2X more than Vertica and even HCC query low compression beats out Vertica’s compression number.

Query Elapsed Times

The closest gear I had to Vertica’s hardware was an Exadata X2-2 system — both use 2 socket, 12 core Westmere-EP nodes. While one may try to argue that Exadata may somehow influence the execution times, I’ll point out that I was using In-Memory Parallel Execution so no table data was even read from spinning disk or Exadata Flash Cache — it’s all memory resident in the database nodes’ buffer cache. This seems to be inline with how Vertica executed their tests though not explicitly stated (it’s a reasonable assertion).

After I loaded the data and gathered table stats, I fired off the exact same SQL query that Vertica used to count triangles to see how Oracle would compare. I ran the query on 1, 2 and 4 nodes just like Vertica. Below is a graph of the results.

As you can see, the elapsed times are reasonably close but overall in the favor of Oracle winning 2 of the 3 scale points as well as having a lower sum of the three executions: Vertica 519 seconds, Oracle 487 seconds — advantage Oracle of 32 seconds.

It Should Go Faster!

As a database performance engineer I was thinking to myself, “it really should go faster!” I took a few minutes to look over things to see what could make this perform better. You might think I was looking at parameters or something like that, but you would be wrong. After a few minutes of looking at the query and the execution plan it became obvious to me — it could go faster! I made a rather subtle change to the SQL query and reran my experiments. With the modified SQL query Oracle was now executing twice as fast on 1 node than Vertica was on 4 nodes. Also, on 4 nodes, the elapsed time came in at just 14 seconds, compared to the 97 seconds Vertica reported — a difference of almost 7X! Below are the combined results.

What’s The Go Fast Trick?

I was thinking a bit more about the problem at hand — we need to count vertices but not count them twice since they are reciprocal. Given that for any edge, it exists in both directions, the query can be structured like Vertica wrote it — doing the filtering with a join predicate like e1.source < e2.source to eliminate the duplicates or we can simply use a single table filter predicate like source < dest before the join takes place. One of the first things they taught me in query planning and optimization class was to filter early! That notation pays off big here because the early filter cuts the rows going into the first join as well as the output of the first join by a factor of 2 — 1.8 billion rows output vs. 3.6 billion. That’s a huge savings not only in the first join, but also in the second join as well.

Here is what my revised query looks like:

with
  e1 as (select * from edges where source < dest),
  e2 as (select * from edges where source < dest),
  e3 as (select * from edges where source > dest)
select count(*)
from e1
join e2 on (e1.dest = e2.source)
join e3 on (e2.dest = e3.source)
where e3.dest = e1.source

Summary

First, I’d like to thank the Vertica team for throwing the challenge out there and being kind enough to provide the data, code and their elapsed times. I always enjoy a challenge — especially one that I can improve upon. Now, I’m not going to throw any product marketing nonsense out there as that is certainly not my style (and there certainly is more than enough of that already), but rather I’ll just let the numbers do the talking. I’d also like to point out that this experiment was done without any structure other than the table. And in full disclosure, all of my SQL commands are available as well.

The other comment that I would make is that the new and improved execution times really make a mockery of the exercise when comparing to Hadoop MapReduce or Pig, but I would also mention that this test case is extremely favorable for parallel pipelined databases that can perform all in-memory operations and given the data set is so small, this is the obviously the case. Overall, in my opinion, a poor problem choice to compare the three technologies as it obviously (over) highlights the right tool for the job cliche.

Experiments performed on Oracle Database 11.2.0.2.

Github source code: https://gist.github.com/grahn/1289188

14 comments

  1. Pingback: Secrets of Oracle’s Automatic Degree of Parallelism | The Pythian Blog
  2. Mats Persson

    Is the timing of the entire work including importing the data from text or just the query with all the data already in memory? If not then I am curious to see the numbers for the entire job from no databases and no process running to completed query?

  3. Greg Rahn

    The time reported is for the query portion only.

    Why are you curious for the latter? To my knowledge, nobody runs a database system that way — at least Oracle people don’t.

  4. Carl Bonta

    Hi Greg,
    Your performance was impressive. I was curious to know how many triangles were actually found. Do you have that information?

  5. Atish Lahiri

    Thank you for this very interesting article, Greg.

    Your site ranks right up at the top.

    Best wishes,

    Atish

  6. sean mikha

    Was there opportunity to re-write the Vertica query for increased performance?

  7. Pingback: Counting triangles smarter (or how to beat Big Data vendors at their own game) | Datablend
  8. Dan

    I know this post is ancient. But as a Vertica DBA, I just want to point out that the assumption that the Vertica test was run against data already in memory is not correct. Vertica has no DB buffer cache similar to Oracle. There is an optimization that loads to memory (Vertica WOS) in small amounts until an async process consolidates to disk, but that was bypassed in the Vertica example with the “direct” keyword in the COPY statement.
    The only way part of the data would be in memory is from the linux os buffer cache. I am almost certain most of the data in that test was read from disk.

  9. Greg Rahn

    @Dan- We can all opine, but the data set is very small and I’m sure the query was run more than once before the measurement was taken, so it’s a very high probability the data was in the OS filesystem cache. That counts as “in-memory” for me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s