ANSI Outer Joins And Lateral Views
February 18, 2008A few months ago the Oracle Optimizer Team did a blog post entitled Outerjoins in Oracle. In the Lateral View section of that post they go through some examples and discuss how a query is transformed with the ANSI outer join syntax. I thought it would be useful to go through an example that recently came through the Real-World Performance Group. For simplicity purposes and so that you can play along at home, the test case has been recreated to use EMP and DEPT which have been created and populated via the $ORACLE_HOME/rdbms/admin/utlsampl.sql script.
The Three Test Cases
Consider the following three SQL statements:
Query A: Oracle Outer Join Syntax
SELECT d.dname, d.deptno, e.ename
FROM dept d, emp e
WHERE d.deptno = e.deptno(+) and
d.deptno in (10,40)
Query B: ANSI Outer Join Syntax Version 1
SELECT d.dname, d.deptno, e.ename FROM dept d LEFT OUTER JOIN emp e ON d.deptno = e.deptno WHERE d.deptno in (10,40)
Query C: ANSI Outer Join Syntax Version 2
SELECT d.dname, d.deptno, e.ename
FROM dept d LEFT OUTER JOIN emp e
ON d.deptno = e.deptno and
d.deptno in (10,40)
Do note the slight difference between the two ANSI versions: Query B has the filter predicate in the WHERE clause, where Query C has the filter predicate in the ON clause.
Query Results
Query A
DNAME DEPTNO ENAME -------------- ---------- ---------- ACCOUNTING 10 CLARK ACCOUNTING 10 KING ACCOUNTING 10 MILLER OPERATIONS 40 4 rows selected.
Query B
DNAME DEPTNO ENAME -------------- ---------- ---------- ACCOUNTING 10 CLARK ACCOUNTING 10 KING ACCOUNTING 10 MILLER OPERATIONS 40 4 rows selected.
Query C
DNAME DEPTNO ENAME -------------- ---------- ---------- ACCOUNTING 10 CLARK ACCOUNTING 10 KING ACCOUNTING 10 MILLER RESEARCH 20 SALES 30 OPERATIONS 40 6 rows selected.
Whoa! Query C returned 6 rows, while Query A and Query B returned 4 rows. Must be a wrong results bug…or is it?
Execution Plans
To start troubleshooting the difference in results sets, lets examine the execution plan of each query.
Query A
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3713469723
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 198 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 9 | 198 | 5 (20)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 2 | 26 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 126 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
2 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)
Query B
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3713469723
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 198 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 9 | 198 | 5 (20)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| DEPT | 2 | 26 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 126 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
2 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)
Query C
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 498633241
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 80 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 4 | 80 | 10 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | DEPT | 4 | 52 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | TABLE ACCESS FULL| EMP | 1 | 9 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("D"."DEPTNO"=10 OR "D"."DEPTNO"=40)
5 - filter("D"."DEPTNO"="E"."DEPTNO" AND ("E"."DEPTNO"=10 OR
"E"."DEPTNO"=40))
For some reason (which we need to investigate!), Query C has a different execution plan. Perhaps this is why the result sets are different.
The 10053 Trace
Perhaps the 10053 trace from Query C can help us understand why the execution plan is different. Looking at the trace file we find that Query C has been transformed (the below has been modified for formatting purposes):
SELECT d.dname, d.deptno, lv.ename
FROM dept d,
LATERAL(
(SELECT e.deptno, e.ename
FROM emp e
WHERE d.deptno=e.deptno and
(d.deptno=10 or d.deptno=40))
)(+) lv
You will notice that the transformation of Query C contains a Lateral View. The Oracle Optimizer Team’s post on Outerjoins in Oracle gives us the definition of a Lateral View:
A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.
What does this mean? It means that Query C is not the same query as Query B and Query A. Just the slight change of the ANSI syntax causes the meaning of the business question being answered to change! Query C applies the deptno in (10,40) filter to EMP first (returning 3 rows in which deptno=10, there are no deptno=40 rows) and then outer joins that result set to DEPT but does not apply the deptno in (10,40) filter to DEPT, essentially resulting in this query, which is most likely not what the user had intended:
SELECT d.dname, d.deptno, sq.ename
FROM dept d,
(SELECT e.deptno, e.ename
FROM emp e
WHERE e.deptno in (10,40)) sq
WHERE d.deptno=sq.deptno (+)
Summary
Filters specified in the ON clause of outer joins are transformed internally into lateral views and will be applied before the join. While ANSI join syntax is preferred by some, be certain that your query matches the business question being asked!
16 Responses to “ANSI Outer Joins And Lateral Views”
Do you think this should be categorized into a bug? or some freaky side effects of query transformation?
By Dion Cho on Feb 18, 2008
Dion-
None of the above. It is the result of different ANSI syntax and different meanings of the statements.
MySQL correctly yields the same results (as Oracle) for the two queries:
Query B
mysql> SELECT D.DNAME, D.DEPTNO, E.ENAME -> FROM DEPT D LEFT OUTER JOIN EMP E -> ON D.DEPTNO = E.DEPTNO -> WHERE D.DEPTNO IN (10,40); +------------+--------+--------+ | DNAME | DEPTNO | ENAME | +------------+--------+--------+ | ACCOUNTING | 10 | CLARK | | ACCOUNTING | 10 | KING | | ACCOUNTING | 10 | MILLER | | OPERATIONS | 40 | NULL | +------------+--------+--------+ 4 rows in set (0.03 sec)Query C
mysql> SELECT D.DNAME, D.DEPTNO, E.ENAME -> FROM DEPT D LEFT OUTER JOIN EMP E -> ON D.DEPTNO = E.DEPTNO AND -> D.DEPTNO IN (10,40) ; +------------+--------+--------+ | DNAME | DEPTNO | ENAME | +------------+--------+--------+ | ACCOUNTING | 10 | CLARK | | ACCOUNTING | 10 | KING | | ACCOUNTING | 10 | MILLER | | RESEARCH | 20 | NULL | | SALES | 30 | NULL | | OPERATIONS | 40 | NULL | +------------+--------+--------+ 6 rows in set (0.07 sec)By Greg Rahn on Feb 18, 2008
This article from Jonathan Gennick explains this phenomenon too, but from a different (not performance) angle.
Regards,
Rob.
By Rob van Wijk on Feb 18, 2008
Hi Greg
I don’t understand the following sentence really mean.”Query C applies the deptno in (10,40) filter to EMP first (returning 3 rows in which deptno=10, there are no deptno=40 rows) and then outer joins that result set to DEPT but does not apply the deptno in (10,40) filter to DEPT”
Because from the explain plan of Query C,we can clearly see deptno in (10.40) not only filter to emp but also filter dept
filter(”D”.”DEPTNO”=10 OR “D”.”DEPTNO”=40)
Is there any thing I’ve mistaken? Would you be kindly explain more clearly?
Thanks,
Smallkoala
By smallkoala on Feb 20, 2008
Smallkoala-
The filter on DEPTNO is not applied to DEPT. If you look carefully at the plan for Query C, you will see there is no filter on line 2 where DEPT is listed. The filter you mention is on line 4 which is part of the lateral view. This is why there are rows returned from DEPT that have DEPTNO values other than 10 and 40 (the results contain DEPTNO values of 20 and 30 as well!). This is because the filter is part of the ON clause, not the WHERE clause in the ANSI outer join.
By Greg Rahn on Feb 21, 2008
Hi
If you are interested I found some side effects using ANSI outer joins and Oracle outer joins along the same lines
Fun and games
http://oraclerob.wordpress.com/2007/11/07/are-ansi-sql-and-oracle-sql-the-same/
By Rob on Mar 27, 2008
Hmm, I can’t see why the confusion here?
Firstly, left outer join means that we want to keep all the rows from the “left” table regardless of what the join criteria matches, so why would one expect adding another join condition (even though “d.deptno in (10,40)” isn’t technically a join condition) to restrict the result set ?
Secondly, is this not bad practice anyway? In the on-clause should be specified join conditions, “d.deptno in (10,40)” is not a join condition its simply a filter so the place for that would always be the where clause, not the on clause.
By simon on May 29, 2008
Simon-
The confusion comes from the fact that if one adds a filter predicate in addition to join predicates in the ON clause it changes the meaning of the query. There are cases where having a filter predicate in the ON clause is what the user may want, but more often then not, it is done by mistake.
The comment by Rob van Wijk has a link that also describes this phaenomenon.
By Greg Rahn on May 29, 2008
Hi Greg
Yes of course moving the condition from the where clause to the on clause changes the semantics, but my point was why would one not expect that to be the case? If doing this didn’t change the semantics then it would have broken what fundamentally a left outer join means - now that would be confusing!
By simon on May 30, 2008
Simon-
I believe the reason there is confusion is that with regular joins this is not observed.
For example:
By Greg Rahn on May 30, 2008
Hi Greg
With respect, is that argument not akin to comparing “apples and oranges” ? - an apple is green, but that doesn’t mean that it is confusing that an orange isn’t green, because an apple isn’t an orange.
Likewise, a regular join isn’t a left outer join, and it doesn’t have the characteristics of one.
The fundamental contract of a regular join is that rows that don’t meet the joining conditions WILL NOT appear in the result set. So when you add “d.deptno in (10,40)” to the joining conditions of your regular join it means that you DO NOT then see rows that don’t meet that condition in your result set.
The fundamental contract of a left outer join is that rows from the “left” table that don’t meet the joining conditions WILL still appear in your result set. So when you add “d.deptno in (10,40)” to the joining conditions of your left-outer join you DO still see rows that don’t meet that condition in your result set.
I think this is a helpful thread and I agree that there is plenty of scope for someone to inadvertently produce results they didn’t intend by incorrectly using these constructs, problems relating to which can be entirely avoided by what I would consider the good practice of putting only true joining conditions in the on clause and putting filtering conditions in the where clause.
Anyway, thanks for the interesting site Greg.
By simon on May 30, 2008
Simon-
It’s not at all about comparing apples and oranges. It’s about syntax that doesn’t change the result set with a JOIN can very well change the result set with an OUTER JOIN because it changes the meaning of the query. I agree 100% with your comments about good practice. The facts are that many people do not realize this (as I’ve seen this issue in a number of organizations) hence the whole reason I wrote this post: to educate and inform.
Cheers.
By Greg Rahn on May 30, 2008
To Simon: But there are times when you *must* put the filtering in the ON clause. For the LEFT OUTER JOIN, it’s the right-hand side. (So in the example above, if we wanted all departments, and also any employee whose name began with “M”. The NAME LIKE ‘M%’ would of course have to go in the ON clause.
So to even an experienced SQL coder, it’s easy to make the mistake that a filter on the left-side table would also take effect, but it doesn’t.
By Doug on Jun 12, 2008
Hi Doug
It is never necessary to put the filtering in the ON clause of a regular join, and this is what I was referring to - because the gist of this thread is that when a regular join with filtering conditions in the ON clause is converted to a left outer join the behaviour changes, and this was apparently confusing some people. So my suggestion is don’t put filtering conditions in the ON clause of a regular join. In fact I think maybe it would be a good idea if this (non-joining conditions in the ON clause of a regular join) was made illegal SQL.
I know I’m repeating myself here, but still not sure how an experienced sql coder can easily make the mistake of thinking that “left” table filtering conditions in the ON clause of a left outer join would have an effect though, as the fundamental contract of a left outer join is that rows from the “left” table that don’t meet the joining conditions WILL still appear in the result set.
By simon on Jun 27, 2008
Hello Simon,
I concur with Doug’s post of 6/12. It is sometimes necessary to include filtering in the join clause. Often for clarity’s sake this is done in an inline view. Using Doug’s example:
SELECT d.dname, d.deptno, e.ename
FROM dept d LEFT OUTER JOIN emp e
ON d.deptno = e.deptno and
e.name like ‘M%’;
is the same as:
SELECT d.dname, d.deptno, e.ename
FROM dept d LEFT OUTER JOIN
(select deptno, ename from emp
where ename like ‘M%’) e
ON d.deptno = e.deptno;
By Dale on Aug 21, 2008
Hi Dale
Yes of course I agree it is sometimes necessary to put the filtering in the ON clause of a LEFT OUTER join. My point was that it is never necessary to put the filtering in the ON clause of a REGULAR join. Therefore, if you avoid doing so you avoid any confusion over the differing semantics of filtering in the ON clause of regular and left-outer joins
By simon on Aug 28, 2008