Oracle Analytic Functions
Recently, I’ve been quite busy with performance projects and haven’t had the spare time I would like to keep up on my blog. Now that those projects are behind me, I wanted to blog about the use of one feature that made a significant difference in performance on a number of queries for this project. This feature was the Oracle Analytic Functions. This functionality has been present since Oracle 8.1.6 and enhancements have been made to them from release to release. For those of you with an OLTP background, the Oracle Analytic Functions may be somewhat unfamiliar, but those with a data warehousing background are hopefully leveraging these functions. One powerful benefit of the Oracle Analytic Functions is that the aggregation and windowing can usually be done in a single pass over the table.
I’ve chosen one example that I consider a more complex one. It was more complex than I had previously worked with, so I had to turn to an Oracle developer that specializes in the Oracle Analytic Functions to get the answers and thus would like to share them with the Oracle user community.
The Original Query
Let’s look at the original query:
SELECT P.PRS_ID,
P.GEO_CD,
P.OBLIGATION_ID,
P.ORDER_NR,
P.COUNTRY_NM,
P.REGION_NM,
P.ORDER_CRE_TS,
P.PREV_OBL_TS1,
CASE
WHEN (Q.ORDER_CRE_TS BETWEEN
NVL(P.PREV_OBL_TS,P.ORDER_CRE_TS)
AND P.ORDER_CRE_TS)
THEN NULL
ELSE P.PREV_OBL_TS
END PREV_OBL_TS
FROM TAB2 P,
TAB1 Q
WHERE P.PRS_ID = Q.PRS_ID
AND Q.ORDER_CRE_TS BETWEEN
NVL(P.PREV_OBL_TS,Q.ORDER_CRE_TS)
AND (SELECT MAX(ORDER_CRE_TS)
FROM TAB1 C
WHERE P.PRS_ID = C.PRS_ID
AND C.ORDER_CRE_TS > P.PREV_OBL_TS
AND C.ORDER_CRE_TS <= P.ORDER_CRE_TS);
This appears to be a pretty basic two table query with a simple subquery that is looking for a max timestamp for a given PRS_ID.
If we take a look at the execution plan for this query you will notice that the subquery is causing a second pass to TAB1 as there are no indexes on either table.
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
| 2 | HASH GROUP BY | |
|* 3 | HASH JOIN | |
|* 4 | HASH JOIN | |
| 5 | TABLE ACCESS FULL| TAB2 |
| 6 | TABLE ACCESS FULL| TAB1 |
| 7 | TABLE ACCESS FULL | TAB1 |
---------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Q"."ORDER_CRE_TS"<=MAX("ORDER_CRE_TS"))
3 - access("P"."PRS_ID"="Q"."PRS_ID")
filter("Q"."ORDER_CRE_TS">=NVL("P"."PREV_OBL_TS","Q"."ORDER_CRE_TS"))
4 - access("P"."PRS_ID"="C"."PRS_ID")
filter(("C"."ORDER_CRE_TS">"P"."PREV_OBL_TS" AND
"C"."ORDER_CRE_TS"<="P"."ORDER_CRE_TS"))
A first, quick reaction might be to recommend to put an index on TAB1(PRS_ID,ORDER_CRE_TS), such that the subquery could be satisfied with an index-only lookup. The challenge here is that we have a pretty big data set w/o much data elimination (say >200,000,000 rows) from the TAB1/TAB2 table join and if we were to use an index it would probably cause the execution plan to do a nested-loops join which would probably be pretty slow, but perhaps faster than a second full table scan of TAB1. Adding that index is probably also considered a "custom solution" because we don't know how many other ad-hoc queries in this data warehouse could leverage this index, if any. Adding "custom" indexes in a data warehouse could also become a problem as the more indexes that exist, the slower the loads are - even if partitioning is used with local index rebuilds.
Now Enter Oracle Analytic Functions
The original query can be rewritten to leverage a window function. Here is the rewritten query:
SELECT PRS_ID,
GEO_CD,
OBLIGATION_ID,
ORDER_NR,
COUNTRY_NM,
REGION_NM,
ORDER_CRE_TS,
PREV_OBL_TS1,
PREV_OBL_TS
FROM (SELECT P.PRS_ID,
P.GEO_CD,
P.OBLIGATION_ID,
P.ORDER_NR,
P.COUNTRY_NM,
P.REGION_NM,
P.ORDER_CRE_TS,
P.PREV_OBL_TS1,
CASE
WHEN (Q.ORDER_CRE_TS BETWEEN
NVL(P.PREV_OBL_TS,P.ORDER_CRE_TS)
AND P.ORDER_CRE_TS) THEN NULL
ELSE P.PREV_OBL_TS
END PREV_OBL_TS,
NVL(P.PREV_OBL_TS,Q.ORDER_CRE_TS) NVL_DATE,
Q.ORDER_CRE_TS Q_ORDER_CRE_TS,
MAX(CASE
WHEN Q.ORDER_CRE_TS > P.PREV_OBL_TS
AND Q.ORDER_CRE_TS <= P.ORDER_CRE_TS
THEN Q.ORDER_CRE_TS
END)
OVER(PARTITION BY Q.PRS_ID,
P.PREV_OBL_TS,
P.ORDER_CRE_TS) MAX_ORDER_CRE_TS
FROM TAB2 P,
TAB1 Q
WHERE P.PRS_ID = Q.PRS_ID)
WHERE Q_ORDER_CRE_TS >= NVL_DATE
AND Q_ORDER_CRE_TS <= MAX_ORDER_CRE_TS;
Notice the original subquery for the MAX(ORDER_CRE_TS) has been incorporated into the "MAX(CASE WHEN)OVER(PARTITION BY)" clause and the original date filter is now applied in the outer select query.
Now lets look at the execution plan for the rewritten query:
--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | VIEW | |
| 2 | WINDOW SORT | |
|* 3 | HASH JOIN | |
| 4 | TABLE ACCESS FULL| TAB2 |
| 5 | TABLE ACCESS FULL| TAB1 |
--------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("Q_ORDER_CRE_TS">="NVL_DATE" AND
"Q_ORDER_CRE_TS"<="MAX_ORDER_CRE_TS"))
3 - access("P"."PRS_ID"="Q"."PRS_ID")
The requirement for the second pass to TAB1 in the original query has now been eliminated and the MAX() is able to be done in a single pass using a WINDOW SORT.
Another alternative is to manually push part of the date filter predicate to the inner select.
SELECT PRS_ID,
GEO_CD,
OBLIGATION_ID,
ORDER_NR,
COUNTRY_NM,
REGION_NM,
ORDER_CRE_TS,
PREV_OBL_TS1,
PREV_OBL_TS
FROM (SELECT P.PRS_ID,
P.GEO_CD,
P.OBLIGATION_ID,
P.ORDER_NR,
P.COUNTRY_NM,
P.REGION_NM,
P.ORDER_CRE_TS,
P.PREV_OBL_TS1,
CASE
WHEN (Q.ORDER_CRE_TS BETWEEN
NVL(P.PREV_OBL_TS,P.ORDER_CRE_TS)
AND P.ORDER_CRE_TS)
THEN NULL
ELSE P.PREV_OBL_TS
END PREV_OBL_TS,
Q.ORDER_CRE_TS Q_ORDER_CRE_TS,
MAX(CASE
WHEN Q.ORDER_CRE_TS > P.PREV_OBL_TS
AND Q.ORDER_CRE_TS <= P.ORDER_CRE_TS
THEN Q.ORDER_CRE_TS
END)
OVER(PARTITION BY Q.PRS_ID,
P.PREV_OBL_TS,
P.ORDER_CRE_TS ) MAX_ORDER_CRE_TS
FROM TAB2 P,
TAB1 Q
WHERE P.PRS_ID = Q.PRS_ID
AND Q.ORDER_CRE_TS >= NVL(P.PREV_OBL_TS,Q.ORDER_CRE_TS))
WHERE Q_ORDER_CRE_TS <= MAX_ORDER_CRE_TS;
--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | VIEW | |
| 2 | WINDOW SORT | |
|* 3 | HASH JOIN | |
| 4 | TABLE ACCESS FULL| TAB2 |
| 5 | TABLE ACCESS FULL| TAB1 |
--------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("Q_ORDER_CRE_TS"<="MAX_ORDER_CRE_TS")
3 - access("P"."PRS_ID"="Q"."PRS_ID")
filter("Q"."ORDER_CRE_TS">=NVL("P"."PREV_OBL_TS","Q"."ORDER_CRE_TS"))
In this case we get exactly the same execution plan but personally I like to push predicates as deep as possible even though the Optimizer may choose to do it as well.
Summary
As demonstrated by example, the Oracle Analytic Functions are quite powerful from a Business Intelligence point of view as well as a performance view. I would encourage leveraging them as much as possible in a data warehouse environment.
I completely agree with you and
[self promotion start] :)
I have written an article about analytic functions (4 cases from real world reports) here
http://www.gplivna.eu/papers/using_analytic_functions_in_reports.htm
[self promotion end]
Hi, i have a query suppose a table has two columns code ,scrip with two rows
code,scrip
A1 AAAA
A1 BBBB
Result should be
A1,AAAA,BBBB
how can i get mentioned result. will anyone help me.
@Ashish
In 11.2 you can use the LISTAGG function like such:
If you are on an older release see Tom Kyte’s example.
SQL> select key, listagg(value,’,') within group (order by key) as comma_list
from t1
group by key;
ORA-00923: FROM keyword not found where expected
not working in oracle db
Introduced in 11.2:
http://download.oracle.com/docs/cd/E11882_01/server.112/e17128/chapter1.htm#FEATURENO08750
Details:
http://download.oracle.com/docs/cd/E14072_01/server.112/e10592/functions087.htm
SQL> create table t1 (key varchar2(8), value varchar2(16)); Table created. SQL> insert into t1 values ('A1','AAA'); 1 row created. SQL> insert into t1 values ('A1','BBB'); 1 row created. SQL> commit; Commit complete. SQL> select key, listagg(value,',') within group (order by key) as comma_list from t1 group by key; KEY COMMA_LIST ---------------- -------------------------------------------------- A1 AAA,BBB