<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Oracle Exadata Storage Server and the HP Oracle Database Machine</title>
	<atom:link href="http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/feed/" rel="self" type="application/rss+xml" />
	<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/</link>
	<description>Oracle Database Performance And Scalability Blog</description>
	<lastBuildDate>Mon, 01 Mar 2010 22:11:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Greg Rahn</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-275</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Fri, 25 Sep 2009 15:19:17 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-275</guid>
		<description>&lt;a href=&quot;#comment-10074&quot; rel=&quot;nofollow&quot;&gt;@Oozy&lt;/a&gt;

I can not comment on what could happen in the future.  I would comment that if you are hitting bugs that there be a SR and bug number to reference (bugs do not get fixed unless they are logged with support).  I have yet to hit any issues where it was required to disable parallel query to work around a bug.  That in itself would just kill performance anyway.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10074" rel="nofollow">@Oozy</a></p>
<p>I can not comment on what could happen in the future.  I would comment that if you are hitting bugs that there be a SR and bug number to reference (bugs do not get fixed unless they are logged with support).  I have yet to hit any issues where it was required to disable parallel query to work around a bug.  That in itself would just kill performance anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oozy</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-276</link>
		<dc:creator>Oozy</dc:creator>
		<pubDate>Fri, 25 Sep 2009 11:10:24 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-276</guid>
		<description>Greg,

We know that Smart Scan requires Parallel Query to work? Sometimes we want to
disable parallelism for a query, for various reasons (in particular: bugs...:)). Is it possible, that in future releases of Exadata, Smart Scan will also be supported for noparallel queries?

Many thanks in advance!</description>
		<content:encoded><![CDATA[<p>Greg,</p>
<p>We know that Smart Scan requires Parallel Query to work? Sometimes we want to<br />
disable parallelism for a query, for various reasons (in particular: bugs&#8230;:)). Is it possible, that in future releases of Exadata, Smart Scan will also be supported for noparallel queries?</p>
<p>Many thanks in advance!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-277</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Tue, 22 Sep 2009 17:15:14 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-277</guid>
		<description>&lt;a href=&quot;#comment-10033&quot; rel=&quot;nofollow&quot;&gt;@Oozy&lt;/a&gt;

The Exadata Smart Scan column projection works on all rows in all blocks.  The advantage of this is a reduction in data sent back to the database grid.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10033" rel="nofollow">@Oozy</a></p>
<p>The Exadata Smart Scan column projection works on all rows in all blocks.  The advantage of this is a reduction in data sent back to the database grid.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oozy</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-279</link>
		<dc:creator>Oozy</dc:creator>
		<pubDate>Tue, 22 Sep 2009 08:00:11 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-279</guid>
		<description>&lt;a href=&quot;#comment-10032&quot; rel=&quot;nofollow&quot;&gt;@Greg Rahn&lt;/a&gt;

Many thanks for your answer, now I understand! I would be grateful for answering one more question... :)

Column projection - I assume that this only works if a table row is located in more than 1 oracle block (this means either row chaining or row migration)? If each and every row in a table fits into 1 oracle block, is column projection relevant?</description>
		<content:encoded><![CDATA[<p><a href="#comment-10032" rel="nofollow">@Greg Rahn</a></p>
<p>Many thanks for your answer, now I understand! I would be grateful for answering one more question&#8230; :)</p>
<p>Column projection &#8211; I assume that this only works if a table row is located in more than 1 oracle block (this means either row chaining or row migration)? If each and every row in a table fits into 1 oracle block, is column projection relevant?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-278</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Tue, 22 Sep 2009 05:22:04 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-278</guid>
		<description>&lt;a href=&quot;#comment-10027&quot; rel=&quot;nofollow&quot;&gt;@Oozy&lt;/a&gt;

For &quot;normal&quot; smart scans the list of values is known at parse time and is generally small in number.  Most people don&#039;t write an IN predicate with 5,000 values for example.  With a join, the list of values could be fairly large (compared to writing it by hand) so it is much more efficient to apply this filtering with a bloom filter.  Think of it as bulk set based elimination.  Do note that a bloom filter can return false positives but no false negatives, so the database still needs to apply this predicate filter as well.  The benefit of the bloom filter being applied in the Exadata storage is that it significantly reduces the number of rows returned to the database.  This saves on channel bandwidth as well as greatly reducing the work the database has to do for this join operation.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10027" rel="nofollow">@Oozy</a></p>
<p>For &#8220;normal&#8221; smart scans the list of values is known at parse time and is generally small in number.  Most people don&#8217;t write an IN predicate with 5,000 values for example.  With a join, the list of values could be fairly large (compared to writing it by hand) so it is much more efficient to apply this filtering with a bloom filter.  Think of it as bulk set based elimination.  Do note that a bloom filter can return false positives but no false negatives, so the database still needs to apply this predicate filter as well.  The benefit of the bloom filter being applied in the Exadata storage is that it significantly reduces the number of rows returned to the database.  This saves on channel bandwidth as well as greatly reducing the work the database has to do for this join operation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oozy</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-280</link>
		<dc:creator>Oozy</dc:creator>
		<pubDate>Mon, 21 Sep 2009 21:05:23 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-280</guid>
		<description>&lt;a href=&quot;#comment-9979&quot; rel=&quot;nofollow&quot;&gt;@Greg Rahn&lt;/a&gt;

I have a question about smart scan in joins: if smart scan can filter rows, why in joins - after finding REG_IDs corresponding to REGION_NAMEs - doesn&#039;t smart scan do &quot;normal&quot; row filtering, and instead it uses bloom filter???

Many thanks in advance!</description>
		<content:encoded><![CDATA[<p><a href="#comment-9979" rel="nofollow">@Greg Rahn</a></p>
<p>I have a question about smart scan in joins: if smart scan can filter rows, why in joins &#8211; after finding REG_IDs corresponding to REGION_NAMEs &#8211; doesn&#8217;t smart scan do &#8220;normal&#8221; row filtering, and instead it uses bloom filter???</p>
<p>Many thanks in advance!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-281</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Tue, 15 Sep 2009 04:42:44 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-281</guid>
		<description>&lt;a href=&quot;#comment-9983&quot; rel=&quot;nofollow&quot;&gt;@Dharmendra&lt;/a&gt;

Correct.  Statements that use parallel execution use Smart Scan, non-PX (serial) statements do not.

As the &lt;a href=&quot;http://www.oracle.com/technology/products/bi/db/exadata/pdf/exadata-technical-whitepaper.pdf&quot; rel=&quot;nofollow&quot;&gt;Technical Overview of Exadata whitepaper&lt;/a&gt; states on page 21:
&lt;blockquote&gt;ASM automatically stripes the database data across Exadata disks and cells to ensure a balanced I/O load and optimum performance.&lt;/blockquote&gt;
And &lt;a href=&quot;http://www.oracle.com/technology/products/bi/db/exadata/pdf/migration-to-exadata-whitepaper.pdf&quot; rel=&quot;nofollow&quot;&gt;Best Practices for Migrating to Oracle Exadata Storage Server&lt;/a&gt; states on page 7:
&lt;blockquote&gt;
Oracle Exadata Storage Server performs best when scanning at least 4MB contiguous chunks.  To ensure this occurs at an ASM level, the disk group&#039;s allocation unit size should be set to 4MB.&lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p><a href="#comment-9983" rel="nofollow">@Dharmendra</a></p>
<p>Correct.  Statements that use parallel execution use Smart Scan, non-PX (serial) statements do not.</p>
<p>As the <a href="http://www.oracle.com/technology/products/bi/db/exadata/pdf/exadata-technical-whitepaper.pdf" rel="nofollow">Technical Overview of Exadata whitepaper</a> states on page 21:</p>
<blockquote><p>ASM automatically stripes the database data across Exadata disks and cells to ensure a balanced I/O load and optimum performance.</p></blockquote>
<p>And <a href="http://www.oracle.com/technology/products/bi/db/exadata/pdf/migration-to-exadata-whitepaper.pdf" rel="nofollow">Best Practices for Migrating to Oracle Exadata Storage Server</a> states on page 7:</p>
<blockquote><p>
Oracle Exadata Storage Server performs best when scanning at least 4MB contiguous chunks.  To ensure this occurs at an ASM level, the disk group&#8217;s allocation unit size should be set to 4MB.</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dharmendra</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-285</link>
		<dc:creator>Dharmendra</dc:creator>
		<pubDate>Tue, 15 Sep 2009 01:13:45 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-285</guid>
		<description>&lt;a href=&quot;#comment-9979&quot; rel=&quot;nofollow&quot;&gt;@Greg Rahn &lt;/a&gt;
Thanks Greg again for detailed explanation.

&gt;First, Smart Scans are performed in Exadata on the data that PQ requests.
&gt;Nothing is really required, it just works. You don’t “tune” for Exadata.

As I understand from your answers so far, To enable smartscan , PQ needs to be enabled at Table/Session/Query level.. else  data would be fetched in normal fashion (i.e. all the data would be returned to DB grid ). Correct?


In Technical overview document, Each Exadata Cell is assigned set of disks. How data is distributed accross the Exadata Cells?  Is it done thru ASM (1MB per Disk/Lun in the Disk Group)?

TIA.</description>
		<content:encoded><![CDATA[<p><a href="#comment-9979" rel="nofollow">@Greg Rahn </a><br />
Thanks Greg again for detailed explanation.</p>
<p>&gt;First, Smart Scans are performed in Exadata on the data that PQ requests.<br />
&gt;Nothing is really required, it just works. You don’t “tune” for Exadata.</p>
<p>As I understand from your answers so far, To enable smartscan , PQ needs to be enabled at Table/Session/Query level.. else  data would be fetched in normal fashion (i.e. all the data would be returned to DB grid ). Correct?</p>
<p>In Technical overview document, Each Exadata Cell is assigned set of disks. How data is distributed accross the Exadata Cells?  Is it done thru ASM (1MB per Disk/Lun in the Disk Group)?</p>
<p>TIA.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-284</link>
		<dc:creator>Greg Rahn</dc:creator>
		<pubDate>Mon, 14 Sep 2009 18:15:09 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-284</guid>
		<description>&lt;a href=&quot;#comment-9974&quot; rel=&quot;nofollow&quot;&gt;@Dharmendra&lt;/a&gt;

First, Smart Scans are performed in Exadata on the data that PQ requests.  Second Exadata does not execute or parallelize queries, it simply scans and filters data.   Exadata is storage software, not database software.  It just knows how to apply some operations that the database software tells it (like filtering/restriction).

Nothing is really required, it just works.  You don&#039;t &quot;tune&quot; for Exadata.

For this specific query I would guess the following would take place:
- partition elimination (this is not Exadata specific but since this query has a predicate on SALES.SALE_DATE and that is the partition key, then it will eliminate the partitions not required and only scan the 4 quarterly partitions for 2001)
- projection (smart scan will only send the required columns back to the DB grid)
- possible use of bloom filter (depending on the selectivity of the REGION_NAME predicate, a bloom filter may be created from the values for REG_ID and pushed down to Exadata to be used as a Smart Scan filter)

The results, post Smart Scan, will be sent to the DB grid and the joins performed there.  So assuming a Bloom/Join filter on REGION.REG_ID, then SALES and REGION would be joined, aggregated, and ordered in the DB grid.

My guess is this query would run a few seconds.  Assuming an equal distribution of data across years, 1/10 of 1TB is 100GB.  The scan rate is 14GB/s for a 1 rack DB Machine, so the table scan of 100GB of SALES would probably take less than a second and the join would also be quite fast with 64 Harpertown CPU cores doing the that and aggregation.</description>
		<content:encoded><![CDATA[<p><a href="#comment-9974" rel="nofollow">@Dharmendra</a></p>
<p>First, Smart Scans are performed in Exadata on the data that PQ requests.  Second Exadata does not execute or parallelize queries, it simply scans and filters data.   Exadata is storage software, not database software.  It just knows how to apply some operations that the database software tells it (like filtering/restriction).</p>
<p>Nothing is really required, it just works.  You don&#8217;t &#8220;tune&#8221; for Exadata.</p>
<p>For this specific query I would guess the following would take place:<br />
- partition elimination (this is not Exadata specific but since this query has a predicate on SALES.SALE_DATE and that is the partition key, then it will eliminate the partitions not required and only scan the 4 quarterly partitions for 2001)<br />
- projection (smart scan will only send the required columns back to the DB grid)<br />
- possible use of bloom filter (depending on the selectivity of the REGION_NAME predicate, a bloom filter may be created from the values for REG_ID and pushed down to Exadata to be used as a Smart Scan filter)</p>
<p>The results, post Smart Scan, will be sent to the DB grid and the joins performed there.  So assuming a Bloom/Join filter on REGION.REG_ID, then SALES and REGION would be joined, aggregated, and ordered in the DB grid.</p>
<p>My guess is this query would run a few seconds.  Assuming an equal distribution of data across years, 1/10 of 1TB is 100GB.  The scan rate is 14GB/s for a 1 rack DB Machine, so the table scan of 100GB of SALES would probably take less than a second and the join would also be quite fast with 64 Harpertown CPU cores doing the that and aggregation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dharmendra</title>
		<link>http://structureddata.org/2008/09/28/oracle-exadata-storage-server-and-the-hp-oracle-database-machine/comment-page-1/#comment-283</link>
		<dc:creator>Dharmendra</dc:creator>
		<pubDate>Mon, 14 Sep 2009 15:05:20 +0000</pubDate>
		<guid isPermaLink="false">http://structureddata.org/?p=133#comment-283</guid>
		<description>&lt;a href=&quot;#comment-9970&quot; rel=&quot;nofollow&quot;&gt;@Greg Rahn &lt;/a&gt;
Thanks again Greg!

Regarding 2, aren&#039;t  smart-scans are done at exadata layer?

can you tell me how this Query (made up) be parallelized at exadata level. i.e. what operations will be executed at exadata (smart-scans) and what would be at the host(DB-Grid) level?

what&#039;s required to be done at table/session/query level to make performance of this query most optimized on DB machine?


select  r.reg_name, p.prod_name, sum(s.rev_amt)
from sales s, region r, product p
where s.reg_id = r.reg_id
and   s.prd_id = p.prd_id
and   r.region_name in ( &#039;ASIA PACIFIC&#039;,&#039;EMEA&#039; )
and   s.sale_date between to_date(&#039;01/01/2001&#039;, &#039;MM/DD/YYYY&#039;)
                  and     to_date(&#039;12/31/2001&#039;, &#039;MM/DD/YYYY&#039;)
group by r.reg_name, p.prod_name
order by 1, 2, 3 desc

Sales table is huge (1TB) with 10 years of sales data and partitioned on sales date ( one partition per quarter ).</description>
		<content:encoded><![CDATA[<p><a href="#comment-9970" rel="nofollow">@Greg Rahn </a><br />
Thanks again Greg!</p>
<p>Regarding 2, aren&#8217;t  smart-scans are done at exadata layer?</p>
<p>can you tell me how this Query (made up) be parallelized at exadata level. i.e. what operations will be executed at exadata (smart-scans) and what would be at the host(DB-Grid) level?</p>
<p>what&#8217;s required to be done at table/session/query level to make performance of this query most optimized on DB machine?</p>
<p>select  r.reg_name, p.prod_name, sum(s.rev_amt)<br />
from sales s, region r, product p<br />
where s.reg_id = r.reg_id<br />
and   s.prd_id = p.prd_id<br />
and   r.region_name in ( &#8216;ASIA PACIFIC&#8217;,'EMEA&#8217; )<br />
and   s.sale_date between to_date(&#8216;01/01/2001&#8242;, &#8216;MM/DD/YYYY&#8217;)<br />
                  and     to_date(&#8216;12/31/2001&#8242;, &#8216;MM/DD/YYYY&#8217;)<br />
group by r.reg_name, p.prod_name<br />
order by 1, 2, 3 desc</p>
<p>Sales table is huge (1TB) with 10 years of sales data and partitioned on sales date ( one partition per quarter ).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
