Re: Comparing apples to apples on Exadata

From: Tanel Poder <tanel@xxxxxxxxxxxxxx>
To: wbeldma@xxxxxx
Date: Fri, 15 Dec 2017 01:06:37 +0200

In the fast case, the direct path read (thus also smart scan) has kicked
in. In the slow case it hasn't and regular buffered IO via buffer cache is
used.

One indicator of that is the fast query having a *Cell offload* column in
its stats, the slow query doesn't (therefore there was no smart scan).

The other indicator is that you have "*cell...physical read*" wait events
showing up in the slow scenario. Smart scan can be so fast asynchronously
feeding data for your query, so that in the fast case you see only CPU
usage and no IO wait events at all.

There are quite a few different inputs that affect the direct path read
decision, affecting Oracle's estimation of how much IO it would have to do
in either case. Usually bigger buffer cache and smaller segment sizes
(partitioned vs non-partitioned tables!) cause more buffered reads and
bigger segments & smaller buffer cache end up favoring direct path
reads/smart scans more.

Also, there are more reasons why smart scan doesn't get used even if direct
path reads have kicked in.

I have some high level explanations and a list of reasons affecting the
decision here:

https://blog.tanelpoder.com/2012/09/03/optimizer-statistics-driven-direct-path-read-decision-for-full-table-scans-_direct_read_decision_statistics_driven/

And there's plenty of low level geekery & details straight from the source:

https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-1
https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-2

--
Tanel Poder
https://blog.tanelpoder.com

On Thu, Dec 14, 2017 at 11:14 PM, Will Beldman <wbeldma@xxxxxx> wrote:

I have two (nearly) identical databases on Exadata.

I have a simple query and can force it to use the same execution plan on
both databases.

On database 1, the query finishes consistently around 1 second. The cost
analyzer shows

Global Stats

Elapsed
Time(s)

Cpu
Time(s)

IO
Waits(s)

Other
Waits(s)

Fetch
Calls

Buffer
Gets

Read
Reqs

Read
Bytes

Cell
Offload

0.97

0.94

0.02

0.01

1

380K

6026

3GB

95.13%

SQL Plan Monitoring Details (Plan Hash Value=1373192284)

Id

Operation

Name

Rows
(Estim)

Cost

Time
Active(s)

Start
Active

Execs

Rows
(Actual)

Read
Reqs

Read
Bytes

Cell
Offload

Mem
(Max)

Activity
(%)

Activity Detail
(# samples)

0

SELECT STATEMENT

1

+1

1

3

1

. SORT GROUP BY

3

105K

2

+0

1

3

2048

100.00

Cpu (1)

2

.. TABLE ACCESS STORAGE FULL

##TABLE_NAME##

4M

105K

1

+1

1

4M

6026

3GB

95.13%

3M

On database 2, the query finishes anywhere between 10 seconds to 60
seconds. The cost analyzer shows
Global Stats

Elapsed
Time(s)

Cpu
Time(s)

IO
Waits(s)

Application
Waits(s)

Cluster
Waits(s)

Fetch
Calls

Buffer
Gets

Read
Reqs

Read
Bytes

Cell
Offload

21

10

12

0.00

0.00

1

4M

62926

4GB

81.24%

SQL Plan Monitoring Details (Plan Hash Value=1373192284)

Id

Operation

Name

Rows
(Estim)

Cost

Time
Active(s)

Start
Active

Execs

Rows
(Actual)

Read
Reqs

Read
Bytes

Cell
Offload

Mem
(Max)

Activity
(%)

Activity Detail
(# samples)

0

SELECT STATEMENT

19

+2

1

3

1

. SORT GROUP BY

3

118K

19

+2

1

3

2048

2

.. TABLE ACCESS STORAGE FULL

##TABLE_NAME##

4M

118K

20

+1

1

4M

62926

4GB

81.24%

3M

100.00

Cpu (9)
cell single block physical read (11)

I can't explain why one database spends so little time on the data
retrieval while the other one spends almost all it's time trying to
retrieve the data.

I am guessing it is due to smart scan offloading. If I force data reads by
adding the hint +OPT_PARAM('cell_offload_processing' 'false') and run the
query a few times, I can get similar execution times on both databases.

Database 1:

Global Stats

Elapsed
Time(s)

Cpu
Time(s)

IO
Waits(s)

Fetch
Calls

Buffer
Gets

Read
Reqs

Read
Bytes

3.86

2.96

0.91

1

380K

6015

3GB

SQL Plan Monitoring Details (Plan Hash Value=1373192284)

Id

Operation

Name

Rows
(Estim)

Cost

Time
Active(s)

Start
Active

Execs

Rows
(Actual)

Read
Reqs

Read
Bytes

Mem
(Max)

Activity
(%)

Activity Detail
(# samples)

0

SELECT STATEMENT

2

+2

1

3

1

. SORT GROUP BY

3

105K

2

+2

1

3

2048

2

.. TABLE ACCESS STORAGE FULL

##TABLE_NAME##

4M

105K

4

+0

1

4M

6015

3GB

100.00

Cpu (4)

Database 2:

Global Stats

Elapsed
Time(s)

Cpu
Time(s)

IO
Waits(s)

Cluster
Waits(s)

Fetch
Calls

Buffer
Gets

Read
Reqs

Read
Bytes

9.40

7.11

2.28

0.00

1

4M

9546

324MB

SQL Plan Monitoring Details (Plan Hash Value=1373192284)

Id

Operation

Name

Rows
(Estim)

Cost

Time
Active(s)

Start
Active

Execs

Rows
(Actual)

Read
Reqs

Read
Bytes

Mem
(Max)

Activity
(%)

Activity Detail
(# samples)

0

SELECT STATEMENT

8

+2

1

3

1

. SORT GROUP BY

3

118K

8

+2

1

3

2048

2

.. TABLE ACCESS STORAGE FULL

##TABLE_NAME##

4M

118K

9

+2

1

4M

9546

324MB

100.00

Cpu (7)
cell multiblock physical read (1)
cell single block physical read (1)

Is Smart Scan really the culprit here and if so, why isn't database 2
using it as well as database 1? If not, how else can I account for such
wild discrepancies for such similar data?

Follow-Ups:
- Re: Comparing apples to apples on Exadata
  - From: Jonathan Lewis

References:
- Comparing apples to apples on Exadata
  - From: Will Beldman

Re: Comparing apples to apples on Exadata

Other related posts: