Re: Strange cell offload behavior

From: Fairlie Rego <fairlie.rego@xxxxxxxxx>
To: xt.and.r@xxxxxxxxx
Date: Wed, 26 May 2021 07:31:42 +1000

Well I have environments where we have applied the below fix on top of 19.10
(since we can't afford the risk)

*Bug 31626438 LARGE BUFFER CACHE ON EXADATA DISABLES SERIAL SMART SCAN
OFFLOAD*

REDISCOVERY INFORMATION:
If finding it difficult to get serial scan to offload when a large buffer
cache is configured, this bug might be rediscovered.

Large is a subjective and relative term and the published notes do not
indicate
what "large" means

Ta
Fairlie
On Wed, May 26, 2021 at 5:32 AM Sayan Malakshinov <xt.and.r@xxxxxxxxx>
wrote:

Hi Lok,
+ Roger MacNicol in BCC,

Yes, you are absolutely correct, from your trace we can see:
kcbstt 9836 - small table threshold (blocks)
keep_nb 0  - no keep pool
kcbnbh 255727 - buffer cache size (blocks)
nblks 541438 - object size in blocks
vlot 500  - very large object threshold, 255727 * 500% = VLOT=1278635
kcbimd: ...kcbnbh 25572 - object is bigger than 10% of buffer cache
(255727*10%=25572), so it "is medium"
ntcache 0 - object is not cached (0 blocks in the buffer cache)

Hi @Roger, we read your

NSMTIO shows the decision (but* we don't give any internal details of

the cost model used*)

But could you give any ideas, please, why in this simple case (segment
size = 212% of buffer, no blocks in the buffer cache, no prewarm, no
prefetch, serial read) it chooses buffer reads?
Logically, we can make an assumption that it's related to the estimated
returning "Bytes" from the plan:
more bytes to return - less effective smart scan, but when E-Rows=1 ?!

PS. don't want to speculate but interesting switching point: more or less
1kB :D

On Tue, May 25, 2021 at 9:33 PM Lok P <loknath.73@xxxxxxxxx> wrote:

We have sga_target set as 6GB and no specific value for dba_cache_size
has been set.

Below two lines suggest that the object was > medium table threshold and
< very large table threshold. But the immediate next line states it's going
with buffer cache reads. But nothing mentioned about why it decided to do
so?

NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from
caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)

NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 53, objd: 1063361, objn:
1063361

Should i also trace the one with less number of columns which is going
for smartscan and compare? But still if this trace won't give details on
how it decided that. No meaning in tracing that. And also the odd part is
increasing the number of columns in the SELECT list should actually make it
more favorable for smartscan because of more block number but somehow it's
working the other way here.

On Tue, May 25, 2021 at 9:50 PM Lok P <loknath.73@xxxxxxxxx> wrote:

Thank you so much Sayan.

I tried tracing ,and below is the key section of the trace
(10358+NSMTIO) for the original query(i.e. will all the columns). I was
trying to understand it , but found it a bit confusing. I do see it has
taken the final decision on doing cache scan but not cell offload but not
able to understand the cause. Can you guide me here on what exactly it's
pointing to? Why are we failing to get the benefit of SMART SCAN?

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

*** 2021-05-25T07:15:15.503247-04:00

kcbism: islarge 1 next 0 nblks 914171 type 3, bpid 65535, kcbisdbfc 0
kcbnhl 8192 kcbstt 9836 keep_nb 0 kcbnbh 255727 kcbnwp 4

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

kcbism: islarge 0 next 0 nblks 36 type 3, bpid 65535, kcbisdbfc 0 kcbnhl
8192 kcbstt 9836 keep_nb 0 kcbnbh 255727 kcbnwp 4

kcbism: islarge 1 next 0 nblks 541438 type 2, bpid 3, kcbisdbfc 0 kcbnhl
8192 kcbstt 9836 keep_nb 0 kcbnbh 255727 kcbnwp 4

kcbimd: nblks 541438 kcbstt 9836 kcbnbh 25572 bpid 3 kcbisdbfc 0
is_medium 0

kcbivlo: nblks 541438 vlot 500 pnb 255727 kcbisdbfc 0 is_large 0

NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read
from caches(local/remote) and checking storage reduction factors (OLTP/EHCC
Comp)

NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 53, objd: 1063361, objn:
1063361

ckpt: 0, nblks: 541438, ntcache: 0, ntdist:0

Direct Path for pdb 0 tsn 53 objd 1063361 objn 1063361

Direct Path 0 ckpt 0, nblks 541438 ntcache 0 ntdist 0

Direct Path mndb 0 tdiob 859 txiob 0 tciob 133

Direct path diomrc 8 dios 2 kcbisdbfc 0

kcbdpc: kx 8 kc 8 lhs 4 rhs NSMTIO: Additional Info: VLOT=1278635

Object# = 1063361, Object_Size = 541438 blocks

SqlId = XXXXXXX, plan_hash_value = 529217307, Partition# = 0

NSMTIO: fple: FALSE, sage: TRUE, isTableSpaceOnSage: 1, invalid table
sapce number: FALSE, prmsdec: TRUE, is kernel txn table space encrypted:
-1,is enc_ktid encrypted: -1, is sage enabled based on data -layer checks:
1, isQesSageEnabled: TRUE

NSMTIO: kcbzib: Cache Scan triggered for tsn = 53, rdba=0x27c04404, fn =
159, kobjd = 1063361, block cnt = 124, noncontig = FALSE pre-warm = FALSE,
prefetch = FALSE

NSMTIO: kcbzib: Cache Scan triggered for tsn = 53, rdba=0x27c0440c, fn =
159, kobjd = 1063361, block cnt = 116, noncontig = FALSE pre-warm = FALSE,
prefetch = FALSE

NSMTIO: kcbzib: Cache Scan triggered for tsn = 53, rdba=0x27c04414, fn =
159, kobjd = 1063361, block cnt = 108, noncontig = FALSE pre-warm = FALSE,
prefetch = FALSE

NSMTIO: kcbzib: Cache Scan triggered for tsn = 53, rdba=0x27c0441c, fn =
159, kobjd = 1063361, block cnt = 100, noncontig = FALSE pre-warm = FALSE,
prefetch = FALSE

On Mon, May 24, 2021 at 4:39 AM Sayan Malakshinov <xt.and.r@xxxxxxxxx>
wrote:

Hi Lok,

https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-1

https://blogs.oracle.com/smartscan-deep-dive/when-bloggers-get-it-wrong-part-2

https://blogs.oracle.com/smartscan-deep-dive/correct-syntax-for-the-table_stats-hint
Have you read this series of articles by Roger MacNicol?
You just need to analyze or provide the following trace events for your
queries:
https://github.com/xtender/xt_scripts/blob/master/trace_nsmtio.sql
-- Buffer cache decision making:
alter session set events '10358 trace name context forever, level 2';
-- Direct I/O decision making:
alter session set events 'trace [NSMTIO] disk highest';

On Sun, May 23, 2021 at 7:43 PM Lok P <loknath.73@xxxxxxxxx> wrote:

Hello Listers,

Its version 19C of oracle and optimizer_features_enable 19.1.0.

While working on a separate issue, we came across a situation which is
a bit odd. A simple SELECT query fetching ~13 columns from a table - TAB1
showing odd behaviour. Sometimes within seconds and sometimes more than a
minute. When , It was running for more than a minute , it happened to be
clear that it's not doing cell offloading from its sql monitor. And then
by
just removing one column from its SELECT list , the smartscan started
happening and the query finished in a few seconds.

Then I started running the same query for all the ~13 columns in the
SELECT list but this time by setting session level "serial_direct_read"=
always. And here we see the smart scan again started happening and the
query finished in quick time. Is this behaviour because of any bug ?
Because I have not seen any such criteria in which the smartscan is
restricted by the number of columns in the SELECT list.

Attached is the sample table script with the test case which produces
the above scenario.  Table TAB1 is a non partition table with size ~7GB
and
there are ~64million rows in it.

Note- Its a third party database and we see few of the optimizer
parameters set to non default as its clear from the outline section. But
at
least none of these should be impacting the choice of smartscan.

SELECT      A_ID,       A_ACTN,       RNM,       FNM,       OVL,
NWVL,       COL1,       COL2,       COL3,       COL4,       COL5,
COL6,       COL7
  FROM USER2.TAB1
WHERE     OVL <> NWVL
       AND (   (RNM = 'XXXX' AND COL1 = '1')
            OR (RNM = 'YYYY' AND COL1 = '1'))

Regards
Lok

--
Best regards,
Sayan Malakshinov
Oracle performance tuning engineer
Oracle ACE Associate
http://orasql.org

--
Best regards,
Sayan Malakshinov
Oracle performance tuning engineer
Oracle ACE Associate
http://orasql.org

--
Fairlie Rego
Executive Database Architect
www.technoconsulting.com.au
http://www.linkedin.com/in/fairlierego
https://fairlierego.wordpress.com/

Twitter @fairlierego

References:
- Strange cell offload behavior
  - From: Lok P
- Re: Strange cell offload behavior
  - From: Sayan Malakshinov
- Re: Strange cell offload behavior
  - From: Lok P
- Re: Strange cell offload behavior
  - From: Lok P
- Re: Strange cell offload behavior
  - From: Sayan Malakshinov

Re: Strange cell offload behavior

Other related posts: