Re: indexing null values curious case?

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: Oracle L <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 21 Nov 2019 14:16:08 +0000

"Function-based" indexes don't do skip scans - it's one of the many
bugs/limitations/failings of FBIs.
See: https://jonathanlewis.wordpress.com/2014/02/25/fbi-skip-scan/

Regards
Jonathan Lewis

________________________________________
From: Vishnu Potukanuma <vishnupotukanuma@xxxxxxxxx>
Sent: 21 November 2019 13:05
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: indexing null values curious case?

Hi Jonathan,

Thanks for your email, your response was invaluable, I kept on thinking what
was happening.
It never occurred to me that the null values could overlap, yap, yes it is the
correct behavior.

But the following still bugs me and could not get a valid explanation as to why
oracle doesn't consider a skip scan even in the best case scenario.

Regarding mark2 is null.
I just created a same test but this time favouring index skip scan, even in the
best possible way with only one distinct value for mark1.
the test is as follows:
create table temp (roll number, name varchar2(20), mark1 number, mark2 number,
mark3 number);
insert into temp select rownum, dbms_random.string(0,20), 1,
dbms_random.value(0,100), dbms_random.value(0,100) from dual connect by level <
100000;
here i made sure that mark1 has only 1 distinct value.
insert into temp values (1000012, 'VISHNU',1, null, 1000);
commit;
created the index in the best possible case as well.
create index idx on temp(mark1, mark2,1);
exec dbms_stats.gather_table_statS('VISHNU','TEMP',cascade=>true);

now the query goes with the full table scan..
SQL> select * from temp where mark2 is null;
   1000012 VISHNU 1    1000
Execution Plan
----------------------------------------------------------
Plan hash value: 1896031711
--------------------------------------------------------------------------
| Id  | Operation  | Name | Rows  | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  | |     1 |    71 |   308   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEMP |     1 |    71 |   308   (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MARK2" IS NULL)

I was wondering why the optimizer was not even considering other scans, and
took the optimizer trace, it clearly didn't consider skip scan as an option and
ended up with the full table scan, it was considering only a index full scan.

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TEMP
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEMP[TEMP]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

kkecdn: Single Table Predicate:"TEMP"."MARK2" IS NULL
  Column (#4): MARK2(NUMBER)
    AvgLen: 21 NDV: 99999 Nulls: 1 Density: 0.000010 Min: 0.000284 Max:
99.999341
  Table: TEMP  Alias: TEMP
    Card: Original: 100000.000000  Rounded: 1  Computed: 1.000000  Non
Adjusted: 1.000000
  Scan IO  Cost (Disk) =   307.000000
  Scan CPU Cost (Disk) =   29018761.440000
  Cost of predicates:
    io = NOCOST, cpu = 20.000000, sel = 0.000010 flag = 2048  ("TEMP"."MARK2"
IS NULL)
  Total Scan IO  Cost  =   307.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) *
100000.000000 (#rows))
                       =   307.000000
  Total Scan CPU  Cost =   29018761.440000 (scan (Disk))
                         + 2000000.000000 (cpu filter eval) (= 20.000000 (per
row) * 100000.000000 (#rows))
                       =   31018761.440000
  Access Path: TableScan
    Cost:  307.870629  Resp: 307.870629  Degree: 0
      Cost_io: 307.000000  Cost_cpu: 31018761
      Resp_io: 307.000000  Resp_cpu: 31018761
kkofmx: index filter:"TEMP"."MARK2" IS NULL

****** Costing Index IDX
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Access Path: index (FullScan)
    Index: IDX
    resc_io: 533.000000  resc_cpu: 23795958
    ix_sel: 1.000000  ix_sel_with_filters: 1.0000e-05
    Cost: 533.724036  Resp: 533.724036  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 307.870629  Degree: 1  Resp: 307.870629  Card: 1.000000  Bytes:
0.000000

***************************************

Then again i tried with the hint index skip scan:
select /*+INDEX_SS(A IDX)*/ * from temp A where mark2 is null;
In this case optimizer simply ignored the skip_scan hint and and from the
optimizer trace 10053 12, clearly it didn't consider skip scan.:

=====================================
Access path analysis for TEMP
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEMP[A]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
kkecdn: Single Table Predicate:"A"."MARK2" IS NULL
  Column (#4): MARK2(NUMBER)
    AvgLen: 21 NDV: 99999 Nulls: 1 Density: 0.000010 Min: 0.000284 Max:
99.999341
  Table: TEMP  Alias: A
    Card: Original: 100000.000000  Rounded: 1  Computed: 1.000000  Non
Adjusted: 1.000000
kkofmx: index filter:"A"."MARK2" IS NULL
****** Costing Index IDX
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Access Path: index (FullScan)
    Index: IDX
    resc_io: 533.000000  resc_cpu: 23795958
    ix_sel: 1.000000  ix_sel_with_filters: 1.0000e-05
    Cost: 533.724036  Resp: 533.724036  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IDX
         Cost: 533.724036  Degree: 1  Resp: 533.724036  Card: 1.000000  Bytes:
0.000000
***************************************

  Hint Report:
    Query Block: SEL$1
      Table: ("A"@"SEL$1")
        Unused: INDEX_SS(A IDX)
  End Hint Report

This time, i went an another step ahead, giving more chance.
modified the mark1 as not null.
alter table temp modify (mark1 not null);
recreated and index and collected the statistics (this may not be necessary).
now with not null constraint we are 100% sure that mark1 is never null, and now
since only 1 distinct value, and it would have been straight forward to have
selected a skip scan, provided I even added 1 (virtual column) to the index.
even in this case, the optimizer goes with a full table scan, even with the
hint.

Thanks,
Vishnu

On Thu, Nov 21, 2019 at 5:48 PM Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:
Vishnu,

Your STUDENTS example looks like a bug.  (The stats for the NULL and '' case
are identical, by the way, so it's likely to be related to the way that Oracle
stores the '' as the DEFAULT for the column definition).

re:
select * from temp where mark1 is null; --> it always goes with a full table
scan
It's got to; Oracle know that there are some rows with mark1 null, some with
mark2 null, so it have to asume that the two sets of rows may overlap to give a
row which does not appear in thei index.

select * from temp where mark1 is null and mark2 = 123123;
Now you're looking for a row which (if it exists) will be in the index.  You
could equally change the mark2 = 123123 to mark2 is not null.

re:
all i did was add an extra column to the index (virtual column).
So now you have a column in the index which is NEVER null, so every row in the
table will appear in the index, so the index can be used for mark1 is null

re
mark2 is null
The possibly paths are
index skip scan
index full scan
index fast full scan
tablescan

The choice of tablescan is almost certainly dictated by cost.

BOTTOM LINE - your students case looks like a bug relating to the special case
of the way Oracle handles the DEFAULT for a constant empty string. The rest are
correct behaviour.

Regards
Jonathan Lewis

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
<oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>> on behalf
of Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>
Sent: 21 November 2019 11:07
To: Andy Sayer
Cc: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Re: indexing null values curious case?

Apparently looks like this entire section dealing with the null is not
developed in its entirety.. the following is a different case as well regarding
the nulls.

even with a multi-column index, considering another scenario as the following:
create table temp (roll number, name varchar2(20), mark1 number, mark2 number);
insert into temp select rownum, dbms_random.string(0,20),
round(dbms_random.value(0,100), round(dbms_random.value(0,200)) from dual
connect by level < 1000000;
insert into temp (roll, name, mark2) select rownum, dbms_random.string(0,20),
round(dbms_random.value(0,200) from dual connect by level < 10;
insert into temp (roll, name, mark1) select rownum, dbms_random.string(0,20),
round(dbms_random.value(0,200) from dual connect by level < 10;
commit;
create index idx on temp(mark1, mark2);
Here i have only 10 rows from each of the columns as null;

exec dbms_Stats.gather_table_stats('VISHNU','TEMP',CASCADE=>TRUE);

from the leaf blocks we can find that the nulls are stored in the index.
The leaf blocks containing the NULLs is as follows:
row#0[8019] flag: -------, lock: 0, len=13
col 0; len 2; (2):  c1 02
col 1; NULL
col 2; len 6; (6):  01 c0 21 de 00 00
row#1[7992] flag: -------, lock: 2, len=14
col 0; len 3; (3):  c2 02 04
col 1; NULL
col 2; len 6; (6):  01 c0 21 de 00 02
row#2[8006] flag: -------, lock: 0, len=13
col 0; NULL
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c0 21 de 00 01
row#3[7978] flag: -------, lock: 2, len=14
col 0; NULL
col 1; len 3; (3):  c2 0d 20
col 2; len 6; (6):  01 c0 21 de 00 03

if we issue a query like select * from temp where mark1 is null; --> it always
goes with a full table scan even when 10 rows only have null values and
apparently the index leaf blocks do contain entries regarding the null values),
for the leading as well as the tail columns as shown above. But the wierd thing
is if we include both the columns mark1, mark2 in the predicate clause as
following:
select * from temp where mark1 is null and mark2 = 123123;
the optimizer selects the index.

now comes even better part.
drop the index  and create the index as follows:
create index idx on temp(mark1,mark2,1);
all i did was add an extra column to the index (virtual column).
collected the statistics and ran the query back again.
select * from temp where mark1 is null;
this time it selects the index properly which makes me wonder if they have
included or written the code or made the optimizer to consider the nulls in the
predicate clause only in the presence of virtual columns.

Another part I believe is that Oracle's statistics (including histogram doesn't
maintain any statistics regarding nulls, which makes sense in a way), as these
are only explicit cases and the concept of statistics  optimizer everything
resolves only around the values that are present but not null..

also an another distinct behavior observed is this regarding of whether the
index is
create index idx on temp(mark1,mark2)
or create index idx on temp(mark1,mark2,1);

any query with predicates involving only mark2 regarding nulls goes with full
table scan forget skip scans etc etc.
queries like
select * from temp where mark1 is not null and mark2 is null;
select * from temp where mark2 is null;
apparently some explanation for this can be the way oracle doesn't include
nulls in the histograms.. and I am tired for the day!

I can't make any assumptions as of which of the optimizer code has to be
developed just with these observations, but the way oracle handles nulls should
be improved consistently.

can someone please tell me if i am missing any thing here? Oracle is a like a
ocean. so many concepts no wonder i became bald at a very young age.

Thanks,
Vishnu

On Thu, Nov 21, 2019 at 2:05 PM Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx><mailto:vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>>
wrote:
This is reproducible case and probably hitting an undeveloped section of oracle
code or a bug. Sure, let me see what I can do.

Thanks,
Vishnu

On Thu, Nov 21, 2019 at 1:55 PM Andy Sayer
<andysayer@xxxxxxxxx<mailto:andysayer@xxxxxxxxx><mailto:andysayer@xxxxxxxxx<mailto:andysayer@xxxxxxxxx>>>
wrote:
Hi Vishnu

File your reproducible test case with Oracle support if you suspect you’re
hitting a bug.

It is not expected for the existence of a special index to change the result of
a query.

If you want to find null values and the selectivity is decent for an index then
I would use an index on (column_name,0).

Thanks,
Andy

On Thu, 21 Nov 2019 at 08:18, Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx><mailto:vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>>
wrote:
Hi,

If all the columns in the index are null then the entries are not stored in the
index. This is a known fact and empty strings are stored as null values.
combining both breaks the consistency part where the query returns wrong
results.

this is really a worst case as no one creates such a index as
(column_name,null) or (column_name,'') as this really doesn't make sense but
this breaks the consistency aspect of database  as the query gives wrong result
as the optimizer consider index even when the leading column is null.

the scenario is basically as following:
Create table students (student_id number, name varchar2(20);
insert into students select rownum, dbms_random.string(0,20) form dual connect
by level < 1000000;
insert into students (name) select dbms_random.string(0,20) from dual connect
by level < 100;
commit;
create index idx on student(student_id,null);
exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
select * from students where student_id is null;
in this case it goes with the full table scan which is correct since both nulls
are not stored.

so now we drop the index idx;
Here we create the index as this.
create index idx on student(student_id,'');
exec dbms_stats.gather_Table_stats('VISHNU','students',cascade=>True);
select * from students where student_id is null;
now Oracle retrives the results as 0, with the index scan.

the leaf blocks store the details as following since the empty strings are
treated as nulls
row#0[8019] flag: -------, lock: 0, len=13
col 0; len 2; (2):  c1 02
col 1; NULL
col 2; len 6; (6):  01 c0 21 dc 00 00
row#1[8006] flag: -------, lock: 2, len=13
col 0; len 2; (2):  c1 03
col 1; NULL
col 2; len 6; (6):  01 c0 21 dc 00 02
but things as expected, index doesn't store entries when the leading column is
null.

here the oracle goes with the index scan and returns zero results. which is
inconsistent result.
optimizer trace considers that index with (student_id, '');

  Access Path: TableScan
    Cost:  21017.655599  Resp: 21017.655599  Degree: 0
      Cost_io: 20995.000000  Cost_cpu: 807173669
      Resp_io: 20995.000000  Resp_cpu: 807173669
  Access Path: index (index (FFS))
    Index: IDX
    resc_io: 1039.000000  resc_cpu: 237414695
    ix_sel: 0.000000  ix_sel_with_filters: 1.000000
  Access Path: index (FFS)
    Cost:  1045.663711  Resp: 1045.663711  Degree: 1
      Cost_io: 1039.000000  Cost_cpu: 237414695
      Resp_io: 1039.000000  Resp_cpu: 237414695
****** Costing Index IDX
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Access Path: index (IndexOnly)
    Index: IDX
    resc_io: 5.000000  resc_cpu: 235407
    ix_sel: 6.6556e-04  ix_sel_with_filters: 6.6556e-04
    Cost: 5.006607  Resp: 5.006607  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IDX
         Cost: 5.006607  Degree: 1  Resp: 5.006607  Card: 999.000000  Bytes:
0.000000

probably we are hitting an undeveloped section of optimizer? producing a wrong
plan is ok, but producing wrong results?

Thanks,
vishnu
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

References:
- indexing null values curious case?
  - From: Vishnu Potukanuma
- Re: indexing null values curious case?
  - From: Andy Sayer
- Re: indexing null values curious case?
  - From: Vishnu Potukanuma
- Re: indexing null values curious case?
  - From: Vishnu Potukanuma
- Re: indexing null values curious case?
  - From: Jonathan Lewis
- Re: indexing null values curious case?
  - From: Vishnu Potukanuma

Re: indexing null values curious case?

Other related posts: