BUG: Row_Number() - Help with a test case

  • From: "amwilson@xxxxxxxxxxxx" <amwilson@xxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Tue, 08 May 2007 11:32:48 +0800

Hi All,

I've posted this on the OTN forums but haven't had much help, so apologies to
anyone who's already seen this.

We've come across a bug in our 10.1 instances when selecting from a view which
uses the ROW_NUMBER() function. I'd like to produce a small test case to send to
our DBA's. I've searched Metalink and have not found any bugs which relate to
this behaviour.

I've distilled the situation into as small a test case as I can produce. I'd be
grateful if some people here could cast their eye over it and tell me:

a) If you are able to reproduce this behaviour using my test script.
b) Any ideas on how to produce a smaller test case, or to clarify exactly what
causes this.

The situation may be summarised as follows: a table T with an ID column (not
nullable, non-unique), some kind of flag column (in this case called CLOSED),
which is nullable, and a date field used for ordering the partitions.

After creating and analyzing the test table, running a query of the form:


with v as (
   select closed
   , row_number() over (
      partition by id
      order by some_date desc
   ) rn
   from t
)
select count(*)
from v
where rn = 1
and closed is null

...a number of times produces non-deterministic results (the result seems to
toggle from one value to another). In my test case I have created a UNION ALL
query, containing the above query repeated 5 times.

I've noticed that:
- The number of distinct ID's seems to affect whether or not this happens.
- There needs to be at least ~1% of values in the CLOSED column which are null.
- The number of rows in the table seems to affect this (you may need to increase
the number to see this happen).

Here is my test case:

REM Start of test case
prompt Enter number of rows: 
accept numrows

drop table t;

create table t (
   id number not null
   , closed varchar2(1)
   , some_date date not null
)
nologging;

insert /*+ append */
into t (
   id
   , closed
   , some_date
)
-- ~10,000 distinct ID's
select trunc(dbms_random.value * 10000)
-- ~1% of records have CLOSED not null
, case sign(trunc(dbms_random.value * 100) - 1)
   when -1 then null
   else 'Y'
end
, sysdate
from dual
-- Number of rows
connect by level <= &numrows;

-- Some versions seem to require the commit before analysing
commit;

exec dbms_stats.gather_table_stats(user, 'T');

with v as (
   select closed
   , row_number() over (
      partition by id
      order by some_date desc
   ) rn
   from t
)
select count(*) from v where rn = 1 and closed is null union all
select count(*) from v where rn = 1 and closed is null union all
select count(*) from v where rn = 1 and closed is null union all
select count(*) from v where rn = 1 and closed is null union all
select count(*) from v where rn = 1 and closed is null;
REM End of test case

And here are the results I see under 10.2.0.2.0:
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
PL/SQL Release 10.2.0.2.0 - Production
CORE    10.2.0.2.0      Production
TNS for 32-bit Windows: Version 10.2.0.2.0 - Production
NLSRTL Version 10.2.0.2.0 - Production

SQL> ed
Wrote file afiedt.buf

  1  explain plan for
  2  with v as (
  3     select closed
  4     , row_number() over (
  5        partition by id
  6        order by some_date desc
  7     ) rn
  8     from t
  9  )
 10  select count(*) from v where rn = 1 and closed is null union all
 11  select count(*) from v where rn = 1 and closed is null union all
 12  select count(*) from v where rn = 1 and closed is null union all
 13  select count(*) from v where rn = 1 and closed is null union all
 14* select count(*) from v where rn = 1 and closed is null
SQL> /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 490403055

-------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     5 |    75 |       |  8250  (81)|
00:01:40 |
|   1 |  UNION-ALL                 |      |       |       |       |            |
         |
|   2 |   SORT AGGREGATE           |      |     1 |    15 |       |            |
         |
|*  3 |    VIEW                    |      |   305K|  4477K|       |  1650   (2)|
00:00:20 |
|*  4 |     WINDOW SORT PUSHED RANK|      |   305K|  3880K|    14M|  1650   (2)|
00:00:20 |
|   5 |      TABLE ACCESS FULL     | T    |   305K|  3880K|       |   180   (4)|
00:00:03 |
|   6 |   SORT AGGREGATE           |      |     1 |    15 |       |            |
         |
|*  7 |    VIEW                    |      |   305K|  4477K|       |  1650   (2)|
00:00:20 |
|*  8 |     WINDOW SORT PUSHED RANK|      |   305K|  3880K|    14M|  1650   (2)|
00:00:20 |
|   9 |      TABLE ACCESS FULL     | T    |   305K|  3880K|       |   180   (4)|
00:00:03 |
|  10 |   SORT AGGREGATE           |      |     1 |    15 |       |            |
         |
|* 11 |    VIEW                    |      |   305K|  4477K|       |  1650   (2)|
00:00:20 |
|* 12 |     WINDOW SORT PUSHED RANK|      |   305K|  3880K|    14M|  1650   (2)|
00:00:20 |
|  13 |      TABLE ACCESS FULL     | T    |   305K|  3880K|       |   180   (4)|
00:00:03 |
|  14 |   SORT AGGREGATE           |      |     1 |    15 |       |            |
         |
|* 15 |    VIEW                    |      |   305K|  4477K|       |  1650   (2)|
00:00:20 |
|* 16 |     WINDOW SORT PUSHED RANK|      |   305K|  3880K|    14M|  1650   (2)|
00:00:20 |
|  17 |      TABLE ACCESS FULL     | T    |   305K|  3880K|       |   180   (4)|
00:00:03 |
|  18 |   SORT AGGREGATE           |      |     1 |    15 |       |            |
         |
|* 19 |    VIEW                    |      |   305K|  4477K|       |  1650   (2)|
00:00:20 |
|* 20 |     WINDOW SORT PUSHED RANK|      |   305K|  3880K|    14M|  1650   (2)|
00:00:20 |
|  21 |      TABLE ACCESS FULL     | T    |   305K|  3880K|       |   180   (4)|
00:00:03 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("RN"=1 AND "CLOSED" IS NULL)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
              INTERNAL_FUNCTION("SOME_DATE") DESC )<=1)
   7 - filter("RN"=1 AND "CLOSED" IS NULL)
   8 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
              INTERNAL_FUNCTION("SOME_DATE") DESC )<=1)
  11 - filter("RN"=1 AND "CLOSED" IS NULL)
  12 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
              INTERNAL_FUNCTION("SOME_DATE") DESC )<=1)
  15 - filter("RN"=1 AND "CLOSED" IS NULL)
  16 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
              INTERNAL_FUNCTION("SOME_DATE") DESC )<=1)
  19 - filter("RN"=1 AND "CLOSED" IS NULL)
  20 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
              INTERNAL_FUNCTION("SOME_DATE") DESC )<=1)

Note
-----
   - 'PLAN_TABLE' is old version

SQL> ed

  1  with v as (
  2     select closed
  3     , row_number() over (
  4        partition by id
  5        order by some_date desc
  6     ) rn
  7     from t
  8  )
  9  select count(*) from v where rn = 1 and closed is null union all
 10  select count(*) from v where rn = 1 and closed is null union all
 11  select count(*) from v where rn = 1 and closed is null union all
 12  select count(*) from v where rn = 1 and closed is null union all
 13* select count(*) from v where rn = 1 and closed is null
SQL> /

  COUNT(*)
----------
       103
       111
       103
       111
       103

I've reproduced this at work under the following versions, with the given number
of rows in the test table:
8i - 300,000 rows
9i - 600,000 rows
10.1.0.3.0 - 300,000 rows
10.1.0.5.0 - 300,000 rows
10.2.0.2.0 - 300,000 rows

The behaviour seems erratic and sometimes does not manifest, but I'm unable to
pin down the exact reason why.  This test case is the best I can come up with,
and seems to display the bug fairly consistently.

Any help would be greatly appreciated.

Thanks,
Anthony


--
//www.freelists.org/webpage/oracle-l


Other related posts: