RE: how many LIOs is too many

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 22 Jan 2014 10:07:49 +0000


An important starting check:

Does the index chosen by the optimizer identify roughly the correct number of 
rows in the table before applying the maudrecno and memrecno tests, or is it 
collecting far more rows than it needs and discarding most of them - if the 
index is ridiculously inefficient then you need the index on or at starting 
with (maudreco, memrecno) to minimise the work identifying candidate rows.

Does the query first collect far too many rows (after applying all predicates), 
sort them, and then discard most of them before returning them, or is the 
outstanding number of rows typically less than the maximum 500 allowed by the 
query.  If the latter then any sensible use of a (maudrecno, memrecno) index 
will be sufficient; if the former than you need to ensure that Oracle walks the 
index with a "stopkey" and doesn't sort the data.

I modeled your data and found an optimizer bug (I haven't checked to see if 
it's documented, yet, but it's clearly a bug) in 11.2.0.4. However I was able 
to get the following execution plan automatically:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   
A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |     15 
|00:00:00.01 |      10 |
|*  1 |  COUNT STOPKEY                 |       |      1 |        |     15 
|00:00:00.01 |      10 |
|   2 |   VIEW                         |       |      1 |     15 |     15 
|00:00:00.01 |      10 |
|   3 |    CONCATENATION               |       |      1 |        |     15 
|00:00:00.01 |      10 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |      9 |      9 
|00:00:00.01 |       6 |
|*  5 |      INDEX RANGE SCAN          | T1_I1 |      1 |      9 |      9 
|00:00:00.01 |       4 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |     16 |      6 
|00:00:00.01 |       4 |
|*  7 |      INDEX RANGE SCAN          | T1_I1 |      1 |     80 |      6 
|00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=15)
   5 - access("MAUDRECNO"=49995 AND "MEMRECNO">10)
   7 - access("MAUDRECNO">49995)


Notice how Oracle has done two consecutive range scans, the first got 9 rows 
(which was all the rows matching the two-column predicate) and the second got 
just 6 more rows (because my stopkey was 15) even though (trust me) there were 
another 1000 rows matching the predicate.  Because the optimizer re-arranged 
the predicates and ordered the concatenation correctly the 15 rows would have 
been supplied to the inline view in the correct order, so the "order by" didn't 
need a sort.

This is the plan you want.

The BUG:  if I decreased my boundary value (the 49995) the optimizer stopped 
doing a concatenation and switched to a single FULL scan.  This also enabled it 
to avoid a sort, and to avoid collecting more data than necessary - but it 
walked through 2,200 index leaf blocks before getting to the right leaf block. 
The reason I know it's a bug is that the cost of the full scan was 4, when it 
should have been much larger (in the order of 2,000).


Your strategy:  create the correct index on the table, and check the execution 
plan; add a /*+ use_concat */ hint inside the inline view to get the 
concatenation plan if necessary, and then generate an SQL Baseline for that 
plan, attaching it to the original text.




Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: Ram Raman [veeeraman@xxxxxxxxx]
Sent: 22 January 2014 00:07
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: how many LIOs is too many


There is no index available on (maudrecno, memrecno) combo. I have uploaded the 
same document again with more information, including the list of all indexes on 
the table:

https://drive.google.com/file/d/0B9YC82qZ8_3eMF9HZXVHeVhOcDA/edit?usp=sharing



On Tue, Jan 21, 2014 at 3:49 PM, Jonathan Lewis 
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:


It looks as if that code is trying to sweep through the table picking up "the 
oldest data not yet processed", remembering to start of from where it ended on 
the last pass (based on a two-part key).  It also looks as if the developer 
expected it to use an index on (maudrecno, memrecno) to walk the index order so 
that it could stop after 500 suitable rows and return the data without sorting. 
If that's the case it's not obvious why the optimizer is choosing the wrong 
index.

As someone else said, concurrency (in the same blocks) also pushes up the 
number of LIOs, so if this is "find the recent activity" the query is 
constantly scanning through recent data which is either not yet committed, or 
may be in need of block cleanout, and therefore produces lots of extra LIOs as 
visits to the undo segment.

Off the top of my head I'm not certain that the optimizer can use the predicate 
at line 4 as an access predicate to do a "count stopkey" with "sort order by 
nosort" - but I think it should be able to if you've got the right index in 
place.  So my next move would be to check if the index exists, run a test to 
see if it can do what I think it can, and if so create an SQL Baseline to force 
the use of that index.


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com<http://jonathanlewis.wordpress.com/>
@jloracle
________________________________
From: Ram Raman [veeeraman@xxxxxxxxx<mailto:veeeraman@xxxxxxxxx>]
Sent: 21 January 2014 21:11
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: how many LIOs is too many

Thanks to everyone who responded. Jonathan, how do you say that it is 1800 
exec/hr? I did (789388/ (sysdate-first_load_time)); with about 33 days, it came 
to abuot 1,000 exec/hr.

I have some more information about the SQL:  
https://drive.google.com/file/d/0B9YC82qZ8_3eNGxFeHdQdlJiN1k/edit?usp=sharing



In this case the number of LIOs per execution is probably not the important bit 
- the important bit is that the query seems to take a little over 2 CPU seconds 
per execution.
At 1,800 executions per hour (rather than the "few thousand" you suggest, this 
would be equivalent to eliminating one  of your CPUs. Unless you've got at 
least 8 (real) CPUs, you don't want to be running this query; if you've got a 
small number of real CPUs which are using threads to fake themselves up to look 
like lots of CPUs you really don't want to be running this query.

To answer your question
1) Yes - and the bizarre thing is that the code fetch a couple of hundred rows 
in order, processed and updated the first one (which took it off the list) then 
re-ran the query to fetch a couple of hundred again.  If you can't see the 
code, try tracing it (and read the trace file) to see what the process does 
next after fetching the 500.

2) No formula that can be generally applied - especially if you're interested 
in precision.  Rick's suggestion is a generous over-estimate and talks about 
"final set" - but doesn't make cleara that "final set" could have to allow for 
an aggregate:  your query might be required to aggregate 20,000 rows to 500 - 
any estimate should be about the 20,000 not the 500. In your case (as Mark 
indicates) you may be acquiring and sorting a very large volume of data and 
then discarding all but 500 rows, and it's the volume acquired that matters, 
not the volume returned.

Bottom line - don't worry about the details at present, that query (with its 
current plan) is clearly too CPU-intensive to be run thousands of times per 
hour.
a) find out what it's doing and why - you may be able to reduce the execution 
count or fetched volume
b) review the current execution plan to see if there is a better path possible
c) review the SQL to see if it can be re-written


Other related posts: