Re: Full table scan -- uniform extent allocation - extent allocation map issue?

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: Oracle L <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 26 Nov 2019 17:07:38 +0000

No longer any need to send a full test case.  Your observation has been known
for a very long time and doesn't need to be demonstrated again.

First thought, of course, is that an application should not be dropping and
creating segments at all (let alone frequently). If this type of processing
appears to be needed then the requirement should be re-examined with global
temporary tables in mind.

Second, if one is dropping and creating segments frequently they should be
assigned to a tablespace that is (a) uniform extent sized at 1MB (or more if
the O/S allows for a larger read) and (b) not used by permanent objects.
Feature (a) ensures that Oracle has the best chance of reading the segment as
efficiently as possible and once the tablescan is doing the largest read the
various components of the O/S will handle it's over-optimistic to assume that
changing the order in which extents are accessed will make any significant
difference in a multi-user system.  If (some of) the segments you are dropping
and recreating are guaranteed to be very small and you really don't want to
"waste" 1MB per segment then assign small objects to a tablespace with a small
uniform extent size.

A significant problem, though, with having a good idea is that it's very
difficult to think of all the ramifications and handle all the peripheral
cases.  For example:   "shrink space compact" deletes rows from the table
backwards from the last block of the last extent, inserting them forwards from
the first block of the first extent - should the shrink code be modified to use
the L1 bitmaps to decide where to delete from and where to insert to ? What
impact would that have on the potential for dropping extents.

Consider an idea that sounds much more useful: the idea that you should be able
to do a tablescan "backwards" has been around for (literally) years  - after
all, the most recent, hence most interesting, data will be in the last
extent(s) of the table - it's a feature that hasn't been made (publicly - but
see  https://jonathanlewis.wordpress.com/2016/03/14/quiz-2/ ;) available despite
the fact that from time to time people complain about performance problems
because the end of a table has been changing a lot while their tablescan has
been reading the unchanging start of the table.  (And that's a good reason why
reading by extent id (in reverse order) would be nicer than reading by L1
address.)

Regards
Jonathan Lewis

________________________________________
From: Vishnu Potukanuma <vishnupotukanuma@xxxxxxxxx>
Sent: 26 November 2019 16:30
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Full table scan -- uniform extent allocation - extent allocation
map issue?

Hi Jonathan,

The trace file indicate for autoallocate - yes.. i must have pasted the old one
when i was testing auto allocate (in auto allocate this issue diminishes as the
extent sizes become large)

Oracle allocates extents to a segment at any allocation in the data file
(uniform size) when the demand for space arises which is logical thing but as
oracle reads extents in the order they are allocated during the full table
scans rather than where they are stored, the IO becomes random rather than
sequential given time when extent sizes are low and the workload mainly
contains frequent segment allocation and deallocation and that if other
sessions doing index range scans saturate IOPS at the storage layer especially
HDDs as these are random in nature, reduces the overall throughput of the full
table scan...

I mean when the full table scans are slow, usually we end up looking at various
things, but could be a factor in very rare cases (OLTP) and could be common in
warehouses where segments are frequently dropped or created, and this random
extent allocation can grow right under our nose and we never detect it. There
are various other scenarios, edge cases i am looking at, but i am really
interested as to why a session blindly reads the extents in order they are
allocated rather than in the order they are stored based on L1DBA.

Please give me sometime (have to get dinner before its too late), i will send
the complete test case with which this thing can be replicated.

Thanks,
Vishnu

On Tue, Nov 26, 2019 at 8:58 PM Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>> wrote:
insert into table10 select * From table1;
drop table1;

--- create a tablespace with 350mb in size and uniform extent allocation,

The extent information you've supplied shows us that you've created the final
table in a tablespace using autoallocate - the early extents are 64KB, the next
1MB, the last few you've shown are 8MB.

If you want to supply a test case that has a chance of reproducing then you
need to tell us WHAT size uniform size, preferably giving us a script that
allows us change a couple of names (tablespace / file) and create the
tablespace.

-- create 10 tables with 30 mb in size.

Needs one sample Create table script - did you create it with "initial 30M"
(obviously not); did you create it with "create as select ...",  did you do
anything odd with pctfree; how did you populate it.  Do you need the tables to
allocate 30MB, or should the highwater mark be pushed to 30MB as well.

-- insert into table10 select * From table1;

That's probably going to work - because we'd probably assume you wanted all 10
tables to look alike.

-- drop table1;

That should work, but do you have your recyclebin enabled - that may affect
where the next insert puts the data (especially if some doesn't follow your
instruction that the tablespace should be 350MB).

-- similarly perform the same randomly

If my results don't match your results, might that be because your results are
highlighting a flaw in your system, or because I didn't choose the same random
order as you did ?  Don't supply a "random" test if you want to demonstrate a
reproducible anomaly. Tell us the exact order you used to insert and drop.

-- interestingly the fix is to read the extents one after the other based on
the L1 DBA rather than the order in which they are allocated to the segment

Are you saying you think this is what Oracle does, or what you think Oracle
should do ?

Regards
Jonathan Lewis

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
<oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>> on behalf
of Vishnu Potukanuma
<vishnupotukanuma@xxxxxxxxx<mailto:vishnupotukanuma@xxxxxxxxx>>
Sent: 26 November 2019 14:12
To: Oracle L
Subject: Full table scan -- uniform extent allocation - extent allocation map
issue?

Hi,

I have observed an interesting behaviour today, while working on other aspects
and my hunch appears to be true ,,, I will start by saying that there are too
many variables and scenarios where things can be optimal but i am talking about
the worst case scenario where the segments are frequently created and dropped
in a tablespace.

This works by exploiting certain aspects of the database such as how oracle
maintains extent map in the segment header and how the extent allocation is
done or how the full table scan reads the segment. Inherently in this case
there is nothing we can do as there are too many variables. The impact may be
small in terms of the overall response times but this typically reduces the
lifespan of the disks as the disk spindles ends up moving from higher to lower
tracks and vice versa over and over again...  instead of just in a simple arc
from higher tracks to lower tracks. in worst case this could add delay when the
segment is very large and if the extent sizes are small (this probability of
this occuring is very less but there is still a chance).

when it comes to oracle while performing full table scans, the process starts
by reading the extent map and starts reading extents one by one until the last
extent or high water mark is reached... but what if the extents allocated to
the segment are stored in random areas just like clustering_factor?
interestingly the fix is to read the extents one after the other based on the
L1 DBA rather than the order in which they are allocated to the segment, in
most cases, oracle does a pretty good job allocating extents close to each
other to the segment, but things can get messy.

this is the simplest way to explain and this goes gradually from here and we
can make however we like it.
create a tablespace with 350mb in size and uniform extent allocation,
create 10 tables with 30 mb in size.
insert into table10 select * From table1;
drop table1;
similarly perform the same randomly so that the final end result is only one
table. here we are just simulating a case where the subsequent extents
allocated to the segment are on different regions randomly in a datafile, thus
their locations are on the disk as well (which could be or could not be based
on several factors such as how fragmented the disks are etc, i wont go in that
space as it adds too much complexity).. here we perform a full table scan,
oracle reads the segment header to read the extent map and reads the extents
one by one...

Notice the block numbers in the traces.
WAIT #139711646963136: nam='db file sequential read' ela= 612 file#=5
block#=71810 blocks=1 obj#=75252 tim=248445129880
WAIT #139711646963136: nam='db file scattered read' ela= 763 file#=5
block#=71811 blocks=5 obj#=75252 tim=248445130757
WAIT #139711646963136: nam='db file scattered read' ela= 771 file#=5
block#=71816 blocks=8 obj#=75252 tim=248445131732
....
....
WAIT #139711646963136: nam='db file scattered read' ela= 1941 file#=5
block#=44736 blocks=64 obj#=75252 tim=248445270874
WAIT #139711646963136: nam='db file scattered read' ela= 1788 file#=5
block#=44802 blocks=62 obj#=75252 tim=248445273696
WAIT #139711646963136: nam='db file scattered read' ela= 1902 file#=5
block#=44864 blocks=64 obj#=75252 tim=248445276573
WAIT #139711646963136: nam='db file scattered read' ela= 1756 file#=5
block#=44930 blocks=62 obj#=75252 tim=248445279335
....

WAIT #139711646963136: nam='db file scattered read' ela= 3078 file#=5
block#=132 blocks=128 obj#=75252 tim=248445887737
WAIT #139711646963136: nam='db file scattered read' ela= 3045 file#=5
block#=260 blocks=124 obj#=75252 tim=248445893065
WAIT #139711646963136: nam='db file scattered read' ela= 3399 file#=5
block#=384 blocks=128 obj#=75252 tim=248445898641
....
...
WAIT #139711646963136: nam='db file scattered read' ela= 12269 file#=5
block#=97540 blocks=124 obj#=75252 tim=248446182677
WAIT #139711646963136: nam='db file scattered read' ela= 17440 file#=5
block#=97664 blocks=128 obj#=75252 tim=248446202405
WAIT #139711646963136: nam='db file scattered read' ela= 3282 file#=5
block#=97792 blocks=128 obj#=75252 tim=248446208027
...
...
WAIT #139711646963136: nam='db file scattered read' ela= 3279 file#=5
block#=61058 blocks=128 obj#=75252 tim=248446281696
WAIT #139711646963136: nam='db file scattered read' ela= 3115 file#=5
block#=61186 blocks=126 obj#=75252 tim=248446287141
WAIT #139711646963136: nam='db file scattered read' ela= 2990 file#=5
block#=61312 blocks=128 obj#=75252 tim=248446292359
WAIT #139711646963136: nam='db file scattered read' ela= 3109 file#=5
block#=61440 blocks=128 obj#=75252 tim=248446297711
....
...

WAIT #139711646963136: nam='db file scattered read' ela= 3125 file#=5
block#=7812 blocks=128 obj#=75252 tim=248446303109
WAIT #139711646963136: nam='db file scattered read' ela= 3246 file#=5
block#=7940 blocks=124 obj#=75252 tim=248446309728
WAIT #139711646963136: nam='db file scattered read' ela= 3180 file#=5
block#=8064 blocks=128 obj#=75252 tim=248446315169

the L1 DBA extent map:
  Auxillary Map
  --------------------------------------------------------
   Extent 0     :  L1 dba:  0x01411880 Data dba:  0x01411883
   Extent 1     :  L1 dba:  0x01411880 Data dba:  0x01411888
   ....
   Extent 35    :  L1 dba:  0x0140ae80 Data dba:  0x0140ae82
   Extent 36    :  L1 dba:  0x0140af00 Data dba:  0x0140af02
   Extent 37    :  L1 dba:  0x0140af80 Data dba:  0x0140af82
   ...
   Extent 38    :  L1 dba:  0x0140b000 Data dba:  0x0140b002
   Extent 39    :  L1 dba:  0x0140b080 Data dba:  0x0140b082
   Extent 40    :  L1 dba:  0x0140b100 Data dba:  0x0140b102
   Extent 41    :  L1 dba:  0x0140b180 Data dba:  0x0140b182
   Extent 42    :  L1 dba:  0x0140b200 Data dba:  0x0140b202
   Extent 43    :  L1 dba:  0x0140b280 Data dba:  0x0140b282
   ...
   ...
   Extent 92    :  L1 dba:  0x01417c80 Data dba:  0x01417c84
   Extent 93    :  L1 dba:  0x01418080 Data dba:  0x01418084
   ...
   Extent 94    :  L1 dba:  0x0140ee80 Data dba:  0x0140ee82
   Extent 95    :  L1 dba:  0x01401e80 Data dba:  0x01401e84
   Extent 96    :  L1 dba:  0x01402280 Data dba:  0x01402284
   Extent 97    :  L1 dba:  0x01402680 Data dba:  0x01402684
   Extent 98    :  L1 dba:  0x01402a80 Data dba:  0x01402a84

When it comes to index range scan things are as expected, as the rowids are
sorted it reads all the blocks in one arc (i am not talking about prefetching,
or batched (db file parallel read which exploits IOPS at a storage layer but a
very simplistic case with all prefetching etc turned off) but for full table
scans things appear to be different as it simply following the extent map and
not in an order of their L1 DBA,

maybe this can be explained by something, but I am still not be able to figure
out any other valid reason as to why we should not perform a full table scan
based on the storage order extent map with the exception of last extent. Please
let me know if I have missed anything?

Thanks,
Vishnu
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: Full table scan -- uniform extent allocation - extent allocation map issue?
  - From: Vishnu Potukanuma

References:
- Full table scan -- uniform extent allocation - extent allocation map issue?
  - From: Vishnu Potukanuma
- Re: Full table scan -- uniform extent allocation - extent allocation map issue?
  - From: Jonathan Lewis
- Re: Full table scan -- uniform extent allocation - extent allocation map issue?
  - From: Vishnu Potukanuma

Re: Full table scan -- uniform extent allocation - extent allocation map issue?

Other related posts: