Re: Index rebuilding

From: "Richard Foote" <richard.foote@xxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 15 Nov 2004 23:03:39 +1000
Hi Tony,

Some comments embedded


>
> Two examples in Richard's paper (fantastic work) that does
> not match my observation are:

Thank you !!

See explainations below why things "don't match".

>
> 1.  What happens when the left most leaf split, when a new row with new
> max value in inserted?
>
> Will the two sibling be 50-50 filled like all other nodes and leafs?
> According to Richards paper it is 90-10 or 99-1 split. My observation is
> that it (#rows-1) to 1.  Yes only one row in the right most block.

You've misunderstood my slides with regard to 90-10 splits. I use the term 
"90-10" because that's the name Oracle associates to the related statistic 
in statspack reports and the such. I prefer the term "99-1" split because 
Oracle leaves behind a full block and only places the "1" leaf entry in the 
new block. It's the "1" that's important not the "99", which is obviously 
dependent on the number of leaf entries in the full leaf block.

In summary, I agree totally with your observation and the presentation 
doesn't claim anything to the contrary (although I can see how there can be 
confusion: the white paper lays it out more clearly).

> In fact what Richard is saying is, it is more like 99 to 1.
> What my observation is, it is (total rows before split -1) to 1.

No, I didn't (mean to) suggest it's 99-1 or 90-10 or any such ratio. The 
presentation (I thought) clearly says Oracle leaves behind "full" blocks and 
only the new index entry is placed in the new leaf block. I just prefer the 
term 99-1 (rather than 90-10) as it suggests only 1 index entry goes into 
the new block and "the rest" remains in the previous leaf block. The term 
90-10 (to me anyways) suggests more than 1 index entry is placed in the new 
block which is not true.

Obviously 99-1 is not a particularly clear term either !!

>
>
>
> 2. Reusable Empty/deleted Blocks
>
> Richards test was
> 1.  Insert 10,000 rows
> 2.  Del 1st 9,990 rows
> 3. Analyse index
> 4.  Insert another 10,000 rows of increasing value
> 5.  Re analyse index.
>
> The test I carried out was to insert 500,000 rows but to start deleting
> individual rows when the number of rows reached 1000. ie a rolling 1000
> records of about 499,000 inserts and deletes.
>
> create table t2 (col1 number);
> create UNIQUE index t2_IDX on t2(col1)  ;
> alter table t2 add constraint T2_PK PRIMARY KEY (col1);
>
> REM 500,000 Rolling Inserts & Dels with rolling 1000 records
>
> begin
> for i in  100000..600000 loop
>    delete t2 where col1 = i-1000 ;
>    insert into t2 values(i);
> end loop;
> end;
> /

You've forgotten one very important thing with this example, the commit !!

The above loop is treated by Oracle as one logical transaction, therefore 
the deleted entries have never been committed and all belong to the *same* 
transaction. The problem here of course is that deleted entries can not be 
cleaned out if they haven't been committed (for hopefully obvious reasons, 
consider the implications in a normal scenario with a rollback command).

Therefore in your example, Oracle has absolutely no alternative but to keep 
growing the index and your results are totally predicable.

To give the Oracle a chance to reclaim the space, place a commit in the 
loop, so that each insert / delete combination represent different logical 
transactions (or commit after every x iterations). Now look at the size of 
your resultant index and wow, what a difference it all now makes !!

>
> On analysing the index and selecting from index_stats , I get
>
>
> SQL> select lf_blks, del_lf_rows from index_stats;
>   LF_BLKS DEL_LF_ROWS
> ---------- -----------
>       939      499001
> ------^^^^^^
>
> which shows that there is a lot of unused deleted blocks.
>
> Is he wrong or am I wrong?  All I can say right now is, that is what
> Richard observed and this is what I observe. (I am on 9.2.0.4 on Windows
> XP).

Well I guess we're both right as they're both totally *different* scenarios. 
What's perhaps missing is the somewhat important implication of what happens 
when index entries are deleted but not committed. That's the fundamental 
difference between the two tests and why it appears we get differing 
results. Deleted index entries need to be committed to be re-used.

>
> However! if you execute gather stats using DBMS_STATS and
> and select from user_indexes, you'll see
>
>
> INDEX_NAME BLEVEL LEAF   DISTINCT  CLUSTERING NUM   AVG LEAF      AVG DATA
>                  BLOCKS KEYS      FACTOR     ROWS  BLKS PER KEY  BLKS PER 
> KEY
> ---------- ------ ------ --------- ---------- ----- ------------- ------------
>    T2_IDX      2      3      1000          5  1000             1 1
>
> ----------------------^^^^^
>
> Ah huh, so the leaf blocks are being reused after all.
> NOTE: in this case, we are testing a Right Hand growth index ie index
> column is based on a ascending sequence #

Actually, you're missing the importance of a vital stat in the above 
analysis, the *blevel* of *2*. This shows that the leaf count is not quite 
telling us the whole story ...

>
> So it really depends on the test case and how you validate your hypothesis
> Check it out yourself.
>

Quite true. It's also important that the right hypothesis is used to 
validate a test case. This can be extremely difficult. That's why I try to 
show *how* I come to a hypothesis by giving examples exactly as you've done 
with this post. That way people can test things for themselves in their own 
environments, on their own data and data loads and see whether the same 
holds true and importantly whether something was not fully considered in the 
test case.

So thank you very much for raising these questions and importantly, 
providing information on why these questions were raised. This gives me the 
information to make a informed response.

> Questions
> ----------
> Q1.  So can someone explain the difference in results when using 
> Index_stats
> and User_indexes?

Validate Structure is showing more of the "truth" than dbms_stats. The 3 in 
DBA_INDEXES.LEAF_BLOCKS represents the number of leaf blocks that actually 
contain non-deleted index entries, whereas INDEX_STATS shows us the total 
number of index blocks currently in the index structure, including those 
re-placed on the freelist due to "empty" leaf blocks that have had all index 
entries deleted. These "empty" blocks can be reused by Oracle in a different 
location within the index structure with subsequent insertions (remembering 
in your example this wasn't initially possible as all insertions/deletions 
were in the same transaction).

>
> Q2.  From index_stats
> select lf_rows, del_lf_rows, lf_blks  from index_stats;
>
>   LF_ROWS DEL_LF_ROWS    LF_BLKS
> ---------- ----------- ----------
>    500001      499001        939
>
> Does that mean, _actual_ rows left =  LF_ROWS - DEL_LF_ROWS  ?

Yes !!

If you checkout my presentation again, it mentions the fact that deleted 
entires are counted in most of the index_stats statistics (eg. non-deleted 
rows = lf_rows - del_lf_rows, pct_used by non-deleted rows = ((used_space - 
del_lf_rows_len) / btree_space) * 100 etc. )

Hope this helps to clear things up a little.

Cheers

Richard 


--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: Index rebuilding
  - From: Tony Jambu
Re: Index rebuilding

Other related posts: