One last thing: Breaking up your threads with a multiple level of partitioning,
a given thread should do all of the higher level partitions. This may fall out
of the way you write the sql generating sql. Do this is determining the number
of threads, as well. And don’t worry about duplicating any partition references
from the test to the final; those just won’t have work to do.
mwf
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Mark W. Farnham
Sent: Sunday, March 28, 2021 7:21 AM
To: loknath.73@xxxxxxxxx; 'Oracle L'
Subject: RE: Inserting with billion of rows faster
First, you should read some incarnation of Tim Gorman’s paper “Scaling to
Infinity.” By my lights he wrote about that better than anyone else. This is
partly because he is a very good writer and partly because he implemented or
taught someone else to implement it enough times in challenging flavors and
varieties that his paper works out the order, method, and details anticipating
pretty much anything that could go bump in the night by a designed order of
operations.
Do not discard out of hand that deleting 50 million rows is faster than copying
20 billion rows without doing a proportional test of significance. AND if the
structure of your partitioning is such that most of the partitions are certain
to NOT being involved in your proposed delete, your 20 billion might drop
dramatically. However do notice that you may have a significant challenge if
you have any global index. Slap a clone over on a test machine and test whether
any global indexes can be rebuilt reasonably.
Second, if you cannot crop that 20 billion down significantly or a global index
rebuild makes the scaling by partition exchange impossible, you may find that
it is worth it to build a secondary table with your same partition structure
and the only other payload (attribute row) the rowid to be deleted. In that the
new table, let’s call the rowid from the old table that you are inserting
rowid_payload, remembering that you need the attribute column that is the rowid
in the old table, not the rowid pseudo column of the delete table. (And yes, I
have described this to someone without saying that and the next phone call was
“No rows were deleted.”)
IF the maximum rows to be deleted from any partition is of a commit reasonable
size, you then generate sql from the dictionary:
delete from {partition_name reference in broken table} where rowid in (select
rowid_payload from {partition_name reference in delete table});
commit;
(make sure you commit after each partition, that is the point.)
IF the maximum row to be deleted from a single lowest level partition is “too
big” then it is trickier. Worry about that only if you have to. Consider
running the delete on the single biggest delete list partition just to see if
it will work before you guess it is too big.
Take the one from the list and run it. Compute the rows deleted per unit time.
Take the next two, compute.
4, 8, ie doubling, in logical parallel jobs until the rate degrades. Say it
degrades between 16 and 32. Try 24 next, and use a binary search to approach
the calculation of the optimal number of logical threads to run. Don’t be too
fussy. IF 16 versus 32 turns out to be the breakpoint and 24 is better than 16,
I’d call that binary search done and use 24. If 24 is slower, I’d call it done
with 16 the answer.
Remember that the rate is the total deletes per unit time, not the deletes per
thread per unit time. In case of a tie, I would go with the smaller number of
threads.
Remember to break up your threads by total rows to be deleted, not partition
count.
Good luck, and remember to consider whether copy keeping good rows and
partition exchange will work for you.
mwf
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Lok P
Sent: Sunday, March 28, 2021 1:44 AM
To: Oracle L
Subject: Re: Inserting with billion of rows faster
The delete which we are now using is something as below for removing the
duplicates from the target, but it's taking a lot of time, So thinking if there
is any faster way to achieve it?
DECLARE
type rowidArray is table of rowid index by binary_integer;
type emRec is record
(rowid rowidArray);
l_record emRec;
l_array_size number:=10000;
l_done boolean;
l_cnt number:=1;
l_processed number:=0;
cursor c is select
rp.rowid
from dp, rp
where .....................;
BEGIN
dbms_application_info.set_module('Deletes ','Starting...');
open c;
loop
dbms_application_info.set_action('Processing '||l_cnt||' thru
'||(l_cnt+l_array_size-1)||' ('||l_processed||')');
fetch c bulk collect into l_record.rowid LIMIT l_array_size;
l_done := c%notfound;
forall i in 1 .. l_record.rowid.count
delete from ...........
where rowid = l_record.rowid(i);
l_processed:=l_processed + SQL%ROWCOUNT;
commit;
exit when (l_done);
l_cnt := l_cnt + l_array_size;
end loop;
dbms_application_info.set_action('Processed '||l_processed||' deletes FROM
.......');
end;
/
On Sun, Mar 28, 2021 at 11:07 AM Lok P <loknath.73@xxxxxxxxx> wrote:
Hi Listers, we have one process in which we load ~50million daily (using direct
path insert.. INSERT APPEND ) to a target table which holds ~20billion
rows(~1TB in size)) and is list -hash composite partitioned. ~2 list partition
and ~2048 hash subpartitions. It has 4 indexes out of those one is a composite
primary key comprising of 5 columns.We have that data load job failing since
~15-20 days without notice and it thus accumulated ~billion rows and the load
process is now failing with (Ora-01628 max extent 32765 reached for rollback
segment).
So we thought of running it in a bulk collect method(append_values hint) and
commit in chunks of ~1million. But during that process we endup seeing the data
load is significantly slower , it was inserting ~1million rows in ~40minutes.
And we were not able to understand the reason but the wait events were all
showing "cell single block physical read" and the object was the target load
object. Means it was the INSERt which was struggling. So we now endup having
partial data loaded to the target i.e. around ~40-50million loaded to the
target. And as it has a primary key , we have to delete the ~40-50million rows
from the target table and then reload it.
I was thinking if we should do the delete in bulk method with a chunk of
1million , but again that will happen in a single thread and will be
significantly slow. So what is the best way to have those ~40-50million data
deleted from the target table(which holds a total ~20billion rows)?
And then I was also thinking , if we could make the index unusable and perform
the delete and then data load(which would happen with almost zero UNDo in
absence of INDEXes) , but then in that case it seems that DELETE will need the
PK index to fetch the rowids so we cant get rid of the PK index then. So what
is the best approach to go for the delete and data load here without breaking
data consistency in the target table?
So wanted to understand what is the best/faster approach to go for delete and
data load in this situation?
Regards
Lok