Re: Big Update on Busy Table

  • From: "Job Miller" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "jobmiller@xxxxxxxxx" for DMARC)
  • To: "dmarc-noreply@xxxxxxxxxxxxx" <dmarc-noreply@xxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 18 Jul 2014 11:52:02 -0700

Charlotte,

That's what dbms_parallel_execute is for:

This package enables the user to incrementally update table data in parallel, 
in two high level steps:
        1. Group sets of rows in the table into smaller sized chunks.
        2. Run a user specified statement on these chunks in parallel, and 
commit when finished processing each chunk.
This package introduces the notion of parallel execution task. This task groups 
the various steps associated with the parallel execution of a PL/SQL block, 
which is typically updating table data.

Examples
The following examples run on the Human Resources (HR) schema of the Oracle 
Database Sample Schemas. It requires that the HR schema be created with the JOB 
SYSTEM privilege.
Chunk by ROWID
This example shows the most common usage of this package. After calling the 
RUN_TASK Procedure, it checks for errors and re-runs in the case of error.
DECLARE l_sql_stmt VARCHAR2(1000); l_try NUMBER; l_status NUMBER;
BEGIN -- Create the TASK DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask'); -- Chunk 
the table by ROWID DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 
'EMPLOYEES', true, 100); -- Execute the DML in parallel l_sql_stmt := 'update 
/*+ ROWID (dda) */ EMPLOYEES e  SET e.salary = e.salary + 10 WHERE rowid 
BETWEEN :start_id AND :end_id'; DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', 
l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 10); -- If there is an error, 
RESUME it for at most 2 times. L_try := 0; L_status := 
DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); WHILE(l_try < 2 and L_status != 
DBMS_PARALLEL_EXECUTE.FINISHED)  LOOP L_try := l_try + 1; 
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask'); L_status := 
DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); END LOOP; -- Done with processing; 
drop the task DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask'); END;
/


________________________________
 From: Charlotte Hammond <dmarc-noreply@xxxxxxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx> 
Sent: Thursday, July 17, 2014 4:36 PM
Subject: RE: Big Update on Busy Table
 


Everyone - thanks for your suggestions!

No, unfortunately I can't get any downtime on this table.   It's very busy 24x7 
- running the update as a single statement would take well over an hour and 
lock many rows.   The update is about 400 million rows of a 2 billion row table.

Mark - I'm doing something similar to what you suggest although my driving 
table uses ROWIDs rather than blocks.   I'm updating 20,000 rows each time and 
setting a flag against these ROWIDs when it completes.  One blessing is that I 
don't need to care what's already in the column I'm updating so it doesn't 
matter if it changes before or after I run each individual chunk transaction or 
that the total update is spread over many hours.

Your suggestion of using blocks sounds better as that way I can localize the 
disk I/O for each chunk although the way I've selected the ROWIDs kind of does 
that anyway (but more by chance than design).  I'll take your suggestions on 
board and look at improving this to squeeze a bit more throughput.

I'm also looking to parallelize this by having a few processes each with its 
own driving table with a subset of the overall rows/blocks to be updated.

Thanks!
Charlotte



RE: Big Update on Busy Table
        * From: "Mark W. Farnham" <mwf@xxxxxxxx>
        * To: <fuzzy.graybeard@xxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
        * Date: Thu, 17 Jul 2014 05:37:21 -0400
Hans identified a key question in 2) below. IF there is an easy way to identify 
chunks remaining to be updated and
especially IF  there is an efficient way to group rows in the same database
block together, then doing so in chunks of at least 1,000 differs from
slow-by-slow by 3 orders of magnitude. IF the driving select is cheap, this
should be sufficient. IF there is a lot of work to the select to identify the 
rows that need to be
updated but you can rely on the OLTP portion of the job not updating this
column in a way that means you should no longer be updating it in  your
batch job, then creating an interim table containing the rowids to be
updated indexed by a nullable key of the block of each row to be updated is
a pretty nifty way to do this. Since only you will be updating the table of
the list to be updated in the real table, your first select then only
operates as a select on the OLTP table. What you deposit in the (at this point 
unindexed) "list of rows to be
updated" table is the rowid, two columns containing the block id extracted
from the rowid, one initially null, and, if variable, the new value for the
column for each row, however you functionally determine that. Let's call
this the ACTION table with columns RID, ALL_BLOCKS, BLOCK_ACTION,
[NEW_VALUE]. Then create single column indexes on ACTION.ALL_BLOCKS and
ACTION.BLOCK_ACTION. At this point it helps to know the minimum and maximum
rows per block to be updated and the average. If you don't mind the commit 
blocks being a bit variable, just update the
stats and use the average. For a useful total number of rows, update
BLOCK_ACTION to the ALL_BLOCKS value and the ALL_BLOCKS value to NULL where
ALL_BLOCKS is not null. Then run your actual update where rowid in select
RID from ACTION where BLOCK_ACTION is not null, update ACTION setting
BLOCK_ACTION to null where BLOCK_ACTION is not null. COMMIT. Rinse and
repeat until all ALL_BLOCKS is null. [IF you are updating to a constant
NEW_VALUE, just leave that column out of this whole thing and use the
constant.] I've suspended disbelief that you cannot find a useful window to do 
this as
a monolith and/or that a monolith would just be too doggone big. (IF you can
find an off hours window and the row needs to be updated frequency is not
horrible, the CTAS solution already mentioned is likely good. IF you are
only updating several million rows of a 10 billion row table that is
probably not the case unless partition pruning comes into play.) Block oriented 
disappearing index batch nibblers are one of the design
answers to avoiding slow-by-slow or excessive sized monolith operations.
Remember that the general case limitation is no one else can be allowed to
change the required new_value or whether a row should be updated until all
the selected rows are complete. They CAN change anything else about the row.
Often (but not always) when a batch operation needs to be done on a column
these criteria are met. Slow-by-slow (TKYTE) has badness about it. Sooner or 
later you do need to do
chunks of set operations in reasonable sizes. What is reasonable changes
over time. mwf From: oracle-l-bounce@xxxxxxxxxxxxx 
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Hans Forbrich
Sent: Wednesday, July 16, 2014 5:36 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: Big Update on Busy Table A couple of thoughts: 1) Is it really 
going to take too long to just get it over and done with on
the live table?  Perhaps an off-hours one-shot transaction?  I've
occasionally rethought my bulk operations and realized that the overall hit
might not be 'that bad', but your situation is, of course, unknown to me. 2) 
How are you planing on getting those 'few thousand rows at a time'?  And
how are you planning on identifying those which have been done and are not
do be repeated?   3) Is this something that can be handled through PL/SQL's 
bulk operations?
Perhaps with a "SELECT FOR UPDATE OF"?
(http://morganslibrary.com/reference/plsql/array_processing.html#apbc and
other areas of Morgan's Library) /Hans On 16/07/2014 3:07 PM, Charlotte Hammond 
 wrote: Hi All, I need to run an update of one column of millions of rows in a 
busy OLTP
table.  To prevent the transaction blocking other sessions for too long I'm
planning to break it into updates of a few thousand rows at a time with a
commit in between.   This will prevent any one row being locked for more
than about a second.   However it is tending towards slow-by-slow
processing:  is there a better way of updating this table without creating
long lived TX locks? Thanks! Charlotte 

Other related posts: