Re: Big Update/DML

  • From: Tim Gorman <tim.evdbt@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 26 Aug 2020 11:05:28 -0700

The video, paper, and presentations were already based on 15 year old concepts in 2012, but I guess the point is that a big UPDATE statement is the wrong approach to this problem.  An INSERT will always out-perform an UPDATE (or DELETE) any day of the week, regardless of degree of parallelism, OS platform, RDBMS platform, or data structure.  And any UPDATE (or DELETE) can be converted into an INSERT with a little thought, particularly with all of the tools we have in Oracle database.

Updating billions of rows is certain to leave behind row chaining and all other sorts of bad things, even if the UPDATE itself ever completes successfully, which is doubtful.

However, rebuilding the table with the corrected data in the desired format (i.e. compressed or non-compressed, partitioned or non-partitioned) leaves behind all sorts of awesomeness.



On 8/26/2020 8:22 AM, Jonathan Lewis wrote:


I haven't re-read /re-viewed the the presentation, Tim, but an update might be in order when you consider the possibilities offered by 19c with online conversion of simple heap tables into partitioned tables - with data filtering etc: https://jonathanlewis.wordpress.com/2017/06/09/12-2-partitions/

Maybe OTW 21 if IRL conferences ever get going again.
.

Regards
Jonathan Lewis


On Wed, Aug 26, 2020 at 4:16 PM Tim Gorman <tim.evdbt@xxxxxxxxx <mailto:tim.evdbt@xxxxxxxxx>> wrote:

    If you've got the patience, I offer a video HERE
    <https://www.youtube.com/watch?v=pvbTAgq_BBY> entitled "The
    Fastest UPDATE Is An INSERT" from Oak Table World 2012.  If you
    prefer to read presentations or white papers instead of videos,
    then HERE
    
<http://evdbt.com/download/presentation-scaling-to-infinity-partitioning-data-warehouses-on-oracle-database/>
    and HERE
    
<http://evdbt.com/download/paper-scaling-to-infinity-partitioning-data-warehouses-on-oracle-database-2/>
    cover much of the same topic, though not directly geared toward
    optimizing UPDATE operations as the video.

    If you're lucky, those big tables that you want to update are
    partitioned already, and so you can just test and run the
    correction from partition to partition, a bit at a time.

    If you're unlucky, those big tables that you want to update are
    not partitioned, so here is your chance to correct that, to create
    a new partitioned table, besides correcting the data errors.

    The first time I used this technique, with Oracle 8.0 back in
    1997, we updated a few columns on a multi-billion row
    range-partitioned table in a single afternoon, including dreaming
    it up and testing first.





    On 8/26/2020 7:30 AM, Reen, Elizabeth (Redacted sender
    elizabeth.reen for DMARC) wrote:

    Be careful with how you do parallelism.  Done correctly it will
    speed things up.  Done incorrectly and you will have a locking
    nightmare.  Are you updating the columns with the same value?  If
    so, the default value option might be very useful.

    Liz

    *From:*[freelists.org <//freelists.org>]
    oracle-l-bounce@xxxxxxxxxxxxx
    <mailto:oracle-l-bounce@xxxxxxxxxxxxx>
    <oracle-l-bounce@xxxxxxxxxxxxx>
    <mailto:oracle-l-bounce@xxxxxxxxxxxxx> *On Behalf Of
    *[freelists.org <//freelists.org>] Sanjay Mishra
    *Sent:* Tuesday, August 25, 2020 11:29 PM
    *To:* oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
    *Subject:* Big Update/DML

    Hi Everyone

    I am working on 19c and need to do one time update on multiple
    tables containing 3-4 Billions records and some tables are
    Compressed for OLTP and some are uncompressed. Tables have
    multiple columns but updating only one new column added with data
    from another column from the same table. Environment is on
    Exadata with Buffer Cache of 60G and CPU_count of 30

    Update using high Parallel DMl enabled are taking several hours
    to even a day per table and are using high UNDO

     1. Does dropping index even the column updated has no relation
        to Indexed column can help the Elapsed time
     2. Does Compress table will help in this scenario vs
        uncompressed Table. Table size with compress for OLTP is
        around 800G and same kind of another table is 4 Tb without
        compression. Trying to see that if compression can help in
        using less IO or buffer cache from both Table and Index
        perspective
     3. Does adding more SGA or CPU can help in allocating more
        Parallel threads to reduce the Elapsed time

    I was checking and found that dbms_parallel_execute can be good
    solution. Can someone update if they had used for Big Update and
    can share his sample code to try

    TIA

    Sanjay



Other related posts: