Re: Inserting with billion of rows faster

  • From: Tim Gorman <tim.evdbt@xxxxxxxxx>
  • To: Lothar Flatz <l.flatz@xxxxxxxxxx>, loknath.73@xxxxxxxxx
  • Date: Mon, 29 Mar 2021 12:49:53 -0700

Lok,

Short answer is "yes".  To resolve your immediate backlog, you'll need to load into separate staging tables without indexes.  Then, you'll need to rebuild the indexing on that staging table prior to performing EXCHANGE PARTITION into the main table, allowing the rebuilt indexing to "plug into" the local indexes on the main table.  CREATE INDEX or ALTER INDEX ... REBUILD is exponentially faster than maintaining an index transactionally during a DML operation, because these index build/rebuild operations themselves employ direct-path methods and parallelism, which is not true for transactional index builds.

This short-term solution can be leveraged into a long-term solution, which might become obvious as you get deeper into the immediate solution.

I sincerely hope that you do not have any global indexes on your main table, because the only way to avoid rebuilding them completely (i.e. requires application downtime) is to use the UPDATE GLOBAL INDEXES clause on the EXCHANGE PARTITION operation, which will perform very slowly and might also run into the same ORA-01628 error.

If you do have global indexes, then don't despair;  you might not actually need them.  One of the most common reasons for global indexes are unique indexes used only for uniqueness enforcement, but which are never/rarely queried.  That can be another email thread on this forum, if you wish.

As Lothar correctly notes, the main table is not designed for data loading and data lifecycle management.  If it were, then it would be range-partitioned by a DATE column.  Sub-partitioning is largely irrelevant from the standpoint of data loading and lifecycle management.

The question is what is was in fact designed for.  With only two list partitions, I sincerely hope that the partition key column for list partitioning is referenced using an equality predicate by every SQL statement using this table.  Likewise with the partition key column for hash subpartitioning, and additionally I hope the range of values is large enough to make it worthwhile.

Hope this helps,

-Tim


On 3/29/2021 11:29 AM, Lothar Flatz wrote:

Lok,

I am afraid exchange partition will generally hardly help in your case.  Of course from the point of manageability a table of this size  should be range partitioned by date.
In addition I rather wonder about the usefulness of the list partitioning if one partition holds 18billion rows out of total ~20billion.
700 million rows however are not that much. When I was RWPG member I had once the opportunity to load 2 billion rows and a standard hardware. Took 15 minutes.
As Tim mentioned, it was likely not a direct path load in your case.
Can you shared an execution plan of the insert?

Thanks

Lothar


Am 29.03.2021 um 20:11 schrieb Lok P:
Thank you Tim. Actually we have the target table holding ~20billion rows which is list-hash composite partitioned. It has two list partitions and each list partition has a ~2048 hash subpartition. And we want to insert into that target table around ~700+ million rows which is failing with ora-01624.

Now I am unable to fully understand how we can utilize partition exchange method here to load those ~700million rows. As because , I see even all of those rows are meant for one of the list partitions(which itself holds ~18billion rows out of total ~20billion) but were spread across ~2048 hash subpartitions. So do you mean to say, we should load ~18billion rows into a similar structure stage table(without index with same list-hash composite partition) and also load those additional ~700million those are missing (and it should succeed as it wont have indexes now so zero UNDO with APPEND INSERT) and then  do the truncate of the list partition and then do the partition exchange with stage table?

On Mon, Mar 29, 2021 at 2:45 AM Tim Gorman <tim.evdbt@xxxxxxxxx <mailto:tim.evdbt@xxxxxxxxx>> wrote:

    Lok,

    >> Ora-01628 max extent 32765 reached for rollback segment

    If you are running out of space on your rollback segments, then
    you are not actually using direct path insert, APPEND or
    APPEND_VALUES hint notwithstanding.  The only undo you should be
    generating is for the DDL, not the DML, and that should not max
    out anything.

    Long story short:  you really can not effectively perform
    direct-path APPEND insert directly to that "live" table or
    partition with indexes.  In your case, you need to perform the
    direct-path insert to a hash-partitioned table of the same
    logical "shape" as the list partition you are targeting, then
    build indexes to match the main table when you are done loading,
    and then finally use ALTER TABLE ... EXCHANGE PARTITION to swap
    it all with the "live" partition(s) in the main table.  If it
    helps, HERE
    
<http://evdbt.com/download/presentation-scaling-to-infinity-partitioning-data-warehouses-on-oracle-database/>is
    a presentation on this technique and HERE
    
<http://evdbt.com/download/paper-scaling-to-infinity-partitioning-data-warehouses-on-oracle-database-2/>is
    a corresponding white paper on the technique.  Also, HERE
    <http://evdbt.com/download/exchpart-sql/>is a PL/SQL package
    called EXCHPART which contains procedures to automate the moves
    mentioned in the white paper and presentation.  Please don't
    consider EXCHPART as anything more than a template, it is not
    intended as working code, particularly when it must be adapted
    for your local coding culture and standards.  Hopefully it is a
    good start?

    Oh also, there is no such thing as a NOLOGGING hint; nologging is
    an attribute on the table or partition which is effective only
    with direct-path insert operations, never with conventional-path
    INSERT, UPDATE, or DELETE operations.

    Hope this helps...

    Thanks!

    -Tim


    On 3/27/2021 10:37 PM, Lok P wrote:
    Hi Listers, we have one process in which we load ~50million
    daily (using direct path insert.. INSERT APPEND ) to a target
    table which holds ~20billion rows(~1TB in size)) and is
    list -hash composite partitioned. ~2 list partition and ~2048
    hash subpartitions. It has 4 indexes out of those one is a
    composite primary key comprising of 5 columns.We have that data
    load job failing since ~15-20 days without notice and  it thus
    accumulated ~billion rows and the load process is now failing
    with (Ora-01628 max extent 32765 reached for rollback segment).

    So we thought of running it in a bulk collect
    method(append_values hint) and commit in chunks of ~1million.
    But during that process we endup seeing the data load is
    significantly slower , it was inserting ~1million rows in
    ~40minutes. And we were not able to understand the reason but
    the wait events were all showing "cell single block physical
    read" and the object was the target load object. Means it was
    the INSERt which was struggling.  So we now endup having partial
    data loaded to the target i.e. around ~40-50million loaded to
    the target. And as it has a primary key , we have to delete the
    ~40-50million rows from the target table and then reload it.

    I was thinking if we should do the delete in bulk method with a
    chunk of 1million , but again that will happen in a single
    thread and will be significantly slow. So what is the best way
    to have those ~40-50million data deleted from the target
    table(which holds a total ~20billion rows)?

    And then I was also thinking , if we could make the index
    unusable and perform the delete and then data load(which would
    happen with almost zero UNDo in absence of INDEXes) , but then
    in that case it seems that DELETE will need the PK index to
    fetch the rowids so we cant get rid of the PK index then. So
    what is the best approach to go for  the delete and data load
    here without breaking data consistency in the target table?

    So wanted to understand what is the best/faster approach to go
    for delete and data load in this situation?

    Regards
    Lok



Other related posts: