RE: "Re-sequencing" a pseudo-key

  • From: Iggy Fernandez <iggy_fernandez@xxxxxxxxxxx>
  • To: "dedba@xxxxxxxxxx" <dedba@xxxxxxxxxx>, "kibeha@xxxxxxxxx" <kibeha@xxxxxxxxx>
  • Date: Sun, 12 Oct 2014 18:21:52 -0700

+42
The most efficient way to accomplish something is to eliminate the need to do 
it.
Date: Mon, 13 Oct 2014 10:46:04 +1000
From: dedba@xxxxxxxxxx
To: kibeha@xxxxxxxxx
CC: oracle-l@xxxxxxxxxxxxx
Subject: Re: "Re-sequencing" a pseudo-key


  
    
  
  
    Hi Kim,

    

    Am I right in surmising that your legacy system is a sales
    processing system? If so, given that you restarted with negative
    numbers several years ago, and expect to reach zero 1-1 1/2 year
    from now, did you consider archiving all rows with positive RECIDs?
    

    

    To do this, you could simply clone the database and remove the
    positive RECIDs from the production. When the production hits the
    maximum positive RECID several years from now, you could copy the
    remaining negatives from production to the archival clone and remove
    the negatives from production. When production hits 0 again in a
    decade's time or so, you store the now complete archival clone to
    some permanent storage and start with a new clone.. You would of
    course need two incarnations of your legacy app, one connecting to
    the archive, with only reporting functionality enabled, and the live
    one.

    

    This way historical data from the past decade is still online, if
    immutable, and new data can continue to be added. "Recent"
    historical data (i.e. the negative RECIDs) is in both the live and
    the archival database, providing a good overlap for reporting.
    Really old data will be available through a second interface for
    long-term trend studies etc and ancient data on request... 

    

    Just a thought..

    

    Cheers,

    Tony

    

    On 12/10/14 06:02, Kim Berg Hansen
      wrote:

    
    
      Hi, Serqey
        

        
        Thanks for the ideas - I'm copying this to the list for the
          benefit of others.
        

        
        Unfortunately the idea won't work for us, I cannot change
          how the application use sequences.
        

        
        You see, the application is written in an old legacy
          programming language. The compiler translates into legacy
          "executable" with embedded SQL (which we cannot touch.) Then
          the runtime engine executes the legacy compiled executable
          code with the SQL in it.
        

        
        Part of that is that the runtime engine whenever a RECID is
          needed (no matter which table) the engine will get the nextval
          from the one master sequence (which is defined as
          INCREMENT BY 10) and it will use that value for the INSERT it
          executes. Then the engine will use the next 9 values for the
          next 9 INSERTS (no matter which table) and only then will it
          get sequence.nextval again.
        

        
        And the engine uses 32-bit signed integers for the RECIDs
          and there will never be a 64-bit updated version, thus our
          deadline in about a year to 1½. The numbers are "running out"
          faster than the amount of rows we have, partly because (as
          Iggy pointed out) numbers will be "lost" when the engine calls
          nextval but then only inserts a couple rows before closing the
          session (similar to how a sequence cache in Oracle may "lose"
          numbers when the cache is aged out of shared pool or at
          database restart.)
        

        
        But mostly because the application inserts rows to tables
          like SalesHeader and SalesLine, and then when the order is
          invoiced the data is inserted to tables like InvoiceHeader and
          InvoiceLine, after which SalesHeader and SalesLine is deleted.
          So the application as such "wastes" numbers that is deleted
          and reinserted in new tables. Again it is unfortunately not
          possible to change this - only option to avoid it would be to
          replace the system completely...
        

        
        I definitely wish I could give each table a separate
          sequence, but I cannot change how this runtime engine works
          when it executes our compiled legacy code :-(
        

        
        
          

            

            
            Regards
            

            
            

            
            Kim Berg Hansen
            

            
            http://dspsd.blogspot.com
            kibeha@xxxxxxxxx
            @kibeha
          
          

        
        

          On Sat, Oct 11, 2014 at 3:41 AM,
            Sergey Kosourikhin <kosour@xxxxxxxxx> wrote:

            
              Hi Kim,
                

                
                If we look at this task from slightly different
                  angle, I would say this task is not about
                  re-sequencing, but rather it’s de-sequencing.
                

                
                I mean, your main issue is that application uses
                  the only sequence which hit the limit too fast. And
                  the solution would be to have separate sequences for
                  every table. And it’s not a one-off operation.
                  Actually, it should be like nightly job during outage
                  period. (well, I don’t know your application - but it
                  may be even possible to run it without outage under
                  some conditions )
                

                
                So, if we want to provide solution for
                  de-sequencing, one of the implementatios would be:
                

                
                1. Create individual sequences for every table.
                2. Create mapping table (every time you when you
                  will reset main sequence, we can call it
                  “incarnation”. Currently you have incarnation 1. every
                  time you reset main sequence, incarnation will be
                  incremented by 1). In every incarnation all your
                  parent recid’s are unique across database - as they
                  come from the same sequence. That means to re-sequence
                  child rec_id you don’t need to know from which table
                  it came from - just understand if it’s current
                  incarnation or past incarnations). So, in simplest way
                  you can use structure of mapping table :
                 - table_name
                - incarnation_id
                - old_rec_id   
                - new_rec_id  -> populated from table related
                  sequence
                - is_processed (default 0)
                

                
                Once outage started, you 
                1. scan your tables and take all recid’s newer than
                  maximum from previous incarnation.
                2. populate mapping table with this rec_id’s and
                  tables related sequences
                3. You don’t need to worry about FK - as rec_id is
                  unique across incarnation - just update them correctly
                  (if it’s bigger than latest from previous incarnation
                  - update it, if less - skip it) using mapping table.
                4. Once you populated mapping table, you can start
                  updating (re-sequencing) rec_id in your tables by
                  using new_rec_id column
                5. I would suggest do it in batches by, say, 1000
                  rec_ids in 1 transaction. before commit, you update
                  is_processed = 1 for these old_rec_id’s. You can
                  create stored proc metadata driven which will update
                  all tables for given rec_id.
                6. to speed up process you can create mapping
                  tables as list partitioned by table name and sub
                  partitioned by is_processed columns and enable row
                  movement. So, you will have partition with
                  non-processed rows which will decrease in size as long
                  as you re-sequencing source tables. 
                7. once you processed all rows, you can reset your
                  main sequence to the greatest of all table related
                  sequences + 1. 
                

                
                You can run this procedure during the outage every
                  day and during weekends/holidays.  It’s not a big
                  problem if you couldn’t de-sequenced all rows for
                  particular outage - that means you will just skip 1
                  sequence reset and you can continue next outage.
                

                
                

                
                Think yourself if you need all processed rows in
                  mapping table (it looks to me at this moment that only
                  the last row you need to start next de-sequencing
                  run).
                

                
                If I didn’t explain something clear enough - feel
                  free to ask me for details.
                

                
                P.S. Sorry that I can’t reply to oracle-list - I
                  have some issues with writing there - I am more like
                  reader :)
                

                
                Regards,
                Sergey.
                
                  
                    

                    
                    
                      
                        

                        
                      
                    
                  
                
              
            
          
        
      
    
    
                                          

Other related posts: