Re: Data load question

  • From: "jaromir nemec" <jaromir@xxxxxxxxxxxx>
  • To: <dubey.sandeep@xxxxxxxxx>, "Mike Spalinger" <mike.spalinger@xxxxxxxxx>, <sfaroult@xxxxxxxxxxxx>
  • Date: Mon, 10 Apr 2006 00:52:12 +0200

Hi Sandeep,



However, I want b.id and c.id as pk.



I'm not sure what you mean with PK in this context. What result you would expect if the table a will be initially populated as follows?




insert into a values ('A1','B1');

insert into a values ('A1','B1');



If the problem is that the generated ID start from 1 (and violating so in the next run the PK) you may use the sequence solution proposed by Stephane, alternatively you may use a max subquery to get the last generated value, see example below (the tricky part here is to get the max efficiently if the tables b and c get large)



slightly modified Mike's solution



insert all

when table_num = 1 then

  into b (id, col1)

  values (a_seq, col1)

when table_num >= 1 then

  into c (id, col2, b_id)

  values (b_seq, col2, a_seq)

select col1,

      col2,

      row_number() over (partition by col1 order by col1) as table_num,

      (select nvl(max(id),0) from b) +

      dense_rank() over (order by col1) as a_seq,

      (select nvl(max(id),0) from c) +

      row_number() over (order by col1, col2) as b_seq

 from a;



I'd prefer the sequence based solution if there are potentially more parallel job performing the load. The MAX solution works well if the load is performed serially one job at a time. The decision is more a matter of taste depending on your preferences of dealing with redundant data.



If you plan to treat the col1 as a natural key and the id plays the surrogate key a more complicated solution with a lookup in the table b (to get the information whether the natural key is new and need a new surrogate key) will be needed.



something like this.

insert all

when table_num = 1 and new_pk_b = 1  then

  into b (id, col1)

  values (a_seq, col1)

when table_num >= 1 then

  into c (id, col2, b_id)

  values (b_seq, col2, a_seq)

select a.col1,

      a.col2,

      row_number() over (partition by a.col1 order by a.col1) as table_num,

      nvl(b.id,

       (select nvl(max(id),0) from b) +

dense_rank() over (partition by case when b.col1 is not NULL then 0 else 1 end order by a.col1)

      ) as a_seq,

      (select nvl(max(id),0) from c) +

      row_number() over (order by a.col1, a.col2) as b_seq,

      case when b.col1 is not NULL then 0 else 1 end new_pk_b

 from a, b

 where a.col1 = b.col1(+);





Depending on the question above you may need to deploy additionally some de-dup processing.



Regards,



Jaromir D.B. Nemec


----- Original Message ----- From: "Sandeep Dubey" <dubey.sandeep@xxxxxxxxx>
To: "Mike Spalinger" <mike.spalinger@xxxxxxxxx>
Cc: <oracle-l@xxxxxxxxxxxxx>
Sent: Friday, April 07, 2006 10:50 PM
Subject: Re: Data load question




--
//www.freelists.org/webpage/oracle-l


Other related posts: