Re: Data load question

From: "jaromir nemec" <jaromir@xxxxxxxxxxxx>
To: <dubey.sandeep@xxxxxxxxx>, "Mike Spalinger" <mike.spalinger@xxxxxxxxx>, <sfaroult@xxxxxxxxxxxx>
Date: Mon, 10 Apr 2006 00:52:12 +0200

Hi Sandeep,

However, I want b.id and c.id as pk.

I'm not sure what you mean with PK in this context. What result you would expect if the table a will be initially populated as follows?

insert into a values ('A1','B1');

insert into a values ('A1','B1');

If the problem is that the generated ID start from 1 (and violating so in the next run the PK) you may use the sequence solution proposed by Stephane, alternatively you may use a max subquery to get the last generated value, see example below (the tricky part here is to get the max efficiently if the tables b and c get large)

slightly modified Mike's solution

insert all

when table_num = 1 then

  into b (id, col1)

  values (a_seq, col1)

when table_num >= 1 then

  into c (id, col2, b_id)

  values (b_seq, col2, a_seq)

select col1,

      col2,

      row_number() over (partition by col1 order by col1) as table_num,

      (select nvl(max(id),0) from b) +

      dense_rank() over (order by col1) as a_seq,

      (select nvl(max(id),0) from c) +

      row_number() over (order by col1, col2) as b_seq

 from a;

I'd prefer the sequence based solution if there are potentially more parallel job performing the load. The MAX solution works well if the load is performed serially one job at a time. The decision is more a matter of taste depending on your preferences of dealing with redundant data.

If you plan to treat the col1 as a natural key and the id plays the surrogate key a more complicated solution with a lookup in the table b (to get the information whether the natural key is new and need a new surrogate key) will be needed.

something like this.

insert all

when table_num = 1 and new_pk_b = 1  then

  into b (id, col1)

  values (a_seq, col1)

when table_num >= 1 then

  into c (id, col2, b_id)

  values (b_seq, col2, a_seq)

select a.col1,

      a.col2,

      row_number() over (partition by a.col1 order by a.col1) as table_num,

      nvl(b.id,

       (select nvl(max(id),0) from b) +

dense_rank() over (partition by case when b.col1 is not NULL then 0 else 1 end order by a.col1)

      ) as a_seq,

      (select nvl(max(id),0) from c) +

      row_number() over (order by a.col1, a.col2) as b_seq,

      case when b.col1 is not NULL then 0 else 1 end new_pk_b

 from a, b

 where a.col1 = b.col1(+);

Depending on the question above you may need to deploy additionally some de-dup processing.

Regards,

Jaromir D.B. Nemec

----- Original Message ----- From: "Sandeep Dubey" <dubey.sandeep@xxxxxxxxx> To: "Mike Spalinger" <mike.spalinger@xxxxxxxxx> Cc: <oracle-l@xxxxxxxxxxxxx> Sent: Friday, April 07, 2006 10:50 PM Subject: Re: Data load question

--
//www.freelists.org/webpage/oracle-l

References:
- Data load question
  - From: Sandeep Dubey
- Re: Data load question
  - From: Mike Spalinger
- Re: Data load question
  - From: Sandeep Dubey

Re: Data load question

Other related posts: