AW: Fastest way to count exact number of rows in a very large table

From: "ahmed.fikri@xxxxxxxxxxx" <ahmed.fikri@xxxxxxxxxxx>
To: "gogala.mladen@xxxxxxxxx" <gogala.mladen@xxxxxxxxx>, "elizabeth.reen@xxxxxxxx" <elizabeth.reen@xxxxxxxx>, Ashoke Mandal <ramukam1983@xxxxxxxxx>
Date: Mon, 5 Oct 2020 22:34:52 +0200 (CEST)

If you are just interested in getting the exact number of rows in a large
partitioned table, then the fasted way (in my opinion) is to use PL/SQL. L
ooping the partition serially will not help (this is more slower than
select count(*) from huge_table).You have to create a pipelined function
and use the parallel_enable feature.
Pseudo code( give only the Idea, if you adjust it you can reduce the time
significantly):
create or replace package pkg_test as
  type xxx IS RECORD(column_value VARCHAR2(200));
  type t_parallel_test_ref_cursor IS REF CURSOR RETURN xxx;
  function fu_count(p_cursor t_parallel_test_ref_cursor);
end;
/
create or replace package body pkg_test as
function fu_count_part(p_part_nameVARCHAR2, p_dop NUMBER) RETURN NUMBER IS
  v_count number := 0;
BEGIN
  -- you can try here to use bind variable to avoid hard parsing (just make
sure that the partition pruning works
  EXECUTE IMMEDIATE 'select /*+ look how to use parallel hint */  count(*)
from huge_table partition('||p_part_name||')' into v_count;
  RETURN v_count;
EXCEPTION
  when others then
    return 0;
END fu_count_part;

function fu_count(p_cursor t_parallel_test_ref_cursor) RETURN t2_list
PIPELINED PARALLEL_ENABLE(PARTITION p_cursor BY HASH(column_value)) IS
ret t2;
  x VARCHAR2(33);
BEGIN
  LOOP
  FETCH p_cursor INTO x;
  exit when p_cursor%NOTFOUND;
  ret.id <http://ret.id> := fu_count_part(x);
PIPE ROW(ret);
END LOOP;
END fu_count;
end pkg_test

And then this will return the total count:
SELECT count(*) FROM pkg_test.fu_count(CURSOR(SELECT /*+ parallel(t 16) */
partition_name COLUMN_VALUE FROM table_hold_all_partitions_name t),16);

If you really want to cut the query time to less than 10 minutes, just try
getting the code above working.

Best regards
Ahmed

-----Original-Nachricht-----
Betreff: Re: Fastest way to count exact number of rows in a very large
table
Datum: 2020-10-05T21:58:55+0200
Von: "Mladen Gogala" <gogala.mladen@xxxxxxxxx>
An: "elizabeth.reen@xxxxxxxx" <elizabeth.reen@xxxxxxxx>, "Ashoke Mandal"
<ramukam1983@xxxxxxxxx>

You can also try the following:
    SELECT /*+ PARALLEL(16) */  100000*count(*) FROM TABLE SAMPLE(0.001);
That would give almost correct count of rows in the table. Accuracy would
be similar to SELECT NUM_ROWS from USER_TABLES WHERE
TABLE_NAME=<TABLE_NAME>;

BTW, has anyone played with APPROX_FOR_COUNT_DISTINCT parameter in 19c?
Regards

On Mon, 2020-10-05 at 19:40 +0000, Reen, Elizabeth wrote:

                Since it is partitioned, why don’t you just count each
  partition separately?  Have you run stats on the old partitions?
  Assuming that prior years do not change, you should be able to get a
  count there.  If you do an import, then you can get the number of rows
  from the log.  A transportable tablespace, will not mount if there is
  something wrong.



  Liz



  Liz Reen
  CPB Database Administration



  Dev Servers:  Oracle  |  ORasS  |  Golden Gate  |  Sybase  |  MS-SQL  |
  MSaaS  |  Neo4j  |  HBase  -  Prod Servers: Golden Gate  |  Neo4j  |
  HBase  -  Team Email:  *CPB Database  -  Hotline:  1 718 248 1416



  ServiceNow  For: Oracle | OraaS | Golden Gate : ICG NA EX US CPB ORACLE
  APPLICATION DBAS  – For Sybase | MS-SQL | Neo4j | HBase: ICG NA EX CPB
  SYBASE  APPLICATION DBAS





  From: [gmail.com] Ashoke Mandal <ramukam1983@xxxxxxxxx>
  Sent: Monday, October 5, 2020 3:14 PM
  To: Reen, Elizabeth [ICG-IT]
  Cc: andysayer@xxxxxxxxx; ahmed.fikri@xxxxxxxxxxx; list, oracle
  Subject: Re: Fastest way to count exact number of rows in a very large
  table



  Hello  Ahmed/Andy/Jackson/Mark/Gogala/Liz, Thanks for your response to my
  posting. Even though my question was how to improve the query time of a
  query to check row count in a big table, you have also brought up many
  good points related to cross platform migration.



  Here is some information regarding my table:

  We have used transportable tablespace for data migration.

  The primary key consists of three columns (UT_ID, UT_SEQ, TEST_DATE).

  This table is partitioned by date and has one partition for every month.
  So, 12 partitions for every calendar year and has data for the last 20
  years.

  After adding a parallel hint as described below the query time went down
  from 2.2 hours to 42 min. Let me know if you have any more
  recommendations to improve the query time for this select statement.



  select /*+ parallel */ to_char(count(*), '999,999,999,999') from
  test_data;



  Ashoke



  On Mon, Oct 5, 2020 at 10:40 AM Reen, Elizabeth <elizabeth.reen@xxxxxxxx
  <mailto:elizabeth.reen@xxxxxxxx> > wrote:

                  We just completed such a transition.  We kept the Oracle
    version the same so we could see the impact of Linux.  Transportable
    tablespaces was how we did it.  We were able to move a 17 terabyte
    database in under 10 hours.





    Liz





    From: [External] oracle-l-bounce@xxxxxxxxxxxxx
    <mailto:oracle-l-bounce@xxxxxxxxxxxxx> <oracle-l-bounce@xxxxxxxxxxxxx
    <mailto:oracle-l-bounce@xxxxxxxxxxxxx> > On Behalf Of [External] Andy
    Sayer
    Sent: Friday, October 2, 2020 3:09 PM
    To: ahmed.fikri@xxxxxxxxxxx <mailto:ahmed.fikri@xxxxxxxxxxx>
    Cc: list, oracle; ramukam1983@xxxxxxxxx <mailto:ramukam1983@xxxxxxxxx>
    Subject: Re: Fastest way to count exact number of rows in a very large
    table



    Just because a table has the same number of rows, it doesn’t mean it
    has the same data. With 108 billion rows, your data is going to be
    changing quickly, in order to get accurate counts at the right point in
    time you’re going to end up keeping your application offline for a
    window before and after your migration.



    What you need to do is determine where you expect data to go missing
    and work out a way to check.



    This will depend on how you’re doing your migration, I would suggest
    you use Cross-Platform Transportable Tablespaces (Doc Id 371556.1) as
    that would allow you to do a physical import and just convert the files
    to the right endianness. This starts by making sure all data has been
    written to your data files (so they can be read only on the source
    system). As you’re working with the physical data files rather than the
    logical data (rows in tables), the only way you’re going to loose rows
    is by corrupting your files. You can check for corruption using RMAN
    once you’ve imported the converted files. No need to count all your
    rows, and no need to hope that that’s all you need to compare.



    Hope that helps,

    Andy







    On Fri, 2 Oct 2020 at 19:38, ahmed.fikri@xxxxxxxxxxx
    <mailto:ahmed.fikri@xxxxxxxxxxx> <ahmed.fikri@xxxxxxxxxxx
    <mailto:ahmed.fikri@xxxxxxxxxxx> > wrote:

      Hi Ashoke,







      could you send the execute plan of the query too? I think there is no
      general approach for that, it depends on several factors: whether the
      table has indexes (normal/bitmap) and in case the table has indexes
      the size of the table compared to the existing index...... But
      generally parallel processing should help.







      Best regards



      Ahmed















      -----Original-Nachricht-----



      Betreff: Fastest way to count exact number of rows in a very large
      table



      Datum: 2020-10-02T19:45:19+0200



      Von: "Ashoke Mandal" <ramukam1983@xxxxxxxxx
      <mailto:ramukam1983@xxxxxxxxx> >



      An: "ORACLE-L" <oracle-l@xxxxxxxxxxxxx
      <mailto:oracle-l@xxxxxxxxxxxxx> >

















      Dear All,

      I have a table with 108 billion rows and migrating this database from
      Oracle 11g on Solaris to Oracle 12c on Linux.







      After the migration I need to compare the row count of this table in
      both the source DB and the destination DB. It takes almost two hours
      to get the row count from this table.



      SQL> select to_char(count(*), '999,999,999,999') from test_data;

      TO_CHAR(COUNT(*)
      ----------------
       108,424,262,144
      Elapsed: 02:22:46.18







      Could you please suggest some tips to get the row count faster so
      that it reduces the cut-over downtime.







      Thanks,



      Ashoke











--

Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Follow-Ups:
- RE: Fastest way to count exact number of rows in a very large table
  - From: Reen, Elizabeth

References:
- Fastest way to count exact number of rows in a very large table
  - From: Ashoke Mandal
- AW: Fastest way to count exact number of rows in a very large table
  - From: ahmed.fikri@xxxxxxxxxxx
- Re: Fastest way to count exact number of rows in a very large table
  - From: Andy Sayer
- RE: Fastest way to count exact number of rows in a very large table
  - From: Reen, Elizabeth
- Re: Fastest way to count exact number of rows in a very large table
  - From: Ashoke Mandal
- RE: Fastest way to count exact number of rows in a very large table
  - From: Reen, Elizabeth
- Re: Fastest way to count exact number of rows in a very large table
  - From: Mladen Gogala

AW: Fastest way to count exact number of rows in a very large table

Other related posts: