need help trying to make this an analytical

  • From: "Michael McMullen" <ganstadba@xxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 27 Oct 2005 12:58:10 -0400

Tried on various platforms/os - 1st version 8.1.7.3 - latest 9.2.0.7

Problem: Users are assigned a username to log onto the system. When they log 
off a record is cut showing columns below. A combination of nasipaddress and 
acctsessionid is unique to the session but not to the table. Sometimes the 
server spits out dups. Users shouldn't be allowed to log onto the system from 
more than one computer but that has been subverted. So I'm trying to figure out 
how to find overlapping date values per user (infostartdate is when they log 
on, recdate is when they log off).
Basically, for each username I have to look at all of their sessions and see if 
any of them overlap. We have > 4 million users with each days logs about 4 Gb. 
Though my test table is ~1Gb. I don't think I can do it one pass with an 
analytical but am hoping someone can figure it out. In 8i I cancelled after 
14hrs. The hash join is taking forever. Trying it on 9i. I've tried numerous 
indexing strategies but really they don't make sense. Two full scans is 
required. I'm now trying it as a cursor, with my cursor looping through each 
distinct username, so I'm avoiding the self join. It's taken about 2 hrs to get 
1/2 way.

My query is as follows.

SELECT DISTINCT a.username, a.nasipaddress, a.acctsessionid,
                   a.infostartdate, a.recdate
              FROM r_monitor.ar_unique a, r_monitor.ar_unique b
             WHERE a.username = b.username
               AND a.nasipaddress != b.nasipaddress
               AND a.acctsessionid != b.acctsessionid
               AND (    (a.infostartdate <= b.recdate)
                    AND (a.recdate >= b.infostartdate)
                   )


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes |TempSpc| Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |   291K|    43M|       | 82809 |
|   1 |  SORT UNIQUE         |             |   291K|    43M|    96M| 82809 |
|*  2 |   HASH JOIN          |             |   291K|    43M|  1083M| 75896 |
|   3 |    TABLE ACCESS FULL | AR_UNIQUE   |    12M|   940M|       | 13586 |
|   4 |    TABLE ACCESS FULL | AR_UNIQUE   |    12M|   940M|       | 13586 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."USERNAME"="B"."USERNAME")
       filter("A"."NASIPADDRESS"<>"B"."NASIPADDRESS" AND
              "A"."ACCTSESSIONID"<>"B"."ACCTSESSIONID" AND
              "A"."INFOSTARTDATE"<="B"."RECDATE" AND 
"A"."RECDATE">="B"."INFOSTARTDATE")

Note: cpu costing is off

20 rows selected.



describe ar_unique
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- 
------------------------------------
 USERNAME                                                       VARCHAR2(200)
 NASIPADDRESS                                          NOT NULL VARCHAR2(20)
 ACCTSESSIONID                                         NOT NULL VARCHAR2(200)
 INFOSTARTDATE                                                  DATE
 RECDATE                                                        DATE

Other related posts: