Re: Process consuming a lot of CPU, UTL_FILE and WAIT EVENTS relationship

  • From: J.Velikanovs@xxxxxxxx
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Sun, 27 Jun 2004 15:35:12 +0300

Hi Diego, Hi ALL
I would like to pay your attention to the aspect of measuring CPU time 
from OS point of view.
A totally agree with list members who responded to you (Jonathan, John) 
that CPU time can be unaccounted within SQL Trace file because of 
granularity and other time accounting effect in oracle code. 
But it also can appear, that OS lying to you regarding CPU consumption by 
this process and it actually doesn?t consume CPU continuously. 
It can be that problem in your case not DB tier, but application tier or 
even network.
Seams there no absolute truth on this planet ;)


Some time ago, I ran into situation like you described.
After investigation it is appears that most of OS utils like ?ps?, ?top? 
etc. don?t show current situation of CPU consumption ;)

As we can see from your output, it is seams that process pid=17676 
consumes 55% of CPU time.
=======================================
CPU TTY  PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU 
COMMAND
 1   ? 17676 erp    216 20 33984K  2912K run     55:14 51.29 51.20 
oracleERP
=======================================
But actually, this information is misleading ;)
Most of OS utils show !!! AVERAGE !!! CPU utilization per process for 
indeterminate time period.

Solaris "man ps"
=======================================
     pcpu  The ratio of CPU time used recently to CPU time avail-
           able  in  the  same period, expressed as a percentage.
           The  meaning  of  ``recently''  in  this  context   is
           unspecified.   The CPU time available is determined in
           an unspecified manner.
=======================================

Possibly your process actually wait on SQL*Net event (application tier) 
for longer time periods (as it is appears in the SQL Trace file) and 
consume 100% of CPU time for short time intervals. For such process OS 
will show you continuously 20-80% CPU consumption.

"ps" as well as "top" command show misleading information on HPUX, 
Solaris, Tru64 and some others platforms. 
Truth is - there is not so easy to get correct information for CPU 
consumption by process.
On HPUX glance utility can be used so see correct ONLINE information about 
CPU utilization on HPUX (there diffrent utils on dffrent platforms).

I have created TESTCASE for checking lie of OS utilities (source of test 
is attached at the end):

Take a look on the output below:
Till row ?-==< SLEEP >==-? process which CPU utilization (second column) 
monitored by ?ps? command actually consuming 100% of CPU, because it 
running loop. As you can see ?ps? command doesn?t show 100%, but show 
progressively CPU time increasing. 
After ?-==< SLEEP >==-? row, mentioned process actually doesn?t consume 
any CPU (wait on .sleep call) but ?ps? shows decreasing of CPU 
consummation. 
If your process time to time consumes 100% of CPU for short periods then 
OS will sow you CPU utilization continuously.

It can appear that OS show you CPU utilization on database tier, but 
actual bottle neck is application server or network, as was in my case.
Any comments?

Best regards,
Jurijs

TESTCASE output
=======================================
bash-2.03$ ./test.sh 10000000 10000
PID = 21287
20040627_134840 0.00
20040627_134841 4.69
20040627_134842 9.29
20040627_134843 13.62
20040627_134844 17.79
20040627_134845 21.76
20040627_134847 25.53
20040627_134848 29.12
20040627_134849 32.48
20040627_134850 35.73
20040627_134851 38.82
20040627_134852 41.76
20040627_134853 44.50
20040627_134854 47.16
-==< SLEEP >==-
20040627_134855 49.69
20040627_134856 49.71
20040627_134857 47.28
20040627_134858 44.97
20040627_134859 42.78
20040627_134900 40.69
20040627_134901 36.81
20040627_134902 35.06
20040627_134903 33.35
20040627_134904 31.72
=======================================


Source of TESTCASE:
test.sh
=======================================
java C $1 $2 &
v_pid=$!
echo PID = $v_pid
while [ 1 ] ; do 
v_ps=`UNIX95= ps -p $v_pid -o pcpu | grep -v CPU`
v_da=`date +%Y%m%d_%H%M%S`
echo $v_da $v_ps
sleep 1
done

C.java
=======================================
bash-2.03$ cat C.java 
public class C
{
 public static void main(String[] args)
 {
  for(int i=0;i < Integer.parseInt(args[0]);++i);
  System.out.println ("-==< SLEEP >==-");
  try {java.lang.Thread.sleep(Integer.parseInt(args[1]));} catch 
(InterruptedException e) {}
 }
}
bash-2.03$





"Diego Cutrone" <diegocutrone@xxxxxxxxxxxx>
Sent by: oracle-l-bounce@xxxxxxxxxxxxx
23.06.2004 22:51
Please respond to oracle-l
 
        To:     "Oracle List" <oracle-l@xxxxxxxxxxxxx>
        cc: 
        Subject:        Process consuming a lot of CPU, UTL_FILE and WAIT 
EVENTS relationship


Hi List,

I have a process that has been consuming a lot of CPU time in out HP 
server.
As you can see below, at the moment I took the snapshot it had already 
been executing for 55 minutes.

CPU TTY  PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU 
COMMAND
 1   ? 17676 erp    216 20 33984K  2912K run     55:14 51.29 51.20 
oracleERP


This is what I've got so far:

*** 10046 trace event

*** SESSION ID:(196.28282) 2004-06-23 10:15:56.166
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
APPNAME mod='WSCMTI' mh=1720951941 act='Concurrent Request' ah=1021472160
=====================
PARSING IN CURSOR #1 len=48 dep=0 uid=183 oct=47 lid=183 tim=1425833216 
hv=3412908754 ad='190e52e8'
begin fnd_file_private.logfile_get(:s, :b); end;
END OF STMT
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833216
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 
p3=0=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217


and keeps going on like this.


*** Then checked the fnd_file_private.logfile_get procedure:

package body APPS.FND_FILE_PRIVATE as
/* $Header: AFCPPPRB.pls 115.5 2001/06/16 09:40:34 pkm ship      $ */

LOG  utl_file.file_type;
OUT  utl_file.file_type;

BUFFER_SIZE constant number := 32500;

LOG_FNAME varchar2(255);
OUT_FNAME varchar2(255);
TEMP_DIR varchar2(255);

NEXT_LOG_LINE varchar2(32767);
NEXT_OUT_LINE varchar2(32767);

procedure LOGFILE_GET(STATUS in out varchar2, TEXT in out varchar2) is
CR varchar2(2);
begin

 CR := '
';
 TEXT := '';
 while nvl(lengthb(TEXT), 0) + nvl(lengthb(NEXT_LOG_LINE), 0) < 
BUFFER_SIZE loop
  TEXT := concat(TEXT, NEXT_LOG_LINE);
  NEXT_LOG_LINE := '';
  utl_file.get_line(LOG, NEXT_LOG_LINE);
  NEXT_LOG_LINE := concat(NEXT_LOG_LINE, CR);
 end loop;

 STATUS := 'OK';

 exception
 when NO_DATA_FOUND then
  if nvl(length(TEXT), 0) > 0 then
   STATUS := 'OK';
   return;
  else
   STATUS := 'EOF';
   return;
  end if;

 when UTL_FILE.INVALID_FILEHANDLE then
  fnd_message.set_name('FND', 'CONC-TEMPFILE_INVALID_HANDLE');
  fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE);
         raise_application_error(-20104, fnd_message.get);

 when UTL_FILE.INVALID_OPERATION then
  fnd_message.set_name('FND', 'CONC-TEMPFILE_INVALID_OPERATN');
  fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE);
         raise_application_error(-20105, fnd_message.get);

 when UTL_FILE.READ_ERROR then
  fnd_message.set_name('FND', 'CONC-TEMPFILE_READ_ERROR');
  fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE);
  raise_application_error(-20106, fnd_message.get);


 when OTHERS then
  raise;

end LOGFILE_GET;
...


*** And the session wait events are:

EVENT                                    TOTAL_WAITS TIME_WAITED
---------------------------------------- ----------- -----------
SQL*Net message from client                 20045610      409293
SQL*Net message to client                     20045610        8731
latch free                                                    79711 4083
SQL*Net more data from client                            2           6
db file scattered read                                        80 2
buffer busy waits                                               2  0
file open                                                           1  0
SQL*Net more data to client                               5           0



My guess is that the session is performing many utl_file.get_line as seen 
before, and that's why it's taking so much CPU time.
But what it's not clear to me is why Oracle does not show that in the wait 
events......

Why is the EXEC showing up in the 10046?
"EXEC #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217"


Can anybody elaborate on this?
Any opinions?


Thanks,
Diego








----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------


----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: