Hi Diego, Hi ALL I would like to pay your attention to the aspect of measuring CPU time from OS point of view. A totally agree with list members who responded to you (Jonathan, John) that CPU time can be unaccounted within SQL Trace file because of granularity and other time accounting effect in oracle code. But it also can appear, that OS lying to you regarding CPU consumption by this process and it actually doesn?t consume CPU continuously. It can be that problem in your case not DB tier, but application tier or even network. Seams there no absolute truth on this planet ;) Some time ago, I ran into situation like you described. After investigation it is appears that most of OS utils like ?ps?, ?top? etc. don?t show current situation of CPU consumption ;) As we can see from your output, it is seams that process pid=17676 consumes 55% of CPU time. ======================================= CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND 1 ? 17676 erp 216 20 33984K 2912K run 55:14 51.29 51.20 oracleERP ======================================= But actually, this information is misleading ;) Most of OS utils show !!! AVERAGE !!! CPU utilization per process for indeterminate time period. Solaris "man ps" ======================================= pcpu The ratio of CPU time used recently to CPU time avail- able in the same period, expressed as a percentage. The meaning of ``recently'' in this context is unspecified. The CPU time available is determined in an unspecified manner. ======================================= Possibly your process actually wait on SQL*Net event (application tier) for longer time periods (as it is appears in the SQL Trace file) and consume 100% of CPU time for short time intervals. For such process OS will show you continuously 20-80% CPU consumption. "ps" as well as "top" command show misleading information on HPUX, Solaris, Tru64 and some others platforms. Truth is - there is not so easy to get correct information for CPU consumption by process. On HPUX glance utility can be used so see correct ONLINE information about CPU utilization on HPUX (there diffrent utils on dffrent platforms). I have created TESTCASE for checking lie of OS utilities (source of test is attached at the end): Take a look on the output below: Till row ?-==< SLEEP >==-? process which CPU utilization (second column) monitored by ?ps? command actually consuming 100% of CPU, because it running loop. As you can see ?ps? command doesn?t show 100%, but show progressively CPU time increasing. After ?-==< SLEEP >==-? row, mentioned process actually doesn?t consume any CPU (wait on .sleep call) but ?ps? shows decreasing of CPU consummation. If your process time to time consumes 100% of CPU for short periods then OS will sow you CPU utilization continuously. It can appear that OS show you CPU utilization on database tier, but actual bottle neck is application server or network, as was in my case. Any comments? Best regards, Jurijs TESTCASE output ======================================= bash-2.03$ ./test.sh 10000000 10000 PID = 21287 20040627_134840 0.00 20040627_134841 4.69 20040627_134842 9.29 20040627_134843 13.62 20040627_134844 17.79 20040627_134845 21.76 20040627_134847 25.53 20040627_134848 29.12 20040627_134849 32.48 20040627_134850 35.73 20040627_134851 38.82 20040627_134852 41.76 20040627_134853 44.50 20040627_134854 47.16 -==< SLEEP >==- 20040627_134855 49.69 20040627_134856 49.71 20040627_134857 47.28 20040627_134858 44.97 20040627_134859 42.78 20040627_134900 40.69 20040627_134901 36.81 20040627_134902 35.06 20040627_134903 33.35 20040627_134904 31.72 ======================================= Source of TESTCASE: test.sh ======================================= java C $1 $2 & v_pid=$! echo PID = $v_pid while [ 1 ] ; do v_ps=`UNIX95= ps -p $v_pid -o pcpu | grep -v CPU` v_da=`date +%Y%m%d_%H%M%S` echo $v_da $v_ps sleep 1 done C.java ======================================= bash-2.03$ cat C.java public class C { public static void main(String[] args) { for(int i=0;i < Integer.parseInt(args[0]);++i); System.out.println ("-==< SLEEP >==-"); try {java.lang.Thread.sleep(Integer.parseInt(args[1]));} catch (InterruptedException e) {} } } bash-2.03$ "Diego Cutrone" <diegocutrone@xxxxxxxxxxxx> Sent by: oracle-l-bounce@xxxxxxxxxxxxx 23.06.2004 22:51 Please respond to oracle-l To: "Oracle List" <oracle-l@xxxxxxxxxxxxx> cc: Subject: Process consuming a lot of CPU, UTL_FILE and WAIT EVENTS relationship Hi List, I have a process that has been consuming a lot of CPU time in out HP server. As you can see below, at the moment I took the snapshot it had already been executing for 55 minutes. CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND 1 ? 17676 erp 216 20 33984K 2912K run 55:14 51.29 51.20 oracleERP This is what I've got so far: *** 10046 trace event *** SESSION ID:(196.28282) 2004-06-23 10:15:56.166 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 APPNAME mod='WSCMTI' mh=1720951941 act='Concurrent Request' ah=1021472160 ===================== PARSING IN CURSOR #1 len=48 dep=0 uid=183 oct=47 lid=183 tim=1425833216 hv=3412908754 ad='190e52e8' begin fnd_file_private.logfile_get(:s, :b); end; END OF STMT EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833216 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217 and keeps going on like this. *** Then checked the fnd_file_private.logfile_get procedure: package body APPS.FND_FILE_PRIVATE as /* $Header: AFCPPPRB.pls 115.5 2001/06/16 09:40:34 pkm ship $ */ LOG utl_file.file_type; OUT utl_file.file_type; BUFFER_SIZE constant number := 32500; LOG_FNAME varchar2(255); OUT_FNAME varchar2(255); TEMP_DIR varchar2(255); NEXT_LOG_LINE varchar2(32767); NEXT_OUT_LINE varchar2(32767); procedure LOGFILE_GET(STATUS in out varchar2, TEXT in out varchar2) is CR varchar2(2); begin CR := ' '; TEXT := ''; while nvl(lengthb(TEXT), 0) + nvl(lengthb(NEXT_LOG_LINE), 0) < BUFFER_SIZE loop TEXT := concat(TEXT, NEXT_LOG_LINE); NEXT_LOG_LINE := ''; utl_file.get_line(LOG, NEXT_LOG_LINE); NEXT_LOG_LINE := concat(NEXT_LOG_LINE, CR); end loop; STATUS := 'OK'; exception when NO_DATA_FOUND then if nvl(length(TEXT), 0) > 0 then STATUS := 'OK'; return; else STATUS := 'EOF'; return; end if; when UTL_FILE.INVALID_FILEHANDLE then fnd_message.set_name('FND', 'CONC-TEMPFILE_INVALID_HANDLE'); fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE); raise_application_error(-20104, fnd_message.get); when UTL_FILE.INVALID_OPERATION then fnd_message.set_name('FND', 'CONC-TEMPFILE_INVALID_OPERATN'); fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE); raise_application_error(-20105, fnd_message.get); when UTL_FILE.READ_ERROR then fnd_message.set_name('FND', 'CONC-TEMPFILE_READ_ERROR'); fnd_message.set_token('TEMP_FILE', LOG_FNAME, FALSE); raise_application_error(-20106, fnd_message.get); when OTHERS then raise; end LOGFILE_GET; ... *** And the session wait events are: EVENT TOTAL_WAITS TIME_WAITED ---------------------------------------- ----------- ----------- SQL*Net message from client 20045610 409293 SQL*Net message to client 20045610 8731 latch free 79711 4083 SQL*Net more data from client 2 6 db file scattered read 80 2 buffer busy waits 2 0 file open 1 0 SQL*Net more data to client 5 0 My guess is that the session is performing many utl_file.get_line as seen before, and that's why it's taking so much CPU time. But what it's not clear to me is why Oracle does not show that in the wait events...... Why is the EXEC showing up in the 10046? "EXEC #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=1425833217" Can anybody elaborate on this? Any opinions? Thanks, Diego ---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html ----------------------------------------------------------------- ---------------------------------------------------------------- Please see the official ORACLE-L FAQ: http://www.orafaq.com ---------------------------------------------------------------- To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx put 'unsubscribe' in the subject line. -- Archives are at //www.freelists.org/archives/oracle-l/ FAQ is at //www.freelists.org/help/fom-serve/cache/1.html -----------------------------------------------------------------