RE: Pro C bulk insert error

  • From: "Goulet, Richard" <Richard.Goulet@xxxxxxxxxxx>
  • To: <avramil@xxxxxxxxxxxxxx>, <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 11 Oct 2010 13:14:17 -0400

OH, I know I'm late to the party here, but yes I have seen that error
before in PRO*C with bulk inserts. 

Here's the problem:

In Pro*C if you want to do bulk inserts you have to declare an array
that will hold the data and that array is of a fixed size.  Now if you
do not take care to prevent your code from running off the high end of
the array, well it will (remember C lets aim and shoot yourself in the
foot easily).  In doing so you can overwrite just about any memory area
and a lot of folks still put there declare areas as globals.  When this
happens your most likely to clobber your connection memory area than
anything else.

Solution, if you do not have access to the source code, don't run in
bulk insert mode.  If you do, move the array declare into the function
that manages it and make sure you can't run off the high end.


Dick Goulet
Senior Oracle DBA


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Lou Avrami
Sent: Thursday, October 07, 2010 8:44 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Pro C bulk insert error

Hello all,

We're experiencing a very interesting error and I'm hoping someone might
have some insight into a solution.

We have an application from a third-party vendor which collects
"messages" from various remote devices.  The messages are loaded into an
Oracle 11.2.0.1 Enterprise Edition database, running on a Solaris 10
server.

Every week or so, we have an issue that causes these messages to queue
up in the application.  When the bottleneck is resolved, the queued
messages are "bulk loaded" into the database.  During these bulk load
periods, the application receives the following error message:

------ Tue Sep 21 05:49:40 2010
pid     = 28927
conn    = [wfc]
process = ideasLog
module  = DB_ALG_UpdateChunk
sqlcode = -12152
mesg    = ORA-12152: TNS:unable to send break message

The connection to the database is apparently dropped, as the following
message is:

mesg    = ORA-03114: not connected to ORACLE

The application eventually reconnects:

mesg    = Attempting to reconnect to database.
...
mesg    = reconnecting, retries 30, interval 10
...
mesg    = reconnect attempt #1
...
mesg    = reconnect successful

but in the interim, many of the queued messages are lost.

In a non-production environment the application folks discovered a way
to cause messages to queue up and subsequently recreate the error.  I
set up client and server side tracing to capture the SQL that was
actually producing the ORA-12152 errors, but when tracing was enabled,
we could not recreate the application errors.  When tracing was removed,
we COULD reproduce the errors.  Within the application, we also had the
ability to set up a diagnostic mode and capture transactions, but again,
when the app tracing was turned on, we could not reproduce the errors.
Turn the app tracing off though and the ORA-12152 errors could be
reproduced at will.

When we ran these conditions by the vendor support, they came back with
a recommendation that we turn off the "bulkDBOperation" application
setting.  The default setting is "ON", which means that 250 rows of
messages are processed at a time and are inserted into several tables.
Turning bulkDBOperation OFF only processes 1 row of messages at a time,
significantly slower.

Turn bulkDBOperation ON, the errors can be reproduced on demand, by
executing certain functions and forcing a bulk load of messages.  Turn
it off and these same functions do not produce the ORA-12152 errors and
lost messages.

So it seems that when the application is "slow" - either because
application tracing is on, Oracle tracing is on or the application is
configured to only process one message at a time - it "works", there are
no ORA-12152 and no messages are lost.  But if no tracing is enabled and
the default processing mode is used, errors are produced when messages
are queued up and the application has to process a backlog.

What I find especially interesting is that if bulkDBOperation is ON and
some sort of tracing is also enabled, there are no errors.  The
application runs significantly slower, for example it takes about 10
minutes to restart, as opposed to 1-2 minutes when there is no tracing.
But turn tracing off, with bulkDBOperation ON, and an ORA-12152 error
occurs within the first minute of processing backlog messages.

The app folks are of the opinion that this is an "Oracle' problem.
Maybe ... it just seems very strange that when there is tracing overload
on the database there are no errors, but when tracing is off errors
occur.

The vendor support folks did send along the Pro C code that does the
message processing.  I am pasting it below, preceded by the explanation
by the vendor.  Previously, they had suggested that we reconfigure the
SDU and TDU packet sizes within SQLNET.  That had no affect - there
ORA-12152 errors continued.  

Reading the vendor's explanation and looking at the code, one thing kind
of struck me - what happens if one or more of the several "buffers" that
are processed at the end of the 5 second period do NOT have 250 rows?
Does the code still try to load it 250 times?  What happens if some
buffers have 250 rows and some do not?

Looking at the nearest Pro C manual, I came across the following:

http://www.cis.njit.edu/~oracle/manuals/proc/ch13.html

Using the FOR Clause 
....
The FOR clause variable specifies the number of array elements to be
processed. Make sure the number is not larger than the smallest array
dimension. Also, the number must be positive. If it is negative or zero,
no rows are processed and Oracle issues an error message.

This may not be applicable ...
I just find it very strange that when there is overload from runing in a
diagnostic mode, there are no errors.

If anyone has some thoughts or insights into this problem, please let me
know what they are.

Thanks,
Lou Avrami

========================
By changing how long it takes to make a DB call or how long it takes
IDEAS to send a message to ideasLog, you are changing how many message
ideasLog can process. ideasLog will spend 5 seconds in its IPMH loop
gathering messages from IDEAS. If it takes longer to process those
messages in any way, then we can send as many rows to the DB at a time.
 
We have buffers of 250 rows across several tables, alertlog, eventatt,
alertprop, ..., that we are attempting to fill up during this 5 second
period. If one of the buffers fills, we flush all of them to the DB
using Oracle host arrays. This is where the bigger packets are coming
from. When IDEAS or the DB slows down, we can't get as many rows into
those buffers during that 5 seconds. So, when we flush, the packets
sizes down at the oracle layer are probably smaller. 
 
My guess is that if you turn off the bulkDBOperation, you will not see
this error from Oracle because the packet sizes will be greatly
decreased. Steve is correct though. ideasLog will be significantly
slower. 
 
Here is an example of the code that gets used when the bulkDBOperation
option is turned on. Notice that we have the ability to update 250 rows
at a time. This is one of the methods that seems to trigger the network
packet errors.

nxint16
DB_ALG_UpdateChunk_r( ALG *chunk, nxuint32 HowMany)
{
   ALG *pAlgPtr;
 
   EXEC SQL BEGIN DECLARE SECTION;
        nxint32     arr_alertlogUpd_alert[250];
        nxint32     arr_alertlogUpd_alertid[250];
        nxint32     arr_alertlogUpd_mom[250];
        nxint32     arr_alertlogUpd_mo[250];
        nxint32     arr_alertlogUpd_times[250];
        nxint32     arr_alertlogUpd_ackoper[250];
        nxint32     arr_alertlogUpd_curroper[250];
        nxint32     arr_alertlogUpd_cleartype[250];
        nxint16     arr_alertlogUpd_severity[250];
        nxint16     arr_alertlogUpd_status[250];
        nxint16     arr_alertlogUpd_clearreason[250];
        char        arr_alertlogUpd_first1[250][26];
        char        arr_alertlogUpd_last1[250][26];
        char        arr_alertlogUpd_cleared[250][26];
        char        arr_alertlogUpd_acknowledge[250][26];
        char        arr_alertlogUpd_text1[250][256];
        nxint32     rowcount;
 
   EXEC SQL END DECLARE SECTION;
 
   rowcount = HowMany;
 
   for (int i=0; i<HowMany; i++) {
 
        pAlgPtr = (ALG *) chunk;
 
        arr_alertlogUpd_alert[i]       = pAlgPtr->alert;
        arr_alertlogUpd_alertid[i]     = pAlgPtr->alertid;
        arr_alertlogUpd_mom[i]         = pAlgPtr->mom;
        arr_alertlogUpd_mo[i]          = pAlgPtr->mo;
        arr_alertlogUpd_times[i]       = pAlgPtr->times;
        arr_alertlogUpd_ackoper[i]     = pAlgPtr->ackoper;
        arr_alertlogUpd_curroper[i]    = pAlgPtr->curroper;
        arr_alertlogUpd_cleartype[i]   = pAlgPtr->cleartype;
        arr_alertlogUpd_severity[i]    = pAlgPtr->severity;
        arr_alertlogUpd_status[i]      = pAlgPtr->status;
        arr_alertlogUpd_clearreason[i] = pAlgPtr->clearreason;
 
        dbstrcpyin(arr_alertlogUpd_first1[i],      pAlgPtr->first1 );
        dbstrcpyin(arr_alertlogUpd_last1[i],       pAlgPtr->last1 );
        dbstrcpyin(arr_alertlogUpd_cleared[i],     pAlgPtr->cleared );
        dbstrcpyin(arr_alertlogUpd_acknowledge[i], pAlgPtr->acknowledge
);
        dbstrcpyin(arr_alertlogUpd_text1[i],       pAlgPtr->text1 );
 
        chunk++;
   }
 
   EXEC SQL FOR :rowcount UPDATE alertlog
            SET alertid     =      :arr_alertlogUpd_alertid,
                mom         =      :arr_alertlogUpd_mom,
                mo          =      :arr_alertlogUpd_mo,
                times       =      :arr_alertlogUpd_times,
                ackoper     =      :arr_alertlogUpd_ackoper,
                curroper    =      :arr_alertlogUpd_curroper,
                cleartype   =      :arr_alertlogUpd_cleartype,
                severity    =      :arr_alertlogUpd_severity,
                status      =      :arr_alertlogUpd_status,
                clearreason =      :arr_alertlogUpd_clearreason,
                first1      =      :arr_alertlogUpd_first1,
                last1       =      :arr_alertlogUpd_last1,
                cleared     =      :arr_alertlogUpd_cleared,
                acknowledge =      :arr_alertlogUpd_acknowledge,
                text1       =      :arr_alertlogUpd_text1
           WHERE alert = :arr_alertlogUpd_alert;
 
 
   if (DB_DB_SQLErr(&sqlca,"DB_ALG_UpdateChunk")) {
        DB_DB_Commit();
        return 1;
   }
 
   DB_DB_Commit();
   return 0;
}
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: