Re: Process died -- no info in trace files

  • From: Tony van Lingen <tony_vanlingen@xxxxxxxxxxxxxxxxxxxxx>
  • To: saad4u@xxxxxxxxx
  • Date: Fri, 17 Apr 2009 09:22:36 +1000

Hi Saad,

I would also expect memory problems at the OS level. Did you check the Linux messages log (/var/log/messages)? You mention that the users were running pipelines - if the same box that runs the database also runs heavy user processes, it might have run out of memory which will activate the OOM-killer in the kernel. This will try and identify the least important process, which usually boils down to an oracle background process that does not seem to do a lot, and kill it in order to resolve the Out Of Memory situation. There will be a trace in the messages file of this.

The requirements in the installation doco should be used as a very minimal guideline only.. you must tune them to your situation.

Cheers,
Tony

Saad Khan wrote:
I ran sysctl -A myself as root (thanks to sysadmin's short memory for not revoking my access), and then compared the results with kernel prerequisites for the installation. All the values are above the minimum settings required. So I think we can opt out this as a possible cause.


I also ran the HCVE script as per metalink note Note 250262.1 <https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=250262.1&blackframe=1>

No solution yet! :(


On Thu, Apr 16, 2009 at 9:58 AM, Joey D'Antoni <jdanton1@xxxxxxxxx <mailto:jdanton1@xxxxxxxxx>> wrote:

    Could you have your sysadmin do a sysctl -A? I suspect some of the
    needed kernel settings related to Oracle may not be set properly.

    ------------------------------------------------------------------------
    *From:* Saad Khan <saad4u@xxxxxxxxx <mailto:saad4u@xxxxxxxxx>>
    *To:* oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
    *Sent:* Thursday, April 16, 2009 9:49:58 AM
    *Subject:* Re: Process died -- no info in trace files

    Now, we have seen this error in the production box as well.
    Earlier it was at QA


    /Process m000 died, see its trace file

    ksvcreate: Process(m000) creation failed/


    Just thinking out loud, if its something related to OS,  how can
    it be hit in two different boxes at almost the save time? Can this
    be a bug? I'm just getting stumped. Can someone plz help me?

    Now in trach=e
    On Wed, Apr 15, 2009 at 9:30 PM, Jack van Zanen <jack@xxxxxxxxxxxx
    <mailto:jack@xxxxxxxxxxxx>> wrote:

        metalink doc *790397.1*

        has similar errors but for different processes. Could the
        underlying cause be the same .

            Cause

        This is caused by lack of OS configuration, where more memory
        is required as OS reached the limits set.
Jack

        2009/4/16 Saad Khan <saad4u@xxxxxxxxx <mailto:saad4u@xxxxxxxxx>>


            Sorry, I was looking the trace files in the bdump directory.

            When I checked the traces at udump, I found following in
            some of them:
            /
            Process P003 is dead (pid=25576, state=3):
            kxfpg1srv
                    could not start local P003
            *** 2009-04-15 14:03:56.381
            Process P003 is dead (pid=25580, state=3):
            kxfpg1srv
                    could not start local P003
            *** 2009-04-15 14:03:57.384
            Process P003 is dead (pid=25582, state=3):
            kxfpg1srv
                    could not start local P003
            *** 2009-04-15 14:03:58.387
            Process P003 is dead (pid=25584, state=3):
            kxfpg1srv
                    could not start local P003
            *** 2009-04-15 14:03:59.417
            Process P003 is dead (pid=25586, state=3):
            kxfpg1srv
                    could not start local P003/



            Does this ring a bell?



            On Wed, Apr 15, 2009 at 3:20 PM, Stephane Faroult
            <sfaroult@xxxxxxxxxxxx <mailto:sfaroult@xxxxxxxxxxxx>> wrote:

                The wording of your post ("I really dont see anything
                in the trace
                file") makes me think that you are looking in the
                alert file or
                something similar. You should look for .trc files
                under the directory
                defined as "user_dump_dest" in you parameter files (cd
                ../udump from the
                directory where you alert file is located should take
                you to the right
                place).

                HTH

                S Faroult

                Saad Khan wrote:
                > Hi fellows,
                >
                > I've oracle 10g (10.2.0.4) running at Linux with
                partitioning option.
                > The users were running pipelines while I was
                informed that the they
                > got crashed. When I checked the alert log file, I
                could see the
                > following error messages:
                >
                >
                > /Wed Apr 15 14:03:54 2009
                > Process P003 died, see its trace file
                > Wed Apr 15 14:03:55 2009
                > Process P004 died, see its trace file
                > Wed Apr 15 14:03:56 2009
                > Process P003 died, see its trace file
                > Process P003 died, see its trace file
                > Process P003 died, see its trace file
                > Process P003 died, see its trace file
                > Wed Apr 15 14:04:02 2009
                > Process P005 died, see its trace file
                > Process P005 died, see its trace file
                >
                >
                >
                > /Now, the wierd thing is, I really dont see anything
                in the trace file
                > that could point anything that could have caused this.
                >
                > I checked my parameters and found that the PROCESSES
                parameter was set
                > to a very low value (i.e.150). Now I've increased it
                to 400 but this
                > is just a shot in dark. I'm totally unsure if this
                could be the reason.
                >
                > Can anyone please help me? Its quite urgent.
                >
                > Thanks,
                >
                >
                > Khan.
                >






-- Jack van Zanen

        -------------------------
        This e-mail and any attachments may contain confidential
        material for the sole use of the intended recipient. If you
        are not the intended recipient, please be aware that any
        disclosure, copying, distribution or use of this e-mail or any
        attachment is prohibited. If you have received this e-mail in
        error, please contact the sender and delete all copies.
        Thank you for your cooperation




Other related posts: