Linux Processes Followup - Was: Fun With Scan Listener

  • From: David Barbour <david.barbour1@xxxxxxxxx>
  • To: "Mark W. Farnham" <mwf@xxxxxxxx>
  • Date: Tue, 3 Jun 2014 18:31:01 -0500

This has probably been noticed by others, but over the weekend I had to
stop and start CRS manually on all nodes to free up an ASM disk that was in
the middle of a rebalancing operation when the electricians accidentally
cut off power to the data center (that was fun too). Part of what I may
have run into with the scan listener is out lined in Doc ID 1594606.1 "The
processes and resources started by CRS (Grid Infrastructure) do not inherit
the ulimit setting for "max user processes" from /etc/security/limits.conf
setting"

The resolution is to modify the ohasd script to set the ulimit explicitly
for all grid and database resources that are started by the Grid
Infrastructure (GI).

Guess it's important if you ever want to start and stop nodes individually
on a busy RAC.


On Tue, Jun 3, 2014 at 5:53 PM, David Barbour <david.barbour1@xxxxxxxxx>
wrote:

> Thanks guys.  Both the note and the permissions suggestion were helpful.
> We added a database to the cluster.  There appears to have been some
> transitional understanding of the difference between RAC and SCAN and VIP
> configuration vs. traditional standalone single listener and tnsnames
> configuration as well as command usage.  However, even when that was ironed
> out, I still got the error.  It seems this database was configured with
> 2000 processes.  All the DBs on the RAC are owned by oracle.  Doc 579365.1
> lit up the dim bulb and I checked max user processes.  1024.  When I add
> up everything that's connected and everything that needs to run just to
> start the DBs & ASM, I come up with waaaaay more than 1024.  I'm greatly
> surprised (and completely thankful) that this didn't crash and burn before
> now.
>
> I love this list.
>
>
> On Tue, Jun 3, 2014 at 5:00 PM, Mark W. Farnham <mwf@xxxxxxxx> wrote:
>
>> Well this is sort of a hunt and poke around problem, but I would start
>> with checking owner, group and file permissions and who owns the various
>> processes that cannot be stopped or started.
>>
>>
>>
>> I'd check the setuid on the programs on rchr1p01 (guessing that is the
>> node in question), and if maybe someone started something as root and that
>> fubared some permissions.
>>
>>
>>
>> Nothing is jumping out of the logs, except I wonder about ohome(null),
>> but I haven't seen that error message in context, so that might be okay.
>>
>>
>>
>> mwf
>>
>>
>>
>> *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
>> oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *David Barbour
>> *Sent:* Tuesday, June 03, 2014 5:27 PM
>> *To:* oracle-l mailing list
>>
>> *Subject:* Fun With Scan Listener
>>
>>
>>
>> Oracle 11.2.0.3  RHEL 6.3  5-Node RAC
>>
>> This has me somewhat (okay -  totally) baffled.  I have a scan listener
>> that is showing as follows when I run crsctl status resource -t:
>>
>>
>> --------------------------------------------------------------------------------
>> Cluster Resources
>>
>> --------------------------------------------------------------------------------
>> ora.LISTENER_SCAN1.lsnr
>>       1        ONLINE  UNKNOWN      rchr1p01
>>
>> If I check the status via srvctl I get the following:
>>
>>  $ srvctl status scan_listener
>> SCAN Listener LISTENER_SCAN1 is enabled
>> SCAN listener LISTENER_SCAN1 is not running
>> SCAN Listener LISTENER_SCAN2 is enabled
>> SCAN listener LISTENER_SCAN2 is running on node rchr1p02
>> SCAN Listener LISTENER_SCAN3 is enabled
>> SCAN listener LISTENER_SCAN3 is running on node rchr1p03
>>
>> So I try to start it:
>>
>>  $ srvctl start scan_listener -i 1
>> PRCR-1079 : Failed to start resource ora.LISTENER_SCAN1.lsnr
>> CRS-5013: Agent "/oracle/grid/11203/bin/oraagent.bin" failed to start
>> process "/oracle/grid/11203/bin/lsnrctl" for action "clean": details at
>> "(:CLSN00008:)" in
>> "/oracle/grid/11203/log/rchr1p01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
>> CRS-5013: Agent "/oracle/grid/11203/bin/oraagent.bin" failed to start
>> process "/oracle/grid/11203/bin/lsnrctl" for action "check": details at
>> "(:CLSN00008:)" in
>> "/oracle/grid/11203/log/rchr1p01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
>> CRS-2680: Clean of 'ora.LISTENER_SCAN1.lsnr' on 'rchr1p01' failed
>>
>> So I try to stop it:
>>
>> srvctl stop scan_listener -i 1 -f
>> PRCR-1065 : Failed to stop resource ora.LISTENER_SCAN1.lsnr
>> CRS-5013: Agent "/oracle/grid/11203/bin/oraagent.bin" failed to start
>> process "/oracle/grid/11203/bin/lsnrctl" for action "clean": details at
>> "(:CLSN00008:)" in
>> "/oracle/grid/11203/log/rchr1p01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
>> CRS-5013: Agent "/oracle/grid/11203/bin/oraagent.bin" failed to start
>> process "/oracle/grid/11203/bin/lsnrctl" for action "check": details at
>> "(:CLSN00008:)" in
>> "/oracle/grid/11203/log/rchr1p01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
>> CRS-2680: Clean of 'ora.LISTENER_SCAN1.lsnr' on 'rchr1p01' failed
>>
>> So I give up and check the log.
>>
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][2617243392]
>> {1:53466:11150} [clean] clsn_agent::clean: Exception
>> SclsProcessSpawnException
>> 2014-06-03 16:19:48.666: [    AGFW][3623876352] {1:53466:11150} Agent
>> sending reply for: RESOURCE_CLEAN[ora.LISTENER_SCAN1.lsnr 1 1] ID 4100:58347
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][2617243392]
>> {1:53466:11150} [clean] (:CLSN00106:) clsn_agent::clean }
>> 2014-06-03 16:19:48.666: [    AGFW][2617243392] {1:53466:11150} Command:
>> clean for resource: ora.LISTENER_SCAN1.lsnr 1 1 completed with status: FAIL
>> 2014-06-03 16:19:48.666: [    AGFW][3623876352] {1:53466:11150} Agent
>> sending reply for: RESOURCE_CLEAN[ora.LISTENER_SCAN1.lsnr 1 1] ID 4100:58347
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] LsnrAgent::check {
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] lsnrctl status LISTENER_SCAN1
>>
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] getOracleHomeAttrib: oracle_home =
>> /oracle/grid/11203
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] getOracleHomeAttrib: oracle_home =
>> /oracle/grid/11203
>> 2014-06-03 16:19:48.666: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Utils::getCrsHome crsHome /oracle/grid/11203
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Utils::execCmd 1
>> USR_ORA_ENV:ORACLE_BASE=/opt/oracle oracleHome:/oracle/grid/11203
>> CrsHome:/oracle/grid/11203
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Utils::getCrsHome crsHome /oracle/grid/11203
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Adding Environment Variables
>> ORACLE_HOME=/oracle/grid/11203
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Adding Environment Variables
>> TNS_ADMIN=/oracle/grid/11203/network/admin/
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Adding Environment variable from USR_ORA_ENV
>> ORACLE_BASE=/opt/oracle
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] Utils:execCmd action = 3 flags = 38 ohome = (null)
>> cmdname = lsnrctl.
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] getOracleHomeAttrib: oracle_home =
>> /oracle/grid/11203
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] (:CLSN00008:)Utils:execCmd scls_process_spawn()
>> failed 1
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] (:CLSN00008:) category: -2, operation: fork, loc:
>> spawnproc28, OS error: 11, other: forked failed [-1]
>> 2014-06-03 16:19:48.667: [   AGENT][3087005440] {1:53466:11150}
>> UserErrorException: Locale is
>> 2014-06-03 16:19:48.667: [ora.LISTENER_SCAN1.lsnr][3087005440]
>> {1:53466:11150} [check] clsnUtils::error Exception type=2 string=
>> CRS-5013: Agent "/oracle/grid/11203/bin/oraagent.bin" failed to start
>> process "/oracle/grid/11203/bin/lsnrctl" for action "check": details at
>> "(:CLSN00008:)" in
>> "/oracle/grid/11203/log/rchr1p01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
>>
>> I'm willing to go with the part about 'UserErrorException', except I'm
>> not aware of what I'm doing wrong.  Looking through MOS docs but hoping
>> someone has a suggestion?
>>
>
>

Other related posts:

  • » Linux Processes Followup - Was: Fun With Scan Listener - David Barbour