[racattack] How do I begin to troubleshoot on a 'down' Rac node: collabn1?

  • From: Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>
  • To: Racattack <racattack@xxxxxxxxxxxxx>
  • Date: Sat, 12 Apr 2014 07:53:03 -0400

Hi Erik, and everyone,
That's true, a shared folder. Why couldn't I think of that. I thought
perhaps I didn't setup any network between the two PCs, so they act
independently as 2 separate PCs...
Anyway, it's quite a challenge, for a RAC newbie like me, :-), to begin to
learn RAC by starting his 1st troubleshoot on the current issue that 1 of
my 2-node RAC, on my desktop RAC, is currently DOWN. But if I have some
help from you and others in the RAC Attack community, perhaps I can learn a
lots from this challenge. So here are the diagnosis:

*collabn1*: is DOWN (see attached notes)
     [root@collabn1 client]# crsctl stat res -t
     CRS-4535: Cannot communicate with Cluster Ready Services
     CRS-4000: Command Status failed, or completed with errors.
     [root@collabn1 client]#

 So I checked the log file generated today under Grid Home dir:
*/u01/app/12.1.0/grid/log/collabn1/client*
     [root@collabn1 client]# cat crsctl_967.log
     Oracle Database 12c Clusterware Release 12.1.0.1.0 - Production
      Copyright 1996, 2013

     Oracle. All rights reserved.
      2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_waitconnect:
                                                    CONN NOT ESTABLISHED
(0,29,1,2)
      2014-04-12 06:21:50.395: [  OCRMSG][2528155200]GIPC error [29] msg

[gipcretConnectionRefused]
      2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_connect:
                                     error while waiting for connection
complete [24]
I ran the "ls -l /dev/asm*" and the output directories look a little bit
different on collabn1 compare with collabn2. collabn1 does NOT show the
file: "shared-45" as collabn2 did. Is this something to consider?
I restart the network on collab1 and the output looks good; collabn1 can
ping collabn2.
=========================================================
collabn2: is UP and running fine (see attached notes)
"crsctl stat res -t" command shows everything is fine on this node.
=========================================================

So, how should I begin the troubleshooting the 'down' Rac node: collabn1?
Is this possible that because collabn1 was down, it has evicted itself and
now collabn2 become the main node?
For more infor, please see attached notes.
Hanh,



On Sat, Apr 12, 2014 at 3:15 AM, Erik Benner <erik@xxxxxxxxx> wrote:

> In windows you can also share the folder via the network, and copy it that
> way.
>
>
>
> Sent from my iPad Air
>
> On Apr 11, 2014, at 3:14 AM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx> wrote:
>
> Hi Erik,
> The problem has been resolved with "copy > 4Gb file to an USB external
> drive". I found a way to convert the USB external drive from FAT32 (even
> though it say it was NTFS, it wasn't) to NTFS format.
> I'm inporting these export VM machine files into my laptop as we speak.
> Thanks, again, Erik, for your help.
> Hanh,
>
>
> On Fri, Apr 11, 2014 at 12:43 AM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>wrote:
>
>> Hi Erik,
>> The export file for both VM machine is 15Gb. So I'm attempt to export one
>> VM Machine at a time.
>> But it still running, but I expect it also a BIG file for just 1 VM
>> Machine. I estimate it must be around 7-8Gb.
>> How do one copy a > 4Gb file from 1 PC to another?
>> Hanh,
>>
>>
>> On Thu, Apr 10, 2014 at 11:54 PM, Erik Benner <erik@xxxxxxxxx> wrote:
>>
>>> If you get  stuck please let me know.
>>>
>>>
>>> Erik
>>>
>>>
>>>
>>> Sent from my Galaxy S®III
>>>
>>>
>>> -------- Original message --------
>>> From: Hanh Nguyen
>>> Date:04/10/2014 7:49 PM (GMT-08:00)
>>> To: Racattack
>>> Subject: [racattack] Re: Cloning my successful VMs Oracle12c Rac
>>> installation from 1 PC to another?
>>>
>>> Thank you, Erik. That would be wonderful because I'd have a working
>>> Oracle12C RAC to play with,:-). Thanks, again.
>>> Hanh,.
>>>
>>>
>>> On Thu, Apr 10, 2014 at 10:30 PM, Erik Benner <erik@xxxxxxxxx> wrote:
>>>
>>>> Yes, you should be able to export the VMs, then import them into your
>>>> other machine.
>>>>
>>>> Erik
>>>>
>>>>
>>>> Sent from my iPad Air
>>>>
>>>> > On Apr 10, 2014, at 7:28 PM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>
>>>> wrote:
>>>> >
>>>> > Hi everyone,
>>>> > Using the Rac Attack instructions from the Wiki-book on the web, I've
>>>> been successfully installed Oracle RAC12c 2-nodes VM version 4.3.10 on my
>>>> MS-Windows 32-bit Vista desktop PC with 6GB. But I am having great
>>>> difficulty to duplicate this success to install the same version of
>>>> Oracle12C RAC and VM on my higher RAM Windows 7 64-bit labtop which is more
>>>> portable and easy to carry around.
>>>> > So I wonder, can I clone my successful VMs installation from my
>>>> Desktop PC to my laptop computer?
>>>> > Hanh,
>>>>
>>>>
>>>
>>
>
[root@collabn2 ~]# ps -ef|grep smon
oracle   18114     1  0 06:14 ?        00:00:00 ora_smon_RAC2
root     18798 17950  0 06:18 pts/0    00:00:00 grep smon
oracle   27878     1  0 00:10 ?        00:00:00 asm_smon_+ASM2
[root@collabn2 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.SHARED.advm
               ONLINE  ONLINE       collabn2                 Volume device 
/dev/a
                                                             sm/shared-45 is 
onli
                                                             ne,STABLE
ora.DATA.dg
               ONLINE  ONLINE       collabn2                 STABLE
ora.FRA.dg
               ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       collabn2                 STABLE
ora.asm
               ONLINE  ONLINE       collabn2                 Started,STABLE
ora.data.shared.acfs
               ONLINE  ONLINE       collabn2                 mounted on 
/shared,S
                                                             TABLE
ora.net1.network
               ONLINE  ONLINE       collabn2                 STABLE
ora.ons
               ONLINE  ONLINE       collabn2                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.collabn1.vip
      1        ONLINE  OFFLINE                               STABLE
ora.collabn2.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.cvu
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.rac.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       collabn2                 Open,STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
--------------------------------------------------------------------------------
[root@collabn2 ~]#
[root@collabn2 ~]# ls -l /dev/asm*
brw-rw---- 1 oracle dba 8, 17 Apr 12 06:52 /dev/asm-dsk1
brw-rw---- 1 oracle dba 8, 33 Apr 12 06:51 /dev/asm-dsk2
brw-rw---- 1 oracle dba 8, 49 Apr 12 06:52 /dev/asm-dsk3
brw-rw---- 1 oracle dba 8, 65 Apr 12 06:23 /dev/asm-dsk4

/dev/asm:
total 0
brwxrwx--- 1 root dba 251, 23041 Apr  5 17:34 shared-45
[root@collabn2 ~]#

login as: root
root@192.168.78.51's password:
Last login: Sat Apr 12 06:22:34 2014

[root@collabn1 ~]# ps -ef|grep smon
root      2487  2460  0 06:25 pts/0    00:00:00 grep smon

[root@collabn1 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base has been set to /u01/app/oracle

[root@collabn1 ~]# env|grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/12.1.0/grid

[root@collabn1 ~]# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[root@collabn1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services 
daemon
CRS-4534: Cannot communicate with Event Manager
[root@collabn1 ~]#
[root@collabn1 ~]# cd /u01/app/12.1.0/grid/log
[root@collabn1 log]# ls -l
total 12
drwxr-xr-t 21 root   oinstall 4096 Apr  3 08:38 collabn1
drwxr-xr-x  2 oracle oinstall 4096 Mar 30 14:12 crs
drwxr-x---  4 oracle oinstall 4096 Mar 30 14:32 diag
[root@collabn1 log]# cd col*
[root@collabn1 collabn1]# ls -l
total 136
drwxr-xr-x 8 oracle oinstall  4096 Mar 30 14:31 acfs
drwxr-x--- 2 oracle oinstall  4096 Mar 30 14:28 admin
drwxrwxr-t 4 root   oinstall  4096 Mar 30 14:28 agent
-rw-rw-r-- 1 oracle oinstall 38260 Apr 12 06:33 alertcollabn1.log
drwxr-x--x 2 oracle oinstall 20480 Apr 12 06:27 client
drwxr-x--- 2 root   oinstall  4096 Mar 30 14:28 crflogd
drwxr-x--- 2 root   oinstall  4096 Mar 30 14:28 crfmond
drwxr-x--- 2 root   oinstall  4096 Mar 30 14:34 crsd
drwxr-x--- 2 oracle oinstall  4096 Apr 12 06:33 cssd
drwxr-x--- 2 root   oinstall  4096 Mar 30 14:32 ctssd
drwxr-x--- 4 oracle oinstall  4096 Mar 30 14:28 cvu
drwxr-x--- 2 oracle oinstall  4096 Mar 30 14:28 diskmon
drwxr-x--- 2 oracle oinstall  4096 Apr 12 06:22 evmd
drwxr-x--- 2 oracle oinstall  4096 Mar 30 14:32 gipcd
drwxr-x--- 2 root   oinstall  4096 Mar 30 14:28 gnsd
drwxr-x--- 2 oracle oinstall  4096 Apr 12 06:22 gpnpd
drwxr-x--- 2 oracle oinstall  4096 Mar 30 14:32 mdnsd
drwxr-x--- 2 root   oinstall  4096 Apr 11 20:50 ohasd
drwxrwxr-t 5 oracle oinstall  4096 Mar 30 15:09 racg
drwxr-x--- 2 oracle oinstall  4096 Mar 30 14:28 srvm

[root@collabn1 collabn1]# cd crsd
[root@collabn1 crsd]# ls -ltr
total 3820
-rw-r--r-- 1 root root     234 Mar 30 14:37 crsdOUT.log
-rw-r--r-- 1 root root 3907584 Apr  5 15:07 crsd.log

[root@collabn1 crsd]# cat crsdOUT.log
2014-03-30 14:34:52
Changing directory to /u01/app/12.1.0/grid/log/collabn1/crsd
2014-03-30 14:34:52
CRSD REBOOT
2014-03-30 14:37:59
Changing directory to /u01/app/12.1.0/grid/log/collabn1/crsd
2014-03-30 14:37:59
CRSD REBOOT
[root@collabn1 crsd]# 

[root@collabn1 client]# pwd
/u01/app/12.1.0/grid/log/collabn1/client
[root@collabn1 crsd]# 
[root@collabn1 crsd]# ls -ltr
...

-rw-r--r-- 1 root   root       415 Apr 12 06:21 crsctl_967.log
-rw-r--r-- 1 root   root       412 Apr 12 06:22 crsctl_1557.log
-rw-r--r-- 1 root   root       415 Apr 12 06:22 crsctl_1900.log
-rw-r--r-- 1 root   root      1625 Apr 12 06:22 crsctl_1892.log
-rw-r--r-- 1 root   root       245 Apr 12 06:22 crswrapexece.log
-rw-r--r-- 1 root   root       683 Apr 12 06:27 ocrcheck_2526.log
-rw-r--r-- 1 root   root       385 Apr 12 07:13 crsctl_3597.log
[root@collabn1 client]#
[root@collabn1 client]# cat crsctl_967.log
Oracle Database 12c Clusterware Release 12.1.0.1.0 - Production Copyright 1996, 
2013 Oracle. All rights reserved.
2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_waitconnect: CONN NOT 
ESTABLISHED (0,29,1,2)
2014-04-12 06:21:50.395: [  OCRMSG][2528155200]GIPC error [29] msg 
[gipcretConnectionRefused]
2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_connect: error while 
waiting for connection complete [24]

[root@collabn1 client]# cat crsctl_1557.log
Oracle Database 12c Clusterware Release 12.1.0.1.0 - Production Copyright 1996, 
2013 Oracle. All rights reserved.
2014-04-12 06:22:00.698: [  OCRMSG][468792896]prom_waitconnect: CONN NOT 
ESTABLISHED (0,29,1,2)
2014-04-12 06:22:00.698: [  OCRMSG][468792896]GIPC error [29] msg 
[gipcretConnectionRefused]
2014-04-12 06:22:00.698: [  OCRMSG][468792896]prom_connect: error while waiting 
for connection complete [24]
[root@collabn1 client]#

[root@collabn1 crsd]# ls -l /dev/asm*
brw-rw---- 1 oracle dba 8, 17 Apr 12 06:33 /dev/asm-dsk1
brw-rw---- 1 oracle dba 8, 33 Apr 12 06:23 /dev/asm-dsk2
brw-rw---- 1 oracle dba 8, 49 Apr 12 06:23 /dev/asm-dsk3
brw-rw---- 1 oracle dba 8, 65 Apr 12 06:23 /dev/asm-dsk4

/dev/asm:
total 0
[root@collabn1 crsd]#
[root@collabn1 cssd]# service network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down interface eth1:                              [  OK  ]
Shutting down interface eth2:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface eth1:                                [  OK  ]
Bringing up interface eth2:
Determining IP information for eth2... done.
                                                           [  OK  ]
[root@collabn1 cssd]#

[root@collabn1 client]# ping collabn2
PING collabn2.racattack (192.168.78.52) 56(84) bytes of data.
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=1 ttl=64 time=0.773 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=2 ttl=64 time=0.555 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=3 ttl=64 time=0.417 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=4 ttl=64 time=0.506 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=5 ttl=64 time=0.516 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=6 ttl=64 time=0.584 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=7 ttl=64 time=0.716 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=8 ttl=64 time=0.573 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=9 ttl=64 time=0.555 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=10 ttl=64 time=0.401 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=11 ttl=64 time=0.393 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=12 ttl=64 time=0.437 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=13 ttl=64 time=0.494 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=14 ttl=64 time=0.527 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=15 ttl=64 time=0.419 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=16 ttl=64 time=0.422 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=17 ttl=64 time=0.405 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=18 ttl=64 time=0.593 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=19 ttl=64 time=0.636 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=20 ttl=64 time=0.600 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=21 ttl=64 time=0.754 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=22 ttl=64 time=0.602 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=23 ttl=64 time=0.654 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=24 ttl=64 time=0.460 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=25 ttl=64 time=0.499 
ms
64 bytes from collabn2.racattack (192.168.78.52): icmp_seq=26 ttl=64 time=0.756 
ms
^C
--- collabn2.racattack ping statistics ---
26 packets transmitted, 26 received, 0% packet loss, time 25544ms
rtt min/avg/max/mdev = 0.393/0.547/0.773/0.118 ms
[root@collabn1 client]#

Other related posts: