[racattack] Re: How do I begin to troubleshoot on a 'down' Rac node: collabn1?

From: Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>
To: Racattack <racattack@xxxxxxxxxxxxx>
Date: Sun, 13 Apr 2014 17:03:24 -0400

Hi Everyone,
I think whatever happened to collabn1 that brought down ASM and the DB:
RAC1 on node collabn1, it caused the fail-over of the server from collabn1
to collabn2, because if *collabn1-vip*, *network* and *ONS* daemon, all
these are enabled, but running on *collabn*2 and everything on collabn2
looks normal and running. I think collabn1 has been evicted and has been
orderly and systematically taken offline by CRS or HA service or whatever
Oracle SW/sevices that manage this failover process*.* Please see some of
the output below and a more detailed on my attachment.
If this is the case, how do we bring collabn1 back on line and add it back
to the cluster?
Hanh,

*[root@collabn2 ~]# crsctl start resource ora.crsd -n collabn1 -init*
*CRS-2546: Server 'collabn1' is not online*
*CRS-4000: Command Start failed, or completed with errors.*
*[root@collabn2 ~]# ^C*
*[root@collabn2 ~]# srvctl status nodeapps*
*VIP collabn1-vip.racattack is enabled*
*VIP collabn1-vip.racattack is running on node: collabn2*
*VIP collabn2-vip.racattack is enabled*
*VIP collabn2-vip.racattack is running on node: collabn2*
*Network is enabled*
*Network is not running on node: collabn1*
*Network is running on node: collabn2*
*ONS is enabled*
*ONS daemon is not running on node: collabn1*
*ONS daemon is running on node: collabn2*
*[root@collabn2 ~]# *
*[root@collabn2 ~]# crsctl start resource ora.rac.db -n collabn1 -init*
*CRS-2546: Server 'collabn1' is not online*
*CRS-4000: Command Start failed, or completed with errors.*
*[root@collabn2 ~]#*




On Sat, Apr 12, 2014 at 2:27 PM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>wrote:

> Hi Maaz,
> Thanks you for your reply. But I wasn't able to reply to you right away
> because I was away from my PC this morning. I should point out that beside
> RAC1 instance being down for awhile, +ASM1 also down on collabn1. So here
> is the oracle processes currently running on collabn1:
>
> login as: root
> root@192.168.78.51's password:
> Last login: Sat Apr 12 08:36:35 2014 from 192.168.78.1
> [root@collabn1 ~]#* ps -ef|grep oracle*
> oracle    2058     1  0 12:00 ?        00:00:05
> /u01/app/12.1.0/grid/bin/oraagent.bin
> oracle    2071     1  0 12:00 ?        00:00:45
> /u01/app/12.1.0/grid/bin/evmd.bin
> oracle    2073     1  0 12:00 ?        00:00:00
> /u01/app/12.1.0/grid/bin/mdnsd.bin
> oracle    2086     1  0 12:00 ?        00:00:07
> /u01/app/12.1.0/grid/bin/gpnpd.bin
> oracle    2111  2071  0 12:00 ?        00:00:00
> /u01/app/12.1.0/grid/bin/evmlogger.bin -o
> /u01/app/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l
> /u01/app/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.log
> oracle    2117     1  3 12:00 ?        00:04:46
> /u01/app/12.1.0/grid/bin/gipcd.bin
> root      4723  4696  0 14:22 pts/1    00:00:00 grep oracle
> [root@collabn1 ~]#
>
> [root@collabn1 ~]# *ps -ef|grep smon*
> root      4725  4696  0 14:22 pts/1    00:00:00 grep smon
> [root@collabn1 ~]#
>
>
>
> On Sat, Apr 12, 2014 at 10:54 AM, Maaz Anjum <maazanjum@xxxxxxxxx> wrote:
>
>> Hanh,
>>
>> It's great to see your enthusiasm and fervor to learn RAC using the
>> RACAttack platform - we are more than happy to help out with
>> troubleshooting.
>>
>> Are there any oracle user processes running on collabn1 at the moment? ps
>> -ef | grep oracle
>>
>> Maaz
>>
>>
>> On Sat, Apr 12, 2014 at 7:53 AM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>wrote:
>>
>>> Hi Erik, and everyone,
>>> That's true, a shared folder. Why couldn't I think of that. I thought
>>> perhaps I didn't setup any network between the two PCs, so they act
>>> independently as 2 separate PCs...
>>> Anyway, it's quite a challenge, for a RAC newbie like me, :-), to begin
>>> to learn RAC by starting his 1st troubleshoot on the current issue that 1
>>> of my 2-node RAC, on my desktop RAC, is currently DOWN. But if I have some
>>> help from you and others in the RAC Attack community, perhaps I can learn a
>>> lots from this challenge. So here are the diagnosis:
>>>
>>> *collabn1*: is DOWN (see attached notes)
>>>      [root@collabn1 client]# crsctl stat res -t
>>>      CRS-4535: Cannot communicate with Cluster Ready Services
>>>      CRS-4000: Command Status failed, or completed with errors.
>>>      [root@collabn1 client]#
>>>
>>>  So I checked the log file generated today under Grid Home dir:
>>> */u01/app/12.1.0/grid/log/collabn1/client*
>>>      [root@collabn1 client]# cat crsctl_967.log
>>>      Oracle Database 12c Clusterware Release 12.1.0.1.0 - Production
>>>       Copyright 1996, 2013
>>>
>>>      Oracle. All rights reserved.
>>>       2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_waitconnect:
>>>                                                         CONN NOT
>>> ESTABLISHED (0,29,1,2)
>>>       2014-04-12 06:21:50.395: [  OCRMSG][2528155200]GIPC error [29] msg
>>>
>>> [gipcretConnectionRefused]
>>>       2014-04-12 06:21:50.395: [  OCRMSG][2528155200]prom_connect:
>>>                                      error while waiting for connection
>>> complete [24]
>>> I ran the "ls -l /dev/asm*" and the output directories look a little bit
>>> different on collabn1 compare with collabn2. collabn1 does NOT show the
>>> file: "shared-45" as collabn2 did. Is this something to consider?
>>> I restart the network on collab1 and the output looks good; collabn1 can
>>> ping collabn2.
>>> =========================================================
>>> collabn2: is UP and running fine (see attached notes)
>>> "crsctl stat res -t" command shows everything is fine on this node.
>>> =========================================================
>>>
>>> So, how should I begin the troubleshooting the 'down' Rac node:
>>> collabn1? Is this possible that because collabn1 was down, it has evicted
>>> itself and now collabn2 become the main node?
>>> For more infor, please see attached notes.
>>> Hanh,
>>>
>>>
>>>
>>> On Sat, Apr 12, 2014 at 3:15 AM, Erik Benner <erik@xxxxxxxxx> wrote:
>>>
>>>> In windows you can also share the folder via the network, and copy it
>>>> that way.
>>>>
>>>>
>>>>
>>>> Sent from my iPad Air
>>>>
>>>> On Apr 11, 2014, at 3:14 AM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>
>>>> wrote:
>>>>
>>>> Hi Erik,
>>>> The problem has been resolved with "copy > 4Gb file to an USB external
>>>> drive". I found a way to convert the USB external drive from FAT32
>>>> (even though it say it was NTFS, it wasn't) to NTFS format.
>>>> I'm inporting these export VM machine files into my laptop as we speak.
>>>> Thanks, again, Erik, for your help.
>>>> Hanh,
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 12:43 AM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx
>>>> > wrote:
>>>>
>>>>> Hi Erik,
>>>>> The export file for both VM machine is 15Gb. So I'm attempt to export
>>>>> one VM Machine at a time.
>>>>> But it still running, but I expect it also a BIG file for just 1 VM
>>>>> Machine. I estimate it must be around 7-8Gb.
>>>>> How do one copy a > 4Gb file from 1 PC to another?
>>>>> Hanh,
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2014 at 11:54 PM, Erik Benner <erik@xxxxxxxxx> wrote:
>>>>>
>>>>>> If you get  stuck please let me know.
>>>>>>
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from my Galaxy S®III
>>>>>>
>>>>>>
>>>>>> -------- Original message --------
>>>>>> From: Hanh Nguyen
>>>>>> Date:04/10/2014 7:49 PM (GMT-08:00)
>>>>>> To: Racattack
>>>>>> Subject: [racattack] Re: Cloning my successful VMs Oracle12c Rac
>>>>>> installation from 1 PC to another?
>>>>>>
>>>>>> Thank you, Erik. That would be wonderful because I'd have a working
>>>>>> Oracle12C RAC to play with,:-). Thanks, again.
>>>>>> Hanh,.
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 10, 2014 at 10:30 PM, Erik Benner <erik@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Yes, you should be able to export the VMs, then import them into
>>>>>>> your other machine.
>>>>>>>
>>>>>>> Erik
>>>>>>>
>>>>>>>
>>>>>>> Sent from my iPad Air
>>>>>>>
>>>>>>> > On Apr 10, 2014, at 7:28 PM, Hanh Nguyen <hanhnguyen2100@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi everyone,
>>>>>>> > Using the Rac Attack instructions from the Wiki-book on the web,
>>>>>>> I've been successfully installed Oracle RAC12c 2-nodes VM version 
>>>>>>> 4.3.10 on
>>>>>>> my MS-Windows 32-bit Vista desktop PC with 6GB. But I am having great
>>>>>>> difficulty to duplicate this success to install the same version of
>>>>>>> Oracle12C RAC and VM on my higher RAM Windows 7 64-bit labtop which is 
>>>>>>> more
>>>>>>> portable and easy to carry around.
>>>>>>> > So I wonder, can I clone my successful VMs installation from my
>>>>>>> Desktop PC to my laptop computer?
>>>>>>> > Hanh,
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> A life yet to be lived...
>>
>
>

[root@collabn2 ~]# crsctl status resource -t|more
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details

--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.SHARED.advm
               ONLINE  ONLINE       collabn2                 Volume device /dev/
                                                             asm/shared-45 is 
online,STABLE
ora.DATA.dg
               ONLINE  ONLINE       collabn2                 STABLE
ora.FRA.dg
               ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       collabn2                 STABLE
ora.asm
               ONLINE  ONLINE       collabn2                 Started,STABLE
ora.data.shared.acfs
               ONLINE  ONLINE       collabn2                 mounted on /shared,
S
                                                             TABLE
ora.net1.network
               ONLINE  ONLINE       collabn2                 STABLE
ora.ons
               ONLINE  ONLINE       collabn2                 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.collabn1.vip
      1        ONLINE  INTERMEDIATE collabn2                 FAILED OVER,STABLE
ora.collabn2.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.cvu
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.rac.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       collabn2                 Open,STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       collabn2                 STABLE
--------------------------------------------------------------------------------
[root@collabn2 ~]# crsctl start resource ora.crsd -n collabn1 -init
CRS-2546: Server 'collabn1' is not online
CRS-4000: Command Start failed, or completed with errors.
[root@collabn2 ~]# ^C
[root@collabn2 ~]# srvctl status nodeapps
VIP collabn1-vip.racattack is enabled
VIP collabn1-vip.racattack is running on node: collabn2
VIP collabn2-vip.racattack is enabled
VIP collabn2-vip.racattack is running on node: collabn2
Network is enabled
Network is not running on node: collabn1
Network is running on node: collabn2
ONS is enabled
ONS daemon is not running on node: collabn1
ONS daemon is running on node: collabn2
[root@collabn2 ~]# ^C
[root@collabn2 ~]# crsctl start resource ora.rac.db -n collabn1 -init
CRS-2546: Server 'collabn1' is not online
CRS-4000: Command Start failed, or completed with errors.
[root@collabn2 ~]#

References:
- [racattack] How do I begin to troubleshoot on a 'down' Rac node: collabn1?
  - From: Hanh Nguyen
- [racattack] Re: How do I begin to troubleshoot on a 'down' Rac node: collabn1?
  - From: Maaz Anjum
- [racattack] Re: How do I begin to troubleshoot on a 'down' Rac node: collabn1?
  - From: Hanh Nguyen

[racattack] Re: How do I begin to troubleshoot on a 'down' Rac node: collabn1?

Other related posts: