[Linux-cluster] needs helps GFS2 on 5 nodes cluster
Cao, Vinh
vinh.cao at hp.com
Wed Jan 7 22:32:46 UTC 2015
Hi Digimer,
Yes, I just did. Looks like they are failing. I'm not sure why that is.
Please see the attachment for all servers log.
By the way, I do appreciated all the helps I can get.
Vinh
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 4:33 PM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
On 07/01/15 04:29 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Here is from the logs:
> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service.
> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines.
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>
> Then it die at:
> Starting cman... [ OK ]
> Waiting for quorum... Timed-out waiting for cluster
> [FAILED]
>
> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
> I did have any disk quorum setup in cluster.conf file.
>
> Any helps can I get appreciated.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:59 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>> Hello Digimer,
>>
>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>
>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>> <clusternodes>
>> <clusternode name="ustlvcmsp1954" nodeid="1"/>
>> <clusternode name="ustlvcmsp1955" nodeid="2"/>
>> <clusternode name="ustlvcmsp1956" nodeid="3"/>
>> <clusternode name="ustlvcmsp1957" nodeid="4"/>
>> <clusternode name="ustlvcmsp1958" nodeid="5"/>
>> </clusternodes>
>
> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>
>> <fencedevices>
>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>> <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>> </fencedevices>
>> </cluster>
>>
>> clustat show:
>>
>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member
>> Status: Quorate
>>
>> Member Name ID Status
>> ------ ---- ---- ------
>> ustlvcmsp1954 1 Offline
>> ustlvcmsp1955 2 Online, Local
>> ustlvcmsp1956 3 Online
>> ustlvcmsp1957 4 Offline
>> ustlvcmsp1958 5 Online
>>
>> I need to make them all online, so I can use fencing for mounting shared disk.
>>
>> Thanks,
>> Vinh
>
> What about the log entries from the start-up? Did you try the post_join_delay config?
>
>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:16 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>
>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>
>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>
>> digimer
>>
>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>> Hello Cluster guru,
>>>
>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two
>>> nodes I don't have any issue.
>>>
>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>>> other two off line.
>>>
>>> When I start the one that are off line. Service cman start. I got:
>>>
>>> [root at ustlvcmspxxx ~]# service cman status
>>>
>>> corosync is stopped
>>>
>>> [root at ustlvcmsp1954 ~]# service cman start
>>>
>>> Starting cluster:
>>>
>>> Checking if cluster has been disabled at boot... [ OK ]
>>>
>>> Checking Network Manager... [ OK ]
>>>
>>> Global setup... [ OK ]
>>>
>>> Loading kernel modules... [ OK ]
>>>
>>> Mounting configfs... [ OK ]
>>>
>>> Starting cman... [ OK ]
>>>
>>> Waiting for quorum... Timed-out waiting for cluster
>>>
>>>
>>> [FAILED]
>>>
>>> Stopping cluster:
>>>
>>> Leaving fence domain... [ OK ]
>>>
>>> Stopping gfs_controld... [ OK ]
>>>
>>> Stopping dlm_controld... [ OK ]
>>>
>>> Stopping fenced... [ OK ]
>>>
>>> Stopping cman... [ OK ]
>>>
>>> Waiting for corosync to shutdown: [ OK ]
>>>
>>> Unloading kernel modules... [ OK ]
>>>
>>> Unmounting configfs... [ OK ]
>>>
>>> Can you help?
>>>
>>> Thank you,
>>>
>>> Vinh
>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 5_nodes_cluster_fails.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150107/0edb510b/attachment.txt>
More information about the Linux-cluster
mailing list