From mgrac at redhat.com  Wed Jan  7 14:13:41 2015
From: mgrac at redhat.com (Marek "marx" Grac)
Date: Wed, 07 Jan 2015 15:13:41 +0100
Subject: [Linux-cluster] fence-agents-4.0.14 stable release
Message-ID: <54AD3F15.9050209@redhat.com>

Welcome to the fence-agents 4.0.14 release

This release includes some new features and several bugfixes:

* fence_zvmip for IBM z/VM is rewritten to Python
* new fence agent for Emerson devices

* fix invalid default ports for fence_eps and fence_amt
* properly escape XML in other fields of metadata
* a lot of refactoring and cleaning

The new source tarball can be downloaded here:

https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.14.tar.xz 


To report bugs or issues:

https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

    Join us on IRC (irc.freenode.net #linux-cluster) and share your
    experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

m,


From vinh.cao at hp.com  Wed Jan  7 20:10:33 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Wed, 7 Jan 2015 20:10:33 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
Message-ID: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>

Hello Cluster guru,


I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes I don't have any issue.

But with 5 nodes, when I ran clustat I got 3 nodes online and the other two off line.

When I start the one that are off line. Service cman start. I got:


[root at ustlvcmspxxx ~]# service cman status

corosync is stopped

[root at ustlvcmsp1954 ~]# service cman start

Starting cluster:

   Checking if cluster has been disabled at boot...        [  OK  ]

   Checking Network Manager...                             [  OK  ]

   Global setup...                                         [  OK  ]

   Loading kernel modules...                               [  OK  ]

   Mounting configfs...                                    [  OK  ]

   Starting cman...                                        [  OK  ]

   Waiting for quorum... Timed-out waiting for cluster

                                                           [FAILED]

Stopping cluster:

   Leaving fence domain...                                 [  OK  ]

   Stopping gfs_controld...                                [  OK  ]

   Stopping dlm_controld...                                [  OK  ]

   Stopping fenced...                                      [  OK  ]

   Stopping cman...                                        [  OK  ]

   Waiting for corosync to shutdown:                       [  OK  ]

   Unloading kernel modules...                             [  OK  ]

   Unmounting configfs...                                  [  OK  ]


Can you help?


Thank you,

Vinh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150107/ee3aaa0a/attachment.htm>

From lists at alteeve.ca  Wed Jan  7 20:16:28 2015
From: lists at alteeve.ca (Digimer)
Date: Wed, 07 Jan 2015 15:16:28 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
Message-ID: <54AD941C.4070205@alteeve.ca>

My first though would be to set <fence_daemon post_join_delay="30" /> in 
cluster.conf.

If that doesn't work, please share your configuration file. Then, with 
all nodes offline, open a terminal to each node and run 'tail -f -n 0 
/var/log/messages'. With that running, start all the nodes and wait for 
things to settle down, then paste the five nodes' output as well.

Also, 6.4 is pretty old, why not upgrade to 6.6?

digimer

On 07/01/15 03:10 PM, Cao, Vinh wrote:
> Hello Cluster guru,
>
> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes I
> don?t have any issue.
>
> But with 5 nodes, when I ran clustat I got 3 nodes online and the other
> two off line.
>
> When I start the one that are off line. Service cman start. I got:
>
> [root at ustlvcmspxxx ~]# service cman status
>
> corosync is stopped
>
> [root at ustlvcmsp1954 ~]# service cman start
>
> Starting cluster:
>
>     Checking if cluster has been disabled at boot...        [  OK  ]
>
>     Checking Network Manager...                             [  OK  ]
>
>     Global setup...                                         [  OK  ]
>
>     Loading kernel modules...                               [  OK  ]
>
>     Mounting configfs...                                    [  OK  ]
>
>     Starting cman...                                        [  OK  ]
>
> Waiting for quorum... Timed-out waiting for cluster
>
>                                                             [FAILED]
>
> Stopping cluster:
>
>     Leaving fence domain...                                 [  OK  ]
>
>     Stopping gfs_controld...                                [  OK  ]
>
>     Stopping dlm_controld...                                [  OK  ]
>
>     Stopping fenced...                                      [  OK  ]
>
>     Stopping cman...                                        [  OK  ]
>
>     Waiting for corosync to shutdown:                       [  OK  ]
>
>     Unloading kernel modules...                             [  OK  ]
>
>     Unmounting configfs...                                  [  OK  ]
>
> Can you help?
>
> Thank you,
>
> Vinh
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From vinh.cao at hp.com  Wed Jan  7 20:39:22 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Wed, 7 Jan 2015 20:39:22 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54AD941C.4070205@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>

Hello Digimer,

Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.

Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf 
<?xml version="1.0"?>
<cluster config_version="15" name="p1954_to_p1958">
        <clusternodes>
                <clusternode name="ustlvcmsp1954" nodeid="1"/>
                <clusternode name="ustlvcmsp1955" nodeid="2"/>
                <clusternode name="ustlvcmsp1956" nodeid="3"/>
                <clusternode name="ustlvcmsp1957" nodeid="4"/>
                <clusternode name="ustlvcmsp1958" nodeid="5"/>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
                <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
                <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
                <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
                <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
        </fencedevices>
</cluster>

clustat show:

Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 ustlvcmsp1954                                                       1 Offline
 ustlvcmsp1955                                                       2 Online, Local
 ustlvcmsp1956                                                       3 Online
 ustlvcmsp1957                                                       4 Offline
 ustlvcmsp1958                                                       5 Online

I need to make them all online, so I can use fencing for mounting shared disk.

Thanks,
Vinh 
-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:16 PM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.

If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.

Also, 6.4 is pretty old, why not upgrade to 6.6?

digimer

On 07/01/15 03:10 PM, Cao, Vinh wrote:
> Hello Cluster guru,
>
> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes 
> I don't have any issue.
>
> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
> other two off line.
>
> When I start the one that are off line. Service cman start. I got:
>
> [root at ustlvcmspxxx ~]# service cman status
>
> corosync is stopped
>
> [root at ustlvcmsp1954 ~]# service cman start
>
> Starting cluster:
>
>     Checking if cluster has been disabled at boot...        [  OK  ]
>
>     Checking Network Manager...                             [  OK  ]
>
>     Global setup...                                         [  OK  ]
>
>     Loading kernel modules...                               [  OK  ]
>
>     Mounting configfs...                                    [  OK  ]
>
>     Starting cman...                                        [  OK  ]
>
> Waiting for quorum... Timed-out waiting for cluster
>
>                                                             [FAILED]
>
> Stopping cluster:
>
>     Leaving fence domain...                                 [  OK  ]
>
>     Stopping gfs_controld...                                [  OK  ]
>
>     Stopping dlm_controld...                                [  OK  ]
>
>     Stopping fenced...                                      [  OK  ]
>
>     Stopping cman...                                        [  OK  ]
>
>     Waiting for corosync to shutdown:                       [  OK  ]
>
>     Unloading kernel modules...                             [  OK  ]
>
>     Unmounting configfs...                                  [  OK  ]
>
> Can you help?
>
> Thank you,
>
> Vinh
>
>
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Wed Jan  7 20:58:45 2015
From: lists at alteeve.ca (Digimer)
Date: Wed, 07 Jan 2015 15:58:45 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
Message-ID: <54AD9E05.2030902@alteeve.ca>

On 07/01/15 03:39 PM, Cao, Vinh wrote:
> Hello Digimer,
>
> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>
> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="15" name="p1954_to_p1958">
>          <clusternodes>
>                  <clusternode name="ustlvcmsp1954" nodeid="1"/>
>                  <clusternode name="ustlvcmsp1955" nodeid="2"/>
>                  <clusternode name="ustlvcmsp1956" nodeid="3"/>
>                  <clusternode name="ustlvcmsp1957" nodeid="4"/>
>                  <clusternode name="ustlvcmsp1958" nodeid="5"/>
>          </clusternodes>

You don't configure the fencing for the nodes... If anything causes a 
fence, the cluster will lock up (by design).

>          <fencedevices>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>          </fencedevices>
> </cluster>
>
> clustat show:
>
> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015
> Member Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   ustlvcmsp1954                                                       1 Offline
>   ustlvcmsp1955                                                       2 Online, Local
>   ustlvcmsp1956                                                       3 Online
>   ustlvcmsp1957                                                       4 Offline
>   ustlvcmsp1958                                                       5 Online
>
> I need to make them all online, so I can use fencing for mounting shared disk.
>
> Thanks,
> Vinh

What about the log entries from the start-up? Did you try the 
post_join_delay config?


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:16 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>
> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>
> Also, 6.4 is pretty old, why not upgrade to 6.6?
>
> digimer
>
> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>> Hello Cluster guru,
>>
>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes
>> I don't have any issue.
>>
>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>> other two off line.
>>
>> When I start the one that are off line. Service cman start. I got:
>>
>> [root at ustlvcmspxxx ~]# service cman status
>>
>> corosync is stopped
>>
>> [root at ustlvcmsp1954 ~]# service cman start
>>
>> Starting cluster:
>>
>>      Checking if cluster has been disabled at boot...        [  OK  ]
>>
>>      Checking Network Manager...                             [  OK  ]
>>
>>      Global setup...                                         [  OK  ]
>>
>>      Loading kernel modules...                               [  OK  ]
>>
>>      Mounting configfs...                                    [  OK  ]
>>
>>      Starting cman...                                        [  OK  ]
>>
>> Waiting for quorum... Timed-out waiting for cluster
>>
>>                                                              [FAILED]
>>
>> Stopping cluster:
>>
>>      Leaving fence domain...                                 [  OK  ]
>>
>>      Stopping gfs_controld...                                [  OK  ]
>>
>>      Stopping dlm_controld...                                [  OK  ]
>>
>>      Stopping fenced...                                      [  OK  ]
>>
>>      Stopping cman...                                        [  OK  ]
>>
>>      Waiting for corosync to shutdown:                       [  OK  ]
>>
>>      Unloading kernel modules...                             [  OK  ]
>>
>>      Unmounting configfs...                                  [  OK  ]
>>
>> Can you help?
>>
>> Thank you,
>>
>> Vinh
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From vinh.cao at hp.com  Wed Jan  7 21:29:14 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Wed, 7 Jan 2015 21:29:14 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54AD9E05.2030902@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
	<54AD9E05.2030902@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>

Hi Digimer,

Here is from the logs:
[root at ustlvcmsp1954 ~]# tail -f /var/log/messages
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed

Then it die at:
 Starting cman...                                        [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
                                                           [FAILED]

Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
I did have any disk quorum setup in cluster.conf file.

Any helps can I get appreciated.

Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 3:59 PM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

On 07/01/15 03:39 PM, Cao, Vinh wrote:
> Hello Digimer,
>
> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>
> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>          <clusternodes>
>                  <clusternode name="ustlvcmsp1954" nodeid="1"/>
>                  <clusternode name="ustlvcmsp1955" nodeid="2"/>
>                  <clusternode name="ustlvcmsp1956" nodeid="3"/>
>                  <clusternode name="ustlvcmsp1957" nodeid="4"/>
>                  <clusternode name="ustlvcmsp1958" nodeid="5"/>
>          </clusternodes>

You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).

>          <fencedevices>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>                  <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>          </fencedevices>
> </cluster>
>
> clustat show:
>
> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member 
> Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   ustlvcmsp1954                                                       1 Offline
>   ustlvcmsp1955                                                       2 Online, Local
>   ustlvcmsp1956                                                       3 Online
>   ustlvcmsp1957                                                       4 Offline
>   ustlvcmsp1958                                                       5 Online
>
> I need to make them all online, so I can use fencing for mounting shared disk.
>
> Thanks,
> Vinh

What about the log entries from the start-up? Did you try the post_join_delay config?


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:16 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>
> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>
> Also, 6.4 is pretty old, why not upgrade to 6.6?
>
> digimer
>
> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>> Hello Cluster guru,
>>
>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes
>> I don't have any issue.
>>
>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>> other two off line.
>>
>> When I start the one that are off line. Service cman start. I got:
>>
>> [root at ustlvcmspxxx ~]# service cman status
>>
>> corosync is stopped
>>
>> [root at ustlvcmsp1954 ~]# service cman start
>>
>> Starting cluster:
>>
>>      Checking if cluster has been disabled at boot...        [  OK  ]
>>
>>      Checking Network Manager...                             [  OK  ]
>>
>>      Global setup...                                         [  OK  ]
>>
>>      Loading kernel modules...                               [  OK  ]
>>
>>      Mounting configfs...                                    [  OK  ]
>>
>>      Starting cman...                                        [  OK  ]
>>
>> Waiting for quorum... Timed-out waiting for cluster
>>
>>                                                              [FAILED]
>>
>> Stopping cluster:
>>
>>      Leaving fence domain...                                 [  OK  ]
>>
>>      Stopping gfs_controld...                                [  OK  ]
>>
>>      Stopping dlm_controld...                                [  OK  ]
>>
>>      Stopping fenced...                                      [  OK  ]
>>
>>      Stopping cman...                                        [  OK  ]
>>
>>      Waiting for corosync to shutdown:                       [  OK  ]
>>
>>      Unloading kernel modules...                             [  OK  ]
>>
>>      Unmounting configfs...                                  [  OK  ]
>>
>> Can you help?
>>
>> Thank you,
>>
>> Vinh
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Wed Jan  7 21:33:16 2015
From: lists at alteeve.ca (Digimer)
Date: Wed, 07 Jan 2015 16:33:16 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>	<54AD9E05.2030902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
Message-ID: <54ADA61C.2020509@alteeve.ca>

Quorum is enabled by default. I need to see the entire logs from all 
five nodes, as I mentioned in the first email. Please disable cman from 
starting on boot, configure fencing properly and then reboot all nodes 
cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, 
then in another window, start cman on all five nodes. When things settle 
down, copy/paste all the log output please.

On 07/01/15 04:29 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Here is from the logs:
> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>
> Then it die at:
>   Starting cman...                                        [  OK  ]
>     Waiting for quorum... Timed-out waiting for cluster
>                                                             [FAILED]
>
> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
> I did have any disk quorum setup in cluster.conf file.
>
> Any helps can I get appreciated.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:59 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>> Hello Digimer,
>>
>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>
>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>           <clusternodes>
>>                   <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>                   <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>                   <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>                   <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>                   <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>           </clusternodes>
>
> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>
>>           <fencedevices>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>           </fencedevices>
>> </cluster>
>>
>> clustat show:
>>
>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>> Status: Quorate
>>
>>    Member Name                                                     ID   Status
>>    ------ ----                                                     ---- ------
>>    ustlvcmsp1954                                                       1 Offline
>>    ustlvcmsp1955                                                       2 Online, Local
>>    ustlvcmsp1956                                                       3 Online
>>    ustlvcmsp1957                                                       4 Offline
>>    ustlvcmsp1958                                                       5 Online
>>
>> I need to make them all online, so I can use fencing for mounting shared disk.
>>
>> Thanks,
>> Vinh
>
> What about the log entries from the start-up? Did you try the post_join_delay config?
>
>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:16 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>
>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>
>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>
>> digimer
>>
>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>> Hello Cluster guru,
>>>
>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes
>>> I don't have any issue.
>>>
>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>>> other two off line.
>>>
>>> When I start the one that are off line. Service cman start. I got:
>>>
>>> [root at ustlvcmspxxx ~]# service cman status
>>>
>>> corosync is stopped
>>>
>>> [root at ustlvcmsp1954 ~]# service cman start
>>>
>>> Starting cluster:
>>>
>>>       Checking if cluster has been disabled at boot...        [  OK  ]
>>>
>>>       Checking Network Manager...                             [  OK  ]
>>>
>>>       Global setup...                                         [  OK  ]
>>>
>>>       Loading kernel modules...                               [  OK  ]
>>>
>>>       Mounting configfs...                                    [  OK  ]
>>>
>>>       Starting cman...                                        [  OK  ]
>>>
>>> Waiting for quorum... Timed-out waiting for cluster
>>>
>>>                                                               [FAILED]
>>>
>>> Stopping cluster:
>>>
>>>       Leaving fence domain...                                 [  OK  ]
>>>
>>>       Stopping gfs_controld...                                [  OK  ]
>>>
>>>       Stopping dlm_controld...                                [  OK  ]
>>>
>>>       Stopping fenced...                                      [  OK  ]
>>>
>>>       Stopping cman...                                        [  OK  ]
>>>
>>>       Waiting for corosync to shutdown:                       [  OK  ]
>>>
>>>       Unloading kernel modules...                             [  OK  ]
>>>
>>>       Unmounting configfs...                                  [  OK  ]
>>>
>>> Can you help?
>>>
>>> Thank you,
>>>
>>> Vinh
>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From vinh.cao at hp.com  Wed Jan  7 22:32:46 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Wed, 7 Jan 2015 22:32:46 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54ADA61C.2020509@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
	<54AD9E05.2030902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
	<54ADA61C.2020509@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>

Hi Digimer,

Yes, I just did. Looks like they are failing. I'm not sure why that is.
Please see the attachment for all servers log.

By the way, I do appreciated all the helps I can get.

Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 4:33 PM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.

On 07/01/15 04:29 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Here is from the logs:
> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>
> Then it die at:
>   Starting cman...                                        [  OK  ]
>     Waiting for quorum... Timed-out waiting for cluster
>                                                             [FAILED]
>
> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
> I did have any disk quorum setup in cluster.conf file.
>
> Any helps can I get appreciated.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 3:59 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>> Hello Digimer,
>>
>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>
>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>           <clusternodes>
>>                   <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>                   <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>                   <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>                   <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>                   <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>           </clusternodes>
>
> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>
>>           <fencedevices>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>                   <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>           </fencedevices>
>> </cluster>
>>
>> clustat show:
>>
>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>> Status: Quorate
>>
>>    Member Name                                                     ID   Status
>>    ------ ----                                                     ---- ------
>>    ustlvcmsp1954                                                       1 Offline
>>    ustlvcmsp1955                                                       2 Online, Local
>>    ustlvcmsp1956                                                       3 Online
>>    ustlvcmsp1957                                                       4 Offline
>>    ustlvcmsp1958                                                       5 Online
>>
>> I need to make them all online, so I can use fencing for mounting shared disk.
>>
>> Thanks,
>> Vinh
>
> What about the log entries from the start-up? Did you try the post_join_delay config?
>
>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:16 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>
>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>
>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>
>> digimer
>>
>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>> Hello Cluster guru,
>>>
>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two 
>>> nodes I don't have any issue.
>>>
>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
>>> other two off line.
>>>
>>> When I start the one that are off line. Service cman start. I got:
>>>
>>> [root at ustlvcmspxxx ~]# service cman status
>>>
>>> corosync is stopped
>>>
>>> [root at ustlvcmsp1954 ~]# service cman start
>>>
>>> Starting cluster:
>>>
>>>       Checking if cluster has been disabled at boot...        [  OK  ]
>>>
>>>       Checking Network Manager...                             [  OK  ]
>>>
>>>       Global setup...                                         [  OK  ]
>>>
>>>       Loading kernel modules...                               [  OK  ]
>>>
>>>       Mounting configfs...                                    [  OK  ]
>>>
>>>       Starting cman...                                        [  OK  ]
>>>
>>> Waiting for quorum... Timed-out waiting for cluster
>>>
>>>                                                               
>>> [FAILED]
>>>
>>> Stopping cluster:
>>>
>>>       Leaving fence domain...                                 [  OK  ]
>>>
>>>       Stopping gfs_controld...                                [  OK  ]
>>>
>>>       Stopping dlm_controld...                                [  OK  ]
>>>
>>>       Stopping fenced...                                      [  OK  ]
>>>
>>>       Stopping cman...                                        [  OK  ]
>>>
>>>       Waiting for corosync to shutdown:                       [  OK  ]
>>>
>>>       Unloading kernel modules...                             [  OK  ]
>>>
>>>       Unmounting configfs...                                  [  OK  ]
>>>
>>> Can you help?
>>>
>>> Thank you,
>>>
>>> Vinh
>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 5_nodes_cluster_fails.txt
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150107/0edb510b/attachment.txt>

From lists at alteeve.ca  Wed Jan  7 22:49:14 2015
From: lists at alteeve.ca (Digimer)
Date: Wed, 07 Jan 2015 17:49:14 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>	<54AD9E05.2030902@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>	<54ADA61C.2020509@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
Message-ID: <54ADB7EA.8000300@alteeve.ca>

Did you configure fencing properly?

On 07/01/15 05:32 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Yes, I just did. Looks like they are failing. I'm not sure why that is.
> Please see the attachment for all servers log.
>
> By the way, I do appreciated all the helps I can get.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 4:33 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>
> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Here is from the logs:
>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>
>> Then it die at:
>>    Starting cman...                                        [  OK  ]
>>      Waiting for quorum... Timed-out waiting for cluster
>>                                                              [FAILED]
>>
>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>> I did have any disk quorum setup in cluster.conf file.
>>
>> Any helps can I get appreciated.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:59 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>> Hello Digimer,
>>>
>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>
>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>            <clusternodes>
>>>                    <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>                    <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>                    <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>                    <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>                    <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>            </clusternodes>
>>
>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>
>>>            <fencedevices>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>            </fencedevices>
>>> </cluster>
>>>
>>> clustat show:
>>>
>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>> Status: Quorate
>>>
>>>     Member Name                                                     ID   Status
>>>     ------ ----                                                     ---- ------
>>>     ustlvcmsp1954                                                       1 Offline
>>>     ustlvcmsp1955                                                       2 Online, Local
>>>     ustlvcmsp1956                                                       3 Online
>>>     ustlvcmsp1957                                                       4 Offline
>>>     ustlvcmsp1958                                                       5 Online
>>>
>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>
>>> Thanks,
>>> Vinh
>>
>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>
>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>
>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>
>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>
>>> digimer
>>>
>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>> Hello Cluster guru,
>>>>
>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two
>>>> nodes I don't have any issue.
>>>>
>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>>>> other two off line.
>>>>
>>>> When I start the one that are off line. Service cman start. I got:
>>>>
>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>
>>>> corosync is stopped
>>>>
>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>
>>>> Starting cluster:
>>>>
>>>>        Checking if cluster has been disabled at boot...        [  OK  ]
>>>>
>>>>        Checking Network Manager...                             [  OK  ]
>>>>
>>>>        Global setup...                                         [  OK  ]
>>>>
>>>>        Loading kernel modules...                               [  OK  ]
>>>>
>>>>        Mounting configfs...                                    [  OK  ]
>>>>
>>>>        Starting cman...                                        [  OK  ]
>>>>
>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>
>>>>
>>>> [FAILED]
>>>>
>>>> Stopping cluster:
>>>>
>>>>        Leaving fence domain...                                 [  OK  ]
>>>>
>>>>        Stopping gfs_controld...                                [  OK  ]
>>>>
>>>>        Stopping dlm_controld...                                [  OK  ]
>>>>
>>>>        Stopping fenced...                                      [  OK  ]
>>>>
>>>>        Stopping cman...                                        [  OK  ]
>>>>
>>>>        Waiting for corosync to shutdown:                       [  OK  ]
>>>>
>>>>        Unloading kernel modules...                             [  OK  ]
>>>>
>>>>        Unmounting configfs...                                  [  OK  ]
>>>>
>>>> Can you help?
>>>>
>>>> Thank you,
>>>>
>>>> Vinh
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Wed Jan  7 22:50:33 2015
From: lists at alteeve.ca (Digimer)
Date: Wed, 07 Jan 2015 17:50:33 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>	<54AD9E05.2030902@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>	<54ADA61C.2020509@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
Message-ID: <54ADB839.9000902@alteeve.ca>

It looks like a network problem... Does your (virtual) switch support 
multicast properly and have you opened up the proper ports in the firewall?

On 07/01/15 05:32 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Yes, I just did. Looks like they are failing. I'm not sure why that is.
> Please see the attachment for all servers log.
>
> By the way, I do appreciated all the helps I can get.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 4:33 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>
> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Here is from the logs:
>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>
>> Then it die at:
>>    Starting cman...                                        [  OK  ]
>>      Waiting for quorum... Timed-out waiting for cluster
>>                                                              [FAILED]
>>
>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>> I did have any disk quorum setup in cluster.conf file.
>>
>> Any helps can I get appreciated.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:59 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>> Hello Digimer,
>>>
>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>
>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>            <clusternodes>
>>>                    <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>                    <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>                    <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>                    <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>                    <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>            </clusternodes>
>>
>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>
>>>            <fencedevices>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>            </fencedevices>
>>> </cluster>
>>>
>>> clustat show:
>>>
>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>> Status: Quorate
>>>
>>>     Member Name                                                     ID   Status
>>>     ------ ----                                                     ---- ------
>>>     ustlvcmsp1954                                                       1 Offline
>>>     ustlvcmsp1955                                                       2 Online, Local
>>>     ustlvcmsp1956                                                       3 Online
>>>     ustlvcmsp1957                                                       4 Offline
>>>     ustlvcmsp1958                                                       5 Online
>>>
>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>
>>> Thanks,
>>> Vinh
>>
>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>
>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>
>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>
>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>
>>> digimer
>>>
>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>> Hello Cluster guru,
>>>>
>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two
>>>> nodes I don't have any issue.
>>>>
>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>>>> other two off line.
>>>>
>>>> When I start the one that are off line. Service cman start. I got:
>>>>
>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>
>>>> corosync is stopped
>>>>
>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>
>>>> Starting cluster:
>>>>
>>>>        Checking if cluster has been disabled at boot...        [  OK  ]
>>>>
>>>>        Checking Network Manager...                             [  OK  ]
>>>>
>>>>        Global setup...                                         [  OK  ]
>>>>
>>>>        Loading kernel modules...                               [  OK  ]
>>>>
>>>>        Mounting configfs...                                    [  OK  ]
>>>>
>>>>        Starting cman...                                        [  OK  ]
>>>>
>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>
>>>>
>>>> [FAILED]
>>>>
>>>> Stopping cluster:
>>>>
>>>>        Leaving fence domain...                                 [  OK  ]
>>>>
>>>>        Stopping gfs_controld...                                [  OK  ]
>>>>
>>>>        Stopping dlm_controld...                                [  OK  ]
>>>>
>>>>        Stopping fenced...                                      [  OK  ]
>>>>
>>>>        Stopping cman...                                        [  OK  ]
>>>>
>>>>        Waiting for corosync to shutdown:                       [  OK  ]
>>>>
>>>>        Unloading kernel modules...                             [  OK  ]
>>>>
>>>>        Unmounting configfs...                                  [  OK  ]
>>>>
>>>> Can you help?
>>>>
>>>> Thank you,
>>>>
>>>> Vinh
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From vinh.cao at hp.com  Thu Jan  8 02:48:07 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Thu, 8 Jan 2015 02:48:07 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54ADB839.9000902@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
	<54AD9E05.2030902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
	<54ADA61C.2020509@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
	<54ADB839.9000902@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>

Hi Digimer,

No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
I did try to set broadcast, but somehow it didn't work either.

Let me give broadcast a try again.

Thanks,
Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Wednesday, January 07, 2015 5:51 PM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?

On 07/01/15 05:32 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> Yes, I just did. Looks like they are failing. I'm not sure why that is.
> Please see the attachment for all servers log.
>
> By the way, I do appreciated all the helps I can get.
>
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 4:33 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>
> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Here is from the logs:
>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>
>> Then it die at:
>>    Starting cman...                                        [  OK  ]
>>      Waiting for quorum... Timed-out waiting for cluster
>>                                                              [FAILED]
>>
>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>> I did have any disk quorum setup in cluster.conf file.
>>
>> Any helps can I get appreciated.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 3:59 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>> Hello Digimer,
>>>
>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>
>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>            <clusternodes>
>>>                    <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>                    <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>                    <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>                    <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>                    <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>            </clusternodes>
>>
>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>
>>>            <fencedevices>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>                    <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>            </fencedevices>
>>> </cluster>
>>>
>>> clustat show:
>>>
>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>> Status: Quorate
>>>
>>>     Member Name                                                     ID   Status
>>>     ------ ----                                                     ---- ------
>>>     ustlvcmsp1954                                                       1 Offline
>>>     ustlvcmsp1955                                                       2 Online, Local
>>>     ustlvcmsp1956                                                       3 Online
>>>     ustlvcmsp1957                                                       4 Offline
>>>     ustlvcmsp1958                                                       5 Online
>>>
>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>
>>> Thanks,
>>> Vinh
>>
>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>
>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com 
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>
>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>
>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>
>>> digimer
>>>
>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>> Hello Cluster guru,
>>>>
>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two 
>>>> nodes I don't have any issue.
>>>>
>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
>>>> other two off line.
>>>>
>>>> When I start the one that are off line. Service cman start. I got:
>>>>
>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>
>>>> corosync is stopped
>>>>
>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>
>>>> Starting cluster:
>>>>
>>>>        Checking if cluster has been disabled at boot...        [  OK  ]
>>>>
>>>>        Checking Network Manager...                             [  OK  ]
>>>>
>>>>        Global setup...                                         [  OK  ]
>>>>
>>>>        Loading kernel modules...                               [  OK  ]
>>>>
>>>>        Mounting configfs...                                    [  OK  ]
>>>>
>>>>        Starting cman...                                        [  OK  ]
>>>>
>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>
>>>>
>>>> [FAILED]
>>>>
>>>> Stopping cluster:
>>>>
>>>>        Leaving fence domain...                                 [  OK  ]
>>>>
>>>>        Stopping gfs_controld...                                [  OK  ]
>>>>
>>>>        Stopping dlm_controld...                                [  OK  ]
>>>>
>>>>        Stopping fenced...                                      [  OK  ]
>>>>
>>>>        Stopping cman...                                        [  OK  ]
>>>>
>>>>        Waiting for corosync to shutdown:                       [  OK  ]
>>>>
>>>>        Unloading kernel modules...                             [  OK  ]
>>>>
>>>>        Unmounting configfs...                                  [  OK  ]
>>>>
>>>> Can you help?
>>>>
>>>> Thank you,
>>>>
>>>> Vinh
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Thu Jan  8 07:01:40 2015
From: lists at alteeve.ca (Digimer)
Date: Thu, 08 Jan 2015 02:01:40 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>	<54AD9E05.2030902@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>	<54ADA61C.2020509@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>	<54ADB839.9000902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>
Message-ID: <54AE2B54.4030705@alteeve.ca>

Please configure fencing. If you don't, it _will_ cause you problems.

On 07/01/15 09:48 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
> I did try to set broadcast, but somehow it didn't work either.
>
> Let me give broadcast a try again.
>
> Thanks,
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 5:51 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?
>
> On 07/01/15 05:32 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Yes, I just did. Looks like they are failing. I'm not sure why that is.
>> Please see the attachment for all servers log.
>>
>> By the way, I do appreciated all the helps I can get.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 4:33 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>>
>> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>>> Hi Digimer,
>>>
>>> Here is from the logs:
>>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>>
>>> Then it die at:
>>>     Starting cman...                                        [  OK  ]
>>>       Waiting for quorum... Timed-out waiting for cluster
>>>                                                               [FAILED]
>>>
>>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>>> I did have any disk quorum setup in cluster.conf file.
>>>
>>> Any helps can I get appreciated.
>>>
>>> Vinh
>>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:59 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>>> Hello Digimer,
>>>>
>>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>>
>>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml
>>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>>             <clusternodes>
>>>>                     <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>>                     <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>>                     <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>>                     <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>>                     <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>>             </clusternodes>
>>>
>>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>>
>>>>             <fencedevices>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>>             </fencedevices>
>>>> </cluster>
>>>>
>>>> clustat show:
>>>>
>>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>>> Status: Quorate
>>>>
>>>>      Member Name                                                     ID   Status
>>>>      ------ ----                                                     ---- ------
>>>>      ustlvcmsp1954                                                       1 Offline
>>>>      ustlvcmsp1955                                                       2 Online, Local
>>>>      ustlvcmsp1956                                                       3 Online
>>>>      ustlvcmsp1957                                                       4 Offline
>>>>      ustlvcmsp1958                                                       5 Online
>>>>
>>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>>
>>>> Thanks,
>>>> Vinh
>>>
>>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces at redhat.com
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>>
>>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>>
>>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>>
>>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>>
>>>> digimer
>>>>
>>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>>> Hello Cluster guru,
>>>>>
>>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two
>>>>> nodes I don't have any issue.
>>>>>
>>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the
>>>>> other two off line.
>>>>>
>>>>> When I start the one that are off line. Service cman start. I got:
>>>>>
>>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>>
>>>>> corosync is stopped
>>>>>
>>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>>
>>>>> Starting cluster:
>>>>>
>>>>>         Checking if cluster has been disabled at boot...        [  OK  ]
>>>>>
>>>>>         Checking Network Manager...                             [  OK  ]
>>>>>
>>>>>         Global setup...                                         [  OK  ]
>>>>>
>>>>>         Loading kernel modules...                               [  OK  ]
>>>>>
>>>>>         Mounting configfs...                                    [  OK  ]
>>>>>
>>>>>         Starting cman...                                        [  OK  ]
>>>>>
>>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>>
>>>>>
>>>>> [FAILED]
>>>>>
>>>>> Stopping cluster:
>>>>>
>>>>>         Leaving fence domain...                                 [  OK  ]
>>>>>
>>>>>         Stopping gfs_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping dlm_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping fenced...                                      [  OK  ]
>>>>>
>>>>>         Stopping cman...                                        [  OK  ]
>>>>>
>>>>>         Waiting for corosync to shutdown:                       [  OK  ]
>>>>>
>>>>>         Unloading kernel modules...                             [  OK  ]
>>>>>
>>>>>         Unmounting configfs...                                  [  OK  ]
>>>>>
>>>>> Can you help?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vinh
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From vinh.cao at hp.com  Thu Jan  8 12:17:22 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Thu, 8 Jan 2015 12:17:22 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54AE2B54.4030705@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
	<54AD9E05.2030902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
	<54ADA61C.2020509@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
	<54ADB839.9000902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>
	<54AE2B54.4030705@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044F3CFC@G1W3786.americas.hpqcorp.net>

Hi Digimer,

You are correct. I do need to configure fencing. But before fencing, I need to have these servers become member of cluster first.
If they are not member of cluster set. Doesn't matter I try to configure fencing or not. My cluster won't work.

Thanks for your help.
Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, January 08, 2015 2:02 AM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

Please configure fencing. If you don't, it _will_ cause you problems.

On 07/01/15 09:48 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
> I did try to set broadcast, but somehow it didn't work either.
>
> Let me give broadcast a try again.
>
> Thanks,
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 5:51 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?
>
> On 07/01/15 05:32 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Yes, I just did. Looks like they are failing. I'm not sure why that is.
>> Please see the attachment for all servers log.
>>
>> By the way, I do appreciated all the helps I can get.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 4:33 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>>
>> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>>> Hi Digimer,
>>>
>>> Here is from the logs:
>>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>>
>>> Then it die at:
>>>     Starting cman...                                        [  OK  ]
>>>       Waiting for quorum... Timed-out waiting for cluster
>>>                                                               
>>> [FAILED]
>>>
>>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>>> I did have any disk quorum setup in cluster.conf file.
>>>
>>> Any helps can I get appreciated.
>>>
>>> Vinh
>>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com 
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:59 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>>> Hello Digimer,
>>>>
>>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>>
>>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
>>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>>             <clusternodes>
>>>>                     <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>>                     <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>>                     <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>>                     <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>>                     <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>>             </clusternodes>
>>>
>>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>>
>>>>             <fencedevices>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>>             </fencedevices>
>>>> </cluster>
>>>>
>>>> clustat show:
>>>>
>>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>>> Status: Quorate
>>>>
>>>>      Member Name                                                     ID   Status
>>>>      ------ ----                                                     ---- ------
>>>>      ustlvcmsp1954                                                       1 Offline
>>>>      ustlvcmsp1955                                                       2 Online, Local
>>>>      ustlvcmsp1956                                                       3 Online
>>>>      ustlvcmsp1957                                                       4 Offline
>>>>      ustlvcmsp1958                                                       5 Online
>>>>
>>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>>
>>>> Thanks,
>>>> Vinh
>>>
>>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces at redhat.com 
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>>
>>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>>
>>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>>
>>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>>
>>>> digimer
>>>>
>>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>>> Hello Cluster guru,
>>>>>
>>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two 
>>>>> nodes I don't have any issue.
>>>>>
>>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
>>>>> other two off line.
>>>>>
>>>>> When I start the one that are off line. Service cman start. I got:
>>>>>
>>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>>
>>>>> corosync is stopped
>>>>>
>>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>>
>>>>> Starting cluster:
>>>>>
>>>>>         Checking if cluster has been disabled at boot...        [  OK  ]
>>>>>
>>>>>         Checking Network Manager...                             [  OK  ]
>>>>>
>>>>>         Global setup...                                         [  OK  ]
>>>>>
>>>>>         Loading kernel modules...                               [  OK  ]
>>>>>
>>>>>         Mounting configfs...                                    [  OK  ]
>>>>>
>>>>>         Starting cman...                                        [  OK  ]
>>>>>
>>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>>
>>>>>
>>>>> [FAILED]
>>>>>
>>>>> Stopping cluster:
>>>>>
>>>>>         Leaving fence domain...                                 [  OK  ]
>>>>>
>>>>>         Stopping gfs_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping dlm_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping fenced...                                      [  OK  ]
>>>>>
>>>>>         Stopping cman...                                        [  OK  ]
>>>>>
>>>>>         Waiting for corosync to shutdown:                       [  OK  ]
>>>>>
>>>>>         Unloading kernel modules...                             [  OK  ]
>>>>>
>>>>>         Unmounting configfs...                                  [  OK  ]
>>>>>
>>>>> Can you help?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vinh
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From vinh.cao at hp.com  Thu Jan  8 13:36:01 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Thu, 8 Jan 2015 13:36:01 +0000
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <54AE2B54.4030705@alteeve.ca>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>
	<54AD941C.4070205@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>
	<54AD9E05.2030902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>
	<54ADA61C.2020509@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>
	<54ADB839.9000902@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>
	<54AE2B54.4030705@alteeve.ca>
Message-ID: <E277764ADBC61145B70AC4639275B499044F3E7B@G1W3786.americas.hpqcorp.net>

Hello Digimer,

The problem solved. First of all, I just want to thank you your time to stay with me on the issue I have. You are also correct about fencing.
But here is it breaking down to.

1. I forgot, when I create the cluster. I didn't join these system in cluster set yet. You know one for a long while I have to setup cluster. I did write documentation about all this.
But I still forget to follow it to the teeth. That is what happens. So I have to run: cman_tool join for all nodes. This is the key.
2. after join all nodes into cluster. I'm able to start cman via: service cman start
3. then configure fencing
4. then add static config mount device into /etc/fstab
5. then reboot each node one by one. They are all come back and well.

I do have this error in logs: (it mean our multicast is not using. I'm using broadcast for now. But if we have multicast network not blocking, then that error would go away. That is my thought.)

[TOTEM ] Received message has invalid digest... ignoring.
Jan  8 08:34:33 ustlvcmsp1956 corosync[21194]:   [TOTEM ] Invalid packet data

Again, thanks for your helps.
Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, January 08, 2015 2:02 AM
To: linux clustering
Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster

Please configure fencing. If you don't, it _will_ cause you problems.

On 07/01/15 09:48 PM, Cao, Vinh wrote:
> Hi Digimer,
>
> No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out.
> I did try to set broadcast, but somehow it didn't work either.
>
> Let me give broadcast a try again.
>
> Thanks,
> Vinh
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
> Sent: Wednesday, January 07, 2015 5:51 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>
> It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall?
>
> On 07/01/15 05:32 PM, Cao, Vinh wrote:
>> Hi Digimer,
>>
>> Yes, I just did. Looks like they are failing. I'm not sure why that is.
>> Please see the attachment for all servers log.
>>
>> By the way, I do appreciated all the helps I can get.
>>
>> Vinh
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>> Sent: Wednesday, January 07, 2015 4:33 PM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>
>> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please.
>>
>> On 07/01/15 04:29 PM, Cao, Vinh wrote:
>>> Hi Digimer,
>>>
>>> Here is from the logs:
>>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync profile loading service
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Using quorum provider quorum_cman
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [QUORUM] Members[1]: 1
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [CPG   ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0)
>>> Jan  7 16:14:01 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>> Jan  7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Unloading all Corosync service engines.
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync configuration service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync profile loading service
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
>>> Jan  7 16:15:06 ustlvcmsp1954 corosync[8182]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
>>> Jan  7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed
>>>
>>> Then it die at:
>>>     Starting cman...                                        [  OK  ]
>>>       Waiting for quorum... Timed-out waiting for cluster
>>>                                                               
>>> [FAILED]
>>>
>>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the problem is still there. One thing I don't know why cluster is looking for quorum?
>>> I did have any disk quorum setup in cluster.conf file.
>>>
>>> Any helps can I get appreciated.
>>>
>>> Vinh
>>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com 
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>> Sent: Wednesday, January 07, 2015 3:59 PM
>>> To: linux clustering
>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>
>>> On 07/01/15 03:39 PM, Cao, Vinh wrote:
>>>> Hello Digimer,
>>>>
>>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6.
>>>>
>>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab.
>>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml 
>>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958">
>>>>             <clusternodes>
>>>>                     <clusternode name="ustlvcmsp1954" nodeid="1"/>
>>>>                     <clusternode name="ustlvcmsp1955" nodeid="2"/>
>>>>                     <clusternode name="ustlvcmsp1956" nodeid="3"/>
>>>>                     <clusternode name="ustlvcmsp1957" nodeid="4"/>
>>>>                     <clusternode name="ustlvcmsp1958" nodeid="5"/>
>>>>             </clusternodes>
>>>
>>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design).
>>>
>>>>             <fencedevices>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/>
>>>>                     <fencedevice agent="fence_vmware_soap" ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/>
>>>>             </fencedevices>
>>>> </cluster>
>>>>
>>>> clustat show:
>>>>
>>>> Cluster Status for p1954_to_p1958 @ Wed Jan  7 15:38:00 2015 Member
>>>> Status: Quorate
>>>>
>>>>      Member Name                                                     ID   Status
>>>>      ------ ----                                                     ---- ------
>>>>      ustlvcmsp1954                                                       1 Offline
>>>>      ustlvcmsp1955                                                       2 Online, Local
>>>>      ustlvcmsp1956                                                       3 Online
>>>>      ustlvcmsp1957                                                       4 Offline
>>>>      ustlvcmsp1958                                                       5 Online
>>>>
>>>> I need to make them all online, so I can use fencing for mounting shared disk.
>>>>
>>>> Thanks,
>>>> Vinh
>>>
>>> What about the log entries from the start-up? Did you try the post_join_delay config?
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces at redhat.com 
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
>>>> Sent: Wednesday, January 07, 2015 3:16 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
>>>>
>>>> My first though would be to set <fence_daemon post_join_delay="30" /> in cluster.conf.
>>>>
>>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well.
>>>>
>>>> Also, 6.4 is pretty old, why not upgrade to 6.6?
>>>>
>>>> digimer
>>>>
>>>> On 07/01/15 03:10 PM, Cao, Vinh wrote:
>>>>> Hello Cluster guru,
>>>>>
>>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two 
>>>>> nodes I don't have any issue.
>>>>>
>>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the 
>>>>> other two off line.
>>>>>
>>>>> When I start the one that are off line. Service cman start. I got:
>>>>>
>>>>> [root at ustlvcmspxxx ~]# service cman status
>>>>>
>>>>> corosync is stopped
>>>>>
>>>>> [root at ustlvcmsp1954 ~]# service cman start
>>>>>
>>>>> Starting cluster:
>>>>>
>>>>>         Checking if cluster has been disabled at boot...        [  OK  ]
>>>>>
>>>>>         Checking Network Manager...                             [  OK  ]
>>>>>
>>>>>         Global setup...                                         [  OK  ]
>>>>>
>>>>>         Loading kernel modules...                               [  OK  ]
>>>>>
>>>>>         Mounting configfs...                                    [  OK  ]
>>>>>
>>>>>         Starting cman...                                        [  OK  ]
>>>>>
>>>>> Waiting for quorum... Timed-out waiting for cluster
>>>>>
>>>>>
>>>>> [FAILED]
>>>>>
>>>>> Stopping cluster:
>>>>>
>>>>>         Leaving fence domain...                                 [  OK  ]
>>>>>
>>>>>         Stopping gfs_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping dlm_controld...                                [  OK  ]
>>>>>
>>>>>         Stopping fenced...                                      [  OK  ]
>>>>>
>>>>>         Stopping cman...                                        [  OK  ]
>>>>>
>>>>>         Waiting for corosync to shutdown:                       [  OK  ]
>>>>>
>>>>>         Unloading kernel modules...                             [  OK  ]
>>>>>
>>>>>         Unmounting configfs...                                  [  OK  ]
>>>>>
>>>>> Can you help?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vinh
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Digimer
Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Thu Jan  8 14:31:01 2015
From: lists at alteeve.ca (Digimer)
Date: Thu, 08 Jan 2015 09:31:01 -0500
Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster
In-Reply-To: <E277764ADBC61145B70AC4639275B499044F3CFC@G1W3786.americas.hpqcorp.net>
References: <E277764ADBC61145B70AC4639275B499044EE515@G1W3786.americas.hpqcorp.net>	<54AD941C.4070205@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE5C9@G1W3786.americas.hpqcorp.net>	<54AD9E05.2030902@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044EE7FC@G1W3786.americas.hpqcorp.net>	<54ADA61C.2020509@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044F010C@G1W3786.americas.hpqcorp.net>	<54ADB839.9000902@alteeve.ca>	<E277764ADBC61145B70AC4639275B499044F067B@G1W3786.americas.hpqcorp.net>	<54AE2B54.4030705@alteeve.ca>
	<E277764ADBC61145B70AC4639275B499044F3CFC@G1W3786.americas.hpqcorp.net>
Message-ID: <54AE94A5.6000802@alteeve.ca>

On 08/01/15 07:17 AM, Cao, Vinh wrote:
> Hi Digimer,
>
> You are correct. I do need to configure fencing. But before fencing, I need to have these servers become member of cluster first.
> If they are not member of cluster set. Doesn't matter I try to configure fencing or not. My cluster won't work.
>
> Thanks for your help.
> Vinh

Define the fence methods right from the start. As soon as the cluster 
forms, the first thing you do is run 'fence_check -f' on all nodes. If 
there is a problem, fix it. Only then do you add services.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Sun Jan 11 22:19:50 2015
From: lists at alteeve.ca (Digimer)
Date: Sun, 11 Jan 2015 17:19:50 -0500
Subject: [Linux-cluster] HA Summit 2015 - plan wiki closed for registration
In-Reply-To: <540D853F.3090109@redhat.com>
References: <540D853F.3090109@redhat.com>
Message-ID: <54B2F706.4090900@alteeve.ca>

Spammers got through the captcha, *sigh*.

If anyone wants to create an account to edit, please email me off-list 
and I'll get you setup ASAP. Sorry for the hassle.

http://plan.alteeve.ca/index.php/Main_Page

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Tue Jan 13 05:31:22 2015
From: lists at alteeve.ca (Digimer)
Date: Tue, 13 Jan 2015 00:31:22 -0500
Subject: [Linux-cluster] [Planning] Organizing HA Summit 2015
In-Reply-To: <540D853F.3090109@redhat.com>
References: <540D853F.3090109@redhat.com>
Message-ID: <54B4ADAA.5080803@alteeve.ca>

Hi all,

   With Fabio away for now, I (and others) are working on the final 
preparations for the summit. This is your chance to speak up and 
influence the planning! Objections/suggestions? Speak now please. :)

   In particular, please raise topics you want to discuss. Either add 
them to the wiki directly or email me and I will update the wiki for 
you. (Note that registration is closed because of spammers, if you want 
an account just let me know and I'll open it back up).

The plan is;

* Informal atmosphere with limited structure to make sure key topics are 
addressed.

Two ways topics will be discussed;

** Someone will guide a given topic they want to raise for ~45 minutes, 
15 minutes for Q&A

** "Round-table" style discussion with no one person leading (though it 
would be nice to have someone taking notes).

People presenting are asked not to use slides. Hand-outs are fine and 
either a white-board or paper flip-board will be available for 
illustrating ideas and flushing out concepts.

The summit will start at 9:00 and go until 17:00. We'll go for a 
semi-official summit dinner and drinks around 6pm on the 4th (location 
to be determined). Those staying in Brno are more than welcome to join 
an informal dinner and drinks (and possibly some sight-seeing, etc) the 
evening of the 5th.

Any concerns/comments/suggestions, please speak up ASAP!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Wed Jan 14 04:38:10 2015
From: lists at alteeve.ca (Digimer)
Date: Tue, 13 Jan 2015 23:38:10 -0500
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg-technical] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <CAMwWrU7pxS=gmggSjKhDqD5iJU49VxZJVGjBY=_O3nxstpUMng@mail.gmail.com>
References: <540D853F.3090109@redhat.com>
	<20141208133608.GC18879@redhat.com>	<CAJM6Fh_GufBob3=sgTkFz7gYj8nNK648VJGgvxe=O+7y5m8e-g@mail.gmail.com>	<54985122.5020907@alteeve.ca>
	<CAMwWrU7pxS=gmggSjKhDqD5iJU49VxZJVGjBY=_O3nxstpUMng@mail.gmail.com>
Message-ID: <54B5F2B2.3050103@alteeve.ca>

Woohoo!!

Will be very nice to see you. :)

I've added you. Can you give me a short sentence to introduce yourself 
to people who haven't met you?

Madi

On 13/01/15 11:33 PM, Yusuke Iida wrote:
> Hi Digimer,
>
> I am Iida to participate from NTT along with Mori.
> I want you added to the list of participants.
>
> I'm sorry contact is late.
>
> Regards,
> Yusuke
>
> 2014-12-23 2:13 GMT+09:00 Digimer <lists at alteeve.ca>:
>> It will be very nice to see you again! Will Ikeda-san be there as well?
>>
>> digimer
>>
>> On 22/12/14 03:35 AM, Keisuke MORI wrote:
>>>
>>> Hi all,
>>>
>>> Really late response but,
>>> I will be joining the HA summit, with a few colleagues from NTT.
>>>
>>> See you guys in Brno,
>>> Thanks,
>>>
>>>
>>> 2014-12-08 22:36 GMT+09:00 Jan Pokorn? <jpokorny at redhat.com>:
>>>>
>>>> Hello,
>>>>
>>>> it occured to me that if you want to use the opportunity and double
>>>> as as tourist while being in Brno, it's about the right time to
>>>> consider reservations/ticket purchases this early.
>>>> At least in some cases it is a must, e.g., Villa Tugendhat:
>>>>
>>>>
>>>> http://rezervace.spilberk.cz/langchange.aspx?mrsname=&languageId=2&returnUrl=%2Flist
>>>>
>>>> On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote:
>>>>>
>>>>> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
>>>>>
>>>>> My suggestion would be to have a 2 days dedicated HA summit the 4th and
>>>>> the 5th of February.
>>>>
>>>>
>>>> --
>>>> Jan
>>>>
>>>> _______________________________________________
>>>> ha-wg-technical mailing list
>>>> ha-wg-technical at lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From jpokorny at redhat.com  Mon Jan 26 14:14:38 2015
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Mon, 26 Jan 2015 15:14:38 +0100
Subject: [Linux-cluster] HA Summit Key-signing Party (was: Organizing HA
	Summit 2015)
In-Reply-To: <54B4ADAA.5080803@alteeve.ca>
References: <540D853F.3090109@redhat.com>
 <54B4ADAA.5080803@alteeve.ca>
Message-ID: <20150126141438.GE21558@redhat.com>

Hello cluster masters,

On 13/01/15 00:31 -0500, Digimer wrote:
> Any concerns/comments/suggestions, please speak up ASAP!

I'd like to throw a key-signing party as it will be a perfect
opportunity to build a web of trust amongst us.

If you haven't incorporated OpenPGP to your communication with the
world yet, I would recommend at least considering it, even more in
the post-Snowden era.  You can use it to prove authenticity/integrity
of the data you emit (signing; not just for email as is the case
with this one, but also for SW releases and more), provide
privacy/confidentiality of interchanged data (encryption; again,
typical scenario is a private email, e.g., when you responsibly
report a vulnerability to the respective maintainers), or both.

In case you have no experience with this technology, there are
plentiful resources on GnuPG (most renowned FOSS implementation):
- https://www.gnupg.org/documentation/howtos.en.html
- http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#prep
  (preparation steps for a key-signing party)
- ...

To make the verification process as smooth and as little
time-consuming as possible, I would stick with a list-based method:
http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#list_based
and volunteer for a role of a coordinator.


What's needed?
Once you have a key pair (and provided that you are using GnuPG), please
run the following sequence:

    # figure out the key ID for the identity to be verified;
    # IDENTITY is either your associated email address/your name
    # if only single key ID matches, specific key otherwise
    # (you can use "gpg -K" to select a desired ID at the "sec" line)
    KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

    # export the public key to a file that is suitable for exchange
    gpg --export -a -- $KEY > $KEY

    # verify that you have an expected data to share
    gpg --with-fingerprint -- $KEY

with IDENTITY adjusted as per the instruction above, and send me the
resulting $KEY file, preferably in a signed (or even encrypted[*]) email
from an address associated with that very public key of yours.

[*] You can find my public key at public keyservers:
http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF
Indeed, the trust in this key should be ephemeral/one-off
(e.g., using a temporary keyring, not a universal one before we proceed
with the signing :)


Timeline?
Best if you send me your public keys before 2015-02-02.  I will then
compile a list of the attendees together with their keys and publish
it at https://people.redhat.com/jpokorny/keysigning/2015-ha/
so you can print it out and be ready for the party.

Thanks for your cooperation, looking forward to this side-event and
hope this will be beneficial to all involved.


P.S. There's now an opportunity to visit an exhibition of the Bohemian
Crown Jewels replicas directly in Brno (sorry, Google Translate only)
https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.letohradekbrno.cz%2F%3Fidm%3D55

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150126/36b43c13/attachment.sig>

From lists at alteeve.ca  Mon Jan 26 14:17:24 2015
From: lists at alteeve.ca (Digimer)
Date: Mon, 26 Jan 2015 09:17:24 -0500
Subject: [Linux-cluster] [Pacemaker] HA Summit Key-signing Party
In-Reply-To: <20150126141438.GE21558@redhat.com>
References: <540D853F.3090109@redhat.com> <54B4ADAA.5080803@alteeve.ca>
	<20150126141438.GE21558@redhat.com>
Message-ID: <54C64C74.8020402@alteeve.ca>

On 26/01/15 09:14 AM, Jan Pokorn? wrote:
> Hello cluster masters,
>
> On 13/01/15 00:31 -0500, Digimer wrote:
>> Any concerns/comments/suggestions, please speak up ASAP!
>
> I'd like to throw a key-signing party as it will be a perfect
> opportunity to build a web of trust amongst us.
>
> If you haven't incorporated OpenPGP to your communication with the
> world yet, I would recommend at least considering it, even more in
> the post-Snowden era.  You can use it to prove authenticity/integrity
> of the data you emit (signing; not just for email as is the case
> with this one, but also for SW releases and more), provide
> privacy/confidentiality of interchanged data (encryption; again,
> typical scenario is a private email, e.g., when you responsibly
> report a vulnerability to the respective maintainers), or both.
>
> In case you have no experience with this technology, there are
> plentiful resources on GnuPG (most renowned FOSS implementation):
> - https://www.gnupg.org/documentation/howtos.en.html
> - http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#prep
>    (preparation steps for a key-signing party)
> - ...
>
> To make the verification process as smooth and as little
> time-consuming as possible, I would stick with a list-based method:
> http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#list_based
> and volunteer for a role of a coordinator.
>
>
> What's needed?
> Once you have a key pair (and provided that you are using GnuPG), please
> run the following sequence:
>
>      # figure out the key ID for the identity to be verified;
>      # IDENTITY is either your associated email address/your name
>      # if only single key ID matches, specific key otherwise
>      # (you can use "gpg -K" to select a desired ID at the "sec" line)
>      KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)
>
>      # export the public key to a file that is suitable for exchange
>      gpg --export -a -- $KEY > $KEY
>
>      # verify that you have an expected data to share
>      gpg --with-fingerprint -- $KEY
>
> with IDENTITY adjusted as per the instruction above, and send me the
> resulting $KEY file, preferably in a signed (or even encrypted[*]) email
> from an address associated with that very public key of yours.
>
> [*] You can find my public key at public keyservers:
> http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF
> Indeed, the trust in this key should be ephemeral/one-off
> (e.g., using a temporary keyring, not a universal one before we proceed
> with the signing :)
>
>
> Timeline?
> Best if you send me your public keys before 2015-02-02.  I will then
> compile a list of the attendees together with their keys and publish
> it at https://people.redhat.com/jpokorny/keysigning/2015-ha/
> so you can print it out and be ready for the party.
>
> Thanks for your cooperation, looking forward to this side-event and
> hope this will be beneficial to all involved.
>
>
> P.S. There's now an opportunity to visit an exhibition of the Bohemian
> Crown Jewels replicas directly in Brno (sorry, Google Translate only)
> https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.letohradekbrno.cz%2F%3Fidm%3D55

=o, keysigning is a brilliant idea!

I can put the keys in the plan wiki, too.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From jpokorny at redhat.com  Tue Jan 27 17:11:11 2015
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Tue, 27 Jan 2015 18:11:11 +0100
Subject: [Linux-cluster] HA Summit Key-signing Party (was: Organizing HA
	Summit 2015)
In-Reply-To: <20150126141438.GE21558@redhat.com>
References: <540D853F.3090109@redhat.com> <54B4ADAA.5080803@alteeve.ca>
	<20150126141438.GE21558@redhat.com>
Message-ID: <20150127171111.GA427@redhat.com>

> What's needed?
> Once you have a key pair (and provided that you are using GnuPG), please
> run the following sequence:
> 
>     # figure out the key ID for the identity to be verified;
>     # IDENTITY is either your associated email address/your name
>     # if only single key ID matches, specific key otherwise
>     # (you can use "gpg -K" to select a desired ID at the "sec" line)
>     KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

Oops, sorry, somehow '-k' got lost above ^.  Correct version:

     KEY=$(gpg -k --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

>     # export the public key to a file that is suitable for exchange
>     gpg --export -a -- $KEY > $KEY
> 
>     # verify that you have an expected data to share
>     gpg --with-fingerprint -- $KEY

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150127/81f7b53d/attachment.sig>

From cluster.labs at gmail.com  Thu Jan 29 04:50:36 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 08:20:36 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes
Message-ID: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>

Hi,

In a two node cluster, i received to different result from "qemu-img
check" on just one file:

node1 # qemu-img check VMStorage/x.qcow2
No errors were found on the image.

Node2 # qemu-img check VMStorage/x.qcow2
qemu-img: Could not open 'VMStorage/x.qcow2"

All other files are OK, and the cluster works properly.
What is the problem?

====
Packages:
kernel: 2.6.32-431.5.1.el6.x86_64
GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
corosync: corosync-1.4.1-17.el6.x86_64

Best Regards


From lists at alteeve.ca  Thu Jan 29 05:27:55 2015
From: lists at alteeve.ca (Digimer)
Date: Thu, 29 Jan 2015 00:27:55 -0500
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
 nodes
In-Reply-To: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
Message-ID: <54C9C4DB.3060008@alteeve.ca>

On 28/01/15 11:50 PM, cluster lab wrote:
> Hi,
>
> In a two node cluster, i received to different result from "qemu-img
> check" on just one file:
>
> node1 # qemu-img check VMStorage/x.qcow2
> No errors were found on the image.
>
> Node2 # qemu-img check VMStorage/x.qcow2
> qemu-img: Could not open 'VMStorage/x.qcow2"
>
> All other files are OK, and the cluster works properly.
> What is the problem?
>
> ====
> Packages:
> kernel: 2.6.32-431.5.1.el6.x86_64
> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
> corosync: corosync-1.4.1-17.el6.x86_64
>
> Best Regards

What does 'dlm_tool ls' show?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From cluster.labs at gmail.com  Thu Jan 29 05:34:19 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 09:04:19 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <54C9C4DB.3060008@alteeve.ca>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
Message-ID: <CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>

Node2: # dlm_tool ls
dlm lockspaces
name          VMStorage3
id            0xb26438a2
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

name          VMStorage2
id            0xab7f09e3
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

name          VMStorage1
id            0x80525a20
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2
===========================================
Node1: # dlm_tool ls
dlm lockspaces
name          VMStorage3
id            0xb26438a2
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

name          VMStorage2
id            0xab7f09e3
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

name          VMStorage1
id            0x80525a20
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
> On 28/01/15 11:50 PM, cluster lab wrote:
>>
>> Hi,
>>
>> In a two node cluster, i received to different result from "qemu-img
>> check" on just one file:
>>
>> node1 # qemu-img check VMStorage/x.qcow2
>> No errors were found on the image.
>>
>> Node2 # qemu-img check VMStorage/x.qcow2
>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>
>> All other files are OK, and the cluster works properly.
>> What is the problem?
>>
>> ====
>> Packages:
>> kernel: 2.6.32-431.5.1.el6.x86_64
>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>> corosync: corosync-1.4.1-17.el6.x86_64
>>
>> Best Regards
>
>
> What does 'dlm_tool ls' show?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cluster.labs at gmail.com  Thu Jan 29 05:39:55 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 09:09:55 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <54C9C4DB.3060008@alteeve.ca>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
Message-ID: <CAJ7sfnecry4b+NPpa7MZoR2qCkRXF7zGjmjK07XK1jY_uhGOyw@mail.gmail.com>

Node2: # dlm_tool ls
dlm lockspaces
name          VMStorage3
id            0xb26438a2
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

name          VMStorage2
id            0xab7f09e3
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2

name          VMStorage1
id            0x80525a20
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 1,1
members       1 2
===========================================
Node1: # dlm_tool ls
dlm lockspaces
name          VMStorage3
id            0xb26438a2
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

name          VMStorage2
id            0xab7f09e3
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

name          VMStorage1
id            0x80525a20
flags         0x00000008 fs_reg
change        member 2 joined 1 remove 0 failed 0 seq 2,2
members       1 2

On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
> On 28/01/15 11:50 PM, cluster lab wrote:
>>
>> Hi,
>>
>> In a two node cluster, i received to different result from "qemu-img
>> check" on just one file:
>>
>> node1 # qemu-img check VMStorage/x.qcow2
>> No errors were found on the image.
>>
>> Node2 # qemu-img check VMStorage/x.qcow2
>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>
>> All other files are OK, and the cluster works properly.
>> What is the problem?
>>
>> ====
>> Packages:
>> kernel: 2.6.32-431.5.1.el6.x86_64
>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>> corosync: corosync-1.4.1-17.el6.x86_64
>>
>> Best Regards
>
>
> What does 'dlm_tool ls' show?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Thu Jan 29 05:55:31 2015
From: lists at alteeve.ca (Digimer)
Date: Thu, 29 Jan 2015 00:55:31 -0500
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
 nodes
In-Reply-To: <CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
Message-ID: <54C9CB53.4070407@alteeve.ca>

That looks OK. Can you touch a file from one node and see it on the 
other and vice-versa? Is there anything in either node's log files when 
you run 'qemu-img check'?

On 29/01/15 12:34 AM, cluster lab wrote:
> Node2: # dlm_tool ls
> dlm lockspaces
> name          VMStorage3
> id            0xb26438a2
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 1,1
> members       1 2
>
> name          VMStorage2
> id            0xab7f09e3
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 1,1
> members       1 2
>
> name          VMStorage1
> id            0x80525a20
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 1,1
> members       1 2
> ===========================================
> Node1: # dlm_tool ls
> dlm lockspaces
> name          VMStorage3
> id            0xb26438a2
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 2,2
> members       1 2
>
> name          VMStorage2
> id            0xab7f09e3
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 2,2
> members       1 2
>
> name          VMStorage1
> id            0x80525a20
> flags         0x00000008 fs_reg
> change        member 2 joined 1 remove 0 failed 0 seq 2,2
> members       1 2
>
> On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
>> On 28/01/15 11:50 PM, cluster lab wrote:
>>>
>>> Hi,
>>>
>>> In a two node cluster, i received to different result from "qemu-img
>>> check" on just one file:
>>>
>>> node1 # qemu-img check VMStorage/x.qcow2
>>> No errors were found on the image.
>>>
>>> Node2 # qemu-img check VMStorage/x.qcow2
>>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>>
>>> All other files are OK, and the cluster works properly.
>>> What is the problem?
>>>
>>> ====
>>> Packages:
>>> kernel: 2.6.32-431.5.1.el6.x86_64
>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>>> corosync: corosync-1.4.1-17.el6.x86_64
>>>
>>> Best Regards
>>
>>
>> What does 'dlm_tool ls' show?
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From cluster.labs at gmail.com  Thu Jan 29 09:46:57 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 13:16:57 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <54C9CB53.4070407@alteeve.ca>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
Message-ID: <CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>

Node2 # touch /VMStorage1/test

Node1 # ls /VMStorage1/test
/VMStorage1/test
Node1 # rm /VMStorage1/test
rm: remove regular empty file `/VMStorage1/test'? y

====

Node1 #  touch /VMStorage1/test

Node2 # ls /VMStorage1/test
/VMStorage1/test
Node2 # rm /VMStorage1/test
rm: remove regular empty file `/VMStorage1/test'? y

No Problem ....

On Thu, Jan 29, 2015 at 9:25 AM, Digimer <lists at alteeve.ca> wrote:
> That looks OK. Can you touch a file from one node and see it on the other
> and vice-versa? Is there anything in either node's log files when you run
> 'qemu-img check'?
>
>
> On 29/01/15 12:34 AM, cluster lab wrote:
>>
>> Node2: # dlm_tool ls
>> dlm lockspaces
>> name          VMStorage3
>> id            0xb26438a2
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>> members       1 2
>>
>> name          VMStorage2
>> id            0xab7f09e3
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>> members       1 2
>>
>> name          VMStorage1
>> id            0x80525a20
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>> members       1 2
>> ===========================================
>> Node1: # dlm_tool ls
>> dlm lockspaces
>> name          VMStorage3
>> id            0xb26438a2
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>> members       1 2
>>
>> name          VMStorage2
>> id            0xab7f09e3
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>> members       1 2
>>
>> name          VMStorage1
>> id            0x80525a20
>> flags         0x00000008 fs_reg
>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>> members       1 2
>>
>> On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
>>>
>>> On 28/01/15 11:50 PM, cluster lab wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> In a two node cluster, i received to different result from "qemu-img
>>>> check" on just one file:
>>>>
>>>> node1 # qemu-img check VMStorage/x.qcow2
>>>> No errors were found on the image.
>>>>
>>>> Node2 # qemu-img check VMStorage/x.qcow2
>>>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>>>
>>>> All other files are OK, and the cluster works properly.
>>>> What is the problem?
>>>>
>>>> ====
>>>> Packages:
>>>> kernel: 2.6.32-431.5.1.el6.x86_64
>>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>>>> corosync: corosync-1.4.1-17.el6.x86_64
>>>>
>>>> Best Regards
>>>
>>>
>>>
>>> What does 'dlm_tool ls' show?
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cluster.labs at gmail.com  Thu Jan 29 11:35:22 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 15:05:22 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
Message-ID: <CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>

I have two separate cluster, with this problem,
the output of "dlm_tool ls" on other site is:

 dlm_tool ls
dlm lockspaces
name          VMStorage4
id            0xfd25ae65
flags         0x00000008 fs_reg
change        member 2 joined 0 remove 1 failed 1 seq 2,2
members       2 3

name          VMStorage3
id            0xb26438a2
flags         0x00000008 fs_reg
change        member 2 joined 0 remove 1 failed 1 seq 2,2
members       2 3

name          VMStorage2
id            0xab7f09e3
flags         0x00000008 fs_reg
change        member 2 joined 0 remove 1 failed 1 seq 2,2
members       2 3

name          VMStorage1
id            0x80525a20
flags         0x00000008 fs_reg
change        member 2 joined 0 remove 1 failed 1 seq 2,2
members       2 3

There is one fail ....!!!!


On Thu, Jan 29, 2015 at 1:16 PM, cluster lab <cluster.labs at gmail.com> wrote:
> Node2 # touch /VMStorage1/test
>
> Node1 # ls /VMStorage1/test
> /VMStorage1/test
> Node1 # rm /VMStorage1/test
> rm: remove regular empty file `/VMStorage1/test'? y
>
> ====
>
> Node1 #  touch /VMStorage1/test
>
> Node2 # ls /VMStorage1/test
> /VMStorage1/test
> Node2 # rm /VMStorage1/test
> rm: remove regular empty file `/VMStorage1/test'? y
>
> No Problem ....
>
> On Thu, Jan 29, 2015 at 9:25 AM, Digimer <lists at alteeve.ca> wrote:
>> That looks OK. Can you touch a file from one node and see it on the other
>> and vice-versa? Is there anything in either node's log files when you run
>> 'qemu-img check'?
>>
>>
>> On 29/01/15 12:34 AM, cluster lab wrote:
>>>
>>> Node2: # dlm_tool ls
>>> dlm lockspaces
>>> name          VMStorage3
>>> id            0xb26438a2
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>> members       1 2
>>>
>>> name          VMStorage2
>>> id            0xab7f09e3
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>> members       1 2
>>>
>>> name          VMStorage1
>>> id            0x80525a20
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>> members       1 2
>>> ===========================================
>>> Node1: # dlm_tool ls
>>> dlm lockspaces
>>> name          VMStorage3
>>> id            0xb26438a2
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>> members       1 2
>>>
>>> name          VMStorage2
>>> id            0xab7f09e3
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>> members       1 2
>>>
>>> name          VMStorage1
>>> id            0x80525a20
>>> flags         0x00000008 fs_reg
>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>> members       1 2
>>>
>>> On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
>>>>
>>>> On 28/01/15 11:50 PM, cluster lab wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> In a two node cluster, i received to different result from "qemu-img
>>>>> check" on just one file:
>>>>>
>>>>> node1 # qemu-img check VMStorage/x.qcow2
>>>>> No errors were found on the image.
>>>>>
>>>>> Node2 # qemu-img check VMStorage/x.qcow2
>>>>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>>>>
>>>>> All other files are OK, and the cluster works properly.
>>>>> What is the problem?
>>>>>
>>>>> ====
>>>>> Packages:
>>>>> kernel: 2.6.32-431.5.1.el6.x86_64
>>>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>>>>> corosync: corosync-1.4.1-17.el6.x86_64
>>>>>
>>>>> Best Regards
>>>>
>>>>
>>>>
>>>> What does 'dlm_tool ls' show?
>>>>
>>>> --
>>>> Digimer
>>>> Papers and Projects: https://alteeve.ca/w/
>>>> What if the cure for cancer is trapped in the mind of a person without
>>>> access to education?
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster


From cluster.labs at gmail.com  Thu Jan 29 12:11:19 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 15:41:19 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
Message-ID: <CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>

Additional information may be useful:

On Affected Node:
# ls -lah FILE
-?????????  ? ?    ?       ?            ? FILE


On Thu, Jan 29, 2015 at 3:05 PM, cluster lab <cluster.labs at gmail.com> wrote:
> I have two separate cluster, with this problem,
> the output of "dlm_tool ls" on other site is:
>
>  dlm_tool ls
> dlm lockspaces
> name          VMStorage4
> id            0xfd25ae65
> flags         0x00000008 fs_reg
> change        member 2 joined 0 remove 1 failed 1 seq 2,2
> members       2 3
>
> name          VMStorage3
> id            0xb26438a2
> flags         0x00000008 fs_reg
> change        member 2 joined 0 remove 1 failed 1 seq 2,2
> members       2 3
>
> name          VMStorage2
> id            0xab7f09e3
> flags         0x00000008 fs_reg
> change        member 2 joined 0 remove 1 failed 1 seq 2,2
> members       2 3
>
> name          VMStorage1
> id            0x80525a20
> flags         0x00000008 fs_reg
> change        member 2 joined 0 remove 1 failed 1 seq 2,2
> members       2 3
>
> There is one fail ....!!!!
>
>
> On Thu, Jan 29, 2015 at 1:16 PM, cluster lab <cluster.labs at gmail.com> wrote:
>> Node2 # touch /VMStorage1/test
>>
>> Node1 # ls /VMStorage1/test
>> /VMStorage1/test
>> Node1 # rm /VMStorage1/test
>> rm: remove regular empty file `/VMStorage1/test'? y
>>
>> ====
>>
>> Node1 #  touch /VMStorage1/test
>>
>> Node2 # ls /VMStorage1/test
>> /VMStorage1/test
>> Node2 # rm /VMStorage1/test
>> rm: remove regular empty file `/VMStorage1/test'? y
>>
>> No Problem ....
>>
>> On Thu, Jan 29, 2015 at 9:25 AM, Digimer <lists at alteeve.ca> wrote:
>>> That looks OK. Can you touch a file from one node and see it on the other
>>> and vice-versa? Is there anything in either node's log files when you run
>>> 'qemu-img check'?
>>>
>>>
>>> On 29/01/15 12:34 AM, cluster lab wrote:
>>>>
>>>> Node2: # dlm_tool ls
>>>> dlm lockspaces
>>>> name          VMStorage3
>>>> id            0xb26438a2
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>>> members       1 2
>>>>
>>>> name          VMStorage2
>>>> id            0xab7f09e3
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>>> members       1 2
>>>>
>>>> name          VMStorage1
>>>> id            0x80525a20
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 1,1
>>>> members       1 2
>>>> ===========================================
>>>> Node1: # dlm_tool ls
>>>> dlm lockspaces
>>>> name          VMStorage3
>>>> id            0xb26438a2
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>>> members       1 2
>>>>
>>>> name          VMStorage2
>>>> id            0xab7f09e3
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>>> members       1 2
>>>>
>>>> name          VMStorage1
>>>> id            0x80525a20
>>>> flags         0x00000008 fs_reg
>>>> change        member 2 joined 1 remove 0 failed 0 seq 2,2
>>>> members       1 2
>>>>
>>>> On Thu, Jan 29, 2015 at 8:57 AM, Digimer <lists at alteeve.ca> wrote:
>>>>>
>>>>> On 28/01/15 11:50 PM, cluster lab wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In a two node cluster, i received to different result from "qemu-img
>>>>>> check" on just one file:
>>>>>>
>>>>>> node1 # qemu-img check VMStorage/x.qcow2
>>>>>> No errors were found on the image.
>>>>>>
>>>>>> Node2 # qemu-img check VMStorage/x.qcow2
>>>>>> qemu-img: Could not open 'VMStorage/x.qcow2"
>>>>>>
>>>>>> All other files are OK, and the cluster works properly.
>>>>>> What is the problem?
>>>>>>
>>>>>> ====
>>>>>> Packages:
>>>>>> kernel: 2.6.32-431.5.1.el6.x86_64
>>>>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64
>>>>>> corosync: corosync-1.4.1-17.el6.x86_64
>>>>>>
>>>>>> Best Regards
>>>>>
>>>>>
>>>>>
>>>>> What does 'dlm_tool ls' show?
>>>>>
>>>>> --
>>>>> Digimer
>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>> What if the cure for cancer is trapped in the mind of a person without
>>>>> access to education?
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Thu Jan 29 13:10:11 2015
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 29 Jan 2015 08:10:11 -0500 (EST)
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of
	the	nodes
In-Reply-To: <CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
Message-ID: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Additional information may be useful:
> 
> On Affected Node:
> # ls -lah FILE
> -?????????  ? ?    ?       ?            ? FILE
> 

This symptom often means a loss of cluster coherency.
Are you using lock_dlm protocol?

Bob Peterson
Red Hat File Systems


From cluster.labs at gmail.com  Thu Jan 29 14:55:52 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 18:25:52 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<54C9C4DB.3060008@alteeve.ca>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>

Hi Bob,

yes, it uses lock_dlm, ...


On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Additional information may be useful:
>>
>> On Affected Node:
>> # ls -lah FILE
>> -?????????  ? ?    ?       ?            ? FILE
>>
>
> This symptom often means a loss of cluster coherency.
> Are you using lock_dlm protocol?
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Thu Jan 29 15:03:37 2015
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 29 Jan 2015 10:03:37 -0500 (EST)
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of
	the	nodes
In-Reply-To: <CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
Message-ID: <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Hi Bob,
> 
> yes, it uses lock_dlm, ...
> 
> 
> On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> > ----- Original Message -----
> >> Additional information may be useful:
> >>
> >> On Affected Node:
> >> # ls -lah FILE
> >> -?????????  ? ?    ?       ?            ? FILE

Hi,

Try doing the command "stat FILE|grep Inode" from both nodes and see if
both nodes come up with the same answer for "Inode:"

Regards,

Bob Peterson
Red Hat File Systems


From cluster.labs at gmail.com  Thu Jan 29 15:22:45 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Thu, 29 Jan 2015 18:52:45 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfndEQ8LqNsaGPs8HeC2S7dRii3Y4s50rV8F+_SZp69O3oA@mail.gmail.com>
	<54C9CB53.4070407@alteeve.ca>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>

On affected node:

stat FILE | grep Inode
stat: cannot stat `FILE': Input/output error

On other node:
stat PublicDNS1-OS.qcow2 | grep Inode
Device: fd06h/64774d    Inode: 267858      Links: 1

On Thu, Jan 29, 2015 at 6:33 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Hi Bob,
>>
>> yes, it uses lock_dlm, ...
>>
>>
>> On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>> > ----- Original Message -----
>> >> Additional information may be useful:
>> >>
>> >> On Affected Node:
>> >> # ls -lah FILE
>> >> -?????????  ? ?    ?       ?            ? FILE
>
> Hi,
>
> Try doing the command "stat FILE|grep Inode" from both nodes and see if
> both nodes come up with the same answer for "Inode:"
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From rpeterso at redhat.com  Thu Jan 29 15:34:56 2015
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 29 Jan 2015 10:34:56 -0500 (EST)
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of
	the	nodes
In-Reply-To: <CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
Message-ID: <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>

----- Original Message -----
> On affected node:
> 
> stat FILE | grep Inode
> stat: cannot stat `FILE': Input/output error
> 
> On other node:
> stat PublicDNS1-OS.qcow2 | grep Inode
> Device: fd06h/64774d    Inode: 267858      Links: 1

Something funky going on.
I'd check dmesg for withdraw messages, etc., on the affected node.

Regards,

Bob Peterson
Red Hat File Systems


From cluster.labs at gmail.com  Sat Jan 31 04:58:08 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Sat, 31 Jan 2015 08:28:08 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>

Hi,

There is n't any unusual state or message,
Also GFS logs (gfs, dlm) are silent ...

Is there any chance to find source of problem?

On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> On affected node:
>>
>> stat FILE | grep Inode
>> stat: cannot stat `FILE': Input/output error
>>
>> On other node:
>> stat PublicDNS1-OS.qcow2 | grep Inode
>> Device: fd06h/64774d    Inode: 267858      Links: 1
>
> Something funky going on.
> I'd check dmesg for withdraw messages, etc., on the affected node.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cluster.labs at gmail.com  Sat Jan 31 05:10:17 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Sat, 31 Jan 2015 08:40:17 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
Message-ID: <CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>

Some more information:

Cluster is a three nodes cluster,
One of its node (ID == 1) fenced because of network failure ...

After fence this problem borned ...


On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs at gmail.com> wrote:
> Hi,
>
> There is n't any unusual state or message,
> Also GFS logs (gfs, dlm) are silent ...
>
> Is there any chance to find source of problem?
>
> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>> ----- Original Message -----
>>> On affected node:
>>>
>>> stat FILE | grep Inode
>>> stat: cannot stat `FILE': Input/output error
>>>
>>> On other node:
>>> stat PublicDNS1-OS.qcow2 | grep Inode
>>> Device: fd06h/64774d    Inode: 267858      Links: 1
>>
>> Something funky going on.
>> I'd check dmesg for withdraw messages, etc., on the affected node.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat File Systems
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster


From lists at alteeve.ca  Sat Jan 31 05:40:44 2015
From: lists at alteeve.ca (Digimer)
Date: Sat, 31 Jan 2015 00:40:44 -0500
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
 nodes
In-Reply-To: <CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
	<CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>
Message-ID: <54CC6ADC.1000305@alteeve.ca>

Does the logs show the fence succeeded or failed? Can you please post 
the logs from the surviving two nodes starting just before the failure 
until a few minutes after?

digimer

On 31/01/15 12:10 AM, cluster lab wrote:
> Some more information:
>
> Cluster is a three nodes cluster,
> One of its node (ID == 1) fenced because of network failure ...
>
> After fence this problem borned ...
>
>
> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs at gmail.com> wrote:
>> Hi,
>>
>> There is n't any unusual state or message,
>> Also GFS logs (gfs, dlm) are silent ...
>>
>> Is there any chance to find source of problem?
>>
>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>>> ----- Original Message -----
>>>> On affected node:
>>>>
>>>> stat FILE | grep Inode
>>>> stat: cannot stat `FILE': Input/output error
>>>>
>>>> On other node:
>>>> stat PublicDNS1-OS.qcow2 | grep Inode
>>>> Device: fd06h/64774d    Inode: 267858      Links: 1
>>>
>>> Something funky going on.
>>> I'd check dmesg for withdraw messages, etc., on the affected node.
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From cluster.labs at gmail.com  Sat Jan 31 06:51:30 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Sat, 31 Jan 2015 10:21:30 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <54CC6ADC.1000305@alteeve.ca>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
	<CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>
	<54CC6ADC.1000305@alteeve.ca>
Message-ID: <CAJ7sfnf2nqwwq+v5+hV5xexeAJ9gTYDAeCCwBWy8QOe4SzRW-w@mail.gmail.com>

Log messages:

Jan 21 17:07:31 node2 corosync[47788]:   [TOTEM ] A processor failed,
forming new configuration.
Jan 21 17:07:43 node2 corosync[47788]:   [QUORUM] Members[2]: 2 3
Jan 21 17:07:43 node2 corosync[47788]:   [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1
Jan 21 17:07:43 node2 corosync[47788]:   [CPG   ] chosen downlist:
sender r(0) ip(172........) ; members(old:3 left:1)
Jan 21 17:07:43 node2 corosync[47788]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan 21 17:07:43 node2 fenced[47840]: fencing node node1
Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Looking at journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying
to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Looking at journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Replaying journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Replaying journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Replayed 250 of 515 blocks
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Found 12 revoke tags
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Journal replayed in 1s
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Replayed 4260 of 4803 blocks
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Found 5 revoke tags
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Journal replayed in 1s
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done


Jan 21 17:07:31 node3 corosync[51444]:   [TOTEM ] A processor failed,
forming new configuration.
Jan 21 17:07:43 node3 corosync[51444]:   [QUORUM] Members[2]: 2 3
Jan 21 17:07:43 node3 corosync[51444]:   [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan 21 17:07:43 node3 corosync[51444]:   [CPG   ] chosen downlist:
sender r(0) ip(172......) ; members(old:3 left:1)
Jan 21 17:07:43 node3 corosync[51444]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1
Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2
Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying
to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
Looking at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Looking at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking
at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Replaying journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Replayed 6 of 7 blocks
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Found 1 revoke tags
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Journal replayed in 1s
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done

One Question: Accessing to files before acquire journal lock may cause
the problem?

....

On Sat, Jan 31, 2015 at 9:10 AM, Digimer <lists at alteeve.ca> wrote:
> Does the logs show the fence succeeded or failed? Can you please post the
> logs from the surviving two nodes starting just before the failure until a
> few minutes after?
>
> digimer
>
>
> On 31/01/15 12:10 AM, cluster lab wrote:
>>
>> Some more information:
>>
>> Cluster is a three nodes cluster,
>> One of its node (ID == 1) fenced because of network failure ...
>>
>> After fence this problem borned ...
>>
>>
>> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> There is n't any unusual state or message,
>>> Also GFS logs (gfs, dlm) are silent ...
>>>
>>> Is there any chance to find source of problem?
>>>
>>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso at redhat.com>
>>> wrote:
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> On affected node:
>>>>>
>>>>> stat FILE | grep Inode
>>>>> stat: cannot stat `FILE': Input/output error
>>>>>
>>>>> On other node:
>>>>> stat PublicDNS1-OS.qcow2 | grep Inode
>>>>> Device: fd06h/64774d    Inode: 267858      Links: 1
>>>>
>>>>
>>>> Something funky going on.
>>>> I'd check dmesg for withdraw messages, etc., on the affected node.
>>>>
>>>> Regards,
>>>>
>>>> Bob Peterson
>>>> Red Hat File Systems
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From cluster.labs at gmail.com  Sat Jan 31 06:52:11 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Sat, 31 Jan 2015 10:22:11 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <CAJ7sfnf2nqwwq+v5+hV5xexeAJ9gTYDAeCCwBWy8QOe4SzRW-w@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
	<CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>
	<54CC6ADC.1000305@alteeve.ca>
	<CAJ7sfnf2nqwwq+v5+hV5xexeAJ9gTYDAeCCwBWy8QOe4SzRW-w@mail.gmail.com>
Message-ID: <CAJ7sfncBrQKSvJEZbMSXeBBicA_yjmDRT3Lx26qe_FOBY4kv+g@mail.gmail.com>

Logs as attach ...

On Sat, Jan 31, 2015 at 10:21 AM, cluster lab <cluster.labs at gmail.com> wrote:
> Log messages:
>
> Jan 21 17:07:31 node2 corosync[47788]:   [TOTEM ] A processor failed,
> forming new configuration.
> Jan 21 17:07:43 node2 corosync[47788]:   [QUORUM] Members[2]: 2 3
> Jan 21 17:07:43 node2 corosync[47788]:   [TOTEM ] A processor joined
> or left the membership and a new membership was formed.
> Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1
> Jan 21 17:07:43 node2 corosync[47788]:   [CPG   ] chosen downlist:
> sender r(0) ip(172........) ; members(old:3 left:1)
> Jan 21 17:07:43 node2 corosync[47788]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Jan 21 17:07:43 node2 fenced[47840]: fencing node node1
> Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Looking at journal...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying
> to acquire journal lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Looking at journal...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Acquiring the transaction lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Replaying journal...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Acquiring the transaction lock...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Replaying journal...
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Replayed 250 of 515 blocks
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Found 12 revoke tags
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
> Journal replayed in 1s
> Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done
> Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Replayed 4260 of 4803 blocks
> Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Found 5 revoke tags
> Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
> Journal replayed in 1s
> Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done
>
>
> Jan 21 17:07:31 node3 corosync[51444]:   [TOTEM ] A processor failed,
> forming new configuration.
> Jan 21 17:07:43 node3 corosync[51444]:   [QUORUM] Members[2]: 2 3
> Jan 21 17:07:43 node3 corosync[51444]:   [TOTEM ] A processor joined
> or left the membership and a new membership was formed.
> Jan 21 17:07:43 node3 corosync[51444]:   [CPG   ] chosen downlist:
> sender r(0) ip(172......) ; members(old:3 left:1)
> Jan 21 17:07:43 node3 corosync[51444]:   [MAIN  ] Completed service
> synchronization, ready to provide service.
> Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1
> Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2
> Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying
> to acquire journal lock...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
> Looking at journal...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Looking at journal...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking
> at journal...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0:
> Trying to acquire journal lock...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Acquiring the transaction lock...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Replaying journal...
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Replayed 6 of 7 blocks
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Found 1 revoke tags
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
> Journal replayed in 1s
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done
> Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done
>
> One Question: Accessing to files before acquire journal lock may cause
> the problem?
>
> ....
>
> On Sat, Jan 31, 2015 at 9:10 AM, Digimer <lists at alteeve.ca> wrote:
>> Does the logs show the fence succeeded or failed? Can you please post the
>> logs from the surviving two nodes starting just before the failure until a
>> few minutes after?
>>
>> digimer
>>
>>
>> On 31/01/15 12:10 AM, cluster lab wrote:
>>>
>>> Some more information:
>>>
>>> Cluster is a three nodes cluster,
>>> One of its node (ID == 1) fenced because of network failure ...
>>>
>>> After fence this problem borned ...
>>>
>>>
>>> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> There is n't any unusual state or message,
>>>> Also GFS logs (gfs, dlm) are silent ...
>>>>
>>>> Is there any chance to find source of problem?
>>>>
>>>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso at redhat.com>
>>>> wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>>
>>>>>> On affected node:
>>>>>>
>>>>>> stat FILE | grep Inode
>>>>>> stat: cannot stat `FILE': Input/output error
>>>>>>
>>>>>> On other node:
>>>>>> stat PublicDNS1-OS.qcow2 | grep Inode
>>>>>> Device: fd06h/64774d    Inode: 267858      Links: 1
>>>>>
>>>>>
>>>>> Something funky going on.
>>>>> I'd check dmesg for withdraw messages, etc., on the affected node.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bob Peterson
>>>>> Red Hat File Systems
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
Jan 21 17:07:43 ost-pvm2 corosync[47788]:   [QUORUM] Members[2]: 2 3
Jan 21 17:07:43 ost-pvm2 corosync[47788]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 21 17:07:43 ost-pvm2 kernel: dlm: closing connection to node 1
Jan 21 17:07:43 ost-pvm2 corosync[47788]:   [CPG   ] chosen downlist: sender r(0) ip(172.16.40.22) ; members(old:3 left:1)
Jan 21 17:07:43 ost-pvm2 corosync[47788]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1
Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Trying to acquire journal lock...
Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Trying to acquire journal lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Busy
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Looking at journal...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Trying to acquire journal lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Trying to acquire journal lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Busy
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Trying to acquire journal lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Busy
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Looking at journal...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Acquiring the transaction lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replaying journal...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Acquiring the transaction lock...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replaying journal...
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replayed 250 of 515 blocks
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Found 12 revoke tags
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Journal replayed in 1s
Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Done
Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replayed 4260 of 4803 blocks
Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Found 5 revoke tags
Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Journal replayed in 1s
Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Done


From lists at alteeve.ca  Sat Jan 31 07:20:29 2015
From: lists at alteeve.ca (Digimer)
Date: Sat, 31 Jan 2015 02:20:29 -0500
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
 nodes
In-Reply-To: <CAJ7sfncBrQKSvJEZbMSXeBBicA_yjmDRT3Lx26qe_FOBY4kv+g@mail.gmail.com>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>	<CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>	<54CC6ADC.1000305@alteeve.ca>	<CAJ7sfnf2nqwwq+v5+hV5xexeAJ9gTYDAeCCwBWy8QOe4SzRW-w@mail.gmail.com>
	<CAJ7sfncBrQKSvJEZbMSXeBBicA_yjmDRT3Lx26qe_FOBY4kv+g@mail.gmail.com>
Message-ID: <54CC823D.1080602@alteeve.ca>

On 31/01/15 01:52 AM, cluster lab wrote:
> Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1

There are no messages about this succeeding or failing... It looks like 
only 15 seconds seconds worth of logs. Can you please share the full 
amount of time I mentioned before, from both nodes?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From cluster.labs at gmail.com  Sat Jan 31 07:52:48 2015
From: cluster.labs at gmail.com (cluster lab)
Date: Sat, 31 Jan 2015 11:22:48 +0330
Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the
	nodes
In-Reply-To: <54CC823D.1080602@alteeve.ca>
References: <CAJ7sfncfb+hMZnCSzXEDXJc6BC+r+Jtjk0WM9CN+nQ8HA==78g@mail.gmail.com>
	<CAJ7sfneVA8Wt5FOt6renGiAWN1QWC=PJ+5yrhZbEpjDHh2HwvA@mail.gmail.com>
	<CAJ7sfnc1saR0MXLrz8ZHkQozQ6gsdhMddbK2UZaCzjSqafMZOQ@mail.gmail.com>
	<CAJ7sfncuY6qq-d68VdFisU7+xq9NA=u1dLSR+mkXkQtuNK3a=w@mail.gmail.com>
	<678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com>
	<CAJ7sfncxhKvn-9huseZRucAOBtHve-aZfW+QnEstPyJzfK2uzw@mail.gmail.com>
	<149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com>
	<CAJ7sfncE8RsZPobLx2sbw-OEzJ3xqHQ1U79ZReKbwS6v91xQZA@mail.gmail.com>
	<1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com>
	<CAJ7sfnfMRfnQ8NU0FeAoWXtWHh4-njtwwbnBvKQ+HpU-eYHsGg@mail.gmail.com>
	<CAJ7sfnem280Fz2nhuB0YYWwDXQR56WbWNuv7HLS3N_dnnLMJAw@mail.gmail.com>
	<54CC6ADC.1000305@alteeve.ca>
	<CAJ7sfnf2nqwwq+v5+hV5xexeAJ9gTYDAeCCwBWy8QOe4SzRW-w@mail.gmail.com>
	<CAJ7sfncBrQKSvJEZbMSXeBBicA_yjmDRT3Lx26qe_FOBY4kv+g@mail.gmail.com>
	<54CC823D.1080602@alteeve.ca>
Message-ID: <CAJ7sfne-ni_30h89DT6zjE9BCraMbx7HJGFVeka3Lc8f8R4TSA@mail.gmail.com>

Excuse me for partially logs ...

Jan 21 17:07:57 node2 fenced[47840]: fence node1 success


All other logs are about HA of VMs, ... and IO Error for this files ...

Some new info: This problem occurred for about 4 files:
three of them cause IO error on node 3, and one of them on node 2 ...


On Sat, Jan 31, 2015 at 10:50 AM, Digimer <lists at alteeve.ca> wrote:
> On 31/01/15 01:52 AM, cluster lab wrote:
>>
>> Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1
>
>
> There are no messages about this succeeding or failing... It looks like only
> 15 seconds seconds worth of logs. Can you please share the full amount of
> time I mentioned before, from both nodes?
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster