From mgrac at redhat.com Wed Jan 7 14:13:41 2015 From: mgrac at redhat.com (Marek "marx" Grac) Date: Wed, 07 Jan 2015 15:13:41 +0100 Subject: [Linux-cluster] fence-agents-4.0.14 stable release Message-ID: <54AD3F15.9050209@redhat.com> Welcome to the fence-agents 4.0.14 release This release includes some new features and several bugfixes: * fence_zvmip for IBM z/VM is rewritten to Python * new fence agent for Emerson devices * fix invalid default ports for fence_eps and fence_amt * properly escape XML in other fields of metadata * a lot of refactoring and cleaning The new source tarball can be downloaded here: https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-4.0.14.tar.xz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this great milestone. m, From vinh.cao at hp.com Wed Jan 7 20:10:33 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Wed, 7 Jan 2015 20:10:33 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster Message-ID: Hello Cluster guru, I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes I don't have any issue. But with 5 nodes, when I ran clustat I got 3 nodes online and the other two off line. When I start the one that are off line. Service cman start. I got: [root at ustlvcmspxxx ~]# service cman status corosync is stopped [root at ustlvcmsp1954 ~]# service cman start Starting cluster: Checking if cluster has been disabled at boot... [ OK ] Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... Timed-out waiting for cluster [FAILED] Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Waiting for corosync to shutdown: [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] Can you help? Thank you, Vinh -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Wed Jan 7 20:16:28 2015 From: lists at alteeve.ca (Digimer) Date: Wed, 07 Jan 2015 15:16:28 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: Message-ID: <54AD941C.4070205@alteeve.ca> My first though would be to set in cluster.conf. If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. Also, 6.4 is pretty old, why not upgrade to 6.6? digimer On 07/01/15 03:10 PM, Cao, Vinh wrote: > Hello Cluster guru, > > I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes I > don?t have any issue. > > But with 5 nodes, when I ran clustat I got 3 nodes online and the other > two off line. > > When I start the one that are off line. Service cman start. I got: > > [root at ustlvcmspxxx ~]# service cman status > > corosync is stopped > > [root at ustlvcmsp1954 ~]# service cman start > > Starting cluster: > > Checking if cluster has been disabled at boot... [ OK ] > > Checking Network Manager... [ OK ] > > Global setup... [ OK ] > > Loading kernel modules... [ OK ] > > Mounting configfs... [ OK ] > > Starting cman... [ OK ] > > Waiting for quorum... Timed-out waiting for cluster > > [FAILED] > > Stopping cluster: > > Leaving fence domain... [ OK ] > > Stopping gfs_controld... [ OK ] > > Stopping dlm_controld... [ OK ] > > Stopping fenced... [ OK ] > > Stopping cman... [ OK ] > > Waiting for corosync to shutdown: [ OK ] > > Unloading kernel modules... [ OK ] > > Unmounting configfs... [ OK ] > > Can you help? > > Thank you, > > Vinh > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From vinh.cao at hp.com Wed Jan 7 20:39:22 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Wed, 7 Jan 2015 20:39:22 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54AD941C.4070205@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> Message-ID: Hello Digimer, Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf clustat show: Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ ustlvcmsp1954 1 Offline ustlvcmsp1955 2 Online, Local ustlvcmsp1956 3 Online ustlvcmsp1957 4 Offline ustlvcmsp1958 5 Online I need to make them all online, so I can use fencing for mounting shared disk. Thanks, Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Wednesday, January 07, 2015 3:16 PM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster My first though would be to set in cluster.conf. If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. Also, 6.4 is pretty old, why not upgrade to 6.6? digimer On 07/01/15 03:10 PM, Cao, Vinh wrote: > Hello Cluster guru, > > I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes > I don't have any issue. > > But with 5 nodes, when I ran clustat I got 3 nodes online and the > other two off line. > > When I start the one that are off line. Service cman start. I got: > > [root at ustlvcmspxxx ~]# service cman status > > corosync is stopped > > [root at ustlvcmsp1954 ~]# service cman start > > Starting cluster: > > Checking if cluster has been disabled at boot... [ OK ] > > Checking Network Manager... [ OK ] > > Global setup... [ OK ] > > Loading kernel modules... [ OK ] > > Mounting configfs... [ OK ] > > Starting cman... [ OK ] > > Waiting for quorum... Timed-out waiting for cluster > > [FAILED] > > Stopping cluster: > > Leaving fence domain... [ OK ] > > Stopping gfs_controld... [ OK ] > > Stopping dlm_controld... [ OK ] > > Stopping fenced... [ OK ] > > Stopping cman... [ OK ] > > Waiting for corosync to shutdown: [ OK ] > > Unloading kernel modules... [ OK ] > > Unmounting configfs... [ OK ] > > Can you help? > > Thank you, > > Vinh > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Wed Jan 7 20:58:45 2015 From: lists at alteeve.ca (Digimer) Date: Wed, 07 Jan 2015 15:58:45 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> Message-ID: <54AD9E05.2030902@alteeve.ca> On 07/01/15 03:39 PM, Cao, Vinh wrote: > Hello Digimer, > > Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. > > Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. > root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf > > > > > > > > > You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). > > > > > > > > > > clustat show: > > Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > ustlvcmsp1954 1 Offline > ustlvcmsp1955 2 Online, Local > ustlvcmsp1956 3 Online > ustlvcmsp1957 4 Offline > ustlvcmsp1958 5 Online > > I need to make them all online, so I can use fencing for mounting shared disk. > > Thanks, > Vinh What about the log entries from the start-up? Did you try the post_join_delay config? > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 3:16 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > My first though would be to set in cluster.conf. > > If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. > > Also, 6.4 is pretty old, why not upgrade to 6.6? > > digimer > > On 07/01/15 03:10 PM, Cao, Vinh wrote: >> Hello Cluster guru, >> >> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes >> I don't have any issue. >> >> But with 5 nodes, when I ran clustat I got 3 nodes online and the >> other two off line. >> >> When I start the one that are off line. Service cman start. I got: >> >> [root at ustlvcmspxxx ~]# service cman status >> >> corosync is stopped >> >> [root at ustlvcmsp1954 ~]# service cman start >> >> Starting cluster: >> >> Checking if cluster has been disabled at boot... [ OK ] >> >> Checking Network Manager... [ OK ] >> >> Global setup... [ OK ] >> >> Loading kernel modules... [ OK ] >> >> Mounting configfs... [ OK ] >> >> Starting cman... [ OK ] >> >> Waiting for quorum... Timed-out waiting for cluster >> >> [FAILED] >> >> Stopping cluster: >> >> Leaving fence domain... [ OK ] >> >> Stopping gfs_controld... [ OK ] >> >> Stopping dlm_controld... [ OK ] >> >> Stopping fenced... [ OK ] >> >> Stopping cman... [ OK ] >> >> Waiting for corosync to shutdown: [ OK ] >> >> Unloading kernel modules... [ OK ] >> >> Unmounting configfs... [ OK ] >> >> Can you help? >> >> Thank you, >> >> Vinh >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From vinh.cao at hp.com Wed Jan 7 21:29:14 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Wed, 7 Jan 2015 21:29:14 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54AD9E05.2030902@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> Message-ID: Hi Digimer, Here is from the logs: [root at ustlvcmsp1954 ~]# tail -f /var/log/messages Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed Then it die at: Starting cman... [ OK ] Waiting for quorum... Timed-out waiting for cluster [FAILED] Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? I did have any disk quorum setup in cluster.conf file. Any helps can I get appreciated. Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Wednesday, January 07, 2015 3:59 PM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster On 07/01/15 03:39 PM, Cao, Vinh wrote: > Hello Digimer, > > Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. > > Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. > root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf version="1.0"?> > > > > > > > You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). > > > > > > > > > > clustat show: > > Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member > Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > ustlvcmsp1954 1 Offline > ustlvcmsp1955 2 Online, Local > ustlvcmsp1956 3 Online > ustlvcmsp1957 4 Offline > ustlvcmsp1958 5 Online > > I need to make them all online, so I can use fencing for mounting shared disk. > > Thanks, > Vinh What about the log entries from the start-up? Did you try the post_join_delay config? > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 3:16 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > My first though would be to set in cluster.conf. > > If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. > > Also, 6.4 is pretty old, why not upgrade to 6.6? > > digimer > > On 07/01/15 03:10 PM, Cao, Vinh wrote: >> Hello Cluster guru, >> >> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes >> I don't have any issue. >> >> But with 5 nodes, when I ran clustat I got 3 nodes online and the >> other two off line. >> >> When I start the one that are off line. Service cman start. I got: >> >> [root at ustlvcmspxxx ~]# service cman status >> >> corosync is stopped >> >> [root at ustlvcmsp1954 ~]# service cman start >> >> Starting cluster: >> >> Checking if cluster has been disabled at boot... [ OK ] >> >> Checking Network Manager... [ OK ] >> >> Global setup... [ OK ] >> >> Loading kernel modules... [ OK ] >> >> Mounting configfs... [ OK ] >> >> Starting cman... [ OK ] >> >> Waiting for quorum... Timed-out waiting for cluster >> >> [FAILED] >> >> Stopping cluster: >> >> Leaving fence domain... [ OK ] >> >> Stopping gfs_controld... [ OK ] >> >> Stopping dlm_controld... [ OK ] >> >> Stopping fenced... [ OK ] >> >> Stopping cman... [ OK ] >> >> Waiting for corosync to shutdown: [ OK ] >> >> Unloading kernel modules... [ OK ] >> >> Unmounting configfs... [ OK ] >> >> Can you help? >> >> Thank you, >> >> Vinh >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Wed Jan 7 21:33:16 2015 From: lists at alteeve.ca (Digimer) Date: Wed, 07 Jan 2015 16:33:16 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> Message-ID: <54ADA61C.2020509@alteeve.ca> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. On 07/01/15 04:29 PM, Cao, Vinh wrote: > Hi Digimer, > > Here is from the logs: > [root at ustlvcmsp1954 ~]# tail -f /var/log/messages > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. > Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. > Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed > > Then it die at: > Starting cman... [ OK ] > Waiting for quorum... Timed-out waiting for cluster > [FAILED] > > Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? > I did have any disk quorum setup in cluster.conf file. > > Any helps can I get appreciated. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 3:59 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > On 07/01/15 03:39 PM, Cao, Vinh wrote: >> Hello Digimer, >> >> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >> >> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf > version="1.0"?> >> >> >> >> >> >> >> > > You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). > >> >> >> >> >> >> >> >> >> >> clustat show: >> >> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >> Status: Quorate >> >> Member Name ID Status >> ------ ---- ---- ------ >> ustlvcmsp1954 1 Offline >> ustlvcmsp1955 2 Online, Local >> ustlvcmsp1956 3 Online >> ustlvcmsp1957 4 Offline >> ustlvcmsp1958 5 Online >> >> I need to make them all online, so I can use fencing for mounting shared disk. >> >> Thanks, >> Vinh > > What about the log entries from the start-up? Did you try the post_join_delay config? > > >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:16 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> My first though would be to set in cluster.conf. >> >> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >> >> Also, 6.4 is pretty old, why not upgrade to 6.6? >> >> digimer >> >> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>> Hello Cluster guru, >>> >>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two nodes >>> I don't have any issue. >>> >>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>> other two off line. >>> >>> When I start the one that are off line. Service cman start. I got: >>> >>> [root at ustlvcmspxxx ~]# service cman status >>> >>> corosync is stopped >>> >>> [root at ustlvcmsp1954 ~]# service cman start >>> >>> Starting cluster: >>> >>> Checking if cluster has been disabled at boot... [ OK ] >>> >>> Checking Network Manager... [ OK ] >>> >>> Global setup... [ OK ] >>> >>> Loading kernel modules... [ OK ] >>> >>> Mounting configfs... [ OK ] >>> >>> Starting cman... [ OK ] >>> >>> Waiting for quorum... Timed-out waiting for cluster >>> >>> [FAILED] >>> >>> Stopping cluster: >>> >>> Leaving fence domain... [ OK ] >>> >>> Stopping gfs_controld... [ OK ] >>> >>> Stopping dlm_controld... [ OK ] >>> >>> Stopping fenced... [ OK ] >>> >>> Stopping cman... [ OK ] >>> >>> Waiting for corosync to shutdown: [ OK ] >>> >>> Unloading kernel modules... [ OK ] >>> >>> Unmounting configfs... [ OK ] >>> >>> Can you help? >>> >>> Thank you, >>> >>> Vinh >>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From vinh.cao at hp.com Wed Jan 7 22:32:46 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Wed, 7 Jan 2015 22:32:46 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54ADA61C.2020509@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> Message-ID: Hi Digimer, Yes, I just did. Looks like they are failing. I'm not sure why that is. Please see the attachment for all servers log. By the way, I do appreciated all the helps I can get. Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Wednesday, January 07, 2015 4:33 PM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. On 07/01/15 04:29 PM, Cao, Vinh wrote: > Hi Digimer, > > Here is from the logs: > [root at ustlvcmsp1954 ~]# tail -f /var/log/messages > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) > Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. > Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 > Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. > Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed > > Then it die at: > Starting cman... [ OK ] > Waiting for quorum... Timed-out waiting for cluster > [FAILED] > > Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? > I did have any disk quorum setup in cluster.conf file. > > Any helps can I get appreciated. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 3:59 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > On 07/01/15 03:39 PM, Cao, Vinh wrote: >> Hello Digimer, >> >> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >> >> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf > version="1.0"?> >> >> >> >> >> >> >> > > You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). > >> >> >> >> >> >> >> >> >> >> clustat show: >> >> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >> Status: Quorate >> >> Member Name ID Status >> ------ ---- ---- ------ >> ustlvcmsp1954 1 Offline >> ustlvcmsp1955 2 Online, Local >> ustlvcmsp1956 3 Online >> ustlvcmsp1957 4 Offline >> ustlvcmsp1958 5 Online >> >> I need to make them all online, so I can use fencing for mounting shared disk. >> >> Thanks, >> Vinh > > What about the log entries from the start-up? Did you try the post_join_delay config? > > >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:16 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> My first though would be to set in cluster.conf. >> >> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >> >> Also, 6.4 is pretty old, why not upgrade to 6.6? >> >> digimer >> >> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>> Hello Cluster guru, >>> >>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>> nodes I don't have any issue. >>> >>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>> other two off line. >>> >>> When I start the one that are off line. Service cman start. I got: >>> >>> [root at ustlvcmspxxx ~]# service cman status >>> >>> corosync is stopped >>> >>> [root at ustlvcmsp1954 ~]# service cman start >>> >>> Starting cluster: >>> >>> Checking if cluster has been disabled at boot... [ OK ] >>> >>> Checking Network Manager... [ OK ] >>> >>> Global setup... [ OK ] >>> >>> Loading kernel modules... [ OK ] >>> >>> Mounting configfs... [ OK ] >>> >>> Starting cman... [ OK ] >>> >>> Waiting for quorum... Timed-out waiting for cluster >>> >>> >>> [FAILED] >>> >>> Stopping cluster: >>> >>> Leaving fence domain... [ OK ] >>> >>> Stopping gfs_controld... [ OK ] >>> >>> Stopping dlm_controld... [ OK ] >>> >>> Stopping fenced... [ OK ] >>> >>> Stopping cman... [ OK ] >>> >>> Waiting for corosync to shutdown: [ OK ] >>> >>> Unloading kernel modules... [ OK ] >>> >>> Unmounting configfs... [ OK ] >>> >>> Can you help? >>> >>> Thank you, >>> >>> Vinh >>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: 5_nodes_cluster_fails.txt URL: From lists at alteeve.ca Wed Jan 7 22:49:14 2015 From: lists at alteeve.ca (Digimer) Date: Wed, 07 Jan 2015 17:49:14 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> Message-ID: <54ADB7EA.8000300@alteeve.ca> Did you configure fencing properly? On 07/01/15 05:32 PM, Cao, Vinh wrote: > Hi Digimer, > > Yes, I just did. Looks like they are failing. I'm not sure why that is. > Please see the attachment for all servers log. > > By the way, I do appreciated all the helps I can get. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 4:33 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. > > On 07/01/15 04:29 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Here is from the logs: >> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >> >> Then it die at: >> Starting cman... [ OK ] >> Waiting for quorum... Timed-out waiting for cluster >> [FAILED] >> >> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >> I did have any disk quorum setup in cluster.conf file. >> >> Any helps can I get appreciated. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:59 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>> Hello Digimer, >>> >>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>> >>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >> version="1.0"?> >>> >>> >>> >>> >>> >>> >>> >> >> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> clustat show: >>> >>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>> Status: Quorate >>> >>> Member Name ID Status >>> ------ ---- ---- ------ >>> ustlvcmsp1954 1 Offline >>> ustlvcmsp1955 2 Online, Local >>> ustlvcmsp1956 3 Online >>> ustlvcmsp1957 4 Offline >>> ustlvcmsp1958 5 Online >>> >>> I need to make them all online, so I can use fencing for mounting shared disk. >>> >>> Thanks, >>> Vinh >> >> What about the log entries from the start-up? Did you try the post_join_delay config? >> >> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:16 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> My first though would be to set in cluster.conf. >>> >>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>> >>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>> >>> digimer >>> >>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>> Hello Cluster guru, >>>> >>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>> nodes I don't have any issue. >>>> >>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>> other two off line. >>>> >>>> When I start the one that are off line. Service cman start. I got: >>>> >>>> [root at ustlvcmspxxx ~]# service cman status >>>> >>>> corosync is stopped >>>> >>>> [root at ustlvcmsp1954 ~]# service cman start >>>> >>>> Starting cluster: >>>> >>>> Checking if cluster has been disabled at boot... [ OK ] >>>> >>>> Checking Network Manager... [ OK ] >>>> >>>> Global setup... [ OK ] >>>> >>>> Loading kernel modules... [ OK ] >>>> >>>> Mounting configfs... [ OK ] >>>> >>>> Starting cman... [ OK ] >>>> >>>> Waiting for quorum... Timed-out waiting for cluster >>>> >>>> >>>> [FAILED] >>>> >>>> Stopping cluster: >>>> >>>> Leaving fence domain... [ OK ] >>>> >>>> Stopping gfs_controld... [ OK ] >>>> >>>> Stopping dlm_controld... [ OK ] >>>> >>>> Stopping fenced... [ OK ] >>>> >>>> Stopping cman... [ OK ] >>>> >>>> Waiting for corosync to shutdown: [ OK ] >>>> >>>> Unloading kernel modules... [ OK ] >>>> >>>> Unmounting configfs... [ OK ] >>>> >>>> Can you help? >>>> >>>> Thank you, >>>> >>>> Vinh >>>> >>>> >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Wed Jan 7 22:50:33 2015 From: lists at alteeve.ca (Digimer) Date: Wed, 07 Jan 2015 17:50:33 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> Message-ID: <54ADB839.9000902@alteeve.ca> It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? On 07/01/15 05:32 PM, Cao, Vinh wrote: > Hi Digimer, > > Yes, I just did. Looks like they are failing. I'm not sure why that is. > Please see the attachment for all servers log. > > By the way, I do appreciated all the helps I can get. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 4:33 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. > > On 07/01/15 04:29 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Here is from the logs: >> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >> >> Then it die at: >> Starting cman... [ OK ] >> Waiting for quorum... Timed-out waiting for cluster >> [FAILED] >> >> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >> I did have any disk quorum setup in cluster.conf file. >> >> Any helps can I get appreciated. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:59 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>> Hello Digimer, >>> >>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>> >>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >> version="1.0"?> >>> >>> >>> >>> >>> >>> >>> >> >> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> clustat show: >>> >>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>> Status: Quorate >>> >>> Member Name ID Status >>> ------ ---- ---- ------ >>> ustlvcmsp1954 1 Offline >>> ustlvcmsp1955 2 Online, Local >>> ustlvcmsp1956 3 Online >>> ustlvcmsp1957 4 Offline >>> ustlvcmsp1958 5 Online >>> >>> I need to make them all online, so I can use fencing for mounting shared disk. >>> >>> Thanks, >>> Vinh >> >> What about the log entries from the start-up? Did you try the post_join_delay config? >> >> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:16 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> My first though would be to set in cluster.conf. >>> >>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>> >>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>> >>> digimer >>> >>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>> Hello Cluster guru, >>>> >>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>> nodes I don't have any issue. >>>> >>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>> other two off line. >>>> >>>> When I start the one that are off line. Service cman start. I got: >>>> >>>> [root at ustlvcmspxxx ~]# service cman status >>>> >>>> corosync is stopped >>>> >>>> [root at ustlvcmsp1954 ~]# service cman start >>>> >>>> Starting cluster: >>>> >>>> Checking if cluster has been disabled at boot... [ OK ] >>>> >>>> Checking Network Manager... [ OK ] >>>> >>>> Global setup... [ OK ] >>>> >>>> Loading kernel modules... [ OK ] >>>> >>>> Mounting configfs... [ OK ] >>>> >>>> Starting cman... [ OK ] >>>> >>>> Waiting for quorum... Timed-out waiting for cluster >>>> >>>> >>>> [FAILED] >>>> >>>> Stopping cluster: >>>> >>>> Leaving fence domain... [ OK ] >>>> >>>> Stopping gfs_controld... [ OK ] >>>> >>>> Stopping dlm_controld... [ OK ] >>>> >>>> Stopping fenced... [ OK ] >>>> >>>> Stopping cman... [ OK ] >>>> >>>> Waiting for corosync to shutdown: [ OK ] >>>> >>>> Unloading kernel modules... [ OK ] >>>> >>>> Unmounting configfs... [ OK ] >>>> >>>> Can you help? >>>> >>>> Thank you, >>>> >>>> Vinh >>>> >>>> >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From vinh.cao at hp.com Thu Jan 8 02:48:07 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Thu, 8 Jan 2015 02:48:07 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54ADB839.9000902@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> <54ADB839.9000902@alteeve.ca> Message-ID: Hi Digimer, No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out. I did try to set broadcast, but somehow it didn't work either. Let me give broadcast a try again. Thanks, Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Wednesday, January 07, 2015 5:51 PM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? On 07/01/15 05:32 PM, Cao, Vinh wrote: > Hi Digimer, > > Yes, I just did. Looks like they are failing. I'm not sure why that is. > Please see the attachment for all servers log. > > By the way, I do appreciated all the helps I can get. > > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 4:33 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. > > On 07/01/15 04:29 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Here is from the logs: >> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >> >> Then it die at: >> Starting cman... [ OK ] >> Waiting for quorum... Timed-out waiting for cluster >> [FAILED] >> >> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >> I did have any disk quorum setup in cluster.conf file. >> >> Any helps can I get appreciated. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 3:59 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>> Hello Digimer, >>> >>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>> >>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >> version="1.0"?> >>> >>> >>> >>> >>> >>> >>> >> >> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> clustat show: >>> >>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>> Status: Quorate >>> >>> Member Name ID Status >>> ------ ---- ---- ------ >>> ustlvcmsp1954 1 Offline >>> ustlvcmsp1955 2 Online, Local >>> ustlvcmsp1956 3 Online >>> ustlvcmsp1957 4 Offline >>> ustlvcmsp1958 5 Online >>> >>> I need to make them all online, so I can use fencing for mounting shared disk. >>> >>> Thanks, >>> Vinh >> >> What about the log entries from the start-up? Did you try the post_join_delay config? >> >> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:16 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> My first though would be to set in cluster.conf. >>> >>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>> >>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>> >>> digimer >>> >>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>> Hello Cluster guru, >>>> >>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>> nodes I don't have any issue. >>>> >>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>> other two off line. >>>> >>>> When I start the one that are off line. Service cman start. I got: >>>> >>>> [root at ustlvcmspxxx ~]# service cman status >>>> >>>> corosync is stopped >>>> >>>> [root at ustlvcmsp1954 ~]# service cman start >>>> >>>> Starting cluster: >>>> >>>> Checking if cluster has been disabled at boot... [ OK ] >>>> >>>> Checking Network Manager... [ OK ] >>>> >>>> Global setup... [ OK ] >>>> >>>> Loading kernel modules... [ OK ] >>>> >>>> Mounting configfs... [ OK ] >>>> >>>> Starting cman... [ OK ] >>>> >>>> Waiting for quorum... Timed-out waiting for cluster >>>> >>>> >>>> [FAILED] >>>> >>>> Stopping cluster: >>>> >>>> Leaving fence domain... [ OK ] >>>> >>>> Stopping gfs_controld... [ OK ] >>>> >>>> Stopping dlm_controld... [ OK ] >>>> >>>> Stopping fenced... [ OK ] >>>> >>>> Stopping cman... [ OK ] >>>> >>>> Waiting for corosync to shutdown: [ OK ] >>>> >>>> Unloading kernel modules... [ OK ] >>>> >>>> Unmounting configfs... [ OK ] >>>> >>>> Can you help? >>>> >>>> Thank you, >>>> >>>> Vinh >>>> >>>> >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Thu Jan 8 07:01:40 2015 From: lists at alteeve.ca (Digimer) Date: Thu, 08 Jan 2015 02:01:40 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> <54ADB839.9000902@alteeve.ca> Message-ID: <54AE2B54.4030705@alteeve.ca> Please configure fencing. If you don't, it _will_ cause you problems. On 07/01/15 09:48 PM, Cao, Vinh wrote: > Hi Digimer, > > No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out. > I did try to set broadcast, but somehow it didn't work either. > > Let me give broadcast a try again. > > Thanks, > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 5:51 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? > > On 07/01/15 05:32 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Yes, I just did. Looks like they are failing. I'm not sure why that is. >> Please see the attachment for all servers log. >> >> By the way, I do appreciated all the helps I can get. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 4:33 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. >> >> On 07/01/15 04:29 PM, Cao, Vinh wrote: >>> Hi Digimer, >>> >>> Here is from the logs: >>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >>> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >>> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >>> >>> Then it die at: >>> Starting cman... [ OK ] >>> Waiting for quorum... Timed-out waiting for cluster >>> [FAILED] >>> >>> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >>> I did have any disk quorum setup in cluster.conf file. >>> >>> Any helps can I get appreciated. >>> >>> Vinh >>> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:59 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>>> Hello Digimer, >>>> >>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>>> >>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >>> version="1.0"?> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> clustat show: >>>> >>>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>>> Status: Quorate >>>> >>>> Member Name ID Status >>>> ------ ---- ---- ------ >>>> ustlvcmsp1954 1 Offline >>>> ustlvcmsp1955 2 Online, Local >>>> ustlvcmsp1956 3 Online >>>> ustlvcmsp1957 4 Offline >>>> ustlvcmsp1958 5 Online >>>> >>>> I need to make them all online, so I can use fencing for mounting shared disk. >>>> >>>> Thanks, >>>> Vinh >>> >>> What about the log entries from the start-up? Did you try the post_join_delay config? >>> >>> >>>> -----Original Message----- >>>> From: linux-cluster-bounces at redhat.com >>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>>> Sent: Wednesday, January 07, 2015 3:16 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>>> >>>> My first though would be to set in cluster.conf. >>>> >>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>>> >>>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>>> >>>> digimer >>>> >>>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>>> Hello Cluster guru, >>>>> >>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>>> nodes I don't have any issue. >>>>> >>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>>> other two off line. >>>>> >>>>> When I start the one that are off line. Service cman start. I got: >>>>> >>>>> [root at ustlvcmspxxx ~]# service cman status >>>>> >>>>> corosync is stopped >>>>> >>>>> [root at ustlvcmsp1954 ~]# service cman start >>>>> >>>>> Starting cluster: >>>>> >>>>> Checking if cluster has been disabled at boot... [ OK ] >>>>> >>>>> Checking Network Manager... [ OK ] >>>>> >>>>> Global setup... [ OK ] >>>>> >>>>> Loading kernel modules... [ OK ] >>>>> >>>>> Mounting configfs... [ OK ] >>>>> >>>>> Starting cman... [ OK ] >>>>> >>>>> Waiting for quorum... Timed-out waiting for cluster >>>>> >>>>> >>>>> [FAILED] >>>>> >>>>> Stopping cluster: >>>>> >>>>> Leaving fence domain... [ OK ] >>>>> >>>>> Stopping gfs_controld... [ OK ] >>>>> >>>>> Stopping dlm_controld... [ OK ] >>>>> >>>>> Stopping fenced... [ OK ] >>>>> >>>>> Stopping cman... [ OK ] >>>>> >>>>> Waiting for corosync to shutdown: [ OK ] >>>>> >>>>> Unloading kernel modules... [ OK ] >>>>> >>>>> Unmounting configfs... [ OK ] >>>>> >>>>> Can you help? >>>>> >>>>> Thank you, >>>>> >>>>> Vinh >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From vinh.cao at hp.com Thu Jan 8 12:17:22 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Thu, 8 Jan 2015 12:17:22 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54AE2B54.4030705@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> <54ADB839.9000902@alteeve.ca> <54AE2B54.4030705@alteeve.ca> Message-ID: Hi Digimer, You are correct. I do need to configure fencing. But before fencing, I need to have these servers become member of cluster first. If they are not member of cluster set. Doesn't matter I try to configure fencing or not. My cluster won't work. Thanks for your help. Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Thursday, January 08, 2015 2:02 AM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster Please configure fencing. If you don't, it _will_ cause you problems. On 07/01/15 09:48 PM, Cao, Vinh wrote: > Hi Digimer, > > No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out. > I did try to set broadcast, but somehow it didn't work either. > > Let me give broadcast a try again. > > Thanks, > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 5:51 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? > > On 07/01/15 05:32 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Yes, I just did. Looks like they are failing. I'm not sure why that is. >> Please see the attachment for all servers log. >> >> By the way, I do appreciated all the helps I can get. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 4:33 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. >> >> On 07/01/15 04:29 PM, Cao, Vinh wrote: >>> Hi Digimer, >>> >>> Here is from the logs: >>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >>> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >>> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >>> >>> Then it die at: >>> Starting cman... [ OK ] >>> Waiting for quorum... Timed-out waiting for cluster >>> >>> [FAILED] >>> >>> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >>> I did have any disk quorum setup in cluster.conf file. >>> >>> Any helps can I get appreciated. >>> >>> Vinh >>> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:59 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>>> Hello Digimer, >>>> >>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>>> >>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >>> version="1.0"?> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> clustat show: >>>> >>>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>>> Status: Quorate >>>> >>>> Member Name ID Status >>>> ------ ---- ---- ------ >>>> ustlvcmsp1954 1 Offline >>>> ustlvcmsp1955 2 Online, Local >>>> ustlvcmsp1956 3 Online >>>> ustlvcmsp1957 4 Offline >>>> ustlvcmsp1958 5 Online >>>> >>>> I need to make them all online, so I can use fencing for mounting shared disk. >>>> >>>> Thanks, >>>> Vinh >>> >>> What about the log entries from the start-up? Did you try the post_join_delay config? >>> >>> >>>> -----Original Message----- >>>> From: linux-cluster-bounces at redhat.com >>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>>> Sent: Wednesday, January 07, 2015 3:16 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>>> >>>> My first though would be to set in cluster.conf. >>>> >>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>>> >>>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>>> >>>> digimer >>>> >>>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>>> Hello Cluster guru, >>>>> >>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>>> nodes I don't have any issue. >>>>> >>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>>> other two off line. >>>>> >>>>> When I start the one that are off line. Service cman start. I got: >>>>> >>>>> [root at ustlvcmspxxx ~]# service cman status >>>>> >>>>> corosync is stopped >>>>> >>>>> [root at ustlvcmsp1954 ~]# service cman start >>>>> >>>>> Starting cluster: >>>>> >>>>> Checking if cluster has been disabled at boot... [ OK ] >>>>> >>>>> Checking Network Manager... [ OK ] >>>>> >>>>> Global setup... [ OK ] >>>>> >>>>> Loading kernel modules... [ OK ] >>>>> >>>>> Mounting configfs... [ OK ] >>>>> >>>>> Starting cman... [ OK ] >>>>> >>>>> Waiting for quorum... Timed-out waiting for cluster >>>>> >>>>> >>>>> [FAILED] >>>>> >>>>> Stopping cluster: >>>>> >>>>> Leaving fence domain... [ OK ] >>>>> >>>>> Stopping gfs_controld... [ OK ] >>>>> >>>>> Stopping dlm_controld... [ OK ] >>>>> >>>>> Stopping fenced... [ OK ] >>>>> >>>>> Stopping cman... [ OK ] >>>>> >>>>> Waiting for corosync to shutdown: [ OK ] >>>>> >>>>> Unloading kernel modules... [ OK ] >>>>> >>>>> Unmounting configfs... [ OK ] >>>>> >>>>> Can you help? >>>>> >>>>> Thank you, >>>>> >>>>> Vinh >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From vinh.cao at hp.com Thu Jan 8 13:36:01 2015 From: vinh.cao at hp.com (Cao, Vinh) Date: Thu, 8 Jan 2015 13:36:01 +0000 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: <54AE2B54.4030705@alteeve.ca> References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> <54ADB839.9000902@alteeve.ca> <54AE2B54.4030705@alteeve.ca> Message-ID: Hello Digimer, The problem solved. First of all, I just want to thank you your time to stay with me on the issue I have. You are also correct about fencing. But here is it breaking down to. 1. I forgot, when I create the cluster. I didn't join these system in cluster set yet. You know one for a long while I have to setup cluster. I did write documentation about all this. But I still forget to follow it to the teeth. That is what happens. So I have to run: cman_tool join for all nodes. This is the key. 2. after join all nodes into cluster. I'm able to start cman via: service cman start 3. then configure fencing 4. then add static config mount device into /etc/fstab 5. then reboot each node one by one. They are all come back and well. I do have this error in logs: (it mean our multicast is not using. I'm using broadcast for now. But if we have multicast network not blocking, then that error would go away. That is my thought.) [TOTEM ] Received message has invalid digest... ignoring. Jan 8 08:34:33 ustlvcmsp1956 corosync[21194]: [TOTEM ] Invalid packet data Again, thanks for your helps. Vinh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer Sent: Thursday, January 08, 2015 2:02 AM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster Please configure fencing. If you don't, it _will_ cause you problems. On 07/01/15 09:48 PM, Cao, Vinh wrote: > Hi Digimer, > > No we're not supporting multicast. I'm trying to use Broadcast, but Redhat support is saying better to use transport=udpu. Which I did set and it is complaining time out. > I did try to set broadcast, but somehow it didn't work either. > > Let me give broadcast a try again. > > Thanks, > Vinh > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 5:51 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > It looks like a network problem... Does your (virtual) switch support multicast properly and have you opened up the proper ports in the firewall? > > On 07/01/15 05:32 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Yes, I just did. Looks like they are failing. I'm not sure why that is. >> Please see the attachment for all servers log. >> >> By the way, I do appreciated all the helps I can get. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 4:33 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> Quorum is enabled by default. I need to see the entire logs from all five nodes, as I mentioned in the first email. Please disable cman from starting on boot, configure fencing properly and then reboot all nodes cleanly. Start the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another window, start cman on all five nodes. When things settle down, copy/paste all the log output please. >> >> On 07/01/15 04:29 PM, Cao, Vinh wrote: >>> Hi Digimer, >>> >>> Here is from the logs: >>> [root at ustlvcmsp1954 ~]# tail -f /var/log/messages >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync profile loading service >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum provider quorum_cman >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service synchronization, ready to provide service. >>> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all Corosync service engines. >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync configuration service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync profile loading service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. >>> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >>> >>> Then it die at: >>> Starting cman... [ OK ] >>> Waiting for quorum... Timed-out waiting for cluster >>> >>> [FAILED] >>> >>> Yes, I did made changes with: the problem is still there. One thing I don't know why cluster is looking for quorum? >>> I did have any disk quorum setup in cluster.conf file. >>> >>> Any helps can I get appreciated. >>> >>> Vinh >>> >>> -----Original Message----- >>> From: linux-cluster-bounces at redhat.com >>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:59 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>>> Hello Digimer, >>>> >>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm not sure why these servers are still at 6.4. Most of our system are 6.6. >>>> >>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount via /etc/fstab. >>>> root at ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf >>> version="1.0"?> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> You don't configure the fencing for the nodes... If anything causes a fence, the cluster will lock up (by design). >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> clustat show: >>>> >>>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>>> Status: Quorate >>>> >>>> Member Name ID Status >>>> ------ ---- ---- ------ >>>> ustlvcmsp1954 1 Offline >>>> ustlvcmsp1955 2 Online, Local >>>> ustlvcmsp1956 3 Online >>>> ustlvcmsp1957 4 Offline >>>> ustlvcmsp1958 5 Online >>>> >>>> I need to make them all online, so I can use fencing for mounting shared disk. >>>> >>>> Thanks, >>>> Vinh >>> >>> What about the log entries from the start-up? Did you try the post_join_delay config? >>> >>> >>>> -----Original Message----- >>>> From: linux-cluster-bounces at redhat.com >>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer >>>> Sent: Wednesday, January 07, 2015 3:16 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>>> >>>> My first though would be to set in cluster.conf. >>>> >>>> If that doesn't work, please share your configuration file. Then, with all nodes offline, open a terminal to each node and run 'tail -f -n 0 /var/log/messages'. With that running, start all the nodes and wait for things to settle down, then paste the five nodes' output as well. >>>> >>>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>>> >>>> digimer >>>> >>>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>>> Hello Cluster guru, >>>>> >>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>>> nodes I don't have any issue. >>>>> >>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>>> other two off line. >>>>> >>>>> When I start the one that are off line. Service cman start. I got: >>>>> >>>>> [root at ustlvcmspxxx ~]# service cman status >>>>> >>>>> corosync is stopped >>>>> >>>>> [root at ustlvcmsp1954 ~]# service cman start >>>>> >>>>> Starting cluster: >>>>> >>>>> Checking if cluster has been disabled at boot... [ OK ] >>>>> >>>>> Checking Network Manager... [ OK ] >>>>> >>>>> Global setup... [ OK ] >>>>> >>>>> Loading kernel modules... [ OK ] >>>>> >>>>> Mounting configfs... [ OK ] >>>>> >>>>> Starting cman... [ OK ] >>>>> >>>>> Waiting for quorum... Timed-out waiting for cluster >>>>> >>>>> >>>>> [FAILED] >>>>> >>>>> Stopping cluster: >>>>> >>>>> Leaving fence domain... [ OK ] >>>>> >>>>> Stopping gfs_controld... [ OK ] >>>>> >>>>> Stopping dlm_controld... [ OK ] >>>>> >>>>> Stopping fenced... [ OK ] >>>>> >>>>> Stopping cman... [ OK ] >>>>> >>>>> Waiting for corosync to shutdown: [ OK ] >>>>> >>>>> Unloading kernel modules... [ OK ] >>>>> >>>>> Unmounting configfs... [ OK ] >>>>> >>>>> Can you help? >>>>> >>>>> Thank you, >>>>> >>>>> Vinh >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Thu Jan 8 14:31:01 2015 From: lists at alteeve.ca (Digimer) Date: Thu, 08 Jan 2015 09:31:01 -0500 Subject: [Linux-cluster] needs helps GFS2 on 5 nodes cluster In-Reply-To: References: <54AD941C.4070205@alteeve.ca> <54AD9E05.2030902@alteeve.ca> <54ADA61C.2020509@alteeve.ca> <54ADB839.9000902@alteeve.ca> <54AE2B54.4030705@alteeve.ca> Message-ID: <54AE94A5.6000802@alteeve.ca> On 08/01/15 07:17 AM, Cao, Vinh wrote: > Hi Digimer, > > You are correct. I do need to configure fencing. But before fencing, I need to have these servers become member of cluster first. > If they are not member of cluster set. Doesn't matter I try to configure fencing or not. My cluster won't work. > > Thanks for your help. > Vinh Define the fence methods right from the start. As soon as the cluster forms, the first thing you do is run 'fence_check -f' on all nodes. If there is a problem, fix it. Only then do you add services. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Sun Jan 11 22:19:50 2015 From: lists at alteeve.ca (Digimer) Date: Sun, 11 Jan 2015 17:19:50 -0500 Subject: [Linux-cluster] HA Summit 2015 - plan wiki closed for registration In-Reply-To: <540D853F.3090109@redhat.com> References: <540D853F.3090109@redhat.com> Message-ID: <54B2F706.4090900@alteeve.ca> Spammers got through the captcha, *sigh*. If anyone wants to create an account to edit, please email me off-list and I'll get you setup ASAP. Sorry for the hassle. http://plan.alteeve.ca/index.php/Main_Page -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Tue Jan 13 05:31:22 2015 From: lists at alteeve.ca (Digimer) Date: Tue, 13 Jan 2015 00:31:22 -0500 Subject: [Linux-cluster] [Planning] Organizing HA Summit 2015 In-Reply-To: <540D853F.3090109@redhat.com> References: <540D853F.3090109@redhat.com> Message-ID: <54B4ADAA.5080803@alteeve.ca> Hi all, With Fabio away for now, I (and others) are working on the final preparations for the summit. This is your chance to speak up and influence the planning! Objections/suggestions? Speak now please. :) In particular, please raise topics you want to discuss. Either add them to the wiki directly or email me and I will update the wiki for you. (Note that registration is closed because of spammers, if you want an account just let me know and I'll open it back up). The plan is; * Informal atmosphere with limited structure to make sure key topics are addressed. Two ways topics will be discussed; ** Someone will guide a given topic they want to raise for ~45 minutes, 15 minutes for Q&A ** "Round-table" style discussion with no one person leading (though it would be nice to have someone taking notes). People presenting are asked not to use slides. Hand-outs are fine and either a white-board or paper flip-board will be available for illustrating ideas and flushing out concepts. The summit will start at 9:00 and go until 17:00. We'll go for a semi-official summit dinner and drinks around 6pm on the 4th (location to be determined). Those staying in Brno are more than welcome to join an informal dinner and drinks (and possibly some sight-seeing, etc) the evening of the 5th. Any concerns/comments/suggestions, please speak up ASAP! -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Wed Jan 14 04:38:10 2015 From: lists at alteeve.ca (Digimer) Date: Tue, 13 Jan 2015 23:38:10 -0500 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg-technical] [RFC] Organizing HA Summit 2015 In-Reply-To: References: <540D853F.3090109@redhat.com> <20141208133608.GC18879@redhat.com> <54985122.5020907@alteeve.ca> Message-ID: <54B5F2B2.3050103@alteeve.ca> Woohoo!! Will be very nice to see you. :) I've added you. Can you give me a short sentence to introduce yourself to people who haven't met you? Madi On 13/01/15 11:33 PM, Yusuke Iida wrote: > Hi Digimer, > > I am Iida to participate from NTT along with Mori. > I want you added to the list of participants. > > I'm sorry contact is late. > > Regards, > Yusuke > > 2014-12-23 2:13 GMT+09:00 Digimer : >> It will be very nice to see you again! Will Ikeda-san be there as well? >> >> digimer >> >> On 22/12/14 03:35 AM, Keisuke MORI wrote: >>> >>> Hi all, >>> >>> Really late response but, >>> I will be joining the HA summit, with a few colleagues from NTT. >>> >>> See you guys in Brno, >>> Thanks, >>> >>> >>> 2014-12-08 22:36 GMT+09:00 Jan Pokorn? : >>>> >>>> Hello, >>>> >>>> it occured to me that if you want to use the opportunity and double >>>> as as tourist while being in Brno, it's about the right time to >>>> consider reservations/ticket purchases this early. >>>> At least in some cases it is a must, e.g., Villa Tugendhat: >>>> >>>> >>>> http://rezervace.spilberk.cz/langchange.aspx?mrsname=&languageId=2&returnUrl=%2Flist >>>> >>>> On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote: >>>>> >>>>> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. >>>>> >>>>> My suggestion would be to have a 2 days dedicated HA summit the 4th and >>>>> the 5th of February. >>>> >>>> >>>> -- >>>> Jan >>>> >>>> _______________________________________________ >>>> ha-wg-technical mailing list >>>> ha-wg-technical at lists.linux-foundation.org >>>> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical >>>> >>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA at lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From jpokorny at redhat.com Mon Jan 26 14:14:38 2015 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Mon, 26 Jan 2015 15:14:38 +0100 Subject: [Linux-cluster] HA Summit Key-signing Party (was: Organizing HA Summit 2015) In-Reply-To: <54B4ADAA.5080803@alteeve.ca> References: <540D853F.3090109@redhat.com> <54B4ADAA.5080803@alteeve.ca> Message-ID: <20150126141438.GE21558@redhat.com> Hello cluster masters, On 13/01/15 00:31 -0500, Digimer wrote: > Any concerns/comments/suggestions, please speak up ASAP! I'd like to throw a key-signing party as it will be a perfect opportunity to build a web of trust amongst us. If you haven't incorporated OpenPGP to your communication with the world yet, I would recommend at least considering it, even more in the post-Snowden era. You can use it to prove authenticity/integrity of the data you emit (signing; not just for email as is the case with this one, but also for SW releases and more), provide privacy/confidentiality of interchanged data (encryption; again, typical scenario is a private email, e.g., when you responsibly report a vulnerability to the respective maintainers), or both. In case you have no experience with this technology, there are plentiful resources on GnuPG (most renowned FOSS implementation): - https://www.gnupg.org/documentation/howtos.en.html - http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#prep (preparation steps for a key-signing party) - ... To make the verification process as smooth and as little time-consuming as possible, I would stick with a list-based method: http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#list_based and volunteer for a role of a coordinator. What's needed? Once you have a key pair (and provided that you are using GnuPG), please run the following sequence: # figure out the key ID for the identity to be verified; # IDENTITY is either your associated email address/your name # if only single key ID matches, specific key otherwise # (you can use "gpg -K" to select a desired ID at the "sec" line) KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5) # export the public key to a file that is suitable for exchange gpg --export -a -- $KEY > $KEY # verify that you have an expected data to share gpg --with-fingerprint -- $KEY with IDENTITY adjusted as per the instruction above, and send me the resulting $KEY file, preferably in a signed (or even encrypted[*]) email from an address associated with that very public key of yours. [*] You can find my public key at public keyservers: http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF Indeed, the trust in this key should be ephemeral/one-off (e.g., using a temporary keyring, not a universal one before we proceed with the signing :) Timeline? Best if you send me your public keys before 2015-02-02. I will then compile a list of the attendees together with their keys and publish it at https://people.redhat.com/jpokorny/keysigning/2015-ha/ so you can print it out and be ready for the party. Thanks for your cooperation, looking forward to this side-event and hope this will be beneficial to all involved. P.S. There's now an opportunity to visit an exhibition of the Bohemian Crown Jewels replicas directly in Brno (sorry, Google Translate only) https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.letohradekbrno.cz%2F%3Fidm%3D55 -- Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From lists at alteeve.ca Mon Jan 26 14:17:24 2015 From: lists at alteeve.ca (Digimer) Date: Mon, 26 Jan 2015 09:17:24 -0500 Subject: [Linux-cluster] [Pacemaker] HA Summit Key-signing Party In-Reply-To: <20150126141438.GE21558@redhat.com> References: <540D853F.3090109@redhat.com> <54B4ADAA.5080803@alteeve.ca> <20150126141438.GE21558@redhat.com> Message-ID: <54C64C74.8020402@alteeve.ca> On 26/01/15 09:14 AM, Jan Pokorn? wrote: > Hello cluster masters, > > On 13/01/15 00:31 -0500, Digimer wrote: >> Any concerns/comments/suggestions, please speak up ASAP! > > I'd like to throw a key-signing party as it will be a perfect > opportunity to build a web of trust amongst us. > > If you haven't incorporated OpenPGP to your communication with the > world yet, I would recommend at least considering it, even more in > the post-Snowden era. You can use it to prove authenticity/integrity > of the data you emit (signing; not just for email as is the case > with this one, but also for SW releases and more), provide > privacy/confidentiality of interchanged data (encryption; again, > typical scenario is a private email, e.g., when you responsibly > report a vulnerability to the respective maintainers), or both. > > In case you have no experience with this technology, there are > plentiful resources on GnuPG (most renowned FOSS implementation): > - https://www.gnupg.org/documentation/howtos.en.html > - http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#prep > (preparation steps for a key-signing party) > - ... > > To make the verification process as smooth and as little > time-consuming as possible, I would stick with a list-based method: > http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#list_based > and volunteer for a role of a coordinator. > > > What's needed? > Once you have a key pair (and provided that you are using GnuPG), please > run the following sequence: > > # figure out the key ID for the identity to be verified; > # IDENTITY is either your associated email address/your name > # if only single key ID matches, specific key otherwise > # (you can use "gpg -K" to select a desired ID at the "sec" line) > KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5) > > # export the public key to a file that is suitable for exchange > gpg --export -a -- $KEY > $KEY > > # verify that you have an expected data to share > gpg --with-fingerprint -- $KEY > > with IDENTITY adjusted as per the instruction above, and send me the > resulting $KEY file, preferably in a signed (or even encrypted[*]) email > from an address associated with that very public key of yours. > > [*] You can find my public key at public keyservers: > http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF > Indeed, the trust in this key should be ephemeral/one-off > (e.g., using a temporary keyring, not a universal one before we proceed > with the signing :) > > > Timeline? > Best if you send me your public keys before 2015-02-02. I will then > compile a list of the attendees together with their keys and publish > it at https://people.redhat.com/jpokorny/keysigning/2015-ha/ > so you can print it out and be ready for the party. > > Thanks for your cooperation, looking forward to this side-event and > hope this will be beneficial to all involved. > > > P.S. There's now an opportunity to visit an exhibition of the Bohemian > Crown Jewels replicas directly in Brno (sorry, Google Translate only) > https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.letohradekbrno.cz%2F%3Fidm%3D55 =o, keysigning is a brilliant idea! I can put the keys in the plan wiki, too. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From jpokorny at redhat.com Tue Jan 27 17:11:11 2015 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Tue, 27 Jan 2015 18:11:11 +0100 Subject: [Linux-cluster] HA Summit Key-signing Party (was: Organizing HA Summit 2015) In-Reply-To: <20150126141438.GE21558@redhat.com> References: <540D853F.3090109@redhat.com> <54B4ADAA.5080803@alteeve.ca> <20150126141438.GE21558@redhat.com> Message-ID: <20150127171111.GA427@redhat.com> > What's needed? > Once you have a key pair (and provided that you are using GnuPG), please > run the following sequence: > > # figure out the key ID for the identity to be verified; > # IDENTITY is either your associated email address/your name > # if only single key ID matches, specific key otherwise > # (you can use "gpg -K" to select a desired ID at the "sec" line) > KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5) Oops, sorry, somehow '-k' got lost above ^. Correct version: KEY=$(gpg -k --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5) > # export the public key to a file that is suitable for exchange > gpg --export -a -- $KEY > $KEY > > # verify that you have an expected data to share > gpg --with-fingerprint -- $KEY -- Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From cluster.labs at gmail.com Thu Jan 29 04:50:36 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 08:20:36 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes Message-ID: Hi, In a two node cluster, i received to different result from "qemu-img check" on just one file: node1 # qemu-img check VMStorage/x.qcow2 No errors were found on the image. Node2 # qemu-img check VMStorage/x.qcow2 qemu-img: Could not open 'VMStorage/x.qcow2" All other files are OK, and the cluster works properly. What is the problem? ==== Packages: kernel: 2.6.32-431.5.1.el6.x86_64 GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 corosync: corosync-1.4.1-17.el6.x86_64 Best Regards From lists at alteeve.ca Thu Jan 29 05:27:55 2015 From: lists at alteeve.ca (Digimer) Date: Thu, 29 Jan 2015 00:27:55 -0500 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: Message-ID: <54C9C4DB.3060008@alteeve.ca> On 28/01/15 11:50 PM, cluster lab wrote: > Hi, > > In a two node cluster, i received to different result from "qemu-img > check" on just one file: > > node1 # qemu-img check VMStorage/x.qcow2 > No errors were found on the image. > > Node2 # qemu-img check VMStorage/x.qcow2 > qemu-img: Could not open 'VMStorage/x.qcow2" > > All other files are OK, and the cluster works properly. > What is the problem? > > ==== > Packages: > kernel: 2.6.32-431.5.1.el6.x86_64 > GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 > corosync: corosync-1.4.1-17.el6.x86_64 > > Best Regards What does 'dlm_tool ls' show? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From cluster.labs at gmail.com Thu Jan 29 05:34:19 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 09:04:19 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <54C9C4DB.3060008@alteeve.ca> References: <54C9C4DB.3060008@alteeve.ca> Message-ID: Node2: # dlm_tool ls dlm lockspaces name VMStorage3 id 0xb26438a2 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 name VMStorage2 id 0xab7f09e3 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 name VMStorage1 id 0x80525a20 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 =========================================== Node1: # dlm_tool ls dlm lockspaces name VMStorage3 id 0xb26438a2 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 name VMStorage2 id 0xab7f09e3 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 name VMStorage1 id 0x80525a20 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: > On 28/01/15 11:50 PM, cluster lab wrote: >> >> Hi, >> >> In a two node cluster, i received to different result from "qemu-img >> check" on just one file: >> >> node1 # qemu-img check VMStorage/x.qcow2 >> No errors were found on the image. >> >> Node2 # qemu-img check VMStorage/x.qcow2 >> qemu-img: Could not open 'VMStorage/x.qcow2" >> >> All other files are OK, and the cluster works properly. >> What is the problem? >> >> ==== >> Packages: >> kernel: 2.6.32-431.5.1.el6.x86_64 >> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >> corosync: corosync-1.4.1-17.el6.x86_64 >> >> Best Regards > > > What does 'dlm_tool ls' show? > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cluster.labs at gmail.com Thu Jan 29 05:39:55 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 09:09:55 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <54C9C4DB.3060008@alteeve.ca> References: <54C9C4DB.3060008@alteeve.ca> Message-ID: Node2: # dlm_tool ls dlm lockspaces name VMStorage3 id 0xb26438a2 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 name VMStorage2 id 0xab7f09e3 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 name VMStorage1 id 0x80525a20 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 1,1 members 1 2 =========================================== Node1: # dlm_tool ls dlm lockspaces name VMStorage3 id 0xb26438a2 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 name VMStorage2 id 0xab7f09e3 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 name VMStorage1 id 0x80525a20 flags 0x00000008 fs_reg change member 2 joined 1 remove 0 failed 0 seq 2,2 members 1 2 On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: > On 28/01/15 11:50 PM, cluster lab wrote: >> >> Hi, >> >> In a two node cluster, i received to different result from "qemu-img >> check" on just one file: >> >> node1 # qemu-img check VMStorage/x.qcow2 >> No errors were found on the image. >> >> Node2 # qemu-img check VMStorage/x.qcow2 >> qemu-img: Could not open 'VMStorage/x.qcow2" >> >> All other files are OK, and the cluster works properly. >> What is the problem? >> >> ==== >> Packages: >> kernel: 2.6.32-431.5.1.el6.x86_64 >> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >> corosync: corosync-1.4.1-17.el6.x86_64 >> >> Best Regards > > > What does 'dlm_tool ls' show? > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Thu Jan 29 05:55:31 2015 From: lists at alteeve.ca (Digimer) Date: Thu, 29 Jan 2015 00:55:31 -0500 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <54C9C4DB.3060008@alteeve.ca> Message-ID: <54C9CB53.4070407@alteeve.ca> That looks OK. Can you touch a file from one node and see it on the other and vice-versa? Is there anything in either node's log files when you run 'qemu-img check'? On 29/01/15 12:34 AM, cluster lab wrote: > Node2: # dlm_tool ls > dlm lockspaces > name VMStorage3 > id 0xb26438a2 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 1,1 > members 1 2 > > name VMStorage2 > id 0xab7f09e3 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 1,1 > members 1 2 > > name VMStorage1 > id 0x80525a20 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 1,1 > members 1 2 > =========================================== > Node1: # dlm_tool ls > dlm lockspaces > name VMStorage3 > id 0xb26438a2 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 2,2 > members 1 2 > > name VMStorage2 > id 0xab7f09e3 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 2,2 > members 1 2 > > name VMStorage1 > id 0x80525a20 > flags 0x00000008 fs_reg > change member 2 joined 1 remove 0 failed 0 seq 2,2 > members 1 2 > > On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: >> On 28/01/15 11:50 PM, cluster lab wrote: >>> >>> Hi, >>> >>> In a two node cluster, i received to different result from "qemu-img >>> check" on just one file: >>> >>> node1 # qemu-img check VMStorage/x.qcow2 >>> No errors were found on the image. >>> >>> Node2 # qemu-img check VMStorage/x.qcow2 >>> qemu-img: Could not open 'VMStorage/x.qcow2" >>> >>> All other files are OK, and the cluster works properly. >>> What is the problem? >>> >>> ==== >>> Packages: >>> kernel: 2.6.32-431.5.1.el6.x86_64 >>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >>> corosync: corosync-1.4.1-17.el6.x86_64 >>> >>> Best Regards >> >> >> What does 'dlm_tool ls' show? >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From cluster.labs at gmail.com Thu Jan 29 09:46:57 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 13:16:57 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <54C9CB53.4070407@alteeve.ca> References: <54C9C4DB.3060008@alteeve.ca> <54C9CB53.4070407@alteeve.ca> Message-ID: Node2 # touch /VMStorage1/test Node1 # ls /VMStorage1/test /VMStorage1/test Node1 # rm /VMStorage1/test rm: remove regular empty file `/VMStorage1/test'? y ==== Node1 # touch /VMStorage1/test Node2 # ls /VMStorage1/test /VMStorage1/test Node2 # rm /VMStorage1/test rm: remove regular empty file `/VMStorage1/test'? y No Problem .... On Thu, Jan 29, 2015 at 9:25 AM, Digimer wrote: > That looks OK. Can you touch a file from one node and see it on the other > and vice-versa? Is there anything in either node's log files when you run > 'qemu-img check'? > > > On 29/01/15 12:34 AM, cluster lab wrote: >> >> Node2: # dlm_tool ls >> dlm lockspaces >> name VMStorage3 >> id 0xb26438a2 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 1,1 >> members 1 2 >> >> name VMStorage2 >> id 0xab7f09e3 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 1,1 >> members 1 2 >> >> name VMStorage1 >> id 0x80525a20 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 1,1 >> members 1 2 >> =========================================== >> Node1: # dlm_tool ls >> dlm lockspaces >> name VMStorage3 >> id 0xb26438a2 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 2,2 >> members 1 2 >> >> name VMStorage2 >> id 0xab7f09e3 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 2,2 >> members 1 2 >> >> name VMStorage1 >> id 0x80525a20 >> flags 0x00000008 fs_reg >> change member 2 joined 1 remove 0 failed 0 seq 2,2 >> members 1 2 >> >> On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: >>> >>> On 28/01/15 11:50 PM, cluster lab wrote: >>>> >>>> >>>> Hi, >>>> >>>> In a two node cluster, i received to different result from "qemu-img >>>> check" on just one file: >>>> >>>> node1 # qemu-img check VMStorage/x.qcow2 >>>> No errors were found on the image. >>>> >>>> Node2 # qemu-img check VMStorage/x.qcow2 >>>> qemu-img: Could not open 'VMStorage/x.qcow2" >>>> >>>> All other files are OK, and the cluster works properly. >>>> What is the problem? >>>> >>>> ==== >>>> Packages: >>>> kernel: 2.6.32-431.5.1.el6.x86_64 >>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >>>> corosync: corosync-1.4.1-17.el6.x86_64 >>>> >>>> Best Regards >>> >>> >>> >>> What does 'dlm_tool ls' show? >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cluster.labs at gmail.com Thu Jan 29 11:35:22 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 15:05:22 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <54C9C4DB.3060008@alteeve.ca> <54C9CB53.4070407@alteeve.ca> Message-ID: I have two separate cluster, with this problem, the output of "dlm_tool ls" on other site is: dlm_tool ls dlm lockspaces name VMStorage4 id 0xfd25ae65 flags 0x00000008 fs_reg change member 2 joined 0 remove 1 failed 1 seq 2,2 members 2 3 name VMStorage3 id 0xb26438a2 flags 0x00000008 fs_reg change member 2 joined 0 remove 1 failed 1 seq 2,2 members 2 3 name VMStorage2 id 0xab7f09e3 flags 0x00000008 fs_reg change member 2 joined 0 remove 1 failed 1 seq 2,2 members 2 3 name VMStorage1 id 0x80525a20 flags 0x00000008 fs_reg change member 2 joined 0 remove 1 failed 1 seq 2,2 members 2 3 There is one fail ....!!!! On Thu, Jan 29, 2015 at 1:16 PM, cluster lab wrote: > Node2 # touch /VMStorage1/test > > Node1 # ls /VMStorage1/test > /VMStorage1/test > Node1 # rm /VMStorage1/test > rm: remove regular empty file `/VMStorage1/test'? y > > ==== > > Node1 # touch /VMStorage1/test > > Node2 # ls /VMStorage1/test > /VMStorage1/test > Node2 # rm /VMStorage1/test > rm: remove regular empty file `/VMStorage1/test'? y > > No Problem .... > > On Thu, Jan 29, 2015 at 9:25 AM, Digimer wrote: >> That looks OK. Can you touch a file from one node and see it on the other >> and vice-versa? Is there anything in either node's log files when you run >> 'qemu-img check'? >> >> >> On 29/01/15 12:34 AM, cluster lab wrote: >>> >>> Node2: # dlm_tool ls >>> dlm lockspaces >>> name VMStorage3 >>> id 0xb26438a2 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>> members 1 2 >>> >>> name VMStorage2 >>> id 0xab7f09e3 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>> members 1 2 >>> >>> name VMStorage1 >>> id 0x80525a20 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>> members 1 2 >>> =========================================== >>> Node1: # dlm_tool ls >>> dlm lockspaces >>> name VMStorage3 >>> id 0xb26438a2 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>> members 1 2 >>> >>> name VMStorage2 >>> id 0xab7f09e3 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>> members 1 2 >>> >>> name VMStorage1 >>> id 0x80525a20 >>> flags 0x00000008 fs_reg >>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>> members 1 2 >>> >>> On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: >>>> >>>> On 28/01/15 11:50 PM, cluster lab wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> In a two node cluster, i received to different result from "qemu-img >>>>> check" on just one file: >>>>> >>>>> node1 # qemu-img check VMStorage/x.qcow2 >>>>> No errors were found on the image. >>>>> >>>>> Node2 # qemu-img check VMStorage/x.qcow2 >>>>> qemu-img: Could not open 'VMStorage/x.qcow2" >>>>> >>>>> All other files are OK, and the cluster works properly. >>>>> What is the problem? >>>>> >>>>> ==== >>>>> Packages: >>>>> kernel: 2.6.32-431.5.1.el6.x86_64 >>>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >>>>> corosync: corosync-1.4.1-17.el6.x86_64 >>>>> >>>>> Best Regards >>>> >>>> >>>> >>>> What does 'dlm_tool ls' show? >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ >>>> What if the cure for cancer is trapped in the mind of a person without >>>> access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From cluster.labs at gmail.com Thu Jan 29 12:11:19 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 15:41:19 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <54C9C4DB.3060008@alteeve.ca> <54C9CB53.4070407@alteeve.ca> Message-ID: Additional information may be useful: On Affected Node: # ls -lah FILE -????????? ? ? ? ? ? FILE On Thu, Jan 29, 2015 at 3:05 PM, cluster lab wrote: > I have two separate cluster, with this problem, > the output of "dlm_tool ls" on other site is: > > dlm_tool ls > dlm lockspaces > name VMStorage4 > id 0xfd25ae65 > flags 0x00000008 fs_reg > change member 2 joined 0 remove 1 failed 1 seq 2,2 > members 2 3 > > name VMStorage3 > id 0xb26438a2 > flags 0x00000008 fs_reg > change member 2 joined 0 remove 1 failed 1 seq 2,2 > members 2 3 > > name VMStorage2 > id 0xab7f09e3 > flags 0x00000008 fs_reg > change member 2 joined 0 remove 1 failed 1 seq 2,2 > members 2 3 > > name VMStorage1 > id 0x80525a20 > flags 0x00000008 fs_reg > change member 2 joined 0 remove 1 failed 1 seq 2,2 > members 2 3 > > There is one fail ....!!!! > > > On Thu, Jan 29, 2015 at 1:16 PM, cluster lab wrote: >> Node2 # touch /VMStorage1/test >> >> Node1 # ls /VMStorage1/test >> /VMStorage1/test >> Node1 # rm /VMStorage1/test >> rm: remove regular empty file `/VMStorage1/test'? y >> >> ==== >> >> Node1 # touch /VMStorage1/test >> >> Node2 # ls /VMStorage1/test >> /VMStorage1/test >> Node2 # rm /VMStorage1/test >> rm: remove regular empty file `/VMStorage1/test'? y >> >> No Problem .... >> >> On Thu, Jan 29, 2015 at 9:25 AM, Digimer wrote: >>> That looks OK. Can you touch a file from one node and see it on the other >>> and vice-versa? Is there anything in either node's log files when you run >>> 'qemu-img check'? >>> >>> >>> On 29/01/15 12:34 AM, cluster lab wrote: >>>> >>>> Node2: # dlm_tool ls >>>> dlm lockspaces >>>> name VMStorage3 >>>> id 0xb26438a2 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>>> members 1 2 >>>> >>>> name VMStorage2 >>>> id 0xab7f09e3 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>>> members 1 2 >>>> >>>> name VMStorage1 >>>> id 0x80525a20 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 1,1 >>>> members 1 2 >>>> =========================================== >>>> Node1: # dlm_tool ls >>>> dlm lockspaces >>>> name VMStorage3 >>>> id 0xb26438a2 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>>> members 1 2 >>>> >>>> name VMStorage2 >>>> id 0xab7f09e3 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>>> members 1 2 >>>> >>>> name VMStorage1 >>>> id 0x80525a20 >>>> flags 0x00000008 fs_reg >>>> change member 2 joined 1 remove 0 failed 0 seq 2,2 >>>> members 1 2 >>>> >>>> On Thu, Jan 29, 2015 at 8:57 AM, Digimer wrote: >>>>> >>>>> On 28/01/15 11:50 PM, cluster lab wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> In a two node cluster, i received to different result from "qemu-img >>>>>> check" on just one file: >>>>>> >>>>>> node1 # qemu-img check VMStorage/x.qcow2 >>>>>> No errors were found on the image. >>>>>> >>>>>> Node2 # qemu-img check VMStorage/x.qcow2 >>>>>> qemu-img: Could not open 'VMStorage/x.qcow2" >>>>>> >>>>>> All other files are OK, and the cluster works properly. >>>>>> What is the problem? >>>>>> >>>>>> ==== >>>>>> Packages: >>>>>> kernel: 2.6.32-431.5.1.el6.x86_64 >>>>>> GFS2: gfs2-utils-3.0.12.1-23.el6.x86_64 >>>>>> corosync: corosync-1.4.1-17.el6.x86_64 >>>>>> >>>>>> Best Regards >>>>> >>>>> >>>>> >>>>> What does 'dlm_tool ls' show? >>>>> >>>>> -- >>>>> Digimer >>>>> Papers and Projects: https://alteeve.ca/w/ >>>>> What if the cure for cancer is trapped in the mind of a person without >>>>> access to education? >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Thu Jan 29 13:10:11 2015 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 29 Jan 2015 08:10:11 -0500 (EST) Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <54C9C4DB.3060008@alteeve.ca> <54C9CB53.4070407@alteeve.ca> Message-ID: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Additional information may be useful: > > On Affected Node: > # ls -lah FILE > -????????? ? ? ? ? ? FILE > This symptom often means a loss of cluster coherency. Are you using lock_dlm protocol? Bob Peterson Red Hat File Systems From cluster.labs at gmail.com Thu Jan 29 14:55:52 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 18:25:52 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> References: <54C9C4DB.3060008@alteeve.ca> <54C9CB53.4070407@alteeve.ca> <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> Message-ID: Hi Bob, yes, it uses lock_dlm, ... On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson wrote: > ----- Original Message ----- >> Additional information may be useful: >> >> On Affected Node: >> # ls -lah FILE >> -????????? ? ? ? ? ? FILE >> > > This symptom often means a loss of cluster coherency. > Are you using lock_dlm protocol? > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Thu Jan 29 15:03:37 2015 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 29 Jan 2015 10:03:37 -0500 (EST) Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <54C9CB53.4070407@alteeve.ca> <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> Message-ID: <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Hi Bob, > > yes, it uses lock_dlm, ... > > > On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson wrote: > > ----- Original Message ----- > >> Additional information may be useful: > >> > >> On Affected Node: > >> # ls -lah FILE > >> -????????? ? ? ? ? ? FILE Hi, Try doing the command "stat FILE|grep Inode" from both nodes and see if both nodes come up with the same answer for "Inode:" Regards, Bob Peterson Red Hat File Systems From cluster.labs at gmail.com Thu Jan 29 15:22:45 2015 From: cluster.labs at gmail.com (cluster lab) Date: Thu, 29 Jan 2015 18:52:45 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> References: <54C9CB53.4070407@alteeve.ca> <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> Message-ID: On affected node: stat FILE | grep Inode stat: cannot stat `FILE': Input/output error On other node: stat PublicDNS1-OS.qcow2 | grep Inode Device: fd06h/64774d Inode: 267858 Links: 1 On Thu, Jan 29, 2015 at 6:33 PM, Bob Peterson wrote: > ----- Original Message ----- >> Hi Bob, >> >> yes, it uses lock_dlm, ... >> >> >> On Thu, Jan 29, 2015 at 4:40 PM, Bob Peterson wrote: >> > ----- Original Message ----- >> >> Additional information may be useful: >> >> >> >> On Affected Node: >> >> # ls -lah FILE >> >> -????????? ? ? ? ? ? FILE > > Hi, > > Try doing the command "stat FILE|grep Inode" from both nodes and see if > both nodes come up with the same answer for "Inode:" > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Thu Jan 29 15:34:56 2015 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 29 Jan 2015 10:34:56 -0500 (EST) Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> Message-ID: <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> ----- Original Message ----- > On affected node: > > stat FILE | grep Inode > stat: cannot stat `FILE': Input/output error > > On other node: > stat PublicDNS1-OS.qcow2 | grep Inode > Device: fd06h/64774d Inode: 267858 Links: 1 Something funky going on. I'd check dmesg for withdraw messages, etc., on the affected node. Regards, Bob Peterson Red Hat File Systems From cluster.labs at gmail.com Sat Jan 31 04:58:08 2015 From: cluster.labs at gmail.com (cluster lab) Date: Sat, 31 Jan 2015 08:28:08 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> Message-ID: Hi, There is n't any unusual state or message, Also GFS logs (gfs, dlm) are silent ... Is there any chance to find source of problem? On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson wrote: > ----- Original Message ----- >> On affected node: >> >> stat FILE | grep Inode >> stat: cannot stat `FILE': Input/output error >> >> On other node: >> stat PublicDNS1-OS.qcow2 | grep Inode >> Device: fd06h/64774d Inode: 267858 Links: 1 > > Something funky going on. > I'd check dmesg for withdraw messages, etc., on the affected node. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cluster.labs at gmail.com Sat Jan 31 05:10:17 2015 From: cluster.labs at gmail.com (cluster lab) Date: Sat, 31 Jan 2015 08:40:17 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> Message-ID: Some more information: Cluster is a three nodes cluster, One of its node (ID == 1) fenced because of network failure ... After fence this problem borned ... On Sat, Jan 31, 2015 at 8:28 AM, cluster lab wrote: > Hi, > > There is n't any unusual state or message, > Also GFS logs (gfs, dlm) are silent ... > > Is there any chance to find source of problem? > > On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson wrote: >> ----- Original Message ----- >>> On affected node: >>> >>> stat FILE | grep Inode >>> stat: cannot stat `FILE': Input/output error >>> >>> On other node: >>> stat PublicDNS1-OS.qcow2 | grep Inode >>> Device: fd06h/64774d Inode: 267858 Links: 1 >> >> Something funky going on. >> I'd check dmesg for withdraw messages, etc., on the affected node. >> >> Regards, >> >> Bob Peterson >> Red Hat File Systems >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Sat Jan 31 05:40:44 2015 From: lists at alteeve.ca (Digimer) Date: Sat, 31 Jan 2015 00:40:44 -0500 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> Message-ID: <54CC6ADC.1000305@alteeve.ca> Does the logs show the fence succeeded or failed? Can you please post the logs from the surviving two nodes starting just before the failure until a few minutes after? digimer On 31/01/15 12:10 AM, cluster lab wrote: > Some more information: > > Cluster is a three nodes cluster, > One of its node (ID == 1) fenced because of network failure ... > > After fence this problem borned ... > > > On Sat, Jan 31, 2015 at 8:28 AM, cluster lab wrote: >> Hi, >> >> There is n't any unusual state or message, >> Also GFS logs (gfs, dlm) are silent ... >> >> Is there any chance to find source of problem? >> >> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson wrote: >>> ----- Original Message ----- >>>> On affected node: >>>> >>>> stat FILE | grep Inode >>>> stat: cannot stat `FILE': Input/output error >>>> >>>> On other node: >>>> stat PublicDNS1-OS.qcow2 | grep Inode >>>> Device: fd06h/64774d Inode: 267858 Links: 1 >>> >>> Something funky going on. >>> I'd check dmesg for withdraw messages, etc., on the affected node. >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From cluster.labs at gmail.com Sat Jan 31 06:51:30 2015 From: cluster.labs at gmail.com (cluster lab) Date: Sat, 31 Jan 2015 10:21:30 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <54CC6ADC.1000305@alteeve.ca> References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> <54CC6ADC.1000305@alteeve.ca> Message-ID: Log messages: Jan 21 17:07:31 node2 corosync[47788]: [TOTEM ] A processor failed, forming new configuration. Jan 21 17:07:43 node2 corosync[47788]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 node2 corosync[47788]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 node2 corosync[47788]: [CPG ] chosen downlist: sender r(0) ip(172........) ; members(old:3 left:1) Jan 21 17:07:43 node2 corosync[47788]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 node2 fenced[47840]: fencing node node1 Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Looking at journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Looking at journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Replaying journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Replaying journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Replayed 250 of 515 blocks Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Found 12 revoke tags Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Journal replayed in 1s Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Replayed 4260 of 4803 blocks Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Found 5 revoke tags Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Journal replayed in 1s Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done Jan 21 17:07:31 node3 corosync[51444]: [TOTEM ] A processor failed, forming new configuration. Jan 21 17:07:43 node3 corosync[51444]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 node3 corosync[51444]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 node3 corosync[51444]: [CPG ] chosen downlist: sender r(0) ip(172......) ; members(old:3 left:1) Jan 21 17:07:43 node3 corosync[51444]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2 Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Replaying journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Replayed 6 of 7 blocks Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Found 1 revoke tags Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Journal replayed in 1s Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done One Question: Accessing to files before acquire journal lock may cause the problem? .... On Sat, Jan 31, 2015 at 9:10 AM, Digimer wrote: > Does the logs show the fence succeeded or failed? Can you please post the > logs from the surviving two nodes starting just before the failure until a > few minutes after? > > digimer > > > On 31/01/15 12:10 AM, cluster lab wrote: >> >> Some more information: >> >> Cluster is a three nodes cluster, >> One of its node (ID == 1) fenced because of network failure ... >> >> After fence this problem borned ... >> >> >> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab >> wrote: >>> >>> Hi, >>> >>> There is n't any unusual state or message, >>> Also GFS logs (gfs, dlm) are silent ... >>> >>> Is there any chance to find source of problem? >>> >>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson >>> wrote: >>>> >>>> ----- Original Message ----- >>>>> >>>>> On affected node: >>>>> >>>>> stat FILE | grep Inode >>>>> stat: cannot stat `FILE': Input/output error >>>>> >>>>> On other node: >>>>> stat PublicDNS1-OS.qcow2 | grep Inode >>>>> Device: fd06h/64774d Inode: 267858 Links: 1 >>>> >>>> >>>> Something funky going on. >>>> I'd check dmesg for withdraw messages, etc., on the affected node. >>>> >>>> Regards, >>>> >>>> Bob Peterson >>>> Red Hat File Systems >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cluster.labs at gmail.com Sat Jan 31 06:52:11 2015 From: cluster.labs at gmail.com (cluster lab) Date: Sat, 31 Jan 2015 10:22:11 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> <54CC6ADC.1000305@alteeve.ca> Message-ID: Logs as attach ... On Sat, Jan 31, 2015 at 10:21 AM, cluster lab wrote: > Log messages: > > Jan 21 17:07:31 node2 corosync[47788]: [TOTEM ] A processor failed, > forming new configuration. > Jan 21 17:07:43 node2 corosync[47788]: [QUORUM] Members[2]: 2 3 > Jan 21 17:07:43 node2 corosync[47788]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1 > Jan 21 17:07:43 node2 corosync[47788]: [CPG ] chosen downlist: > sender r(0) ip(172........) ; members(old:3 left:1) > Jan 21 17:07:43 node2 corosync[47788]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 21 17:07:43 node2 fenced[47840]: fencing node node1 > Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Looking at journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying > to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Looking at journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Replaying journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Replaying journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Replayed 250 of 515 blocks > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Found 12 revoke tags > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Journal replayed in 1s > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Replayed 4260 of 4803 blocks > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Found 5 revoke tags > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Journal replayed in 1s > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done > > > Jan 21 17:07:31 node3 corosync[51444]: [TOTEM ] A processor failed, > forming new configuration. > Jan 21 17:07:43 node3 corosync[51444]: [QUORUM] Members[2]: 2 3 > Jan 21 17:07:43 node3 corosync[51444]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jan 21 17:07:43 node3 corosync[51444]: [CPG ] chosen downlist: > sender r(0) ip(172......) ; members(old:3 left:1) > Jan 21 17:07:43 node3 corosync[51444]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1 > Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2 > Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying > to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: > Looking at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Looking at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking > at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Replaying journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Replayed 6 of 7 blocks > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Found 1 revoke tags > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Journal replayed in 1s > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done > > One Question: Accessing to files before acquire journal lock may cause > the problem? > > .... > > On Sat, Jan 31, 2015 at 9:10 AM, Digimer wrote: >> Does the logs show the fence succeeded or failed? Can you please post the >> logs from the surviving two nodes starting just before the failure until a >> few minutes after? >> >> digimer >> >> >> On 31/01/15 12:10 AM, cluster lab wrote: >>> >>> Some more information: >>> >>> Cluster is a three nodes cluster, >>> One of its node (ID == 1) fenced because of network failure ... >>> >>> After fence this problem borned ... >>> >>> >>> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab >>> wrote: >>>> >>>> Hi, >>>> >>>> There is n't any unusual state or message, >>>> Also GFS logs (gfs, dlm) are silent ... >>>> >>>> Is there any chance to find source of problem? >>>> >>>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson >>>> wrote: >>>>> >>>>> ----- Original Message ----- >>>>>> >>>>>> On affected node: >>>>>> >>>>>> stat FILE | grep Inode >>>>>> stat: cannot stat `FILE': Input/output error >>>>>> >>>>>> On other node: >>>>>> stat PublicDNS1-OS.qcow2 | grep Inode >>>>>> Device: fd06h/64774d Inode: 267858 Links: 1 >>>>> >>>>> >>>>> Something funky going on. >>>>> I'd check dmesg for withdraw messages, etc., on the affected node. >>>>> >>>>> Regards, >>>>> >>>>> Bob Peterson >>>>> Red Hat File Systems >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- Jan 21 17:07:43 ost-pvm2 corosync[47788]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 ost-pvm2 corosync[47788]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 ost-pvm2 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 ost-pvm2 corosync[47788]: [CPG ] chosen downlist: sender r(0) ip(172.16.40.22) ; members(old:3 left:1) Jan 21 17:07:43 ost-pvm2 corosync[47788]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1 Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Looking at journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Looking at journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replaying journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replaying journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replayed 250 of 515 blocks Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Found 12 revoke tags Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Journal replayed in 1s Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Done Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replayed 4260 of 4803 blocks Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Found 5 revoke tags Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Journal replayed in 1s Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Done From lists at alteeve.ca Sat Jan 31 07:20:29 2015 From: lists at alteeve.ca (Digimer) Date: Sat, 31 Jan 2015 02:20:29 -0500 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> <54CC6ADC.1000305@alteeve.ca> Message-ID: <54CC823D.1080602@alteeve.ca> On 31/01/15 01:52 AM, cluster lab wrote: > Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1 There are no messages about this succeeding or failing... It looks like only 15 seconds seconds worth of logs. Can you please share the full amount of time I mentioned before, from both nodes? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From cluster.labs at gmail.com Sat Jan 31 07:52:48 2015 From: cluster.labs at gmail.com (cluster lab) Date: Sat, 31 Jan 2015 11:22:48 +0330 Subject: [Linux-cluster] GFS2: "Could not open" the file on one of the nodes In-Reply-To: <54CC823D.1080602@alteeve.ca> References: <678887466.3468720.1422537011406.JavaMail.zimbra@redhat.com> <149243946.3558706.1422543817506.JavaMail.zimbra@redhat.com> <1658385774.3582814.1422545696153.JavaMail.zimbra@redhat.com> <54CC6ADC.1000305@alteeve.ca> <54CC823D.1080602@alteeve.ca> Message-ID: Excuse me for partially logs ... Jan 21 17:07:57 node2 fenced[47840]: fence node1 success All other logs are about HA of VMs, ... and IO Error for this files ... Some new info: This problem occurred for about 4 files: three of them cause IO error on node 3, and one of them on node 2 ... On Sat, Jan 31, 2015 at 10:50 AM, Digimer wrote: > On 31/01/15 01:52 AM, cluster lab wrote: >> >> Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1 > > > There are no messages about this succeeding or failing... It looks like only > 15 seconds seconds worth of logs. Can you please share the full amount of > time I mentioned before, from both nodes? > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster