From wmodes at ucsc.edu Wed Feb 1 00:07:32 2012 From: wmodes at ucsc.edu (Wes Modes) Date: Tue, 31 Jan 2012 16:07:32 -0800 Subject: [Linux-cluster] GFS2 and VM High Availability/DRS Message-ID: <4F288244.7060904@ucsc.edu> Howdy, thanks for all your answers here. With your help (particularly Digimer), I was able to set up my little two node GFS2 cluster. I can't pretend yet to understand everything, but I have a blossoming awareness of what and why and how. The way I finally set it up for my test cluster was 1. LUN on SAN 2. configured through ESXi as RDM 3. RDM made available to OS 4. parted RDM device 5. pvcreate/vgcreate/lvcreate to create logical volume on device 6. mkfs.gfs2 to create GFS2 filesystem on volume supported by clvmd and cman, etc It works and that's great. BUT the lit says VMWare's vMotion/HA/DRS doesn't support RDM (though others say that isn't a problem) I am setting up GFS2 on CentOS running on VMWare and a SAN. We want to take advantage of VMWare's High Availability (HA) and Distributed Resource Scheduler (DRS) which allow the VM cluster to migrate a guest to another host if the guest becomes unavailable for any reason. I've come across some contradictory statements regarding the compatibility of RDMs and HA/DRS. So naturally, I have some questions: 1) If my shared cluster filesystem resides on an RDM on a SAN and is available to all of the ESXi hosts, can I use HA/DRS or not? If so, what are the limitations? If not, why not? 2) If I cannot use an RDM for the cluster filesystem, can I use VMFS so vmware can deal with it? What are the limitations of this? 3) Is there some other magic way using iSCSI connectors or something bypassing vmware? Anyone have experience with this? Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From anprice at redhat.com Wed Feb 1 13:13:46 2012 From: anprice at redhat.com (Andrew Price) Date: Wed, 01 Feb 2012 13:13:46 +0000 Subject: [Linux-cluster] gfs2-utils 3.1.4 Released Message-ID: <4F293A8A.9070504@redhat.com> Hi, gfs2-utils 3.1.4 has been released. This version features a new gfs2_lockgather script to aid diagnosis of GFS2 locking issues, more clean-ups and fixes based on static analysis results, and various other minor enhancements and bug fixes. See below for a full list of changes. The source tarball is available from: https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.4.tar.gz To report bugs or issues, please use: https://bugzilla.redhat.com/ Regards, Andy Price Red Hat File Systems Changes since 3.1.3: Adam Drew (1): Added gfs2_lockgather data gathering script. Andrew Price (30): libgfs2: Expand out calls to die() libgfs2: Push down die() into the utils and remove it gfs2_edit: Remove a useless assignment gfs2_edit: Check return value of compute_constants gfs2_edit: Fix possible uninitialized access gfs2_edit: Fix memory leak in dump_journal() gfs2_edit: Fix null pointer dereference in dump_journal gfs2_edit: Remove unused j_inode from find_journal_block() gfs2_edit: Fix memory leak in find_journal_block gfs2_edit: Check for error value from gfs2_get_bitmap gfs2_edit: Fix resource leaks in display_extended() gfs2_edit: Fix resource leak in print_block_details() gfs2_edit: Fix null pointer derefs in display_block_type() gfs2_edit: Check more error values from gfs2_get_bitmap gfs2_edit: Fix another resource leak in display_extended mkfs.gfs2: Fix use of uninitialized value in check_dev_content gfs2_convert: Fix null pointer deref in journ_space_to_rg gfs2_convert: Fix null pointer deref in conv_build_jindex fsck.gfs2: Remove unsigned comparisons with zero fsck.gfs2: Plug a leak in init_system_inodes() libgfs2: Set errno in dirent_alloc and use dir_add consistently fsck.gfs2: Plug memory leak in check_system_dir() fsck.gfs2: Fix null pointer deref in check_system_dir() fsck.gfs2: Plug a leak in find_block_ref() fsck.gfs2: Remove unused hash.c, hash.h mkfs.gfs2: Improve error messages libgfscontrol: Fix resource leaks fsck.gfs2: Plug a leak in peruse_system_dinode() fsck.gfs2: Fix unchecked malloc in gfs2_dup_set() gfs2_edit: Don't exit prematurely in display_block_type Carlos Maiolino (2): i18n: Update gfs2-utils.pot file Merge branch 'master' of ssh://git.fedorahosted.org/git/gfs2-utils Steven Whitehouse (13): gfs2_convert: clean up question asking code fsck.gfs2: Use sigaction and not signal syscall fsck.gfs2: Clean up pass calling code libgfs2: Add iovec to gfs2_buffer_head libgfs2: Add beginnings of a metadata description libgfs2: Remove struct gfs_rindex from header, etc libgfs2: Use endian defined types for GFS1 on disk structures edit: Fix up block type recognition libgfs2: Add a few structures missed from the initial version of meta.c fsck/libgfs2: Add a couple of missing header files libgfs2: Add some tables of symbolic constant names edit: Hook up gfs2_edit to use new metadata info from libgfs2 libgfs2: Add flags to metadata description From erik.redding at txstate.edu Wed Feb 1 20:02:45 2012 From: erik.redding at txstate.edu (Redding, Erik) Date: Wed, 1 Feb 2012 14:02:45 -0600 Subject: [Linux-cluster] HA-LVM and /etc/lvm/lvm.conf Message-ID: I'm having a dialog with RH support about configuring HA-LVM within RHCS and I'm trying to see if there are some limitations and thought I'd ping the mailing list on the same subject. Is an HA-LVM configuration that only uses LVM tags useful beyond a single volume group? I'm attempting to provide a database service along side a pair of HA-NFS services that utilize DRBD and LVM (but drbd isn't the issue). on two nodes, rhel-01 and rhel-02, I currently I have three volume groups: vgTest0, vgTest1, vgTestCluster vgTest0 and vgTest1 are volume groups that exist on each node, and they provide a single back-end LVM volume to a DRBD resource, so DRBD can leverage the snapshotting technique during syncs. Both nodes in the cluster have the same configuration, and DRBD is working fine. This has been in production for about a month. I recently got the SAN resource that is presented to both hosts that I want to roll in LVM so I can utilize snapshots on the volume data. I don't want to bother with CLVM because I have a use case for snapshotting, so HA-LVM as described: https://access.redhat.com/kb/docs/DOC-3068, I'm doing the second method. The goal is a failover cluster. I've been struggling with how to configure my /etc/lvm/lvm.conf because of the volume_list parameter: # If volume_list is defined, each LV is only activated if there is a # match against the list. # "vgname" and "vgname/lvname" are matched exactly. # "@tag" matches any tag set in the LV or VG. # "@*" matches if any tag defined on the host is also set in the LV or VG # # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ] would I go with something like: volume_list = [ "vgTest0", "vgTestCluster/lvTest0", "@rhel-01", "@*" ] I don't get why I need to state a persistent volume group - and if I do, which one? I've got two persistent groups on each node. I don't use LVM on the root disk. Could I somehow expand this out to two HA-LVM volume groups? I don't see a way but thought I'd ask. Erik Redding Systems Programmer, RHCE Core Systems Texas State University-San Marcos From wmodes at ucsc.edu Wed Feb 1 21:43:09 2012 From: wmodes at ucsc.edu (Wes Modes) Date: Wed, 01 Feb 2012 13:43:09 -0800 Subject: [Linux-cluster] GFS2 and VM High Availability/DRS Message-ID: <4F29B1ED.6090500@ucsc.edu> Howdy, thanks for all your answers here. With your help (particularly Digimer), I was able to set up my little two node GFS2 cluster. I can't pretend yet to understand everything, but I have a blossoming awareness of what and why and how. The way I finally set it up for my test cluster was 1. LUN on SAN 2. configured through ESXi as RDM 3. RDM made available to OS 4. parted RDM device 5. pvcreate/vgcreate/lvcreate to create logical volume on device 6. mkfs.gfs2 to create GFS2 filesystem on volume supported by clvmd and cman, etc It works and that's great. BUT the lit says VMWare's vMotion/HA/DRS doesn't support RDM (though others say that isn't a problem) I am setting up GFS2 on CentOS running on VMWare and a SAN. We want to take advantage of VMWare's High Availability (HA) and Distributed Resource Scheduler (DRS) which allow the VM cluster to migrate a guest to another host if the guest becomes unavailable for any reason. I've come across some contradictory statements regarding the compatibility of RDMs and HA/DRS. So naturally, I have some questions: 1) If my shared cluster filesystem resides on an RDM on a SAN and is available to all of the ESXi hosts, can I use vMotion and DRS or not? If so, what are the limitations? If not, why not? 2) If I cannot use an RDM for the cluster filesystem, can I use VMFS so vmware can deal with it? What are the limitations of this? 3) Is there some other magic way using iSCSI connectors or something bypassing vmware? Anyone have experience with this? Can anyone point me to details docs on this? Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From arunkp1987 at gmail.com Thu Feb 2 08:47:17 2012 From: arunkp1987 at gmail.com (Arun Purushothaman) Date: Thu, 2 Feb 2012 14:17:17 +0530 Subject: [Linux-cluster] O/P of cman_tool service Message-ID: Hi, O/p of cman_service [root at ssdgblade1 ~]# cman_tool services type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00020001 none [1 2] dlm 1 rgmanager 00030001 none [1 2] dlm 1 gfs 00050001 none [1] gfs 2 gfs 00040001 none [1] Regads Arun K P On 01/02/2012, linux-cluster-request at redhat.com wrote: > Send Linux-cluster mailing list submissions to > linux-cluster at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request at redhat.com > > You can reach the person managing the list at > linux-cluster-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: Nodes are getting Down while relocating service > (jose nuno neto) > 2. GFS2 and VM High Availability/DRS (Wes Modes) > 3. gfs2-utils 3.1.4 Released (Andrew Price) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 31 Jan 2012 17:25:41 -0000 (GMT) > From: "jose nuno neto" > To: "linux clustering" > Subject: Re: [Linux-cluster] Nodes are getting Down while relocating > service > Message-ID: > Content-Type: text/plain;charset=iso-8859-1 > > Hi > Well just not fully sure what logging that was > > Anyway, to help clarify, if the cluster works ok, up until you start > services, I'll investigate the services > > can you post the output of > cman_tool services > > when cluster is running ok > > Cheers > Jose > >> Hello Jose >> >> If you look the cluster.conf you can see his dosn't using drbd >> >> Like i sayed beforce >> =================================================== >> [network_problem] >> =================================================== >> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:05 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:06 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:07 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:08 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:09 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:10 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] FAILED TO RECEIVE >> Jan 28 15:50:11 ssdgblade2 openais[10324]: [TOTEM] entering GATHER state >> from 6. >> ================================================================== >> >> the first think it can be utils it's stops iptables >> >> 2012/1/31 jose nuno neto >> >>> Hello >>> >>> Took a quick look on the messages and see no fence reference, there's a >>> break in token messages, recovering, cluster.conf change, comunication >>> lost again.... >>> could be the service shutdown, after cluster.conf update, forcing >>> shutdown >>> >>> do you have drbd running too? >>> >>> Cheers >>> Jose Neto >>> >>> > Hi, >>> > >>> > We are facing some issue while configuring cluster in Centos 5.5 >>> > >>> > >>> > Here is the scenario where we got stuck. >>> > >>> > Issue: >>> > >>> > All nodes in the cluster turned of if cluster services restarted or >>> > disabled or enabled. >>> > >>> > Three services should work as a clustered service, >>> > >>> > 1. Postgresql. >>> > 2. GFS (1TB SAN space which is mounted on /var/lib/pgsql) >>> > 3. Virtual IP (common IP)?IP 10.242.108.42 >>> > >>> > Even we tried adding only Virtual IP as a cluster service then also, >>> > >>> > #clusvcadm -r DBService ?m ssdgblade2.db2 (from ssdgblade1.db1) >>> > >>> > Could not relocate the service and both node get turned off. >>> > >>> > Environment >>> > >>> > CentOS 5.5 >>> > Postgresql 8.3.3 >>> > Kernel version-2.6.18-194 >>> > CentOs Cluster Suit. >>> > >>> > Hardware: >>> > >>> > 1. Chasis IBM BladeCenter E. >>> > 2. IBM HS22 blades (8 numbers)?clustering is done in blade1 and >>> blade2 >>> > 3. Blade Management Module IP is 10.242.108.58 >>> > 4. Fence device IBM Bladecenter.( login successful via telnet and >>> > web browser to management module). >>> > 5. Cisco Catalyst 2960G Switch. >>> > >>> > IP: >>> > >>> > 10.242.108.41 (ssdgblade1.db1) >>> > 10.242.108.43 (ssdgblade2.db2) >>> > >>> > Virtual IP 10.242.108.42 >>> > Multicast IP 239.192.247.38 >>> > >>> > >>> > Diagnostic Steps followed: >>> > >>> > 1. Removed postgresql and GFS from cluster service and rebooted >>> > both the server with only VIP service. Still problem exist. Can not >>> > relocate the service. >>> > 2. Tested fencing by, >>> > >>> > #fence_node ssdgblade2.db2 (from db1) >>> > #fence_node ssdgblade1.db1 (from db2) >>> > >>> > Can fence the given node. But during boot up it fence the other node. >>> > >>> > Please find the attachment for your reference. >>> > -- >>> > >>> > >>> > Thanks & Regards, >>> > >>> > *Arun K P >>> > * >>> > >>> > System Administrator >>> > >>> > *HCL Infosystems Ltd*. >>> > >>> > *Kolkata* >>> > >>> > Mob: +91- 9903361422 >>> > >>> > *www.hclinfosystems.in* >>> > >>> > *Technology that touches lives* *TM* >>> > ** >>> > >>> > -- >>> > This message has been scanned for viruses and >>> > dangerous content by MailScanner, and is >>> > believed to be clean. >>> > >>> > -- >>> > Linux-cluster mailing list >>> > Linux-cluster at redhat.com >>> > https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by MailScanner, and is >>> believed to be clean. >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> >> -- >> esta es mi vida e me la vivo hasta que dios quiera >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > > > ------------------------------ > > Message: 2 > Date: Tue, 31 Jan 2012 16:07:32 -0800 > From: Wes Modes > To: linux clustering > Subject: [Linux-cluster] GFS2 and VM High Availability/DRS > Message-ID: <4F288244.7060904 at ucsc.edu> > Content-Type: text/plain; charset="iso-8859-1" > > Howdy, thanks for all your answers here. With your help (particularly > Digimer), I was able to set up my little two node GFS2 cluster. I can't > pretend yet to understand everything, but I have a blossoming awareness > of what and why and how. > > The way I finally set it up for my test cluster was > > 1. LUN on SAN > 2. configured through ESXi as RDM > 3. RDM made available to OS > 4. parted RDM device > 5. pvcreate/vgcreate/lvcreate to create logical volume on device > 6. mkfs.gfs2 to create GFS2 filesystem on volume supported by clvmd and > cman, etc > > It works and that's great. BUT the lit says VMWare's vMotion/HA/DRS > doesn't support RDM (though others say that isn't a problem) > > I am setting up GFS2 on CentOS running on VMWare and a SAN. We want to > take advantage of VMWare's High Availability (HA) and Distributed > Resource Scheduler (DRS) which allow the VM cluster to migrate a guest > to another host if the guest becomes unavailable for any reason. I've > come across some contradictory statements regarding the compatibility of > RDMs and HA/DRS. So naturally, I have some questions: > > 1) If my shared cluster filesystem resides on an RDM on a SAN and is > available to all of the ESXi hosts, can I use HA/DRS or not? If so, > what are the limitations? If not, why not? > > 2) If I cannot use an RDM for the cluster filesystem, can I use VMFS so > vmware can deal with it? What are the limitations of this? > > 3) Is there some other magic way using iSCSI connectors or something > bypassing vmware? Anyone have experience with this? > > Wes > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Message: 3 > Date: Wed, 01 Feb 2012 13:13:46 +0000 > From: Andrew Price > To: cluster-devel at redhat.com, linux-cluster at redhat.com > Subject: [Linux-cluster] gfs2-utils 3.1.4 Released > Message-ID: <4F293A8A.9070504 at redhat.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi, > > gfs2-utils 3.1.4 has been released. This version features a new > gfs2_lockgather script to aid diagnosis of GFS2 locking issues, more > clean-ups and fixes based on static analysis results, and various other > minor enhancements and bug fixes. See below for a full list of changes. > > The source tarball is available from: > > https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.4.tar.gz > > To report bugs or issues, please use: > > https://bugzilla.redhat.com/ > > Regards, > > Andy Price > Red Hat File Systems > > > Changes since 3.1.3: > > Adam Drew (1): > Added gfs2_lockgather data gathering script. > > Andrew Price (30): > libgfs2: Expand out calls to die() > libgfs2: Push down die() into the utils and remove it > gfs2_edit: Remove a useless assignment > gfs2_edit: Check return value of compute_constants > gfs2_edit: Fix possible uninitialized access > gfs2_edit: Fix memory leak in dump_journal() > gfs2_edit: Fix null pointer dereference in dump_journal > gfs2_edit: Remove unused j_inode from find_journal_block() > gfs2_edit: Fix memory leak in find_journal_block > gfs2_edit: Check for error value from gfs2_get_bitmap > gfs2_edit: Fix resource leaks in display_extended() > gfs2_edit: Fix resource leak in print_block_details() > gfs2_edit: Fix null pointer derefs in display_block_type() > gfs2_edit: Check more error values from gfs2_get_bitmap > gfs2_edit: Fix another resource leak in display_extended > mkfs.gfs2: Fix use of uninitialized value in check_dev_content > gfs2_convert: Fix null pointer deref in journ_space_to_rg > gfs2_convert: Fix null pointer deref in conv_build_jindex > fsck.gfs2: Remove unsigned comparisons with zero > fsck.gfs2: Plug a leak in init_system_inodes() > libgfs2: Set errno in dirent_alloc and use dir_add consistently > fsck.gfs2: Plug memory leak in check_system_dir() > fsck.gfs2: Fix null pointer deref in check_system_dir() > fsck.gfs2: Plug a leak in find_block_ref() > fsck.gfs2: Remove unused hash.c, hash.h > mkfs.gfs2: Improve error messages > libgfscontrol: Fix resource leaks > fsck.gfs2: Plug a leak in peruse_system_dinode() > fsck.gfs2: Fix unchecked malloc in gfs2_dup_set() > gfs2_edit: Don't exit prematurely in display_block_type > > Carlos Maiolino (2): > i18n: Update gfs2-utils.pot file > Merge branch 'master' of ssh://git.fedorahosted.org/git/gfs2-utils > > Steven Whitehouse (13): > gfs2_convert: clean up question asking code > fsck.gfs2: Use sigaction and not signal syscall > fsck.gfs2: Clean up pass calling code > libgfs2: Add iovec to gfs2_buffer_head > libgfs2: Add beginnings of a metadata description > libgfs2: Remove struct gfs_rindex from header, etc > libgfs2: Use endian defined types for GFS1 on disk structures > edit: Fix up block type recognition > libgfs2: Add a few structures missed from the initial version of > meta.c > fsck/libgfs2: Add a couple of missing header files > libgfs2: Add some tables of symbolic constant names > edit: Hook up gfs2_edit to use new metadata info from libgfs2 > libgfs2: Add flags to metadata description > > > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 94, Issue 1 > ******************************************** > From jose.neto at liber4e.com Thu Feb 2 08:58:44 2012 From: jose.neto at liber4e.com (jose nuno neto) Date: Thu, 2 Feb 2012 08:58:44 -0000 (GMT) Subject: [Linux-cluster] HA-LVM and /etc/lvm/lvm.conf In-Reply-To: References: Message-ID: <666ec9ec3026fe86314623812970f61e.squirrel@liber4e.com> Hi I have used LVM-HA on a previous project and for Failover Cluster works fine. Didn't use DRBD with it, have tested DRBD for concurrent access to devices/filesystems For volume_list I would use something like this "@rhel-01" if you put "@*" think it will allow all tags so guess you should remove it anyway, didn't fully understand the DRDB here, you provide this LVM-HA has DRBD resources? If so you dont need LVM tags for this. just normal LVM LVM tags in redhat cluster are used to allow switching access to the Vgs from one node to others Regards Jose > I'm having a dialog with RH support about configuring HA-LVM within RHCS > and I'm trying to see if there are some limitations and thought I'd ping > the mailing list on the same subject. > > Is an HA-LVM configuration that only uses LVM tags useful beyond a single > volume group? > > I'm attempting to provide a database service along side a pair of HA-NFS > services that utilize DRBD and LVM (but drbd isn't the issue). > > on two nodes, rhel-01 and rhel-02, I currently I have three volume groups: > vgTest0, vgTest1, vgTestCluster > > vgTest0 and vgTest1 are volume groups that exist on each node, and they > provide a single back-end LVM volume to a DRBD resource, so DRBD can > leverage the snapshotting technique during syncs. Both nodes in the > cluster have the same configuration, and DRBD is working fine. This has > been in production for about a month. > > I recently got the SAN resource that is presented to both hosts that I > want to roll in LVM so I can utilize snapshots on the volume data. I > don't want to bother with CLVM because I have a use case for snapshotting, > so HA-LVM as described: https://access.redhat.com/kb/docs/DOC-3068, I'm > doing the second method. The goal is a failover cluster. > > I've been struggling with how to configure my /etc/lvm/lvm.conf because of > the volume_list parameter: > > > # If volume_list is defined, each LV is only activated if there is a > # match against the list. > # "vgname" and "vgname/lvname" are matched exactly. > # "@tag" matches any tag set in the LV or VG. > # "@*" matches if any tag defined on the host is also set in the LV or > VG > # > # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ] > > would I go with something like: > volume_list = [ "vgTest0", "vgTestCluster/lvTest0", "@rhel-01", "@*" ] > > I don't get why I need to state a persistent volume group - and if I do, > which one? I've got two persistent groups on each node. I don't use LVM > on the root disk. > > Could I somehow expand this out to two HA-LVM volume groups? I don't see a > way but thought I'd ask. > > > > Erik Redding > Systems Programmer, RHCE > Core Systems > Texas State University-San Marcos > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From emi2fast at gmail.com Thu Feb 2 09:19:26 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Thu, 2 Feb 2012 10:19:26 +0100 Subject: [Linux-cluster] HA-LVM and /etc/lvm/lvm.conf In-Reply-To: <666ec9ec3026fe86314623812970f61e.squirrel@liber4e.com> References: <666ec9ec3026fe86314623812970f61e.squirrel@liber4e.com> Message-ID: HA-LVM it's deprecated on redhat cluster 2012/2/2 jose nuno neto > Hi > > I have used LVM-HA on a previous project and for Failover Cluster works > fine. > Didn't use DRBD with it, have tested DRBD for concurrent access to > devices/filesystems > > For volume_list I would use something like this > "@rhel-01" > if you put > "@*" > think it will allow all tags > so guess you should remove it > > anyway, didn't fully understand the DRDB here, you provide this LVM-HA has > DRBD resources? If so you dont need LVM tags for this. just normal LVM > LVM tags in redhat cluster are used to allow switching access to the Vgs > from one node to others > > Regards > Jose > > > I'm having a dialog with RH support about configuring HA-LVM within RHCS > > and I'm trying to see if there are some limitations and thought I'd ping > > the mailing list on the same subject. > > > > Is an HA-LVM configuration that only uses LVM tags useful beyond a single > > volume group? > > > > I'm attempting to provide a database service along side a pair of HA-NFS > > services that utilize DRBD and LVM (but drbd isn't the issue). > > > > on two nodes, rhel-01 and rhel-02, I currently I have three volume > groups: > > vgTest0, vgTest1, vgTestCluster > > > > vgTest0 and vgTest1 are volume groups that exist on each node, and they > > provide a single back-end LVM volume to a DRBD resource, so DRBD can > > leverage the snapshotting technique during syncs. Both nodes in the > > cluster have the same configuration, and DRBD is working fine. This has > > been in production for about a month. > > > > I recently got the SAN resource that is presented to both hosts that I > > want to roll in LVM so I can utilize snapshots on the volume data. I > > don't want to bother with CLVM because I have a use case for > snapshotting, > > so HA-LVM as described: https://access.redhat.com/kb/docs/DOC-3068, I'm > > doing the second method. The goal is a failover cluster. > > > > I've been struggling with how to configure my /etc/lvm/lvm.conf because > of > > the volume_list parameter: > > > > > > # If volume_list is defined, each LV is only activated if there is a > > # match against the list. > > # "vgname" and "vgname/lvname" are matched exactly. > > # "@tag" matches any tag set in the LV or VG. > > # "@*" matches if any tag defined on the host is also set in the LV or > > VG > > # > > # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ] > > > > would I go with something like: > > volume_list = [ "vgTest0", "vgTestCluster/lvTest0", "@rhel-01", "@*" ] > > > > I don't get why I need to state a persistent volume group - and if I do, > > which one? I've got two persistent groups on each node. I don't use LVM > > on the root disk. > > > > Could I somehow expand this out to two HA-LVM volume groups? I don't see > a > > way but thought I'd ask. > > > > > > > > Erik Redding > > Systems Programmer, RHCE > > Core Systems > > Texas State University-San Marcos > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > > > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.redding at txstate.edu Thu Feb 2 22:32:28 2012 From: erik.redding at txstate.edu (Redding, Erik) Date: Thu, 2 Feb 2012 16:32:28 -0600 Subject: [Linux-cluster] HA-LVM and /etc/lvm/lvm.conf In-Reply-To: References: <666ec9ec3026fe86314623812970f61e.squirrel@liber4e.com> Message-ID: <809DF8F0-D629-4C5C-8D82-B92B1DEB00B6@txstate.edu> Jose - Ignore the DRBD+LVM aspect - act like it's / in LVM because I'm not concerned about that part. I describe it because it directly effects the LVM configuration if I turn on tagging. Here's the info on LVM+DRBD: http://www.drbd.org/users-guide/s-lvm-lv-as-drbd-backing-dev.html - it lets you gain a "roll back" technique if the DRBD sync breaks somehow because it snaps the underlying LV before it starts a sync. Emmanuel - Thanks for the insightful comment but I would appreciate it if you'd elaborate on how you'd solve for the situation. I'm open to other solutions if Red Hat has something better (and not CLVM because I require snapshots) but I don't see that there is one. As far as I can tell, this is the best solution for the requirements. Thanks, Erik Redding Core Systems Texas State University-San Marcos On Feb 2, 2012, at 3:19 AM, emmanuel segura wrote: HA-LVM it's deprecated on redhat cluster 2012/2/2 jose nuno neto > Hi I have used LVM-HA on a previous project and for Failover Cluster works fine. Didn't use DRBD with it, have tested DRBD for concurrent access to devices/filesystems For volume_list I would use something like this "@rhel-01" if you put "@*" think it will allow all tags so guess you should remove it anyway, didn't fully understand the DRDB here, you provide this LVM-HA has DRBD resources? If so you dont need LVM tags for this. just normal LVM LVM tags in redhat cluster are used to allow switching access to the Vgs from one node to others Regards Jose > I'm having a dialog with RH support about configuring HA-LVM within RHCS > and I'm trying to see if there are some limitations and thought I'd ping > the mailing list on the same subject. > > Is an HA-LVM configuration that only uses LVM tags useful beyond a single > volume group? > > I'm attempting to provide a database service along side a pair of HA-NFS > services that utilize DRBD and LVM (but drbd isn't the issue). > > on two nodes, rhel-01 and rhel-02, I currently I have three volume groups: > vgTest0, vgTest1, vgTestCluster > > vgTest0 and vgTest1 are volume groups that exist on each node, and they > provide a single back-end LVM volume to a DRBD resource, so DRBD can > leverage the snapshotting technique during syncs. Both nodes in the > cluster have the same configuration, and DRBD is working fine. This has > been in production for about a month. > > I recently got the SAN resource that is presented to both hosts that I > want to roll in LVM so I can utilize snapshots on the volume data. I > don't want to bother with CLVM because I have a use case for snapshotting, > so HA-LVM as described: https://access.redhat.com/kb/docs/DOC-3068, I'm > doing the second method. The goal is a failover cluster. > > I've been struggling with how to configure my /etc/lvm/lvm.conf because of > the volume_list parameter: > > > # If volume_list is defined, each LV is only activated if there is a > # match against the list. > # "vgname" and "vgname/lvname" are matched exactly. > # "@tag" matches any tag set in the LV or VG. > # "@*" matches if any tag defined on the host is also set in the LV or > VG > # > # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ] > > would I go with something like: > volume_list = [ "vgTest0", "vgTestCluster/lvTest0", "@rhel-01", "@*" ] > > I don't get why I need to state a persistent volume group - and if I do, > which one? I've got two persistent groups on each node. I don't use LVM > on the root disk. > > Could I somehow expand this out to two HA-LVM volume groups? I don't see a > way but thought I'd ask. > > > > Erik Redding > Systems Programmer, RHCE > Core Systems > Texas State University-San Marcos > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Fri Feb 3 08:26:41 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 3 Feb 2012 09:26:41 +0100 Subject: [Linux-cluster] HA-LVM and /etc/lvm/lvm.conf In-Reply-To: <809DF8F0-D629-4C5C-8D82-B92B1DEB00B6@txstate.edu> References: <666ec9ec3026fe86314623812970f61e.squirrel@liber4e.com> <809DF8F0-D629-4C5C-8D82-B92B1DEB00B6@txstate.edu> Message-ID: Jose If i understand you looking for lvm snapshot http://www.drbd.org/users-guide/s-lvm-snapshots.html 2012/2/2 Redding, Erik > Jose - Ignore the DRBD+LVM aspect - act like it's / in LVM because I'm not > concerned about that part. I describe it because it directly effects the > LVM configuration if I turn on tagging. Here's the info on LVM+DRBD: > http://www.drbd.org/users-guide/s-lvm-lv-as-drbd-backing-dev.html - it > lets you gain a "roll back" technique if the DRBD sync breaks somehow > because it snaps the underlying LV before it starts a sync. > > > > Emmanuel - Thanks for the insightful comment but I would appreciate it if > you'd elaborate on how you'd solve for the situation. I'm open to other > solutions if Red Hat has something better (and not CLVM because I require > snapshots) but I don't see that there is one. As far as I can tell, this is > the best solution for the requirements. > > > > Thanks, > Erik Redding > Core Systems > Texas State University-San Marcos > > > > > > On Feb 2, 2012, at 3:19 AM, emmanuel segura wrote: > > HA-LVM it's deprecated on redhat cluster > > 2012/2/2 jose nuno neto > >> Hi >> >> I have used LVM-HA on a previous project and for Failover Cluster works >> fine. >> Didn't use DRBD with it, have tested DRBD for concurrent access to >> devices/filesystems >> >> For volume_list I would use something like this >> "@rhel-01" >> if you put >> "@*" >> think it will allow all tags >> so guess you should remove it >> >> anyway, didn't fully understand the DRDB here, you provide this LVM-HA has >> DRBD resources? If so you dont need LVM tags for this. just normal LVM >> LVM tags in redhat cluster are used to allow switching access to the Vgs >> from one node to others >> >> Regards >> Jose >> >> > I'm having a dialog with RH support about configuring HA-LVM within RHCS >> > and I'm trying to see if there are some limitations and thought I'd ping >> > the mailing list on the same subject. >> > >> > Is an HA-LVM configuration that only uses LVM tags useful beyond a >> single >> > volume group? >> > >> > I'm attempting to provide a database service along side a pair of HA-NFS >> > services that utilize DRBD and LVM (but drbd isn't the issue). >> > >> > on two nodes, rhel-01 and rhel-02, I currently I have three volume >> groups: >> > vgTest0, vgTest1, vgTestCluster >> > >> > vgTest0 and vgTest1 are volume groups that exist on each node, and they >> > provide a single back-end LVM volume to a DRBD resource, so DRBD can >> > leverage the snapshotting technique during syncs. Both nodes in the >> > cluster have the same configuration, and DRBD is working fine. This has >> > been in production for about a month. >> > >> > I recently got the SAN resource that is presented to both hosts that I >> > want to roll in LVM so I can utilize snapshots on the volume data. I >> > don't want to bother with CLVM because I have a use case for >> snapshotting, >> > so HA-LVM as described: https://access.redhat.com/kb/docs/DOC-3068, I'm >> > doing the second method. The goal is a failover cluster. >> > >> > I've been struggling with how to configure my /etc/lvm/lvm.conf because >> of >> > the volume_list parameter: >> > >> > >> > # If volume_list is defined, each LV is only activated if there is a >> > # match against the list. >> > # "vgname" and "vgname/lvname" are matched exactly. >> > # "@tag" matches any tag set in the LV or VG. >> > # "@*" matches if any tag defined on the host is also set in the LV or >> > VG >> > # >> > # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ] >> > >> > would I go with something like: >> > volume_list = [ "vgTest0", "vgTestCluster/lvTest0", "@rhel-01", "@*" ] >> > >> > I don't get why I need to state a persistent volume group - and if I do, >> > which one? I've got two persistent groups on each node. I don't use >> LVM >> > on the root disk. >> > >> > Could I somehow expand this out to two HA-LVM volume groups? I don't >> see a >> > way but thought I'd ask. >> > >> > >> > >> > Erik Redding >> > Systems Programmer, RHCE >> > Core Systems >> > Texas State University-San Marcos >> > >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster at redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> > -- >> > This message has been scanned for viruses and >> > dangerous content by MailScanner, and is >> > believed to be clean. >> > >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccaulfie at redhat.com Fri Feb 3 14:22:59 2012 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 03 Feb 2012 14:22:59 +0000 Subject: [Linux-cluster] Fwd: fence-agent : ipmilan : power_wait : missing in ipmi_off. In-Reply-To: <1efcbd0c-953e-4151-a1f2-4dc1a3b134be@mailpro> References: <1efcbd0c-953e-4151-a1f2-4dc1a3b134be@mailpro> Message-ID: <4F2BEDC3.7010001@redhat.com> -------- Original Message -------- Subject: fence-agent : ipmilan : power_wait : missing in ipmi_off. Date: Fri, 03 Feb 2012 15:20:47 +0100 (CET) From: Alexandre DERUMIER To: ccaulfie at redhat.com Hi, I'm working to implement a redhat cluster and I think I found a bug in ipmilan.c On this commit: fence-agents: Add power_wait to fence_ipmilan http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=7d53eb8ab06a8713d2b52500da741b6170fbfc91 in ipmi_off , the sleep(2) is still hardcorded I think It must replace with sleep(ipmi->i_power_wait), like ipmi_on ? index 52be371..46814a8 100644 --- a/fence/agents/ipmilan/ipmilan.c +++ b/fence/agents/ipmilan/ipmilan.c @@ -473,7 +473,7 @@ ipmi_off(struct ipmi *ipmi) if (ret != 0) return ret; - sleep(2); + sleep(ipmi->i_power_wait); --retries; ret = ipmi_op(ipmi, ST_STATUS, power_status); What do you thinks about it ? Best Regards, Alexandre Derumier System Engineer aderumier at odiso.com From bubble at hoster-ok.com Fri Feb 3 18:07:14 2012 From: bubble at hoster-ok.com (Vladislav Bogdanov) Date: Fri, 03 Feb 2012 21:07:14 +0300 Subject: [Linux-cluster] Fwd: fence-agent : ipmilan : power_wait : missing in ipmi_off. In-Reply-To: <4F2BEDC3.7010001@redhat.com> References: <1efcbd0c-953e-4151-a1f2-4dc1a3b134be@mailpro> <4F2BEDC3.7010001@redhat.com> Message-ID: <4F2C2252.3040301@hoster-ok.com> Hi Christine, all, This is definitely true. I do not have so nice patches against git master, but I'd like to present my patches against 3.1.7 which I use for a quite long time (I send them as I promised year ago or so on pacemaker list). I attach them in order they are applied in my srpm, and I hope that their names and contents are self-describing. If not please do not hesitate to write me. I would describe some patches here: 01-fence-agents-3.1.7-ipmilan-uniq.patch fixes parameter uniqueness report (needed for newer pacemaker) 02-fence-agents-3.1.2-ipmilan-cycle.patch fixes return value for cycle method (nobody uses it yet?) 06-fence-agents-3.1.7-ipmilan-reset-method.patch just adds IPMI reset method, because some IPMI controllers have bugs in cycle or on-off implementations and admin may want to use reset which always work with them. 08-fence-agents-3.1.7-ipmilan-force-ops.patch and 09-fence-agents-3.1.7-ipmilan-status-recheck.patch add more workarounds against buggy IPMI controllers which may report status incorrectly right after operation is completed (Supermicro on-board ones are examples of them). I definitely do not like amount of function arguments I have after that all, but it is better to leave it to package maintainer to decide what to do with them Pleas do not kick me for sending non-git patches, it is over my skills to do that. Best, Vladislav 03.02.2012 17:22, Christine Caulfield wrote: > > > -------- Original Message -------- > Subject: fence-agent : ipmilan : power_wait : missing in ipmi_off. > Date: Fri, 03 Feb 2012 15:20:47 +0100 (CET) > From: Alexandre DERUMIER > To: ccaulfie at redhat.com > > Hi, > > I'm working to implement a redhat cluster and I think I found a bug in > ipmilan.c > > On this commit: > > fence-agents: Add power_wait to fence_ipmilan > > http://git.fedorahosted.org/git/?p=fence-agents.git;a=commitdiff;h=7d53eb8ab06a8713d2b52500da741b6170fbfc91 > > > in ipmi_off , the sleep(2) is still hardcorded > > I think It must replace with sleep(ipmi->i_power_wait), like ipmi_on ? > > > index 52be371..46814a8 100644 > --- a/fence/agents/ipmilan/ipmilan.c > +++ b/fence/agents/ipmilan/ipmilan.c > @@ -473,7 +473,7 @@ ipmi_off(struct ipmi *ipmi) > if (ret != 0) > return ret; > > - sleep(2); > + sleep(ipmi->i_power_wait); > --retries; > ret = ipmi_op(ipmi, ST_STATUS, power_status); > > > > What do you thinks about it ? > > > > Best Regards, > > Alexandre Derumier > System Engineer > aderumier at odiso.com > > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: 01-fence-agents-3.1.7-ipmilan-uniq.patch Type: text/x-patch Size: 3335 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 02-fence-agents-3.1.2-ipmilan-cycle.patch Type: text/x-patch Size: 431 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 03-fence-agents-3.1.2-ipmilan-wait-time.patch Type: text/x-patch Size: 492 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 04-fence-agents-3.1.2-ipmilan-optname-typo.patch Type: text/x-patch Size: 762 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 05-fence-agents-3.1.2-ipmilan-spelling-fix.patch Type: text/x-patch Size: 599 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 06-fence-agents-3.1.7-ipmilan-reset-method.patch Type: text/x-patch Size: 4472 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 07-fence-agents-3.1.2-ipmilan-delay-fix.patch Type: text/x-patch Size: 617 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08-fence-agents-3.1.7-ipmilan-force-ops.patch Type: text/x-patch Size: 4758 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 09-fence-agents-3.1.7-ipmilan-status-recheck.patch Type: text/x-patch Size: 8627 bytes Desc: not available URL: From kortux at gmail.com Fri Feb 3 23:15:03 2012 From: kortux at gmail.com (Miguel Angel Guerrero) Date: Fri, 3 Feb 2012 18:15:03 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence Message-ID: Hi all I try to setup my cluster configuration with centos 6.2 and drbd 8.4, but again i have a fencing race situation, in this case the problem is bigger, because if a hangup any node both nodes halt, if i disconnect a drbd network cable both nodes halt, i try change the outdata-peer handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node but the result is the same, the next pastebin have the log output of both nodes (with rhcs_fence in debug mode) and my config files http://pastebin.com/Kr4FPScs thanks for the help -- Atte: ------------------------------------ Miguel Angel Guerrero Usuario GNU/Linux Registrado #353531 ------------------------------------ From emi2fast at gmail.com Fri Feb 3 23:28:37 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Sat, 4 Feb 2012 00:28:37 +0100 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: Message-ID: Hi Jose The reason is that you miss the delay parameter delay in your redhat cluster CONF man fencen or fencend. i don't remember very well ==================================================== 1. fencedevice agent="fence_ipmilan" ipaddr="192.168.201.220" lanplus="1" login="ADMIN" name="ipmi1" passwd="easy"/> 2. =================================================================== 2012/2/4 Miguel Angel Guerrero > Hi all > > I try to setup my cluster configuration with centos 6.2 and drbd 8.4, > but again i have a fencing race situation, in this case the problem is > bigger, because if a hangup any node both nodes halt, if i disconnect > a drbd network cable both nodes halt, i try change the outdata-peer > handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node > but the result is the same, the next pastebin have the log output of > both nodes (with rhcs_fence in debug mode) and my config files > http://pastebin.com/Kr4FPScs > > thanks for the help > > -- > Atte: > ------------------------------------ > Miguel Angel Guerrero > Usuario GNU/Linux Registrado #353531 > ------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From linux at alteeve.com Sat Feb 4 02:35:22 2012 From: linux at alteeve.com (Digimer) Date: Fri, 03 Feb 2012 21:35:22 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: Message-ID: <4F2C996A.4030009@alteeve.com> On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: > Hi all > > I try to setup my cluster configuration with centos 6.2 and drbd 8.4, > but again i have a fencing race situation, in this case the problem is > bigger, because if a hangup any node both nodes halt, if i disconnect > a drbd network cable both nodes halt, i try change the outdata-peer > handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node > but the result is the same, the next pastebin have the log output of > both nodes (with rhcs_fence in debug mode) and my config files > http://pastebin.com/Kr4FPScs > > thanks for the help Hi, It looks like rhcs_fence is being called repeatedly: Starting at line 23, we see that it start up: Feb 3 17:48:13 wsguardian1 rhcs_fence: 74; Attempting to fence peer using RHCS from DRBD... ... Then at line 42, it starts again: Feb 3 17:48:13 wsguardian1 rhcs_fence: 74; Attempting to fence peer using RHCS from DRBD... It seems to be called five times, and never finishes. Without a successful exit, DRBD will hang. Can you share your DRBD configuration files please? The 'rhcs_fence' agent was tested on RHEL/CentOS 6.x and DRBD 8.3 only. If there is a patch needed to make it work on 8.4, I would like to sort it out and get it applied. -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From linux at alteeve.com Sat Feb 4 02:35:57 2012 From: linux at alteeve.com (Digimer) Date: Fri, 03 Feb 2012 21:35:57 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: Message-ID: <4F2C998D.3000602@alteeve.com> On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: > Hi all > > I try to setup my cluster configuration with centos 6.2 and drbd 8.4, > but again i have a fencing race situation, in this case the problem is > bigger, because if a hangup any node both nodes halt, if i disconnect > a drbd network cable both nodes halt, i try change the outdata-peer > handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node > but the result is the same, the next pastebin have the log output of > both nodes (with rhcs_fence in debug mode) and my config files > http://pastebin.com/Kr4FPScs > > thanks for the help Woops, you did have the config there, I am blind. Let me look at it and reply again in a few minutes. -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From linux at alteeve.com Sat Feb 4 02:39:30 2012 From: linux at alteeve.com (Digimer) Date: Fri, 03 Feb 2012 21:39:30 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: <4F2C998D.3000602@alteeve.com> References: <4F2C998D.3000602@alteeve.com> Message-ID: <4F2C9A62.4090305@alteeve.com> On 02/03/2012 09:35 PM, Digimer wrote: > On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: >> Hi all >> >> I try to setup my cluster configuration with centos 6.2 and drbd 8.4, >> but again i have a fencing race situation, in this case the problem is >> bigger, because if a hangup any node both nodes halt, if i disconnect >> a drbd network cable both nodes halt, i try change the outdata-peer >> handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node >> but the result is the same, the next pastebin have the log output of >> both nodes (with rhcs_fence in debug mode) and my config files >> http://pastebin.com/Kr4FPScs >> >> thanks for the help > > Woops, you did have the config there, I am blind. Let me look at it and > reply again in a few minutes. > With the cluster up and running, can you run this please and tell me what the output is? (From the other node) /usr/sbin/cman_tool kill -f wsguardian1 -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From bshepherd at voxeo.com Sun Feb 5 12:17:13 2012 From: bshepherd at voxeo.com (Ben Shepherd) Date: Sun, 05 Feb 2012 12:17:13 +0000 Subject: [Linux-cluster] corosync issue with two interface directives Message-ID: Currently have a 2 node cluster. We configured HA on 1 network to take inbound traffic with multicast in corosync and 1 VIP. This works fine (most of the time sometimes if you take the cable out both interfaces end up with the VIP but that is another story) Customer now has another network on which they want to take traffic. I have assigned the VIP on node lxnivrr45.at.inside node lxnivrr46.at.inside primitive failover-ip1 ocf:heartbeat:IPaddr params ip=" 10.251.96.185" op monitor interval="10s" primitive failover-ip2 ocf:heartbeat:IPaddr params ip="10.2.150.201" op monitor interval="10s" colocation failover-ips inf: failover-ip1 failover-ip2 property $id="cib-bootstrap-options" dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" cluster-infrastructure="openais" expected-quorum-votes="2" no-quorum-policy="ignore" stonith-enabled="false" rsc_defaults $id="rsc-options" resource-stickiness="100" Current Corosync configuration is: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: on threads: 0 interface { ringnumber: 0 bindnetaddr: 10.251.96.160 #broadcast: yes mcastaddr: 239.254.6.8 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } I am a little confused about using. Should I add the Multicast address for the 2nd Network as ring 1 or can I have 2 Interfaces on ring 0 on different networks ? Giving me: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: on threads: 0 interface { ringnumber: 0 bindnetaddr: 10.251.96.160 #broadcast: yes mcastaddr: 239.254.6.8 mcastport: 5405 ttl: 1 } interface { ringnumber: 0 bindnetaddr: 10.122.147.192 #broadcast: yes mcastaddr: 239.254.6.9 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } Just need to make sure that if I lose either of the interfaces they VIP's fail over. -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Sun Feb 5 19:14:14 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Sun, 5 Feb 2012 20:14:14 +0100 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: References: Message-ID: I think the ringnumber must be diferent for every network 2012/2/5 Ben Shepherd > Currently have a 2 node cluster. We configured HA on 1 network to take > inbound traffic with multicast in corosync and 1 VIP. > > This works fine (most of the time sometimes if you take the cable out both > interfaces end up with the VIP but that is another story) > Customer now has another network on which they want to take traffic. I > have assigned the VIP on > > node lxnivrr45.at.inside > node lxnivrr46.at.inside > primitive failover-ip1 ocf:heartbeat:IPaddr > params ip=" 10.251.96.185" > op monitor interval="10s" > primitive failover-ip2 ocf:heartbeat:IPaddr > params ip="10.2.150.201" > op monitor interval="10s" > colocation failover-ips inf: failover-ip1 failover-ip2 > property $id="cib-bootstrap-options" > dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" > cluster-infrastructure="openais" > expected-quorum-votes="2" > no-quorum-policy="ignore" > stonith-enabled="false" > rsc_defaults $id="rsc-options" > resource-stickiness="100" > > Current Corosync configuration is: > > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > version: 2 > secauth: on > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 10.251.96.160 > #broadcast: yes > mcastaddr: 239.254.6.8 > mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: yes > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > > amf { > mode: disabled > } > > I am a little confused about using. Should I add the Multicast address for > the 2nd Network as ring 1 or can I have 2 Interfaces on ring 0 on different > networks ? > > Giving me: > > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > version: 2 > secauth: on > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 10.251.96.160 > #broadcast: yes > mcastaddr: 239.254.6.8 > mcastport: 5405 > ttl: 1 > } > interface { > ringnumber: 0 > bindnetaddr: 10.122.147.192 > #broadcast: yes > mcastaddr: 239.254.6.9 > mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: yes > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > > amf { > mode: disabled > } > > Just need to make sure that if I lose either of the interfaces they VIP's > fail over. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From bshepherd at voxeo.com Sun Feb 5 19:35:21 2012 From: bshepherd at voxeo.com (Ben Shepherd) Date: Sun, 05 Feb 2012 19:35:21 +0000 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: Message-ID: Hi, OK so how does that affect the fail over. Each f the networks is important if we lose ring 0 or ring 1 we need to fail over. If I have the config stated below: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 secauth: on threads: 0 interface { ringnumber: 0 bindnetaddr: 10.251.96.160 #broadcast: yes mcastaddr: 239.254.6.8 mcastport: 5405 ttl: 1 } interface { ringnumber: 1 bindnetaddr: 10.122.147.192 #broadcast: yes mcastaddr: 239.254.6.9 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } And I pull out the cable for the interface on ring 1 will it fail over ? Or will it use ring 1 only if ring 0 fails. I read the documentation but it is less than clear :-) I would just do it and pull the cable out but sadly it requires me to fly to Vienna to do it seems a little extravagant. From: emmanuel segura Reply-To: linux clustering Date: Sun, 5 Feb 2012 20:14:14 +0100 To: linux clustering Subject: Re: [Linux-cluster] corosync issue with two interface directives I think the ringnumber must be diferent for every network 2012/2/5 Ben Shepherd > Currently have a 2 node cluster. We configured HA on 1 network to take inbound > traffic with multicast in corosync and 1 VIP. > > This works fine (most of the time sometimes if you take the cable out both > interfaces end up with the VIP but that is another story) > Customer now has another network on which they want to take traffic. I have > assigned the VIP on > > node lxnivrr45.at.inside > node lxnivrr46.at.inside > primitive failover-ip1 ocf:heartbeat:IPaddr > params ip=" 10.251.96.185" > op monitor interval="10s" > primitive failover-ip2 ocf:heartbeat:IPaddr > params ip="10.2.150.201" > op monitor interval="10s" > colocation failover-ips inf: failover-ip1 failover-ip2 > property $id="cib-bootstrap-options" > dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" > cluster-infrastructure="openais" > expected-quorum-votes="2" > no-quorum-policy="ignore" > stonith-enabled="false" > rsc_defaults $id="rsc-options" > resource-stickiness="100" > > Current Corosync configuration is: > > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > version: 2 > secauth: on > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 10.251.96.160 > #broadcast: yes > mcastaddr: 239.254.6.8 > mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: yes > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > > amf { > mode: disabled > } > > I am a little confused about using. Should I add the Multicast address for the > 2nd Network as ring 1 or can I have 2 Interfaces on ring 0 on different > networks ? > > Giving me: > > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > version: 2 > secauth: on > threads: 0 > interface { > ringnumber: 0 > bindnetaddr: 10.251.96.160 > #broadcast: yes > mcastaddr: 239.254.6.8 > mcastport: 5405 > ttl: 1 > } > interface { > ringnumber: 0 > bindnetaddr: 10.122.147.192 > #broadcast: yes > mcastaddr: 239.254.6.9 > mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: yes > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > > amf { > mode: disabled > } > > Just need to make sure that if I lose either of the interfaces they VIP's fail > over. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From df.cluster at gmail.com Mon Feb 6 12:25:52 2012 From: df.cluster at gmail.com (Dan Frincu) Date: Mon, 6 Feb 2012 14:25:52 +0200 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: References: Message-ID: Hi, On Sun, Feb 5, 2012 at 9:35 PM, Ben Shepherd wrote: > Hi, > > OK so how does that affect the fail over. Each f the networks is important > if we lose ring 0 or ring 1 we need to fail over. > > If I have the config stated below: > # Please read the corosync.conf.5 manual page > compatibility: whitetank > > totem { > version: 2 > secauth: on > threads: 0 If secauth: on -> you need to set threads > 0 (normally threads == number_of_cpus_on_the_system) > interface { > ringnumber: 0 > bindnetaddr: 10.251.96.160 > #broadcast: yes > mcastaddr: 239.254.6.8 > ? ? ? ? ? ? ? ? mcastport: 5405 > ttl: 1 > } > interface { > ringnumber: 1 > bindnetaddr:?10.122.147.192 > #broadcast: yes > mcastaddr: 239.254.6.9 > ? ? ? ? ? ? ? ? mcastport: 5405 > ttl: 1 > } > } > > logging { > fileline: off > to_stderr: no > to_logfile: yes > to_syslog: yes > logfile: /var/log/cluster/corosync.log > debug: off > timestamp: on > logger_subsys { > subsys: AMF > debug: off > } > } > > amf { > mode: disabled > } > > And I pull out the cable for the interface on ring 1 will it fail over ? Or > will it use ring 1 only if ring 0 fails. You also need to add rrp_mode: active/passive for redundancy when using more than one ringnumber. When enabled, if you pull the cable on one of the network links, the other remaining network will continue to work, therefore the cluster manager will still have a path to communicate over -> no failover (from a messaging and membership standpoint). By default (your case) rrp_mode is set to none (no redundancy - you have redundant network communication, but it's not being used) so when you pull either cable you might: a) failover b) not failover a -> might happen if you pull the cable corosync uses for communication at the moment (that may be the one set in ringnumber 0 - not 100% sure) b -> might happen if you pull the cable corosync doesn't use for communication (-EDONTKNOW) > > I read the documentation but it is less than clear :-) > > I would just do it and pull the cable out but sadly it requires me to fly to > Vienna to do it seems a little extravagant. And if you think about doing something remotely to the network cards, have a look at http://corosync.org/doku.php?id=faq:ifdown before you do. > > From: emmanuel segura > Reply-To: linux clustering > Date: Sun, 5 Feb 2012 20:14:14 +0100 > To: linux clustering > Subject: Re: [Linux-cluster] corosync issue with two interface directives > > I think the ringnumber must be diferent for every network > > 2012/2/5 Ben Shepherd >> >> Currently have a 2 node cluster. We configured HA on 1 network to take >> inbound traffic with multicast in corosync ?and 1 VIP. >> >> This works fine (most of the time sometimes if you take the cable out both >> interfaces end up with the VIP but that is another story) The above happens because you have (see below) >> Customer now has another network on which they want to take traffic. I >> have assigned the VIP on >> >> node lxnivrr45.at.inside >> node lxnivrr46.at.inside >> primitive failover-ip1 ocf:heartbeat:IPaddr >> params ip=" 10.251.96.185" >> op monitor interval="10s" >> ?primitive failover-ip2 ocf:heartbeat:IPaddr >> params ip="10.2.150.201" >> op monitor interval="10s" >> colocation failover-ips inf: failover-ip1 failover-ip2 >> property $id="cib-bootstrap-options" >> dc-version="1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" >> cluster-infrastructure="openais" >> expected-quorum-votes="2" >> no-quorum-policy="ignore" >> stonith-enabled="false" ^^ this set to false (see above) Regards, Dan >> rsc_defaults $id="rsc-options" >> resource-stickiness="100" >> >> Current Corosync configuration is: >> >> # Please read the corosync.conf.5 manual page >> compatibility: whitetank >> >> totem { >> version: 2 >> secauth: on >> threads: 0 >> interface { >> ringnumber: 0 >> bindnetaddr: 10.251.96.160 >> #broadcast: yes >> mcastaddr: 239.254.6.8 >> ? ? ? ? ? ? ? ? mcastport: 5405 >> ttl: 1 >> } >> } >> >> logging { >> fileline: off >> to_stderr: no >> to_logfile: yes >> to_syslog: yes >> logfile: /var/log/cluster/corosync.log >> debug: off >> timestamp: on >> logger_subsys { >> subsys: AMF >> debug: off >> } >> } >> >> amf { >> mode: disabled >> } >> >> I am a little confused about using. Should I add the Multicast address for >> the 2nd Network as ring 1 or can I have 2 Interfaces on ring 0 on different >> networks ? >> >> Giving me: >> >> # Please read the corosync.conf.5 manual page >> compatibility: whitetank >> >> totem { >> version: 2 >> secauth: on >> threads: 0 >> interface { >> ringnumber: 0 >> bindnetaddr: 10.251.96.160 >> #broadcast: yes >> mcastaddr: 239.254.6.8 >> ? ? ? ? ? ? ? ? mcastport: 5405 >> ttl: 1 >> } >> interface { >> ringnumber: 0 >> bindnetaddr:?10.122.147.192 >> #broadcast: yes >> mcastaddr: 239.254.6.9 >> ? ? ? ? ? ? ? ? mcastport: 5405 >> ttl: 1 >> } >> } >> >> logging { >> fileline: off >> to_stderr: no >> to_logfile: yes >> to_syslog: yes >> logfile: /var/log/cluster/corosync.log >> debug: off >> timestamp: on >> logger_subsys { >> subsys: AMF >> debug: off >> } >> } >> >> amf { >> mode: disabled >> } >> >> Just need to make sure that if I lose either of the interfaces they VIP's >> fail over. >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > -- Linux-cluster mailing list Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Dan Frincu CCNA, RHCE From bshepherd at voxeo.com Mon Feb 6 13:22:01 2012 From: bshepherd at voxeo.com (Ben Shepherd) Date: Mon, 06 Feb 2012 13:22:01 +0000 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: Message-ID: Now I am even more confused. How do I configure this thing so that it fails over if either of the networks I lost. Can I setup 2 multicast address on separate networks in a non-redundant way. On 06/02/2012 12:25, "Dan Frincu" wrote: > > > >OK so how does that affect the fail over. Each f the networks is important From df.cluster at gmail.com Mon Feb 6 14:29:55 2012 From: df.cluster at gmail.com (Dan Frincu) Date: Mon, 6 Feb 2012 16:29:55 +0200 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: References: Message-ID: Hi, On Mon, Feb 6, 2012 at 3:22 PM, Ben Shepherd wrote: > Now I am even more confused. How do I configure this thing so that it > fails over if either of the networks I lost. > Don't really see the reasoning behind this, normally you'd want the service to be available if any of the paths is still reachable. To prevent what I would call undefined behavior, you would be better off with just one ring if you don't want redundancy. Otherwise look into setting up ping location restrictions (but this is done one layer up, in the resource manager, not in the communications layer). See http://www.clusterlabs.org/wiki/Pingd_with_resources_on_different_networks Regards, Dan > Can I setup 2 multicast address on separate networks in a non-redundant > way. Now given the statement made here, I have to ask, if they're not redundant, why use two multicast groups? > > > > On 06/02/2012 12:25, "Dan Frincu" wrote: > >> >> >> >>OK so how does that affect the fail over. Each f the networks is important > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Dan Frincu CCNA, RHCE From kortux at gmail.com Mon Feb 6 15:06:51 2012 From: kortux at gmail.com (Miguel Angel Guerrero) Date: Mon, 6 Feb 2012 10:06:51 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: <4F2C9A62.4090305@alteeve.com> References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> Message-ID: Hi digimer The command /usr/sbin/cman_tool kill -f wsguardian1 show me: /usr/sbin/cman_tool: unknown option: f I try with /usr/sbin/cman_tool kill -n wsguardian1 In this case the node2 (wsguardian2) halt complety after i execute the command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 The next paste bin have the log from both nodes http://pastebin.com/jHKVW9kF On Fri, Feb 3, 2012 at 9:39 PM, Digimer wrote: > On 02/03/2012 09:35 PM, Digimer wrote: >> On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: >>> Hi all >>> >>> I try to setup my cluster configuration with centos 6.2 and drbd 8.4, >>> but again i have a fencing race situation, in this case the problem is >>> bigger, because if a hangup any node both nodes halt, if i disconnect >>> a drbd network cable both nodes halt, i try change the outdata-peer >>> handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node >>> but the result is the same, the next pastebin have the log output of >>> both nodes (with rhcs_fence in debug mode) and my config files >>> http://pastebin.com/Kr4FPScs >>> >>> thanks for the help >> >> Woops, you did have the config there, I am blind. Let me look at it and >> reply again in a few minutes. >> > > With the cluster up and running, can you run this please and tell me > what the output is? > > (From the other node) > /usr/sbin/cman_tool kill -f wsguardian1 > > -- > Digimer > E-Mail: ? ? ? ? ? ? ?digimer at alteeve.com > Papers and Projects: https://alteeve.com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Atte: ------------------------------------ Miguel Angel Guerrero Usuario GNU/Linux Registrado #353531 ------------------------------------ From emi2fast at gmail.com Mon Feb 6 15:18:06 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Mon, 6 Feb 2012 16:18:06 +0100 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> Message-ID: Miguel Te pido escusas si te hablo en espa?ol, pero como te e dicho muchas veces, el problem es de redhat cluster no de drbd, tienes que usar el parametro delay en el fence device de uno de los dos servidores 2012/2/6 Miguel Angel Guerrero > Hi digimer > > The command /usr/sbin/cman_tool kill -f wsguardian1 show me: > /usr/sbin/cman_tool: unknown option: f > I try with /usr/sbin/cman_tool kill -n wsguardian1 > > In this case the node2 (wsguardian2) halt complety after i execute the > command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 > > The next paste bin have the log from both nodes > > http://pastebin.com/jHKVW9kF > > On Fri, Feb 3, 2012 at 9:39 PM, Digimer wrote: > > On 02/03/2012 09:35 PM, Digimer wrote: > >> On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: > >>> Hi all > >>> > >>> I try to setup my cluster configuration with centos 6.2 and drbd 8.4, > >>> but again i have a fencing race situation, in this case the problem is > >>> bigger, because if a hangup any node both nodes halt, if i disconnect > >>> a drbd network cable both nodes halt, i try change the outdata-peer > >>> handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node > >>> but the result is the same, the next pastebin have the log output of > >>> both nodes (with rhcs_fence in debug mode) and my config files > >>> http://pastebin.com/Kr4FPScs > >>> > >>> thanks for the help > >> > >> Woops, you did have the config there, I am blind. Let me look at it and > >> reply again in a few minutes. > >> > > > > With the cluster up and running, can you run this please and tell me > > what the output is? > > > > (From the other node) > > /usr/sbin/cman_tool kill -f wsguardian1 > > > > -- > > Digimer > > E-Mail: digimer at alteeve.com > > Papers and Projects: https://alteeve.com > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Atte: > ------------------------------------ > Miguel Angel Guerrero > Usuario GNU/Linux Registrado #353531 > ------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From kortux at gmail.com Mon Feb 6 15:34:29 2012 From: kortux at gmail.com (Miguel Angel Guerrero) Date: Mon, 6 Feb 2012 10:34:29 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> Message-ID: I forget to say the node1 (wsguardian1) its live all the time. On Mon, Feb 6, 2012 at 10:06 AM, Miguel Angel Guerrero wrote: > Hi digimer > > The command /usr/sbin/cman_tool kill -f wsguardian1 show me: > /usr/sbin/cman_tool: unknown option: f > I try with /usr/sbin/cman_tool kill -n wsguardian1 > > In this case the node2 (wsguardian2) halt complety after i execute the > command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 > > The next paste bin have the log from both nodes > > http://pastebin.com/jHKVW9kF > > On Fri, Feb 3, 2012 at 9:39 PM, Digimer wrote: >> On 02/03/2012 09:35 PM, Digimer wrote: >>> On 02/03/2012 06:15 PM, Miguel Angel Guerrero wrote: >>>> Hi all >>>> >>>> I try to setup my cluster configuration with centos 6.2 and drbd 8.4, >>>> but again i have a fencing race situation, in this case the problem is >>>> bigger, because if a hangup any node both nodes halt, if i disconnect >>>> a drbd network cable both nodes halt, i try change the outdata-peer >>>> handler with /sbin/obliterate-peer.sh with the "sleep 10" in one node >>>> but the result is the same, the next pastebin have the log output of >>>> both nodes (with rhcs_fence in debug mode) and my config files >>>> http://pastebin.com/Kr4FPScs >>>> >>>> thanks for the help >>> >>> Woops, you did have the config there, I am blind. Let me look at it and >>> reply again in a few minutes. >>> >> >> With the cluster up and running, can you run this please and tell me >> what the output is? >> >> (From the other node) >> /usr/sbin/cman_tool kill -f wsguardian1 >> >> -- >> Digimer >> E-Mail: ? ? ? ? ? ? ?digimer at alteeve.com >> Papers and Projects: https://alteeve.com >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Atte: > ------------------------------------ > Miguel Angel Guerrero > Usuario GNU/Linux Registrado #353531 > ------------------------------------ -- Atte: ------------------------------------ Miguel Angel Guerrero Usuario GNU/Linux Registrado #353531 ------------------------------------ From linux at alteeve.com Mon Feb 6 15:37:38 2012 From: linux at alteeve.com (Digimer) Date: Mon, 06 Feb 2012 10:37:38 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> Message-ID: <4F2FF3C2.9080004@alteeve.com> On 02/06/2012 10:06 AM, Miguel Angel Guerrero wrote: > Hi digimer > > The command /usr/sbin/cman_tool kill -f wsguardian1 show me: > /usr/sbin/cman_tool: unknown option: f > I try with /usr/sbin/cman_tool kill -n wsguardian1 > > In this case the node2 (wsguardian2) halt complety after i execute the > command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 > > The next paste bin have the log from both nodes > > http://pastebin.com/jHKVW9kF Thanks for this, Miguel. I am leaving for a business trip tomorrow and won't be back for a week, I am afraid. I am surprised that the -f switch is causing a problem as I tested the fence agent on RHEL 6.2 and CentOS 6.2. Are you running the cluster from the stock repositories, or did you port the Fedora RPMs (or install from source)? I will certainly test/fix this as soon as I return. In the meantime, if you feel comfortable with perl, it should be pretty easy to patch rhcs_fence yourself (and I'd happily apply a patch). Cheers -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From kortux at gmail.com Mon Feb 6 15:41:18 2012 From: kortux at gmail.com (Miguel Angel Guerrero) Date: Mon, 6 Feb 2012 10:41:18 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: <4F2FF3C2.9080004@alteeve.com> References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> <4F2FF3C2.9080004@alteeve.com> Message-ID: Hi digimer On Mon, Feb 6, 2012 at 10:37 AM, Digimer wrote: > On 02/06/2012 10:06 AM, Miguel Angel Guerrero wrote: >> Hi digimer >> >> The command /usr/sbin/cman_tool kill -f wsguardian1 show me: >> /usr/sbin/cman_tool: unknown option: f >> I try with /usr/sbin/cman_tool kill -n wsguardian1 >> >> In this case the node2 (wsguardian2) halt complety after i execute the >> command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 >> >> The next paste bin have the log from both nodes >> >> http://pastebin.com/jHKVW9kF > > Thanks for this, Miguel. > > ?I am leaving for a business trip tomorrow and won't be back for a > week, I am afraid. I am surprised that the -f switch is causing a > problem as I tested the fence agent on RHEL 6.2 and CentOS 6.2. Are you > running the cluster from the stock repositories, or did you port the > Fedora RPMs (or install from source)? > > ?I will certainly test/fix this as soon as I return. In the meantime, > if you feel comfortable with perl, it should be pretty easy to patch > rhcs_fence yourself (and I'd happily apply a patch). > > Cheers > > -- > Digimer > E-Mail: ? ? ? ? ? ? ?digimer at alteeve.com > Papers and Projects: https://alteeve.com -- Atte: ------------------------------------ Miguel Angel Guerrero Usuario GNU/Linux Registrado #353531 ------------------------------------ From kortux at gmail.com Mon Feb 6 15:42:17 2012 From: kortux at gmail.com (Miguel Angel Guerrero) Date: Mon, 6 Feb 2012 10:42:17 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> <4F2FF3C2.9080004@alteeve.com> Message-ID: I am using the official repository centos 6.2 version packages cman_tool -V cman_tool 3.0.12.1 (built Dec 7 2011 21:28:25) Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved. On Mon, Feb 6, 2012 at 10:41 AM, Miguel Angel Guerrero wrote: > Hi digimer > > > On Mon, Feb 6, 2012 at 10:37 AM, Digimer wrote: >> On 02/06/2012 10:06 AM, Miguel Angel Guerrero wrote: >>> Hi digimer >>> >>> The command /usr/sbin/cman_tool kill -f wsguardian1 show me: >>> /usr/sbin/cman_tool: unknown option: f >>> I try with /usr/sbin/cman_tool kill -n wsguardian1 >>> >>> In this case the node2 (wsguardian2) halt complety after i execute the >>> command "/usr/sbin/cman_tool kill -n wsguardian1" from wsguardian2 >>> >>> The next paste bin have the log from both nodes >>> >>> http://pastebin.com/jHKVW9kF >> >> Thanks for this, Miguel. >> >> ?I am leaving for a business trip tomorrow and won't be back for a >> week, I am afraid. I am surprised that the -f switch is causing a >> problem as I tested the fence agent on RHEL 6.2 and CentOS 6.2. Are you >> running the cluster from the stock repositories, or did you port the >> Fedora RPMs (or install from source)? >> >> ?I will certainly test/fix this as soon as I return. In the meantime, >> if you feel comfortable with perl, it should be pretty easy to patch >> rhcs_fence yourself (and I'd happily apply a patch). >> >> Cheers >> >> -- >> Digimer >> E-Mail: ? ? ? ? ? ? ?digimer at alteeve.com >> Papers and Projects: https://alteeve.com > > > > -- > Atte: > ------------------------------------ > Miguel Angel Guerrero > Usuario GNU/Linux Registrado #353531 > ------------------------------------ -- Atte: ------------------------------------ Miguel Angel Guerrero Usuario GNU/Linux Registrado #353531 ------------------------------------ From linux at alteeve.com Mon Feb 6 15:43:50 2012 From: linux at alteeve.com (Digimer) Date: Mon, 06 Feb 2012 10:43:50 -0500 Subject: [Linux-cluster] Fencing race again in centos6.2 with rhcs_fence In-Reply-To: References: <4F2C998D.3000602@alteeve.com> <4F2C9A62.4090305@alteeve.com> <4F2FF3C2.9080004@alteeve.com> Message-ID: <4F2FF536.3030104@alteeve.com> On 02/06/2012 10:42 AM, Miguel Angel Guerrero wrote: > I am using the official repository centos 6.2 version packages > > cman_tool -V > cman_tool 3.0.12.1 (built Dec 7 2011 21:28:25) > Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved. That is quite odd... I am curious why it worked when I wrote the script... :) Still the same though, if you can/want to patch it, I'd be happy to have the patch. Otherwise, I will return to it as soon as I return from my trip. Cheers -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From bshepherd at voxeo.com Mon Feb 6 16:34:23 2012 From: bshepherd at voxeo.com (Ben Shepherd) Date: Mon, 06 Feb 2012 16:34:23 +0000 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: Message-ID: Basically traffic of both types comes in from BOTH networks. We send the traffic to the VIP's on each network. These VIPS will be held by the Active server. Traffic will go to Server 1 on both Network1 and Network2. If we lose either the interface to Network1 or the interface to Network2 we need to fail over the VIP's to the other server. We cannot keep the VIP on the active server if 1 of the networks is not working as an entire service will go down. Yes I would prefer a single ring with 2 interfaces...that fails over if either interfaces reports a problem. Can I do that ? On 06/02/2012 14:29, "Dan Frincu" wrote: >Hi, > >On Mon, Feb 6, 2012 at 3:22 PM, Ben Shepherd wrote: >> Now I am even more confused. How do I configure this thing so that it >> fails over if either of the networks I lost. >> > >Don't really see the reasoning behind this, normally you'd want the >service to be available if any of the paths is still reachable. > >To prevent what I would call undefined behavior, you would be better >off with just one ring if you don't want redundancy. > >Otherwise look into setting up ping location restrictions (but this is >done one layer up, in the resource manager, not in the communications >layer). See >http://www.clusterlabs.org/wiki/Pingd_with_resources_on_different_networks > >Regards, >Dan > >> Can I setup 2 multicast address on separate networks in a non-redundant >> way. > >Now given the statement made here, I have to ask, if they're not >redundant, why use two multicast groups? > >> >> >> >> On 06/02/2012 12:25, "Dan Frincu" wrote: >> >>> >>> >>> >>>OK so how does that affect the fail over. Each f the networks is >>>important >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > >-- >Dan Frincu >CCNA, RHCE > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster From florian at hastexo.com Tue Feb 7 16:27:36 2012 From: florian at hastexo.com (Florian Haas) Date: Tue, 7 Feb 2012 17:27:36 +0100 Subject: [Linux-cluster] corosync issue with two interface directives In-Reply-To: References: Message-ID: Ben, I'm afraid you're completely missing the distinction between internal cluster communications (the "interface" definitions in corosync.conf), and the clients' communications with networked cluster resources. On Mon, Feb 6, 2012 at 5:34 PM, Ben Shepherd wrote: > Basically traffic of both types comes in from BOTH networks. > We send the traffic to the VIP's on each network. > These VIPS will be held by the Active server. > > Traffic will go to Server 1 on both Network1 and Network2. When you say Network1 and Network2, does that mean two network interfaces connected to two distinct subnets? > If we lose either the interface to Network1 or the interface to Network2 > we need to fail over the VIP's to the other server. That's what connectivity monitoring is for, which is a cluster service. Corosync doesn't concern itself with that; Pacemaker will manage it. The ocf:pacemaker:ping resource agent was designed for that purpose. > We cannot keep the VIP on the active server if 1 of the networks is not > working as an entire service will go down. > > Yes I would prefer a single ring with 2 interfaces...that fails over if > either interfaces reports a problem. No you don't; you always want your cluster to communicate over as many rings as possible. You want your cluster resource manager to fail over if there is a problem on the upstream network. I hope this helps. Try to think of cluster communications and cluster resource management as two distinct layers in the stack. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now From lhh at redhat.com Tue Feb 7 23:13:49 2012 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 07 Feb 2012 18:13:49 -0500 Subject: [Linux-cluster] fence-virt 0.3.0 Message-ID: <4F31B02D.2040904@redhat.com> Hi, I've cut v0.3.0 of fence-virt. - Serial listener now can handle multiple domain starts/stops - Libvirt-qpid replaced with libvirt-qmf - QMFv2 management - A pacemaker backend is now available - Systemd integration - Deprecated cman/checkpoint plugin. - Easier to deploy on Fedora systems. Contributors: Zane Bitter - QMFv2 backend & Misc Fixes Kazunori INOUE - Serial listener enhancements for multiple machine start/stops Pacemaker backend Bleeding edge packages: http://koji.fedoraproject.org/koji/taskinfo?taskID=3770449 Enjoy. -- Lon From lhh at redhat.com Tue Feb 7 23:21:49 2012 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 07 Feb 2012 18:21:49 -0500 Subject: [Linux-cluster] [Cluster-devel] fence-virt 0.3.0 In-Reply-To: <4F31B02D.2040904@redhat.com> References: <4F31B02D.2040904@redhat.com> Message-ID: <4F31B20D.9050601@redhat.com> On 02/07/2012 06:13 PM, Lon Hohberger wrote: > > Bleeding edge packages: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3770449 > Clearly, I need more coffee. Source tarball here: https://sourceforge.net/projects/fence-virt/files/fence-virt-0.3.0.tar.gz/download My apologies for not posting this in the previous email. Have a wonderful day. -- Lon From lhh at redhat.com Wed Feb 8 14:36:26 2012 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 08 Feb 2012 09:36:26 -0500 Subject: [Linux-cluster] $OCF_ERR_CONFIGURED - recovers service on another cluster node In-Reply-To: References: Message-ID: <4F32886A.4070808@redhat.com> On 01/27/2012 04:03 AM, Parvez Shaikh wrote: > Hi guys, > > I am using Red Hat Cluster Suite which comes with RHEL 5.5 - > > cman_tool version > >>6.2.0 config xxx > > Now I have a script resource in which I return $OCF_ERR_CONFIGURED; in > case of a Fatal irrecoverable error, hoping that my service would not > start on another cluster node. > > But I see that cluster, relocates it to another cluster node and > attempts to start it. > > I referred error code documentation from > http://www.linux-ha.org/doc/dev-guides/_return_codes.html > > Is there any return code which makes RHCS to give up on recovering service? > The resource must fail during the 'stop' phase if you want rgmanager to not try to recover it. There is no 'start' phase error condition that tells rgmanager to give up. The history: If you don't have a program installed or configured on host1 but try to enable a service there, it will obviously fail to start (rightfully so). However, host2 may have the configuration. So, rgmanager will then stop the service and try to start it on host2. In fact, it will systematically try every host in the cluster until: - the service starts successfully - no more hosts are available (e.g. restricted failover domain, exclusive services, or simply all hosts were tried). At this point, the service is placed in the 'stopped' state in the hopes that the next host to come online will be able to start the service - a failure during 'stop' occurs. Most errors during the stop phase will trigger an abortion of the enable request (except 'OCF_NOT_INSTALLED' when a