From carlopmart at gmail.com Fri Mar 2 07:55:10 2012 From: carlopmart at gmail.com (C. L. Martinez) Date: Fri, 2 Mar 2012 08:55:10 +0100 Subject: [Linux-cluster] Questionas about fence_vmware_soap Message-ID: Hi all, I have some doubts about using fence_vmware_soap under vSphere Infrastructure (4.1). a) Can I use this fence device under an ESXi 4.1 standalone server without using vCenter server?? b) To use fence_vmware_soap with vCenter server, what privileges needs vCenter user to fence cluster nodes?? start, stop and restart are right or do I neeed to configure more?? Thanks. From christian.masopust at siemens.com Fri Mar 2 09:09:02 2012 From: christian.masopust at siemens.com (Masopust, Christian) Date: Fri, 2 Mar 2012 10:09:02 +0100 Subject: [Linux-cluster] Questionas about fence_vmware_soap In-Reply-To: References: Message-ID: Hi, I don't know fence_vmware_soap but for ESXi 4.1 I've written a "fence_esxi" (based on fence_apc) which connects by ssh and simply powers on/off the VM by means of "vim-cmd". Please send me a private mail if you like to have it :) br, christian > -----Urspr?ngliche Nachricht----- > Von: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von C. > L. Martinez > Gesendet: Freitag, 02. M?rz 2012 08:55 > An: linux-cluster at redhat.com > Betreff: [Linux-cluster] Questionas about fence_vmware_soap > > Hi all, > > I have some doubts about using fence_vmware_soap under vSphere > Infrastructure (4.1). > > a) Can I use this fence device under an ESXi 4.1 standalone server > without using vCenter server?? > > b) To use fence_vmware_soap with vCenter server, what privileges needs > vCenter user to fence cluster nodes?? start, stop and restart are > right or do I neeed to configure more?? > > Thanks. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From scooter at cgl.ucsf.edu Fri Mar 2 14:05:35 2012 From: scooter at cgl.ucsf.edu (Scooter Morris) Date: Fri, 02 Mar 2012 06:05:35 -0800 Subject: [Linux-cluster] GFS2 bug in appending to a file? Message-ID: <4F50D3AF.4090706@cgl.ucsf.edu> Hi all, We're seeing a problem with file append using cat: "cat >> file" on a 4 node cluster with gfs2 where the file's mtime doesn't get updated. This looks exactly the same as in Bug 496716 , except that bug was supposed to have been fixed in RHEL 5.5 and we're running RHEL 6.2. Did the same problem creep back in or is this a new manifestation of the problem? We are mounting noatime, noquota, nodiratime, noacl if that helps. Thanks in advance! -- scooter -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Fri Mar 2 14:16:50 2012 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 02 Mar 2012 14:16:50 +0000 Subject: [Linux-cluster] GFS2 bug in appending to a file? In-Reply-To: <4F50D3AF.4090706@cgl.ucsf.edu> References: <4F50D3AF.4090706@cgl.ucsf.edu> Message-ID: <1330697810.2745.3.camel@menhir> Hi, On Fri, 2012-03-02 at 06:05 -0800, Scooter Morris wrote: > Hi all, > We're seeing a problem with file append using cat: "cat >> file" > on a 4 node cluster with gfs2 where the file's mtime doesn't get > updated. This looks exactly the same as in Bug 496716, except that > bug was supposed to have been fixed in RHEL 5.5 and we're running RHEL > 6.2. Did the same problem creep back in or is this a new > manifestation of the problem? We are mounting noatime, noquota, > nodiratime, noacl if that helps. > > Thanks in advance! > > -- scooter > Well the patch in question is definitely in RHEL6, so this is probably something different. Can you open a bug for it? Steve. From thevision at pobox.com Fri Mar 2 16:27:37 2012 From: thevision at pobox.com (Greg Mortensen) Date: Fri, 2 Mar 2012 11:27:37 -0500 Subject: [Linux-cluster] Throughput drops with VMware GFS2 cluster when using fence_scsi In-Reply-To: References: <1330017228.2710.51.camel@menhir> Message-ID: This looks like it was caused by the device mapping defaulting to a round-robin path selection policy.? While I couldn't find any mention of it in the RedHat cluster documentation, I did see some MSCS postings[1] that said: Round Robin can interfere with applications that use SCSI reservations for sharing LUNs among VMs and thus is not supported with the use of LUNs with MSCS. so I changed the policy to "Most Recently Used" and was able to get sustained writes of 40MB/s and sustained reads of 100MB/s down one NIC. Regards, Greg [1] http://en.community.dell.com/techcenter/storage/w/wiki/2671.aspx From orquidea.peramor at gmail.com Fri Mar 2 19:22:32 2012 From: orquidea.peramor at gmail.com (Orquidea Salt mas) Date: Fri, 2 Mar 2012 20:22:32 +0100 Subject: [Linux-cluster] =?iso-8859-1?q?Reb=E9late_by_self-management=2C_f?= =?iso-8859-1?q?irst_project_of_free_software_by_which_we_bet_all_/?= =?iso-8859-1?q?_Reb=E9late_por_la_autogesti=F3n=2C_primer_proyecto?= =?iso-8859-1?q?_de_software_libre_por_el_que_apostamos_todas?= Message-ID: Ingl?s : Many already we have contributed to the first project of free software dedicated to self-management in this campaign of collective financing, it collaborates and it spreads!/ Beginning campaign collective financing http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion?lang=en Login to enter with user of social networks and for would register in Goteo : http://www.goteo.org/user/login?lang=en Rebelaos! Publication by self-management A massive publication that floods the public transport, the work centers, the parks, the consumption centers, by means of distribution of 500,000 gratuitous units, acting simultaneously in all sides and nowhere. We announce the main tool of a vestibule Web for the management of self-sustaining resources by means of Drupal, in addition in the publication there will be an article dedicated to free software, hardware, It is being prepared in ingl?s, the machinery You can see more details in the index of the publication https://n-1.cc/pg/file/read/1151902/indexresumen-de-los-contenidos-pdf . A computer system that allows us to share resources in all the scopes of our life so that we do not have to generate means different for each subject nor for each territory. A point of contact digitalis to generate projects of life outside Capitalism and to margin of the State. A tool to spread and to impel the social transformation through the resources that will set out in their contents around self-management, the autoorganizaci?n, the disobedience and the collective action. In which the capitalist system goes to the collapse, in a while immersed in a deep systemic crisis (ecological, political and economic, but mainly of values), where individual and collective of people they are being lacking of his fundamental rights, is necessary to develop a horizontal collective process where all the human beings we pruned to interact in equality of conditions and freedom. To interact means to relate to us (as much human as economically), to communicate to us, to cover our basic needs, to generate and to protect communal properties, to know and to provide collective solutions us problematic that our lives interfere. We want abrir a breach within normality in the monotonous life state-capitalist, a day anyone, that finally will not be any day. By means of this publication we try: - To drive a horizontal collective process where all and all we pruned to interact in equality of conditions and freedom. - To create communications network between the people it jeopardize with the change and arranged to act. - To find collective solutions to problematic that our lives interfere - To facilitate the access to resources that make possible self-management. - To participate in the construction of networks of mutual support, generated horizontals, asamblearias and from the base. - To publish all this information in an attractive format stops to facilitate the access to all the society. There are 15 days remaining for the upcoming March 15, the day that will come Rebelaos!, Magazine for the selfmanagement Today, we issue the cover of Rebelaos! (Castilian version) that can be displayed on the following link: https://n-1.cc/pg/file/read/1200503/portada-15-de-marzo-rebelaos The contents of the store owners to us by 15 March. Do you? Do you keep on 15 March? In addition, we have over 200 distribution nodes, distributed throughout the Spanish state. Check the map: https://afinidadrebelde.crowdmap.com/ On the other hand, the funding campaign continues to move and still have 12 days to collect the remaining 6,000 euros. We can all make a bit for all the grains of sand become a great beach on March 15. You can access the co-financing campaign: http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion Rebel Affinity group www.rebelaos.net ------------------------------------------------------------------------------- Castellano: Muchos ya hemos aportado al primer proyecto de software libre dedicado a la la financiaci?n colectiva, colabora y diffunde !!!!! Inicio campa?a financiaci?n colectiva goteo.org www.goteo.org/project/rebelaos-publicacion-por-la-autogestion Link para registrarse en Goteo y acceder a redes sociales para colaborar en la difus?n http://www.goteo.org/user/login ?Rebelaos! Publicaci?n por la autogesti?n Una publicaci?n masiva que inunde el transporte p?blico, los centros de trabajo, los parques, los centros de consumo, mediante la distribuci?n de 500.000 ejemplares gratuitos, actuando simult?neamente en todos lados y en ninguna parte. Anunciamos la herramienta principal de un portal web para la gesti?n de recursos autogestionados mediante Drupal, adem?s en la publicaci?n habr? un art?culo dedicado al software libre, el hardware, la maquinaria... Puedes ver m?s detalles en el ?ndice de la publicaci?n https://n-1.cc/pg/file/read/1151902/indexresumen-de-los-contenidos-pdf Un sistema inf?rmatico que nos permita compartir recursos en todos los ?mbitos de nuestra vida de forma que no tengamos que generar un medio distinto para cada tema ni para cada territorio. Un punto de encuentro digital para generar proyectos de vida fuera del capitalismo y al margen del Estado. Una herramienta para difundir e impulsar la transformaci?n social a trav?s de los recursos que se propondr?n en sus contenidos en torno a la autogesti?n, la autoorganizaci?n, la desobediencia y la acci?n colectiva. En un momento en que el sistema capitalista se dirige al colapso, inmerso en una profunda crisis sist?mica (ecol?gica, pol?tica y econ?mica, pero principalmente de valores), donde individuos y colectivos de personas est?n siendo desprovistos de sus derechos fundamentales, es necesario desarrollar un proceso colectivo horizontal donde todos los seres humanos podamos interactuar en igualdad de condiciones y en libertad. Interactuar significa relacionarnos (tanto humana como econ?micamente), comunicarnos, cubrir nuestras necesidades b?sicas, generar y proteger bienes comunes, conocernos y dar soluciones colectivas a problem?ticas que interfieren nuestras vidas. Queremos abrir una brecha dentro de la normalidad en la mon?tona vida estatal-capitalista, un d?a cualquiera, que finalmente no ser? cualquier d?a. Mediante esta publicaci?n pretendemos: - Impulsar un proceso colectivo horizontal donde todos y todas podamos interactuar en igualdad de condiciones y en libertad. - Crear red de comunicaciones entre las personas comprometidas con el cambio y dispuestas a actuar. - Encontrar soluciones colectivas a problem?ticas que interfieren nuestras vidas. - Facilitar el acceso a recursos que posibiliten la autogesti?n. - Participar en la construcci?n de redes de apoyo mutuo, horizontales, asamblearias y generadas desde la base. - Publicar toda esta informaci?n en un formato atractivo para facilitar el acceso a toda la sociedad. Son 15 los d?as que restan para el pr?ximo 15 de marzo, d?a en el que ver? la luz ?Rebelaos!, publicaci?n por la autogesti?n. Hoy, hacemos p?blica la portada de ?Rebelaos! (versi?n en castellano) que pod?is visualizar en el siguiente enlace: https://n-1.cc/pg/file/read/1200503/portada-15-de-marzo-rebelaos El contenido de los titulares nos los guardamos para el 15 de marzo. ?Y t?? ?Te guardas el 15 de marzo? Adem?s, ya hemos superado los 200 nodos de distribuci?n, repartidos por todo el estado espa?ol. Ver el mapa: https://afinidadrebelde.crowdmap.com/ Por otro lado, la campa?a de financiaci?n contin?a avanzando y todav?a quedan 12 d?as para reunir los 6.000 euros que restan. Todas podemos aportar un poco para que todos los granitos de arena se conviertan en una gran playa el 15 de marzo. Pod?is acceder a la campa?a de cofinanciaci?n en: http://www.goteo.org/project/rebelaos-publicacion-por-la-autogestion Colectivo Afinidad Rebelde www.rebelaos.net From carlopmart at gmail.com Fri Mar 2 22:21:53 2012 From: carlopmart at gmail.com (carlopmart) Date: Fri, 02 Mar 2012 23:21:53 +0100 Subject: [Linux-cluster] Questionas about fence_vmware_soap In-Reply-To: References: Message-ID: <4F514801.4090002@gmail.com> On 03/02/2012 10:09 AM, Masopust, Christian wrote: > > Hi, > > I don't know fence_vmware_soap but for ESXi 4.1 I've written > a "fence_esxi" (based on fence_apc) which connects by ssh and > simply powers on/off the VM by means of "vim-cmd". > > Please send me a private mail if you like to have it :) > > br, > christian > Ok, I have tried to configure it (fence_vmware_soap): ...... ... and it seems it works, but .... [root at firstnode cluster]# clustat Cluster Status for VMwareCluster @ Fri Mar 2 22:16:04 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ firstnode.domain.com 1 Online, Local secondnode.domain.com 2 Offline ... and when I try to start clvmd: [root at firstnode log]# service clvmd start Starting clvmd: clvmd startup timed out and group_tool output: [root at firstnode cluster]# group_tool fence domain member count 1 victim count 1 victim now 2 master nodeid 1 wait state fencing members dlm lockspaces name clvmd id 0x4104eefa flags 0x00000015 need_plock,kern_stop,join change member 0 joined 0 remove 0 failed 0 seq 0,0 members new change member 1 joined 1 remove 0 failed 0 seq 1,1 new status wait_messages 0 wait_condition 1 fencing new members 1 .. and I can't configure shared storage with lvm tools because they doesn't works ... Where is the problem?' Do I need to setup a quorum device?? Still I have not configured second node ... -- CL Martinez carlopmart {at} gmail {d0t} com From grimme at atix.de Thu Mar 8 15:04:18 2012 From: grimme at atix.de (Marc Grimme) Date: Thu, 8 Mar 2012 16:04:18 +0100 (CET) Subject: [Linux-cluster] GFS2 not able to remove a file In-Reply-To: Message-ID: <43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20> Hello, I'm having a strange behavior of a GFS2 file system. I have a file I can write to and read from. But I cannot delete the file or move it. I've already done an fsck but with no effect. see below ---------------X8------------- [root at server run.old]# cat messagebus.pid 4000 [root at server run.old]# echo 4001 > messagebus.pid [root at server run.old]# cat messagebus.pid 4001 [root at server run.old]# rm messagebus.pid rm: remove regular file `messagebus.pid'? y rm: cannot remove `messagebus.pid': No such file or directory ---------------X8------------- Information about system: ---------------X8------------------------ # cat /etc/redhat-release CentOS release 6.2 (Final) # modinfo gfs2 filename: /lib/modules/2.6.32-220.4.2.el6.x86_64/kernel/fs/gfs2/gfs2.ko license: GPL author: Red Hat, Inc. description: Global File System srcversion: C664A0EEE2337E08DEF7648 depends: dlm vermagic: 2.6.32-220.4.2.el6.x86_64 SMP mod_unload modversions ---------------X8------------------------ If you need any more information let me know. Has anybody an idea where this is from or how I can solve it? Thanks Marc. ______________________________________________________________________________ Marc Grimme ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de Enterprise Linux einfach online kaufen: www.linux-subscriptions.com Registergericht: Amtsgericht M?nchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Thomas Merz (Vors.), Marc Grimme, Mark Hlawatschek, Jan R. Bergrath | Vorsitzender des Aufsichtsrats: Dr. Martin Buss From raju.rajsand at gmail.com Thu Mar 8 17:40:57 2012 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Thu, 8 Mar 2012 23:10:57 +0530 Subject: [Linux-cluster] GFS2 not able to remove a file In-Reply-To: <43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20> References: <43bc931e-e364-4e55-892b-4ef743a1b2f7@mobilix-20> Message-ID: Greetings, On Thu, Mar 8, 2012 at 8:34 PM, Marc Grimme wrote: > Hello, > I'm having a strange behavior of a GFS2 file system. > > I have a file I can write to and read from. But I cannot delete the file or move it. > I've already done an fsck but with no effect. > > see below > ---------------X8------------- > [root at server run.old]# cat messagebus.pid > 4000 > [root at server run.old]# echo 4001 > messagebus.pid > [root at server run.old]# cat messagebus.pid > 4001 > [root at server run.old]# rm messagebus.pid > rm: remove regular file `messagebus.pid'? y > rm: cannot remove `messagebus.pid': No such file or directory > ---------------X8------------- > What does lsof says? -- Regards, Rajagopal From criley at erad.com Thu Mar 8 18:59:05 2012 From: criley at erad.com (Charles Riley) Date: Thu, 8 Mar 2012 13:59:05 -0500 Subject: [Linux-cluster] Clustered LVM for storage Message-ID: Greetings, I have an aoe device with a lot of storage in it that I would like to share among four rhel 4 servers. Each of the servers will mount it's own storage, no data is shared between them. e.g the servers won't be mounting the same volumes. I could create four different raid groups on the aoe device and present a different one to each server, but that would waste space. What I'd rather do is create one big raid group and use clustered lvm to divvy the space between servers. Is it possible? Would it be enough to run just the clustered lvm daemon, or would I need to install all of the cluster suite? Are there other/better options? Thanks! Charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From grimme at atix.de Thu Mar 8 20:33:11 2012 From: grimme at atix.de (Marc Grimme) Date: Thu, 8 Mar 2012 21:33:11 +0100 (CET) Subject: [Linux-cluster] GFS2 not able to remove a file In-Reply-To: Message-ID: <018a9cfe-24d4-4d8a-9765-b1be99ee416b@mobilix-20> Hello, Nothing. # lsof -b +M -n -l 2>/dev/null | grep messagebus # The server/cluster was rebooted a few times. I'm pretty sure that no application is using this file. Regards Marc. ----- Original Message ----- From: "Rajagopal Swaminathan" To: "linux clustering" Sent: Donnerstag, 8. M?rz 2012 18:40:57 Subject: Re: [Linux-cluster] GFS2 not able to remove a file Greetings, On Thu, Mar 8, 2012 at 8:34 PM, Marc Grimme wrote: > Hello, > I'm having a strange behavior of a GFS2 file system. > > I have a file I can write to and read from. But I cannot delete the file or move it. > I've already done an fsck but with no effect. > > see below > ---------------X8------------- > [root at server run.old]# cat messagebus.pid > 4000 > [root at server run.old]# echo 4001 > messagebus.pid > [root at server run.old]# cat messagebus.pid > 4001 > [root at server run.old]# rm messagebus.pid > rm: remove regular file `messagebus.pid'? y > rm: cannot remove `messagebus.pid': No such file or directory > ---------------X8------------- > What does lsof says? -- Regards, Rajagopal -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From linux at alteeve.com Thu Mar 8 20:45:43 2012 From: linux at alteeve.com (Digimer) Date: Thu, 08 Mar 2012 15:45:43 -0500 Subject: [Linux-cluster] Clustered LVM for storage In-Reply-To: References: Message-ID: <4F591A77.7000303@alteeve.com> On 03/08/2012 01:59 PM, Charles Riley wrote: > Greetings, > > I have an aoe device with a lot of storage in it that I would like to > share among four rhel 4 servers. Each of the servers will mount it's > own storage, no data is shared between them. e.g the servers won't be > mounting the same volumes. > > I could create four different raid groups on the aoe device and present > a different one to each server, but that would waste space. > What I'd rather do is create one big raid group and use clustered lvm to > divvy the space between servers. > > Is it possible? Would it be enough to run just the clustered lvm > daemon, or would I need to install all of the cluster suite? > Are there other/better options? > > Thanks! > > Charles In theory, it should be fine, but it will be up to you to ensure a given LV is in fact only mounted in one place at a time. Clustered LVM does required cman on RHEL/CentOS 6. As such, you need the full stack, *including* fencing. -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com From jeff.sturm at eprize.com Thu Mar 8 22:59:00 2012 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Thu, 8 Mar 2012 22:59:00 +0000 Subject: [Linux-cluster] Clustered LVM for storage In-Reply-To: References: Message-ID: With aoe you can use old-fashioned disk partitioning. Just run "parted" (or whatever partitioning tool you choose) and allocate storage for partitions as you see fit. The benefits of doing this are: Easier/simpler to setup than cluster suite, and you can still use all the spindles from your aoe target (for example by creating a large RAID-10 array across all disks). The downside of partitions is they aren't easy to change. You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver, which you can't do while any partitions are in use, e.g. on mounted file systems. And resizing partitions is tricky because they are allocated on consecutive sectors. So if you want the flexibility of adding/removing/modifying volumes at any time, it may be worth the trouble to get Cluster Suite running so you can use CLVM. If you just want to carve it up once and forget about it, partitioning the array will be the fastest. -Jeff From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley Sent: Thursday, March 08, 2012 1:59 PM To: linux cluster Subject: [Linux-cluster] Clustered LVM for storage Greetings, I have an aoe device with a lot of storage in it that I would like to share among four rhel 4 servers. Each of the servers will mount it's own storage, no data is shared between them. e.g the servers won't be mounting the same volumes. I could create four different raid groups on the aoe device and present a different one to each server, but that would waste space. What I'd rather do is create one big raid group and use clustered lvm to divvy the space between servers. Is it possible? Would it be enough to run just the clustered lvm daemon, or would I need to install all of the cluster suite? Are there other/better options? Thanks! Charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From criley at erad.com Fri Mar 9 13:55:45 2012 From: criley at erad.com (Charles Riley) Date: Fri, 9 Mar 2012 08:55:45 -0500 Subject: [Linux-cluster] Clustered LVM for storage In-Reply-To: References: Message-ID: We're talking 36TB of storage here.. I've never had much success partitioning an array of that size (and actually being able to use all of the space) without lvm. But I might give that a try. Continuing along this line of thought: If I stop access to the array from all of the servers before I make any changes, I could probably even make use of lvm without clustering. Charles On Thu, Mar 8, 2012 at 5:59 PM, Jeff Sturm wrote: > With aoe you can use old-fashioned disk partitioning. Just run "parted" > (or whatever partitioning tool you choose) and allocate storage for > partitions as you see fit.**** > > ** ** > > The benefits of doing this are: Easier/simpler to setup than cluster > suite, and you can still use all the spindles from your aoe target (for > example by creating a large RAID-10 array across all disks).**** > > ** ** > > The downside of partitions is they aren't easy to change. You can add > them safely while the storage array is in use, but each host needs to > reload the partition table when you're done with changes before the new > storage can be used, and that may not happen until you rmmod/modprobe the > aoe driver, which you can't do while any partitions are in use, e.g. on > mounted file systems. And resizing partitions is tricky because they are > allocated on consecutive sectors.**** > > ** ** > > So if you want the flexibility of adding/removing/modifying volumes at any > time, it may be worth the trouble to get Cluster Suite running so you can > use CLVM. If you just want to carve it up once and forget about it, > partitioning the array will be the fastest.**** > > ** ** > > -Jeff**** > > ** ** > > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *Charles Riley > *Sent:* Thursday, March 08, 2012 1:59 PM > *To:* linux cluster > *Subject:* [Linux-cluster] Clustered LVM for storage**** > > ** ** > > Greetings,**** > > ** ** > > I have an aoe device with a lot of storage in it that I would like to > share among four rhel 4 servers. Each of the servers will mount it's own > storage, no data is shared between them. e.g the servers won't be mounting > the same volumes.**** > > ** ** > > I could create four different raid groups on the aoe device and present a > different one to each server, but that would waste space.**** > > What I'd rather do is create one big raid group and use clustered lvm to > divvy the space between servers.**** > > ** ** > > Is it possible? Would it be enough to run just the clustered lvm daemon, > or would I need to install all of the cluster suite?**** > > Are there other/better options?**** > > ** ** > > Thanks!**** > > ** ** > > Charles**** > > ** ** > > > **** > > ** ** > > ** ** > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Charles Riley | eRAD | Director of Technical Solutions | O: 864.640.8648 C: 864.881.1331 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianluca.cecchi at gmail.com Fri Mar 9 14:14:17 2012 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Fri, 9 Mar 2012 15:14:17 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd Message-ID: Hello, I have a cluster in RH EL 5.7 with quorum disk and an heuristic. Current versions of main cluster packages are: rgmanager-2.0.52-21.el5_7.1 cman-2.0.115-85.el5_7.3 This is the loaded heuristic Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=200 Line in cluster.conf: where 10.4.5.250 is the gateway of the production lan, >From ping man page: -c count Stop after sending count ECHO_REQUEST packets. With deadline (-w) option, ping waits for count ECHO_REPLY packets, until the timeout expires. -w deadline Specify a timeout, in seconds, before ping exits regardless of how many packets have been sent or received. In this case ping does not stop after count packet are sent, it waits either for deadline expire or until count probes are answered or for some error notification from network. So I would expect that the single ping command, executed as a sanity check, at most after 1 second should exit with a code, regardless an echo reply has been received or not And in fact I had no particular problem for many months As a test, putting an ip on an unreachable lan (say 10.4.6.5): date n=0 while [ $n -lt 20 ] do ping -c1 -w1 10.4.6.5 sleep 2 n=$(expr $n + 1) done date Output is Fri Mar 9 11:59:02 CET 2012 PING 10.4.6.5 (10.4.6.5) 56(84) bytes of data. --- 10.4.6.5 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1000ms ... --- 10.4.6.5 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms Fri Mar 9 12:00:02 CET 2012 so 60 seconds.... In case of gateway reachability problems (also tested with an iptables rule that drops icmp output request) I would then have: qdiskd[2780]: Heuristic: 'ping -c1 -w1 10.4.5.250' missed (1/200) Strange thing I got yesterday night was this only line: qdiskd[22145]: Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN - Exceeded timeout of 75 seconds and the node self-fencing causing relocation of some services So for some reason the ping command was not able to exit at all, I presume... despite the -c and -w options.... I suppose a condition that causes an internal timeout defined for the monitor operation itself (default to 75 seconds?) something like a pacemaker directive op monitor interval="20" timeout="40" And the cluster at this point considering as heuristic failed at all and self-fencing.... Is this right? My default quorumd directive is this one, btw: And in fact when for some reason I have temporary problems with my SAN, I get something like: qdiskd[1339]: qdisk cycle took more than 5 seconds to complete (34.540000) and on the other node qdiskd[6025]: Node 1 missed an update (2/200) qdiskd[6025]: Node 1 missed an update (3/200) ... Can anyone give any insight for the message I got yesterday that I never saw before: qdiskd[22145]: Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN - Exceeded timeout of 75 seconds ? Do I have to suppose a bug in the ping command? Thanks in advance, Gianluca From emi2fast at gmail.com Fri Mar 9 14:39:43 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 9 Mar 2012 15:39:43 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd In-Reply-To: References: Message-ID: Hello Gianluca Do you have a cluster private network? if your answer it's yes i recommend don't use heuristic because if your cluster public network goes down your cluster take a fencing loop Or you can do something better, use pacemaker+corosync Il giorno 09 marzo 2012 15:14, Gianluca Cecchi ha scritto: > Hello, > I have a cluster in RH EL 5.7 with quorum disk and an heuristic. > Current versions of main cluster packages are: > rgmanager-2.0.52-21.el5_7.1 > cman-2.0.115-85.el5_7.3 > > This is the loaded heuristic > > Heuristic: 'ping -c1 -w1 10.4.5.250' score=1 interval=2 tko=200 > > Line in cluster.conf: > tko="200"/> > > where 10.4.5.250 is the gateway of the production lan, > >From ping man page: > -c count > Stop after sending count ECHO_REQUEST packets. With deadline (-w) > option, ping waits for count ECHO_REPLY packets, until the timeout > expires. > -w deadline > Specify a timeout, in seconds, before ping exits regardless of how many > packets have been sent or received. In this case ping does not stop > after count packet are sent, it waits either for deadline expire or > until count probes are answered or for some error notification from > network. > > So I would expect that the single ping command, executed as a sanity > check, at most after 1 second > should exit with a code, regardless an echo reply has been received or not > And in fact I had no particular problem for many months > > As a test, putting an ip on an unreachable lan (say 10.4.6.5): > date > n=0 > while [ $n -lt 20 ] > do > ping -c1 -w1 10.4.6.5 > sleep 2 > n=$(expr $n + 1) > done > date > > Output is > Fri Mar 9 11:59:02 CET 2012 > PING 10.4.6.5 (10.4.6.5) 56(84) bytes of data. > > --- 10.4.6.5 ping statistics --- > 2 packets transmitted, 0 received, 100% packet loss, time 1000ms > > ... > > --- 10.4.6.5 ping statistics --- > 2 packets transmitted, 0 received, 100% packet loss, time 999ms > > Fri Mar 9 12:00:02 CET 2012 > > so 60 seconds.... > > In case of gateway reachability problems (also tested with an iptables > rule that drops icmp output request) I would then have: > > qdiskd[2780]: Heuristic: 'ping -c1 -w1 10.4.5.250' missed > (1/200) > > Strange thing I got yesterday night was this only line: > > qdiskd[22145]: Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN - > Exceeded timeout of 75 seconds > > and the node self-fencing causing relocation of some services > So for some reason the ping command was not able to exit at all, I > presume... > despite the -c and -w options.... > > I suppose a condition that causes an internal timeout defined for the > monitor operation itself (default to 75 seconds?) > something like a pacemaker directive > op monitor interval="20" timeout="40" > > And the cluster at this point considering as heuristic failed at all > and self-fencing.... > Is this right? > > My default quorumd directive is this one, btw: > > log_facility="local4" log_level="7" tko="16" votes="1"> > > And in fact when for some reason I have temporary problems with my > SAN, I get something like: > > qdiskd[1339]: qdisk cycle took more than 5 seconds to complete > (34.540000) > > and on the other node > qdiskd[6025]: Node 1 missed an update (2/200) > qdiskd[6025]: Node 1 missed an update (3/200) > ... > > Can anyone give any insight for the message I got yesterday that I > never saw before: > qdiskd[22145]: Heuristic: 'ping -c1 -w1 10.4.5.250' DOWN - > Exceeded timeout of 75 seconds > > ? > Do I have to suppose a bug in the ping command? > > Thanks in advance, > Gianluca > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianluca.cecchi at gmail.com Fri Mar 9 15:44:03 2012 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Fri, 9 Mar 2012 16:44:03 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd In-Reply-To: References: Message-ID: On Fri, 9 Mar 2012 15:39:43 +0100, emmanuel segura wrote: > Hello Gianluca > Do you have a cluster private network? > if your answer it's yes i recommend don't use heuristic because if your cluster public network goes down > your cluster take a fencing loop > > Or you can do something better, use pacemaker+corosync My cluster is RH EL 5.7 based. Pacemaker is not an option here... And also, if I remember correctly, pacemaker in 6.2 is not officialy supported yet. Probably in 6.3? I do have a private network that is in place. Here we are talking about heuristic to manage fencing decisions, based on both quorum disk reachability and network serviceability (ping to gateway for the production network must be ok) see also (even if not so recent): http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/ But in the mean time I also found this thread: http://osdir.com/ml/linux-cluster/2010-05/msg00081.html and in fact during last weeks we got a nightly job consisting of a big ftp to an external site and it could be related to my problem... I have to evaluate if ping -c3 -t3 -W1 could be a better option in our new situation during night From jeff.sturm at eprize.com Fri Mar 9 15:55:36 2012 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 9 Mar 2012 15:55:36 +0000 Subject: [Linux-cluster] Clustered LVM for storage In-Reply-To: References: Message-ID: Sure. As long as you don't try to make use of any LVM features that require metadata consistency (mirroring, snapshots, online resizing, etc.) you can get by using LVM without clustering. -Jeff From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley Sent: Friday, March 09, 2012 8:56 AM To: linux clustering Subject: Re: [Linux-cluster] Clustered LVM for storage We're talking 36TB of storage here.. I've never had much success partitioning an array of that size (and actually being able to use all of the space) without lvm. But I might give that a try. Continuing along this line of thought: If I stop access to the array from all of the servers before I make any changes, I could probably even make use of lvm without clustering. Charles On Thu, Mar 8, 2012 at 5:59 PM, Jeff Sturm > wrote: With aoe you can use old-fashioned disk partitioning. Just run "parted" (or whatever partitioning tool you choose) and allocate storage for partitions as you see fit. The benefits of doing this are: Easier/simpler to setup than cluster suite, and you can still use all the spindles from your aoe target (for example by creating a large RAID-10 array across all disks). The downside of partitions is they aren't easy to change. You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver, which you can't do while any partitions are in use, e.g. on mounted file systems. And resizing partitions is tricky because they are allocated on consecutive sectors. So if you want the flexibility of adding/removing/modifying volumes at any time, it may be worth the trouble to get Cluster Suite running so you can use CLVM. If you just want to carve it up once and forget about it, partitioning the array will be the fastest. -Jeff From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Charles Riley Sent: Thursday, March 08, 2012 1:59 PM To: linux cluster Subject: [Linux-cluster] Clustered LVM for storage Greetings, I have an aoe device with a lot of storage in it that I would like to share among four rhel 4 servers. Each of the servers will mount it's own storage, no data is shared between them. e.g the servers won't be mounting the same volumes. I could create four different raid groups on the aoe device and present a different one to each server, but that would waste space. What I'd rather do is create one big raid group and use clustered lvm to divvy the space between servers. Is it possible? Would it be enough to run just the clustered lvm daemon, or would I need to install all of the cluster suite? Are there other/better options? Thanks! Charles -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Charles Riley | eRAD | Director of Technical Solutions | O: 864.640.8648 C: 864.881.1331 -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Fri Mar 9 16:29:06 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 9 Mar 2012 17:29:06 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd In-Reply-To: References: Message-ID: i'll try to be more clear i work on redhat cluster from 2 years and i seen this topic so much times this it's the problem with redhat cluster ping+quorum If i have a two node cluster with public_net+private_net I think it's normal my services switch if have the public network down on the node where the resource group was running,But But But with ping as heuristic you get a node fence And remember when you use a quorum disk on redhat cluster the fence decision it's based on the ping or heuristic Sorry i tell you something in Italian Usare il ping sul qdisk aggiungi solo dei problemi al cluster I nodi in un cluster devono essere fensati solo se perdono l'accesso al qdisk o la rete privata non funziona, in caso la mia rete privata ? ok e i nodo riesco a accedere ai diski ma rete publica non funziona le risorse devono lo switchare Tell me if you need more info about the ping Il giorno 09 marzo 2012 16:44, Gianluca Cecchi ha scritto: > On Fri, 9 Mar 2012 15:39:43 +0100, emmanuel segura wrote: > > > Hello Gianluca > > Do you have a cluster private network? > > if your answer it's yes i recommend don't use heuristic because if your > cluster public network goes down > > your cluster take a fencing loop > > > > Or you can do something better, use pacemaker+corosync > > My cluster is RH EL 5.7 based. Pacemaker is not an option here... > And also, if I remember correctly, pacemaker in 6.2 is not officialy > supported yet. Probably in 6.3? > I do have a private network that is in place. > Here we are talking about heuristic to manage fencing decisions, based > on both quorum disk reachability and network serviceability (ping to > gateway for the production network must be ok) > > see also (even if not so recent): > http://magazine.redhat.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/ > > But in the mean time I also found this thread: > http://osdir.com/ml/linux-cluster/2010-05/msg00081.html > > and in fact during last weeks we got a nightly job consisting of a big > ftp to an external site and it could be related to my problem... > > I have to evaluate if > ping -c3 -t3 -W1 > could be a better option in our new situation during night > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajb2 at mssl.ucl.ac.uk Fri Mar 9 14:45:34 2012 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Fri, 09 Mar 2012 14:45:34 +0000 Subject: [Linux-cluster] Clustered LVM for storage In-Reply-To: References: Message-ID: <4F5A178E.6070509@mssl.ucl.ac.uk> On 08/03/12 22:59, Jeff Sturm wrote: > The downside of partitions is they aren't easy to change. You can add them safely while the storage array is in use, but each host needs to reload the partition table when you're done with changes before the new storage can be used, and that may not happen until you rmmod/modprobe the aoe driver Partprobe fails? From rohit2525 at gmail.com Sat Mar 10 22:36:32 2012 From: rohit2525 at gmail.com (Rohit tripathi) Date: Sun, 11 Mar 2012 04:06:32 +0530 Subject: [Linux-cluster] (no subject) Message-ID: Hi Team, I need to know step by step procedure to create a GFS2 file system for my cluster. regards Rohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianluca.cecchi at gmail.com Mon Mar 12 11:26:20 2012 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Mon, 12 Mar 2012 12:26:20 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd In-Reply-To: References: Message-ID: On Fri, 9 Mar 2012 17:29:06 +0100 emmanuel segura wrote: > i'll try to be more clear > i work on redhat cluster from 2 years and i seen this topic so much times Sorry, I didn't want to offend anyone. I have been working on rhcs (and other companions from other OSes) for many years too... > I think it's normal my services switch if have the public network down on the node where > the resource group was running,But But But with ping as heuristic you get a node fence AFAIK rhcs is not able to switch service if the server looses its connectivity. Better: the /usr/share/cluster/ip.sh resource definition contains the parameter monitor_link, but it is only for dead link on the nic.. And I have to manage rhcs... So in my opinion if you want to test gateway reachability (that means production lan where you deliver a cluster service) you are at the moment forced to use heuristic or write your own resource to add to the ones composing the service and so causing a service switch in case of problems with this custom resource... but I could be wrong in my assumption... Cheers, Gianluca From emi2fast at gmail.com Mon Mar 12 11:56:56 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Mon, 12 Mar 2012 12:56:56 +0100 Subject: [Linux-cluster] Problem with ping as an heuristic with qdiskd In-Reply-To: References: Message-ID: I know the cluster agent /usr/share/cluster/ip.sh cannot check the gateway I resolved this problem with one script in my service group, so when the script fail the resource switch ========================================================