From fdinitto at redhat.com Sat Nov 1 05:06:39 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Sat, 01 Nov 2014 06:06:39 +0100 Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <540D853F.3090109@redhat.com> References: <540D853F.3090109@redhat.com> Message-ID: <54546A5F.8030207@redhat.com> just a kind reminder. On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: > All, > > it's been almost 6 years since we had a face to face meeting for all > developers and vendors involved in Linux HA. > > I'd like to try and organize a new event and piggy-back with DevConf in > Brno [1]. > > DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. > > My suggestion would be to have a 2 days dedicated HA summit the 4th and > the 5th of February. > > The goal for this meeting is to, beside to get to know each other and > all social aspect of those events, tune the directions of the various HA > projects and explore common areas of improvements. > > I am also very open to the idea of extending to 3 days, 1 one dedicated > to customers/users and 2 dedicated to developers, by starting the 3rd. > > Thoughts? > > Fabio > > PS Please hit reply all or include me in CC just to make sure I'll see > an answer :) > > [1] http://devconf.cz/ Could you please let me know by end of Nov if you are interested or not? I have heard only from few people so far. Cheers Fabio From lists at alteeve.ca Sat Nov 1 05:19:35 2014 From: lists at alteeve.ca (Digimer) Date: Sat, 01 Nov 2014 01:19:35 -0400 Subject: [Linux-cluster] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <54546A5F.8030207@redhat.com> References: <540D853F.3090109@redhat.com> <54546A5F.8030207@redhat.com> Message-ID: <54546D67.6010606@alteeve.ca> All the cool kids will be there. You want to be a cool kid, right? :p On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote: > just a kind reminder. > > On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: >> All, >> >> it's been almost 6 years since we had a face to face meeting for all >> developers and vendors involved in Linux HA. >> >> I'd like to try and organize a new event and piggy-back with DevConf in >> Brno [1]. >> >> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. >> >> My suggestion would be to have a 2 days dedicated HA summit the 4th and >> the 5th of February. >> >> The goal for this meeting is to, beside to get to know each other and >> all social aspect of those events, tune the directions of the various HA >> projects and explore common areas of improvements. >> >> I am also very open to the idea of extending to 3 days, 1 one dedicated >> to customers/users and 2 dedicated to developers, by starting the 3rd. >> >> Thoughts? >> >> Fabio >> >> PS Please hit reply all or include me in CC just to make sure I'll see >> an answer :) >> >> [1] http://devconf.cz/ > > Could you please let me know by end of Nov if you are interested or not? > > I have heard only from few people so far. > > Cheers > Fabio > _______________________________________________ > Linux-HA mailing list > Linux-HA at lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From jfriesse at redhat.com Mon Nov 3 09:02:57 2014 From: jfriesse at redhat.com (Jan Friesse) Date: Mon, 03 Nov 2014 10:02:57 +0100 Subject: [Linux-cluster] daemon cpg_join error retrying In-Reply-To: References: <68ABE774-8755-416F-829B-CED002B14D03@beekhof.net> <5451F581.5050100@redhat.com> <5453BC31.2000102@redhat.com> Message-ID: <545744C1.4080007@redhat.com> Lax, > > > >> This is just weird. What exact version of corosync are you running? Do you have latest Z stream? > I am running on Corosync 1.4.1 and pacemaker version is 1.1.8-7.el6 Are you running package version (like RHEL/CentOS) or did you compiled package by yourself? If package version, can you please send exact version (like 1.4.1-17.1)? > How should I get access to Z stream? Is there a specific dir I should pick this z stream from? For RHEL you are subscribed to RHN, so you should get it automatically, for CentOS, you should get it automatically. Regards, Honza > > Thanks > Lax > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse > Sent: Friday, October 31, 2014 9:43 AM > To: linux clustering > Subject: Re: [Linux-cluster] daemon cpg_join error retrying > > Lax, > > >> Thanks Honza. Here is what I was doing, >> >>> usual reasons for this problem: >>> 1. mtu is too high and fragmented packets are not enabled (take a >>> look to netmtu configuration option) >> I am running with default mtu settings which is 1500. And I do see my interface(eth1) on the box does have MTU as 1500 too. >> > > Keep in mind that if they are not directly connected, switch can throw packets because of MTU. > >> >> 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two > clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending. >> Verfiifed my config files cluster.conf and cib.xml and both have same >> no of node entries (2) >> >>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall. >> I also ran tests with firewall off too on both the participating >> nodes, still see same issue >> >> In corosync log I see repeated set of these messages, hoping these will give some more pointers. >> >> Oct 29 22:11:02 corosync [SYNC ] Committing synchronization for >> (corosync cluster closed process group service v1.01) Oct 29 22:11:02 corosync [MAIN ] Completed service synchronization, ready to provide service. >> Oct 29 22:11:02 corosync [TOTEM ] waiting_trans_ack changed to 0 Oct >> 29 22:11:03 corosync [TOTEM ] entering GATHER state from 11. >> Oct 29 22:11:03 corosync [TOTEM ] entering GATHER state from 10. >> Oct 29 22:11:05 corosync [TOTEM ] entering GATHER state from 0. > > This is just weird. What exact version of corosync are you running? Do you have latest Z stream? > > Regards, > Honza > >> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 >> corosync [TOTEM ] Saving state aru 1b high seq received 1b Oct 29 >> 22:11:05 corosync [TOTEM ] Storing new sequence id for ring 51708 Oct >> 29 22:11:05 corosync [TOTEM ] entering COMMIT state. >> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 >> corosync [TOTEM ] entering RECOVERY state. >> Oct 29 22:11:05 corosync [TOTEM ] TRANS [0] member 172.28.0.64: >> Oct 29 22:11:05 corosync [TOTEM ] TRANS [1] member 172.28.0.65: >> Oct 29 22:11:05 corosync [TOTEM ] position [0] member 172.28.0.64: >> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep >> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 1b >> received flag 1 Oct 29 22:11:05 corosync [TOTEM ] position [1] member 172.28.0.65: >> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep >> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 1b >> received flag 1 Oct 29 22:11:05 corosync [TOTEM ] Did not need to originate any messages in recovery. >> Oct 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set >> retrans flag0 retrans queue empty 1 count 0, aru ffffffff Oct 29 >> 22:11:05 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Oct >> 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 Oct 29 22:11:05 corosync >> [TOTEM ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 >> corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans >> queue empty 1 count 2, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install >> seq 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] >> token retrans flag is 0 my set retrans flag0 retrans queue empty 1 >> count 3, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install seq 0 aru 0 >> high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] retrans flag >> count 4 token aru 0 install seq 0 aru 0 0 Oct 29 22:11:05 corosync >> [TOTEM ] Resetting old ring state Oct 29 22:11:05 corosync [TOTEM ] >> recovery to regular 1-0 Oct 29 22:11:05 corosync [CMAN ] ais: >> confchg_fn called type = 1, seq=333576 Oct 29 22:11:05 corosync [TOTEM >> ] waiting_trans_ack changed to 1 Oct 29 22:11:05 corosync [CMAN ] >> ais: confchg_fn called type = 0, seq=333576 Oct 29 22:11:05 corosync >> [CMAN ] ais: last memb_count = 2, current = 2 Oct 29 22:11:05 >> corosync [CMAN ] memb: sending TRANSITION message. cluster_name = vsomcluster Oct 29 22:11:05 corosync [CMAN ] ais: comms send message 0x7fff8185ca00 len = 65 Oct 29 22:11:05 corosync [CMAN ] daemon: sending reply 103 to fd 24 Oct 29 22:11:05 corosync [CMAN ] daemon: sending reply 103 to fd 34 Oct 29 22:11:05 corosync [SYNC ] This node is within the primary component and will provide service. >> Oct 29 22:11:05 corosync [TOTEM ] entering OPERATIONAL state. >> Oct 29 22:11:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Oct 29 22:11:05 corosync [CMAN ] ais: deliver_fn source nodeid = 2, >> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN ] memb: Message >> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN ] memb: got TRANSITION >> from node 2 Oct 29 22:11:05 corosync [CMAN ] memb: Got TRANSITION >> message. msg->flags=20, node->flags=20, first_trans=0 Oct 29 22:11:05 >> corosync [CMAN ] memb: add_ais_node ID=2, incarnation = 333576 Oct 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [CMAN ] ais: deliver_fn source nodeid = 1, >> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN ] memb: Message >> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN ] memb: got TRANSITION >> from node 1 Oct 29 22:11:05 corosync [CMAN ] Completed first >> transition with nodes on the same config versions Oct 29 22:11:05 >> corosync [CMAN ] memb: Got TRANSITION message. msg->flags=20, >> node->flags=20, first_trans=0 Oct 29 22:11:05 corosync [CMAN ] memb: >> add_ais_node ID=1, incarnation = 333576 Oct 29 22:11:05 corosync [SYNC >> ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Synchronization actions starting for >> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC ] confchg entries >> 2 Oct 29 22:11:05 corosync [SYNC ] Barrier Start Received From 1 Oct >> 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC ] Synchronization >> actions starting for (dummy AMF service) Oct 29 22:11:05 corosync >> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier >> Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy AMF service) Oct 29 22:11:05 corosync [SYNC ] Synchronization >> actions starting for (openais checkpoint service B.01.01) Oct 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier >> Start Received From 1 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (openais checkpoint service B.01.01) Oct 29 22:11:05 corosync [SYNC ] >> Synchronization actions starting for (dummy EVT service) Oct 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy EVT service) Oct 29 22:11:05 corosync [SYNC ] Synchronization actions starting for (corosync cluster closed process group service v1.01) >> Oct 29 22:11:05 corosync [CPG ] got joinlist message from node 1 >> Oct 29 22:11:05 corosync [CPG ] comparing: sender r(0) ip(172.28.0.65) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] comparing: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] got joinlist message from node 2 >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[0] group:crmd\x00, ip:r(0) ip(172.28.0.65) , pid:9198 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[1] group:attrd\x00, ip:r(0) ip(172.28.0.65) , pid:9196 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[2] group:stonith-ng\x00, ip:r(0) ip(172.28.0.65) , pid:9194 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[3] group:cib\x00, ip:r(0) ip(172.28.0.65) , pid:9193 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[4] group:pcmk\x00, ip:r(0) ip(172.28.0.65) , pid:9187 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[5] group:gfs:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9111 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[6] group:dlm:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9057 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[7] group:fenced:default\x00, ip:r(0) ip(172.28.0.65) , pid:9040 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[8] group:fenced:daemon\x00, ip:r(0) ip(172.28.0.65) , pid:9040 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[9] group:crmd\x00, ip:r(0) ip(172.28.0.64) , pid:14530 >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (corosync cluster closed process group service v1.01) Oct 29 22:11:05 corosync [MAIN ] Completed service synchronization, ready to provide service. >> >> Thanks >> Lax >> >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse >> Sent: Thursday, October 30, 2014 1:23 AM >> To: linux clustering >> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >> >>> >>>> On 30 Oct 2014, at 9:32 am, Lax Kota (lkota) wrote: >>>> >>>> >>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>>> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. >>>> >>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in. >>>> Ok. >>>> >>>>>> >>>>>> Also one more issue I am seeing in one other setup a repeated >>>>>> flood of 'A processor joined or left the membership and a new >>>>>> membership was formed' messages for every 4secs. I am running with >>>>>> default TOTEM settings with token time out as 10 secs. Even after >>>>>> I increase the token, consensus values to be higher. It goes on >>>>>> flooding the same message after newer consensus defined time (eg: >>>>>> if I increase it to be 10secs, then I see new membership formed >>>>>> messages for every 10secs) >>>>>> >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>>>> >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>> >>>>> It does not sound like your network is particularly healthy. >>>>> Are you using multicast or udpu? If multicast, it might be worth >>>>> trying udpu >>>> >>>> I am using udpu and I also have firewall opened for ports 5404 & 5405. Tcpdump looks fine too, it does not complain of any issues. This is a VM envirornment and even if I switch to other node within same VM I keep getting same failure. >>> >>> Depending on what the host and VMs are doing, that might be your problem. >>> In any case, I will defer to the corosync guys at this point. >>> >> >> Lax, >> usual reasons for this problem: >> 1. mtu is too high and fragmented packets are not enabled (take a look to netmtu configuration option) 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending. >> >> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall. >> >> Regards, >> Honza >> >> >> >>>> >>>> Thanks >>>> Lax >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: linux-cluster-bounces at redhat.com >>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew >>>> Beekhof >>>> Sent: Wednesday, October 29, 2014 3:17 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >>>> >>>> >>>>> On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) wrote: >>>>> >>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. >>>> >>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in. >>>> >>>>> >>>>> Also one more issue I am seeing in one other setup a repeated flood >>>>> of 'A processor joined or left the membership and a new membership >>>>> was formed' messages for every 4secs. I am running with default >>>>> TOTEM settings with token time out as 10 secs. Even after I >>>>> increase the token, consensus values to be higher. It goes on >>>>> flooding the same message after newer consensus defined time (eg: >>>>> if I increase it to be 10secs, then I see new membership formed >>>>> messages for every >>>>> 10secs) >>>>> >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>>> >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>> >>>> It does not sound like your network is particularly healthy. >>>> Are you using multicast or udpu? If multicast, it might be worth >>>> trying udpu >>>> >>>>> >>>>> Thanks >>>>> Lax >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: linux-cluster-bounces at redhat.com >>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew >>>>> Beekhof >>>>> Sent: Wednesday, October 29, 2014 2:42 PM >>>>> To: linux clustering >>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >>>>> >>>>> >>>>>> On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon cpg_join error retrying'. I have a 2 Node setup with pacemaker and corosync. >>>>> >>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>> >>>>>> >>>>>> Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is there any way to fix this issue? >>>>>> >>>>>> >>>>>> Thanks >>>>>> Lax >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster at redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From lkota at cisco.com Mon Nov 3 18:26:02 2014 From: lkota at cisco.com (Lax Kota (lkota)) Date: Mon, 3 Nov 2014 18:26:02 +0000 Subject: [Linux-cluster] daemon cpg_join error retrying In-Reply-To: <545744C1.4080007@redhat.com> References: <68ABE774-8755-416F-829B-CED002B14D03@beekhof.net> <5451F581.5050100@redhat.com> <5453BC31.2000102@redhat.com> <545744C1.4080007@redhat.com> Message-ID: Hi Honza, I am running on the packaged version from RHEL 6.4. The exact version is 'corosync-1.4.1-15' Thanks Lax -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse Sent: Monday, November 03, 2014 1:03 AM To: linux clustering Subject: Re: [Linux-cluster] daemon cpg_join error retrying Lax, > > > >> This is just weird. What exact version of corosync are you running? Do you have latest Z stream? > I am running on Corosync 1.4.1 and pacemaker version is 1.1.8-7.el6 Are you running package version (like RHEL/CentOS) or did you compiled package by yourself? If package version, can you please send exact version (like 1.4.1-17.1)? > How should I get access to Z stream? Is there a specific dir I should pick this z stream from? For RHEL you are subscribed to RHN, so you should get it automatically, for CentOS, you should get it automatically. Regards, Honza > > Thanks > Lax > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse > Sent: Friday, October 31, 2014 9:43 AM > To: linux clustering > Subject: Re: [Linux-cluster] daemon cpg_join error retrying > > Lax, > > >> Thanks Honza. Here is what I was doing, >> >>> usual reasons for this problem: >>> 1. mtu is too high and fragmented packets are not enabled (take a >>> look to netmtu configuration option) >> I am running with default mtu settings which is 1500. And I do see my interface(eth1) on the box does have MTU as 1500 too. >> > > Keep in mind that if they are not directly connected, switch can throw packets because of MTU. > >> >> 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two > clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending. >> Verfiifed my config files cluster.conf and cib.xml and both have same >> no of node entries (2) >> >>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall. >> I also ran tests with firewall off too on both the participating >> nodes, still see same issue >> >> In corosync log I see repeated set of these messages, hoping these will give some more pointers. >> >> Oct 29 22:11:02 corosync [SYNC ] Committing synchronization for >> (corosync cluster closed process group service v1.01) Oct 29 22:11:02 corosync [MAIN ] Completed service synchronization, ready to provide service. >> Oct 29 22:11:02 corosync [TOTEM ] waiting_trans_ack changed to 0 Oct >> 29 22:11:03 corosync [TOTEM ] entering GATHER state from 11. >> Oct 29 22:11:03 corosync [TOTEM ] entering GATHER state from 10. >> Oct 29 22:11:05 corosync [TOTEM ] entering GATHER state from 0. > > This is just weird. What exact version of corosync are you running? Do you have latest Z stream? > > Regards, > Honza > >> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 >> corosync [TOTEM ] Saving state aru 1b high seq received 1b Oct 29 >> 22:11:05 corosync [TOTEM ] Storing new sequence id for ring 51708 Oct >> 29 22:11:05 corosync [TOTEM ] entering COMMIT state. >> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 >> corosync [TOTEM ] entering RECOVERY state. >> Oct 29 22:11:05 corosync [TOTEM ] TRANS [0] member 172.28.0.64: >> Oct 29 22:11:05 corosync [TOTEM ] TRANS [1] member 172.28.0.65: >> Oct 29 22:11:05 corosync [TOTEM ] position [0] member 172.28.0.64: >> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep >> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered >> 1b received flag 1 Oct 29 22:11:05 corosync [TOTEM ] position [1] member 172.28.0.65: >> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep >> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered >> 1b received flag 1 Oct 29 22:11:05 corosync [TOTEM ] Did not need to originate any messages in recovery. >> Oct 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set >> retrans flag0 retrans queue empty 1 count 0, aru ffffffff Oct 29 >> 22:11:05 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Oct >> 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 Oct 29 22:11:05 corosync >> [TOTEM ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 >> corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 >> retrans queue empty 1 count 2, aru 0 Oct 29 22:11:05 corosync [TOTEM >> ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync >> [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue >> empty 1 count 3, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install seq >> 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] retrans >> flag count 4 token aru 0 install seq 0 aru 0 0 Oct 29 22:11:05 >> corosync [TOTEM ] Resetting old ring state Oct 29 22:11:05 corosync >> [TOTEM ] recovery to regular 1-0 Oct 29 22:11:05 corosync [CMAN ] ais: >> confchg_fn called type = 1, seq=333576 Oct 29 22:11:05 corosync >> [TOTEM ] waiting_trans_ack changed to 1 Oct 29 22:11:05 corosync >> [CMAN ] >> ais: confchg_fn called type = 0, seq=333576 Oct 29 22:11:05 corosync >> [CMAN ] ais: last memb_count = 2, current = 2 Oct 29 22:11:05 >> corosync [CMAN ] memb: sending TRANSITION message. cluster_name = vsomcluster Oct 29 22:11:05 corosync [CMAN ] ais: comms send message 0x7fff8185ca00 len = 65 Oct 29 22:11:05 corosync [CMAN ] daemon: sending reply 103 to fd 24 Oct 29 22:11:05 corosync [CMAN ] daemon: sending reply 103 to fd 34 Oct 29 22:11:05 corosync [SYNC ] This node is within the primary component and will provide service. >> Oct 29 22:11:05 corosync [TOTEM ] entering OPERATIONAL state. >> Oct 29 22:11:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. >> Oct 29 22:11:05 corosync [CMAN ] ais: deliver_fn source nodeid = 2, >> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN ] memb: Message >> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN ] memb: got TRANSITION >> from node 2 Oct 29 22:11:05 corosync [CMAN ] memb: Got TRANSITION >> message. msg->flags=20, node->flags=20, first_trans=0 Oct 29 22:11:05 >> corosync [CMAN ] memb: add_ais_node ID=2, incarnation = 333576 Oct >> 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [CMAN ] ais: deliver_fn source nodeid = 1, >> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN ] memb: Message >> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN ] memb: got TRANSITION >> from node 1 Oct 29 22:11:05 corosync [CMAN ] Completed first >> transition with nodes on the same config versions Oct 29 22:11:05 >> corosync [CMAN ] memb: Got TRANSITION message. msg->flags=20, >> node->flags=20, first_trans=0 Oct 29 22:11:05 corosync [CMAN ] memb: >> add_ais_node ID=1, incarnation = 333576 Oct 29 22:11:05 corosync >> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Synchronization actions starting >> for (dummy CLM service) Oct 29 22:11:05 corosync [SYNC ] confchg >> entries >> 2 Oct 29 22:11:05 corosync [SYNC ] Barrier Start Received From 1 Oct >> 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC ] Synchronization >> actions starting for (dummy AMF service) Oct 29 22:11:05 corosync >> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier >> Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy AMF service) Oct 29 22:11:05 corosync [SYNC ] Synchronization >> actions starting for (openais checkpoint service B.01.01) Oct 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC ] Barrier >> Start Received From 1 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (openais checkpoint service B.01.01) Oct 29 22:11:05 corosync [SYNC >> ] Synchronization actions starting for (dummy EVT service) Oct 29 >> 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync >> [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 1 = 0. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (dummy EVT service) Oct 29 22:11:05 corosync [SYNC ] Synchronization actions starting for (corosync cluster closed process group service v1.01) >> Oct 29 22:11:05 corosync [CPG ] got joinlist message from node 1 >> Oct 29 22:11:05 corosync [CPG ] comparing: sender r(0) ip(172.28.0.65) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] comparing: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >> Oct 29 22:11:05 corosync [CPG ] got joinlist message from node 2 >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 1 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 0. >> Oct 29 22:11:05 corosync [SYNC ] confchg entries 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier Start Received From 2 Oct 29 22:11:05 >> corosync [SYNC ] Barrier completion status for nodeid 1 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Barrier completion status for nodeid 2 = 1. >> Oct 29 22:11:05 corosync [SYNC ] Synchronization barrier completed >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[0] group:crmd\x00, ip:r(0) ip(172.28.0.65) , pid:9198 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[1] group:attrd\x00, ip:r(0) ip(172.28.0.65) , pid:9196 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[2] group:stonith-ng\x00, ip:r(0) ip(172.28.0.65) , pid:9194 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[3] group:cib\x00, ip:r(0) ip(172.28.0.65) , pid:9193 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[4] group:pcmk\x00, ip:r(0) ip(172.28.0.65) , pid:9187 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[5] group:gfs:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9111 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[6] group:dlm:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9057 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[7] group:fenced:default\x00, ip:r(0) ip(172.28.0.65) , pid:9040 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[8] group:fenced:daemon\x00, ip:r(0) ip(172.28.0.65) , pid:9040 >> Oct 29 22:11:05 corosync [CPG ] joinlist_messages[9] group:crmd\x00, ip:r(0) ip(172.28.0.64) , pid:14530 >> Oct 29 22:11:05 corosync [SYNC ] Committing synchronization for >> (corosync cluster closed process group service v1.01) Oct 29 22:11:05 corosync [MAIN ] Completed service synchronization, ready to provide service. >> >> Thanks >> Lax >> >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse >> Sent: Thursday, October 30, 2014 1:23 AM >> To: linux clustering >> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >> >>> >>>> On 30 Oct 2014, at 9:32 am, Lax Kota (lkota) wrote: >>>> >>>> >>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>>> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. >>>> >>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in. >>>> Ok. >>>> >>>>>> >>>>>> Also one more issue I am seeing in one other setup a repeated >>>>>> flood of 'A processor joined or left the membership and a new >>>>>> membership was formed' messages for every 4secs. I am running >>>>>> with default TOTEM settings with token time out as 10 secs. Even >>>>>> after I increase the token, consensus values to be higher. It >>>>>> goes on flooding the same message after newer consensus defined time (eg: >>>>>> if I increase it to be 10secs, then I see new membership formed >>>>>> messages for every 10secs) >>>>>> >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>>>> >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>> >>>>> It does not sound like your network is particularly healthy. >>>>> Are you using multicast or udpu? If multicast, it might be worth >>>>> trying udpu >>>> >>>> I am using udpu and I also have firewall opened for ports 5404 & 5405. Tcpdump looks fine too, it does not complain of any issues. This is a VM envirornment and even if I switch to other node within same VM I keep getting same failure. >>> >>> Depending on what the host and VMs are doing, that might be your problem. >>> In any case, I will defer to the corosync guys at this point. >>> >> >> Lax, >> usual reasons for this problem: >> 1. mtu is too high and fragmented packets are not enabled (take a look to netmtu configuration option) 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending. >> >> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall. >> >> Regards, >> Honza >> >> >> >>>> >>>> Thanks >>>> Lax >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: linux-cluster-bounces at redhat.com >>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew >>>> Beekhof >>>> Sent: Wednesday, October 29, 2014 3:17 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >>>> >>>> >>>>> On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) wrote: >>>>> >>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>> How to check cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue. >>>> >>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in. >>>> >>>>> >>>>> Also one more issue I am seeing in one other setup a repeated >>>>> flood of 'A processor joined or left the membership and a new >>>>> membership was formed' messages for every 4secs. I am running with >>>>> default TOTEM settings with token time out as 10 secs. Even after >>>>> I increase the token, consensus values to be higher. It goes on >>>>> flooding the same message after newer consensus defined time (eg: >>>>> if I increase it to be 10secs, then I see new membership formed >>>>> messages for every >>>>> 10secs) >>>>> >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>>> >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [TOTEM ] A processor joined or left the membership and a new membership was formed. >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [CPG ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0) >>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]: [MAIN ] Completed service synchronization, ready to provide service. >>>> >>>> It does not sound like your network is particularly healthy. >>>> Are you using multicast or udpu? If multicast, it might be worth >>>> trying udpu >>>> >>>>> >>>>> Thanks >>>>> Lax >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: linux-cluster-bounces at redhat.com >>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew >>>>> Beekhof >>>>> Sent: Wednesday, October 29, 2014 2:42 PM >>>>> To: linux clustering >>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying >>>>> >>>>> >>>>>> On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon cpg_join error retrying'. I have a 2 Node setup with pacemaker and corosync. >>>>> >>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with. >>>>> >>>>>> >>>>>> Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is there any way to fix this issue? >>>>>> >>>>>> >>>>>> Thanks >>>>>> Lax >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster at redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lars.ellenberg at linbit.com Wed Nov 5 15:16:58 2014 From: lars.ellenberg at linbit.com (Lars Ellenberg) Date: Wed, 5 Nov 2014 16:16:58 +0100 Subject: [Linux-cluster] [ha-wg-technical] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <54546D67.6010606@alteeve.ca> References: <540D853F.3090109@redhat.com> <54546A5F.8030207@redhat.com> <54546D67.6010606@alteeve.ca> Message-ID: <20141105151658.GY20549@soda.linbit> On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote: > All the cool kids will be there. > > You want to be a cool kid, right? Well, no. ;-) But I'll still be there, and a few other Linbit'ers as well. Fabio, let us know what we could do to help make it happen. Lars > On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote: > > just a kind reminder. > > > >On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: > >> All, > >> > >> it's been almost 6 years since we had a face to face meeting for all > >> developers and vendors involved in Linux HA. > >> > >> I'd like to try and organize a new event and piggy-back with DevConf in > >> Brno [1]. > >> > >> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. > >> > >> My suggestion would be to have a 2 days dedicated HA summit the 4th and > >> the 5th of February. > >> > >> The goal for this meeting is to, beside to get to know each other and > >> all social aspect of those events, tune the directions of the various HA > >> projects and explore common areas of improvements. > >> > >> I am also very open to the idea of extending to 3 days, 1 one dedicated > >> to customers/users and 2 dedicated to developers, by starting the 3rd. > >> > >> Thoughts? > >> > >> Fabio > >> > >> PS Please hit reply all or include me in CC just to make sure I'll see > >> an answer :) > >> > >> [1] http://devconf.cz/ > > > > Could you please let me know by end of Nov if you are interested or not? > > > > I have heard only from few people so far. > > > > Cheers > > Fabio From fdinitto at redhat.com Tue Nov 11 08:17:56 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 11 Nov 2014 09:17:56 +0100 Subject: [Linux-cluster] [ha-wg] [ha-wg-technical] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141105151658.GY20549@soda.linbit> References: <540D853F.3090109@redhat.com> <54546A5F.8030207@redhat.com> <54546D67.6010606@alteeve.ca> <20141105151658.GY20549@soda.linbit> Message-ID: <5461C634.5000503@redhat.com> On 11/5/2014 4:16 PM, Lars Ellenberg wrote: > On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote: >> All the cool kids will be there. >> >> You want to be a cool kid, right? > > Well, no. ;-) > > But I'll still be there, > and a few other Linbit'ers as well. > > Fabio, let us know what we could do to help make it happen. > I appreciate the offer. Assuming we achieve quorum to do the event, I?d say that I?ll take of the meeting rooms/hotel logistics and one "lunch and learn" pizza event. It would be nice if others could organize a dinner event. Cheers Fabio > Lars > >> On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote: >>> just a kind reminder. >>> >>> On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: >>>> All, >>>> >>>> it's been almost 6 years since we had a face to face meeting for all >>>> developers and vendors involved in Linux HA. >>>> >>>> I'd like to try and organize a new event and piggy-back with DevConf in >>>> Brno [1]. >>>> >>>> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. >>>> >>>> My suggestion would be to have a 2 days dedicated HA summit the 4th and >>>> the 5th of February. >>>> >>>> The goal for this meeting is to, beside to get to know each other and >>>> all social aspect of those events, tune the directions of the various HA >>>> projects and explore common areas of improvements. >>>> >>>> I am also very open to the idea of extending to 3 days, 1 one dedicated >>>> to customers/users and 2 dedicated to developers, by starting the 3rd. >>>> >>>> Thoughts? >>>> >>>> Fabio >>>> >>>> PS Please hit reply all or include me in CC just to make sure I'll see >>>> an answer :) >>>> >>>> [1] http://devconf.cz/ >>> >>> Could you please let me know by end of Nov if you are interested or not? >>> >>> I have heard only from few people so far. >>> >>> Cheers >>> Fabio > _______________________________________________ > ha-wg mailing list > ha-wg at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg > From loulou07 at 126.com Wed Nov 12 08:20:42 2014 From: loulou07 at 126.com (=?GBK?B?s8LCpQ==?=) Date: Wed, 12 Nov 2014 16:20:42 +0800 (CST) Subject: [Linux-cluster] GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block Message-ID: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com> hi ,guys I have a two-nodes GFS2 cluster based on logic volume created by drbd block device /dev/drbd0. The two nodes' mount points of GFS2 filesystem are exported by samba share. Then there are two clients mounting and copying data into them respectively. Hours later, one client(assume just call it clientA) has finished all tasks, while the other client(assume just call it clientB) is still copying with very slow write speed(2-3MB/s, in normal case 40-100MB/s). Then I doubt that the there is something wrong with gfs2 filesystem on the corresponding server node that clientB mount to, and I try to write some data into it by excute commad as follows: [root at dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000 1000+0 records in 1000+0 records out 131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s It shows the write speed is too slow, almostly hangs up. I redo it once again, it hangs up. Then, I terminate it with ?Ctr + c?, and kernel reports error messages as follows: Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: bh = 25 (magic number) Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to acquire journal lock... Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1 Nov 12 11:50:11 dcs-229 kernel: Call Trace: Nov 12 11:50:11 dcs-229 kernel: [] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? wake_bit_function+0x0/0x50 Nov 12 11:50:11 dcs-229 kernel: [] ? gfs2_meta_check_ii+0x45/0x50 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? perf_event_task_sched_out+0x33/0x80 Nov 12 11:50:11 dcs-229 kernel: [] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? inode_go_lock+0x88/0xf0 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? do_promote+0x1bb/0x330 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? finish_xmote+0x178/0x410 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? glock_work_func+0x133/0x1d0 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? glock_work_func+0x0/0x1d0 [gfs2] Nov 12 11:50:11 dcs-229 kernel: [] ? worker_thread+0x170/0x2a0 Nov 12 11:50:11 dcs-229 kernel: [] ? autoremove_wake_function+0x0/0x40 Nov 12 11:50:11 dcs-229 kernel: [] ? worker_thread+0x0/0x2a0 Nov 12 11:50:11 dcs-229 kernel: [] ? kthread+0x96/0xa0 Nov 12 11:50:11 dcs-229 kernel: [] ? child_rip+0xa/0x20 Nov 12 11:50:11 dcs-229 kernel: [] ? kthread+0x0/0xa0 Nov 12 11:50:11 dcs-229 kernel: [] ? child_rip+0x0/0x20 Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed And the other node also reports error messages: Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1 Nov 12 11:48:50 dcs-226 kernel: Call Trace: Nov 12 11:48:50 dcs-226 kernel: [] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] Nov 12 11:48:50 dcs-226 kernel: [] ? wake_bit_function+0x0/0x50 Nov 12 11:48:50 dcs-226 kernel: [] ? gfs2_meta_check_ii+0x45/0x50 [gfs2] Nov 12 11:48:50 dcs-226 kernel: [] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] Nov 12 11:48:50 dcs-226 kernel: [] ? perf_event_task_sched_out+0x33/0x80 Nov 12 11:48:50 dcs-226 kernel: [] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2] Nov 12 11:48:50 dcs-226 kernel: [] ? inode_go_lock+0x88/0xf0 [gfs2] Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid metadata block Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: bh = 66213 (magic number) Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw this file system Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to unmount Nov 12 11:48:51 dcs-226 kernel: [] ? do_promote+0x1bb/0x330 [gfs2] Nov 12 11:48:51 dcs-226 kernel: [] ? finish_xmote+0x178/0x410 [gfs2] Nov 12 11:48:51 dcs-226 kernel: [] ? glock_work_func+0x133/0x1d0 [gfs2] Nov 12 11:48:51 dcs-226 kernel: [] ? glock_work_func+0x0/0x1d0 [gfs2] Nov 12 11:48:51 dcs-226 kernel: [] ? worker_thread+0x170/0x2a0 Nov 12 11:48:51 dcs-226 kernel: [] ? autoremove_wake_function+0x0/0x40 Nov 12 11:48:51 dcs-226 kernel: [] ? worker_thread+0x0/0x2a0 Nov 12 11:48:51 dcs-226 kernel: [] ? kthread+0x96/0xa0 Nov 12 11:48:51 dcs-226 kernel: [] ? child_rip+0xa/0x20 Nov 12 11:48:51 dcs-226 kernel: [] ? kthread+0x0/0xa0 Nov 12 11:48:51 dcs-226 kernel: [] ? child_rip+0x0/0x20 After this, mount points has crashed. what should i do? Anyone could help me? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradiptasingha at yahoo.com Wed Nov 12 12:10:28 2014 From: pradiptasingha at yahoo.com (Pradipta Singha) Date: Wed, 12 Nov 2014 04:10:28 -0800 Subject: [Linux-cluster] Deployment of Redhat cluster setup 6 to provide HA to oracle 11g R2 Message-ID: <1415794228.40474.YahooMailNeo@web161705.mail.bf1.yahoo.com> Hi Team, I have to setup 2 node Redhat cluster 6 to provide HA to oracle 11g R2 database with two instance.Kindly help me to setup the cluster. Below shared file system (shared in both node ) are for data file. /dev/mapper/vg1-lv3 gfs2 250G 2.2G 248G 1% /u01 /dev/mapper/vg1-lv4 gfs2 175G 268M 175G 1% /u02 /dev/mapper/vg1-lv5 gfs2 25G 259M 25G 2% /u03 /dev/mapper/vg1-lv6 gfs2 25G 259M 25G 2% /u04 /dev/mapper/vg1-lv7 gfs2 25G 259M 25G 2% /u05 /dev/mapper/vg1-lv8 gfs2 300G 259M 300G 1% /u06 /dev/mapper/vg1-lv9 gfs2 300G 1.8G 299G 1% /u07 And below local file system (both are local to both the node) are for database binary on both node- /dev/mapper/vg2-lv1_oracle ext4 99G 4.5G 89G 5% /oracle -> one instance for oracle database /dev/mapper/vg2-lv2_orafmw ext4 99G 60M 94G 1% /orafmw -> another for application instance Note-Two instance will run one for oracle database and another for application. Thankspradipta -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Wed Nov 12 13:20:38 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Wed, 12 Nov 2014 08:20:38 -0500 (EST) Subject: [Linux-cluster] GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block In-Reply-To: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com> References: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com> Message-ID: <1613629069.11759385.1415798438904.JavaMail.zimbra@redhat.com> ----- Original Message ----- > hi ,guys > I have a two-nodes GFS2 cluster based on logic volume created by drbd block > device /dev/drbd0. The two nodes' mount points of GFS2 filesystem are > exported by samba share. Then there are two clients mounting and copying > data into them respectively. Hours later, one client(assume just call it > clientA) has finished all tasks, while the other client(assume just call it > clientB) is still copying with very slow write speed(2-3MB/s, in normal case > 40-100MB/s). > Then I doubt that the there is something wrong with gfs2 filesystem on the > corresponding server node that clientB mount to, and I try to write some > data into it by > excute commad as follows: > [root at dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000 > 1000+0 records in > 1000+0 records out > 131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s > It shows the write speed is too slow, almostly hangs up. I redo it once > again, it hangs up. Then, I terminate it with ?Ctr + c?, and kernel reports > error messages as > follows: > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid > metadata block > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: bh = 25 (magic > number) > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: function = > gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to > acquire journal lock... > Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted > 2.6.32-358.el6.x86_64 #1 > Nov 12 11:50:11 dcs-229 kernel: Call Trace: > Nov 12 11:50:11 dcs-229 kernel: [] ? > gfs2_lm_withdraw+0x102/0x130 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > wake_bit_function+0x0/0x50 > Nov 12 11:50:11 dcs-229 kernel: [] ? > gfs2_meta_check_ii+0x45/0x50 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > perf_event_task_sched_out+0x33/0x80 > Nov 12 11:50:11 dcs-229 kernel: [] ? > gfs2_inode_refresh+0x25/0x2c0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > inode_go_lock+0x88/0xf0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? do_promote+0x1bb/0x330 > [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > finish_xmote+0x178/0x410 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > glock_work_func+0x133/0x1d0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > glock_work_func+0x0/0x1d0 [gfs2] > Nov 12 11:50:11 dcs-229 kernel: [] ? > worker_thread+0x170/0x2a0 > Nov 12 11:50:11 dcs-229 kernel: [] ? > autoremove_wake_function+0x0/0x40 > Nov 12 11:50:11 dcs-229 kernel: [] ? > worker_thread+0x0/0x2a0 > Nov 12 11:50:11 dcs-229 kernel: [] ? kthread+0x96/0xa0 > Nov 12 11:50:11 dcs-229 kernel: [] ? child_rip+0xa/0x20 > Nov 12 11:50:11 dcs-229 kernel: [] ? kthread+0x0/0xa0 > Nov 12 11:50:11 dcs-229 kernel: [] ? child_rip+0x0/0x20 > Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed > And the other node also reports error messages: > Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted > 2.6.32-358.el6.x86_64 #1 > Nov 12 11:48:50 dcs-226 kernel: Call Trace: > Nov 12 11:48:50 dcs-226 kernel: [] ? > gfs2_lm_withdraw+0x102/0x130 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [] ? > wake_bit_function+0x0/0x50 > Nov 12 11:48:50 dcs-226 kernel: [] ? > gfs2_meta_check_ii+0x45/0x50 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [] ? > gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [] ? > perf_event_task_sched_out+0x33/0x80 > Nov 12 11:48:50 dcs-226 kernel: [] ? > gfs2_inode_refresh+0x25/0x2c0 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: [] ? > inode_go_lock+0x88/0xf0 [gfs2] > Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid > metadata block > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: bh = 66213 > (magic number) > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: function = > gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393 > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw > this file system > Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to > unmount > Nov 12 11:48:51 dcs-226 kernel: [] ? do_promote+0x1bb/0x330 > [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [] ? > finish_xmote+0x178/0x410 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [] ? > glock_work_func+0x133/0x1d0 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [] ? > glock_work_func+0x0/0x1d0 [gfs2] > Nov 12 11:48:51 dcs-226 kernel: [] ? > worker_thread+0x170/0x2a0 > Nov 12 11:48:51 dcs-226 kernel: [] ? > autoremove_wake_function+0x0/0x40 > Nov 12 11:48:51 dcs-226 kernel: [] ? > worker_thread+0x0/0x2a0 > Nov 12 11:48:51 dcs-226 kernel: [] ? kthread+0x96/0xa0 > Nov 12 11:48:51 dcs-226 kernel: [] ? child_rip+0xa/0x20 > Nov 12 11:48:51 dcs-226 kernel: [] ? kthread+0x0/0xa0 > Nov 12 11:48:51 dcs-226 kernel: [] ? child_rip+0x0/0x20 > After this, mount points has crashed. what should i do? Anyone could help me? Hi, I recommend you open a support case with Red Hat. If you're not a Red Hat customer, you can open a bugzilla record, save off the metadata for that file system (with gfs2_edit savemeta) and post a link to it in the bugzilla. The hang and the assert should not happen. Regards, Bob Peterson Red Hat File Systems From fdinitto at redhat.com Mon Nov 24 14:54:33 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 24 Nov 2014 15:54:33 +0100 Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141124143957.GU2508@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> Message-ID: <547346A9.6010901@redhat.com> On 11/24/2014 3:39 PM, Lars Marowsky-Bree wrote: > On 2014-09-08T12:30:23, "Fabio M. Di Nitto" wrote: > > Folks, Fabio, > > thanks for organizing this and getting the ball rolling. And again sorry > for being late to said game; I was busy elsewhere. > > However, it seems that the idea for such a HA Summit in Brno/Feb 2015 > hasn't exactly fallen on fertile grounds, even with the suggested > user/client day. (Or if there was a lot of feedback, it wasn't > public.) > > I wonder why that is, and if/how we can make this more attractive? > > Frankly, as might have been obvious ;-), for me the venue is an issue. > It's not easy to reach, and I'm theoretically fairly close in Germany > already. > > I wonder if we could increase participation with a virtual meeting (on > either those dates or another), similar to what the Ceph Developer > Summit does? > > Those appear really productive and make it possible for a wide range of > interested parties from all over the world to attend, regardless of > travel times, or even just attend select sessions (that would otherwise > make it hard to justify travel expenses & time off). > > > Alternatively, would a relocation to a more connected venue help, such > as Vienna xor Prague? > > > I'd love to get some more feedback from the community. I agree. some feedback would be useful. > > As Fabio put it, yes, I *can* suck it up and go to Brno if that's where > everyone goes to play ;-), but I'd also prefer to have a broader > participation. dates and location were chosen to piggy-back with devconf.cz and allow people to travel for more than just HA Summit. I?d prefer, at least for this round, to keep dates/location and explore the option to allow people to join remotely. Afterall there are tons of tools between google hangouts and others that would allow that. Fabio From lists at alteeve.ca Mon Nov 24 15:06:45 2014 From: lists at alteeve.ca (Digimer) Date: Mon, 24 Nov 2014 10:06:45 -0500 Subject: [Linux-cluster] [Pacemaker] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <547346A9.6010901@redhat.com> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> Message-ID: <54734985.1060106@alteeve.ca> On 24/11/14 09:54 AM, Fabio M. Di Nitto wrote: > On 11/24/2014 3:39 PM, Lars Marowsky-Bree wrote: >> On 2014-09-08T12:30:23, "Fabio M. Di Nitto" wrote: >> >> Folks, Fabio, >> >> thanks for organizing this and getting the ball rolling. And again sorry >> for being late to said game; I was busy elsewhere. >> >> However, it seems that the idea for such a HA Summit in Brno/Feb 2015 >> hasn't exactly fallen on fertile grounds, even with the suggested >> user/client day. (Or if there was a lot of feedback, it wasn't >> public.) >> >> I wonder why that is, and if/how we can make this more attractive? I suspect a lot of it is that, given people's busy schedules, February seems far away. Also, I wonder how much discussion has happened outside of these lists. Is it really that there hasn't been much feedback? Fabio started this ball rolling, so I would be interested to hear what he's heard. >> Frankly, as might have been obvious ;-), for me the venue is an issue. >> It's not easy to reach, and I'm theoretically fairly close in Germany >> already. >> >> I wonder if we could increase participation with a virtual meeting (on >> either those dates or another), similar to what the Ceph Developer >> Summit does? Requested feedback given; Virtual meetings are never as good, and I really don't like this idea. In my experience, just as much productive decision making happens in the unofficial after-hours activities as during formal(ish) meetings/presentations. I think it is very important that the meeting remain in-person if at all possible. >> Those appear really productive and make it possible for a wide range of >> interested parties from all over the world to attend, regardless of >> travel times, or even just attend select sessions (that would otherwise >> make it hard to justify travel expenses & time off). >> >> Alternatively, would a relocation to a more connected venue help, such >> as Vienna xor Prague? Personally, I don't care where we meet, but I do believe Fabio already ruled out a relocation. >> I'd love to get some more feedback from the community. > > I agree. some feedback would be useful. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Mon Nov 24 15:14:26 2014 From: lists at alteeve.ca (Digimer) Date: Mon, 24 Nov 2014 10:14:26 -0500 Subject: [Linux-cluster] [Pacemaker] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141124151235.GX2508@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> Message-ID: <54734B52.50708@alteeve.ca> On 24/11/14 10:12 AM, Lars Marowsky-Bree wrote: > Beijing, the US, Tasmania (OK, one crazy guy), various countries in Oh, bring him! crazy++ -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From fdinitto at redhat.com Mon Nov 24 15:16:05 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 24 Nov 2014 16:16:05 +0100 Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141124151235.GX2508@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> Message-ID: <54734BB5.3010104@redhat.com> On 11/24/2014 4:12 PM, Lars Marowsky-Bree wrote: > On 2014-11-24T15:54:33, "Fabio M. Di Nitto" wrote: > >> dates and location were chosen to piggy-back with devconf.cz and allow >> people to travel for more than just HA Summit. > > Yeah, well, devconf.cz is not such an interesting event for those who do > not wear the fedora ;-) That would be the perfect opportunity for you to convert users to Suse ;) > >> I?d prefer, at least for this round, to keep dates/location and explore >> the option to allow people to join remotely. Afterall there are tons of >> tools between google hangouts and others that would allow that. > > That is, in my experience, the absolute worst. It creates second class > participants and is a PITA for everyone. I agree, it is still a way for people to join in tho. > > I know that an in-person meeting is useful, but we have a large team in > Beijing, the US, Tasmania (OK, one crazy guy), various countries in > Europe etc. > Yes same here. No difference.. we have one crazy guy in Australia.. Fabio From andrew at beekhof.net Mon Nov 24 21:31:33 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Tue, 25 Nov 2014 08:31:33 +1100 Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141124151235.GX2508@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> Message-ID: > On 25 Nov 2014, at 2:12 am, Lars Marowsky-Bree wrote: > > On 2014-11-24T15:54:33, "Fabio M. Di Nitto" wrote: > >> dates and location were chosen to piggy-back with devconf.cz and allow >> people to travel for more than just HA Summit. > > Yeah, well, devconf.cz is not such an interesting event for those who do > not wear the fedora ;-) Its not necessarily the conference of choice even for people that do. I just do what I'm told :) > >> I?d prefer, at least for this round, to keep dates/location and explore >> the option to allow people to join remotely. Afterall there are tons of >> tools between google hangouts and others that would allow that. > > That is, in my experience, the absolute worst. It creates second class > participants and is a PITA for everyone. > > I know that an in-person meeting is useful, but we have a large team in > Beijing, the US, Tasmania (OK, one crazy guy), various countries in > Europe etc. > > > Regards, > Lars > > -- > Architect Storage/HA > SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > _______________________________________________ > ha-wg-technical mailing list > ha-wg-technical at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical From andrew at beekhof.net Tue Nov 25 21:31:02 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Wed, 26 Nov 2014 08:31:02 +1100 Subject: [Linux-cluster] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141125095401.GG2522@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> Message-ID: > On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree wrote: > > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > >>> Yeah, well, devconf.cz is not such an interesting event for those who do >>> not wear the fedora ;-) >> That would be the perfect opportunity for you to convert users to Suse ;) > >>>> I?d prefer, at least for this round, to keep dates/location and explore >>>> the option to allow people to join remotely. Afterall there are tons of >>>> tools between google hangouts and others that would allow that. >>> That is, in my experience, the absolute worst. It creates second class >>> participants and is a PITA for everyone. >> I agree, it is still a way for people to join in tho. > > I personally disagree. In my experience, one either does a face-to-face > meeting, or a virtual one that puts everyone on the same footing. > Mixing both works really badly unless the team already knows each > other. > >>> I know that an in-person meeting is useful, but we have a large team in >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>> Europe etc. >> Yes same here. No difference.. we have one crazy guy in Australia.. > > Yeah, but you're already bringing him for your personal conference. > That's a bit different. ;-) > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > fill two days? Where would we want to collect them? Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. Other design-y topics: - SBD - degraded mode - improved notifications - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. From dvossel at redhat.com Tue Nov 25 21:46:01 2014 From: dvossel at redhat.com (David Vossel) Date: Tue, 25 Nov 2014 16:46:01 -0500 (EST) Subject: [Linux-cluster] [Pacemaker] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> Message-ID: <1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com> ----- Original Message ----- > > > On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree wrote: > > > > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > > > >>> Yeah, well, devconf.cz is not such an interesting event for those who do > >>> not wear the fedora ;-) > >> That would be the perfect opportunity for you to convert users to Suse ;) > > > >>>> I?d prefer, at least for this round, to keep dates/location and explore > >>>> the option to allow people to join remotely. Afterall there are tons of > >>>> tools between google hangouts and others that would allow that. > >>> That is, in my experience, the absolute worst. It creates second class > >>> participants and is a PITA for everyone. > >> I agree, it is still a way for people to join in tho. > > > > I personally disagree. In my experience, one either does a face-to-face > > meeting, or a virtual one that puts everyone on the same footing. > > Mixing both works really badly unless the team already knows each > > other. > > > >>> I know that an in-person meeting is useful, but we have a large team in > >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in > >>> Europe etc. > >> Yes same here. No difference.. we have one crazy guy in Australia.. > > > > Yeah, but you're already bringing him for your personal conference. > > That's a bit different. ;-) > > > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > > fill two days? Where would we want to collect them? > > Personally I'm interested in talking about scaling - with pacemaker-remoted > and/or a new messaging/membership layer. If we're going to talk about scaling, we should throw in our new docker support in the same discussion. Docker lends itself well to the "pet vs cattle" analogy. I see management of docker with pacemaker making quite a bit of sense now that we have the ability to scale into the "cattle" territory. > Other design-y topics: > - SBD > - degraded mode > - improved notifications > - containerisation of services (cgroups, docker, virt) > - resource-agents (upstream releases, handling of pull requests, testing) Yep, We definitely need to talk about the resource-agents. > > User-facing topics could include recent features (ie. pacemaker-remoted, > crm_resource --restart) and common deployment scenarios (eg. NFS) that > people get wrong. Adding to the list, it would be a good idea to talk about Deployment integration testing, what's going on with the phd project and why it's important regardless if you're interested in what the project functionally does. -- Vossel > _______________________________________________ > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > From lists at alteeve.ca Tue Nov 25 23:06:29 2014 From: lists at alteeve.ca (Digimer) Date: Tue, 25 Nov 2014 18:06:29 -0500 Subject: [Linux-cluster] [ha-wg-technical] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> Message-ID: <54750B75.6040207@alteeve.ca> On 25/11/14 04:31 PM, Andrew Beekhof wrote: >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. > > Other design-y topics: > - SBD > - degraded mode > - improved notifications This my be something my company can bring to the table. We just hired a dev whose principle goal is to develop and alert system for HA. We're modelling it heavily on the fence/resource agent model with a "scan core" and "scan agents". It's sort of like existing tools, but designed specifically for HA clusters and heavily focused on not interfering with the host more than at all necessary. By Feb., it should be mostly done. We're doing this for our own needs, but it might be a framework worth talking about, if nothing else to see if others consider it a fit. Of course, it will be entirely open source. *If* there is interest, I could put together a(n informal) talk on it with a demo. > - containerisation of services (cgroups, docker, virt) > - resource-agents (upstream releases, handling of pull requests, testing) > > User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From andrew at beekhof.net Tue Nov 25 23:11:25 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Wed, 26 Nov 2014 10:11:25 +1100 Subject: [Linux-cluster] [ha-wg-technical] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <54750B75.6040207@alteeve.ca> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <54750B75.6040207@alteeve.ca> Message-ID: <44714D54-50F1-4817-9B9D-09B64C128EC6@beekhof.net> > On 26 Nov 2014, at 10:06 am, Digimer wrote: > > On 25/11/14 04:31 PM, Andrew Beekhof wrote: >>> Yeah, but you're already bringing him for your personal conference. >>> That's a bit different. ;-) >>> >>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >>> fill two days? Where would we want to collect them? >> >> Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. >> >> Other design-y topics: >> - SBD >> - degraded mode >> - improved notifications > > This my be something my company can bring to the table. We just hired a dev whose principle goal is to develop and alert system for HA. We're modelling it heavily on the fence/resource agent model with a "scan core" and "scan agents". It's sort of like existing tools, but designed specifically for HA clusters and heavily focused on not interfering with the host more than at all necessary. By Feb., it should be mostly done. > > We're doing this for our own needs, but it might be a framework worth talking about, if nothing else to see if others consider it a fit. Of course, it will be entirely open source. *If* there is interest, I could put together a(n informal) talk on it with a demo. Definitely interesting > >> - containerisation of services (cgroups, docker, virt) >> - resource-agents (upstream releases, handling of pull requests, testing) >> >> User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. > > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Wed Nov 26 05:58:30 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 26 Nov 2014 00:58:30 -0500 Subject: [Linux-cluster] [Cluster-devel] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <54756A76.60905@fabbione.net> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <54756A76.60905@fabbione.net> Message-ID: <54756C06.4090508@alteeve.ca> On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote: > > > On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: >> >>>> Yeah, well, devconf.cz is not such an interesting event for those who do >>>> not wear the fedora ;-) >>> That would be the perfect opportunity for you to convert users to Suse ;) >> >>>>> I?d prefer, at least for this round, to keep dates/location and explore >>>>> the option to allow people to join remotely. Afterall there are tons of >>>>> tools between google hangouts and others that would allow that. >>>> That is, in my experience, the absolute worst. It creates second class >>>> participants and is a PITA for everyone. >>> I agree, it is still a way for people to join in tho. >> >> I personally disagree. In my experience, one either does a face-to-face >> meeting, or a virtual one that puts everyone on the same footing. >> Mixing both works really badly unless the team already knows each >> other. >> >>>> I know that an in-person meeting is useful, but we have a large team in >>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>>> Europe etc. >>> Yes same here. No difference.. we have one crazy guy in Australia.. >> >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > I?d say either a google doc or any random etherpad/wiki instance will do > just fine. > > As for the topics: > - corosync qdevice and plugins (network, disk, integration with sdb?, > others?) > - corosync RRP / libknet integration/replacement > - fence autodetection/autoconfiguration > > For the user facing topics (that is if there are enough participants and > I only got 1 user confirmation so far): > > - demos, cluster 101, tutorials > - get feedback > - get feedback > - get more feedback > > Fabio I'd be happy to do a cluster 101 or similar, if there is interest. Not sure if that would be particularly appealing to anyone coming to our meeting, as I think anyone interested is probably well past 101. :) Anyway, you guys know my background, let me know if there is a topic you'd like me to cover for the user side of things. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Wed Nov 26 06:10:52 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 26 Nov 2014 01:10:52 -0500 Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <54756A76.60905@fabbione.net> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <54756A76.60905@fabbione.net> Message-ID: <54756EEC.7050905@alteeve.ca> On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote: > > > On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: >> >>>> Yeah, well, devconf.cz is not such an interesting event for those who do >>>> not wear the fedora ;-) >>> That would be the perfect opportunity for you to convert users to Suse ;) >> >>>>> I?d prefer, at least for this round, to keep dates/location and explore >>>>> the option to allow people to join remotely. Afterall there are tons of >>>>> tools between google hangouts and others that would allow that. >>>> That is, in my experience, the absolute worst. It creates second class >>>> participants and is a PITA for everyone. >>> I agree, it is still a way for people to join in tho. >> >> I personally disagree. In my experience, one either does a face-to-face >> meeting, or a virtual one that puts everyone on the same footing. >> Mixing both works really badly unless the team already knows each >> other. >> >>>> I know that an in-person meeting is useful, but we have a large team in >>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>>> Europe etc. >>> Yes same here. No difference.. we have one crazy guy in Australia.. >> >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > I?d say either a google doc or any random etherpad/wiki instance will do > just fine. > > As for the topics: > - corosync qdevice and plugins (network, disk, integration with sdb?, > others?) > - corosync RRP / libknet integration/replacement > - fence autodetection/autoconfiguration > > For the user facing topics (that is if there are enough participants and > I only got 1 user confirmation so far): > > - demos, cluster 101, tutorials > - get feedback > - get feedback > - get more feedback > > Fabio Ok, I do have a topic I want to add; Merging the dozen different mailing lists, IRC channels and other support forums. This thread is a good example of the thinness that the community is spread over. A 'dev', 'user', 'announce' list should be enough for all HA. Likewise, one IRC channel should be enough, too. The trick will be discussing this without bikeshedding. :) digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From andrew at beekhof.net Wed Nov 26 06:28:25 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Wed, 26 Nov 2014 17:28:25 +1100 Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <54756A76.60905@fabbione.net> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <54756A76.60905@fabbione.net> Message-ID: > On 26 Nov 2014, at 4:51 pm, Fabio M. Di Nitto wrote: > > > > On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: >> >>>> Yeah, well, devconf.cz is not such an interesting event for those who do >>>> not wear the fedora ;-) >>> That would be the perfect opportunity for you to convert users to Suse ;) >> >>>>> I?d prefer, at least for this round, to keep dates/location and explore >>>>> the option to allow people to join remotely. Afterall there are tons of >>>>> tools between google hangouts and others that would allow that. >>>> That is, in my experience, the absolute worst. It creates second class >>>> participants and is a PITA for everyone. >>> I agree, it is still a way for people to join in tho. >> >> I personally disagree. In my experience, one either does a face-to-face >> meeting, or a virtual one that puts everyone on the same footing. >> Mixing both works really badly unless the team already knows each >> other. >> >>>> I know that an in-person meeting is useful, but we have a large team in >>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in >>>> Europe etc. >>> Yes same here. No difference.. we have one crazy guy in Australia.. >> >> Yeah, but you're already bringing him for your personal conference. >> That's a bit different. ;-) >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? > > I?d say either a google doc or any random etherpad/wiki instance will do > just fine. -ENOGOOGLE > > As for the topics: > - corosync qdevice and plugins (network, disk, integration with sdb?, > others?) > - corosync RRP / libknet integration/replacement > - fence autodetection/autoconfiguration > > For the user facing topics (that is if there are enough participants and > I only got 1 user confirmation so far): > > - demos, cluster 101, tutorials > - get feedback > - get feedback > - get more feedback > > Fabio > _______________________________________________ > ha-wg-technical mailing list > ha-wg-technical at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical From bubble at hoster-ok.com Wed Nov 26 15:53:50 2014 From: bubble at hoster-ok.com (Vladislav Bogdanov) Date: Wed, 26 Nov 2014 18:53:50 +0300 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141125095401.GG2522@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> Message-ID: <5475F78E.1040700@hoster-ok.com> 25.11.2014 12:54, Lars Marowsky-Bree wrote:... > > OK, let's switch tracks a bit. What *topics* do we actually have? Can we > fill two days? Where would we want to collect them? > Just my 2c. - It would be interesting to get some bird-view information on what C APIs corosync and pacemaker currently provide to application developers (one immediate use-case is in-app monitoring of the cluster events). - One more (more developer-bounded) topic could be a "resource degraded state" support. From the user perspective it would be nice to have. One immediate example is iscsi connection to several portals. When some portals are not accessible, connection still may work, but in the "degraded" state. Best, Vladislav From raju.rajsand at gmail.com Wed Nov 26 18:43:11 2014 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Thu, 27 Nov 2014 00:13:11 +0530 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <5475F78E.1040700@hoster-ok.com> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <5475F78E.1040700@hoster-ok.com> Message-ID: Greetings, Guys, I am a poor Indian whom US of A Abhors and have successfully deployed over 5 centos/rhel clusts vaying from 4-6. May I Know where this event is held? Why don't you shift it to India which is much less expensive for all. I will try to invigorate all ILUG groups as much as I can. On Wed, Nov 26, 2014 at 9:23 PM, Vladislav Bogdanov wrote: > 25.11.2014 12:54, Lars Marowsky-Bree wrote:... >> >> OK, let's switch tracks a bit. What *topics* do we actually have? Can we >> fill two days? Where would we want to collect them? >> > > Just my 2c. > > - It would be interesting to get some bird-view information > on what C APIs corosync and pacemaker currently provide to application > developers (one immediate use-case is in-app monitoring of the cluster > events). > > - One more (more developer-bounded) topic could be a "resource degraded > state" support. From the user perspective it would be nice to have. One > immediate example is iscsi connection to several portals. When some > portals are not accessible, connection still may work, but in the > "degraded" state. > > Best, > Vladislav > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Regards, Rajagopal From misch at schwartzkopff.org Wed Nov 26 19:00:33 2014 From: misch at schwartzkopff.org (Michael Schwartzkopff) Date: Wed, 26 Nov 2014 20:00:33 +0100 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> Message-ID: <1875415.HLenkzVapo@nb003> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: > Greetings, > > > Guys, I am a poor Indian whom US of A Abhors and have successfully > deployed over 5 centos/rhel clusts vaying from 4-6. > > May I Know where this event is held? Brno, Slovakia. Next international Airport: Vienna. > Why don't you shift it to India which is much less expensive for all. Because flights to indea would be more expensive for most of the participants. Greetings, -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 M?nchen Tel: (0162) 1650044 Fax: (089) 620 304 13 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: This is a digitally signed message part. URL: From mgrac at redhat.com Wed Nov 26 21:18:15 2014 From: mgrac at redhat.com (Marek "marx" Grac) Date: Wed, 26 Nov 2014 22:18:15 +0100 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <1875415.HLenkzVapo@nb003> References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> <1875415.HLenkzVapo@nb003> Message-ID: <54764397.60701@redhat.com> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: > Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: >> Greetings, >> >> >> Guys, I am a poor Indian whom US of A Abhors and have successfully >> deployed over 5 centos/rhel clusts vaying from 4-6. >> >> May I Know where this event is held? > Brno, Slovakia. Next international Airport: Vienna. Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava m, From andrew at beekhof.net Wed Nov 26 22:26:52 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Thu, 27 Nov 2014 09:26:52 +1100 Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Pacemaker] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <20141126154119.GN2522@suse.de> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com> <20141126154119.GN2522@suse.de> Message-ID: <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net> > On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree wrote: > > On 2014-11-25T16:46:01, David Vossel wrote: > > Okay, okay, apparently we have got enough topics to discuss. I'll > grumble a bit more about Brno, but let's get the organisation of that > thing on track ... Sigh. Always so much work! > > I'm assuming arrival on the 3rd and departure on the 6th would be the > plan? > >>> Personally I'm interested in talking about scaling - with pacemaker-remoted >>> and/or a new messaging/membership layer. >> If we're going to talk about scaling, we should throw in our new docker support >> in the same discussion. Docker lends itself well to the "pet vs cattle" analogy. >> I see management of docker with pacemaker making quite a bit of sense now that we >> have the ability to scale into the "cattle" territory. > > While we're on that, I'd like to throw in a heretic thought and suggest > that one might want to look at etcd and fleetd. Nod. I suspect the next evolutionary step will be to sit on a NoSQL/Big-data kind of table.... somehow. I was intending to head down that path last year when I did all that cib work. > >>> Other design-y topics: >>> - SBD > > Point taken. I have actually not forgotten this Andrew, and am reading > your development. I probably just need to pull the code over ... ok > >>> - degraded mode >>> - improved notifications >>> - containerisation of services (cgroups, docker, virt) >>> - resource-agents (upstream releases, handling of pull requests, testing) >> >> Yep, We definitely need to talk about the resource-agents. > > Agreed. > >>> User-facing topics could include recent features (ie. pacemaker-remoted, >>> crm_resource --restart) and common deployment scenarios (eg. NFS) that >>> people get wrong. >> Adding to the list, it would be a good idea to talk about Deployment >> integration testing, what's going on with the phd project and why it's >> important regardless if you're interested in what the project functionally >> does. > > OK. So QA is within scope as well. It seems the agenda will fill up > quite nicely. > > > Regards, > Lars > > -- > Architect Storage/HA > SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > _______________________________________________ > ha-wg-technical mailing list > ha-wg-technical at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical From andrew at beekhof.net Wed Nov 26 22:28:45 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Thu, 27 Nov 2014 09:28:45 +1100 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <54764397.60701@redhat.com> References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> <1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com> Message-ID: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> > On 27 Nov 2014, at 8:18 am, Marek marx Grac wrote: > > > On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: >> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: >>> Greetings, >>> >>> >>> Guys, I am a poor Indian whom US of A Abhors and have successfully >>> deployed over 5 centos/rhel clusts vaying from 4-6. >>> >>> May I Know where this event is held? >> Brno, Slovakia. Next international Airport: Vienna. > Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava Anyone want to meet in munich and share a car? :-) From misch at schwartzkopff.org Wed Nov 26 22:51:58 2014 From: misch at schwartzkopff.org (Michael Schwartzkopff) Date: Wed, 26 Nov 2014 23:51:58 +0100 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> References: <540D853F.3090109@redhat.com> <54764397.60701@redhat.com> <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> Message-ID: <3827969.fRP4QWWuTV@nb003> Am Donnerstag, 27. November 2014, 09:28:45 schrieb Andrew Beekhof: > > On 27 Nov 2014, at 8:18 am, Marek marx Grac wrote: > > > > On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: > >> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: > >>> Greetings, > >>> > >>> > >>> Guys, I am a poor Indian whom US of A Abhors and have successfully > >>> deployed over 5 centos/rhel clusts vaying from 4-6. > >>> > >>> May I Know where this event is held? > >> > >> Brno, Slovakia. Next international Airport: Vienna. > > > > Brno is quite close to Slovakia but it is in Czech Republic. International > > airports around are Vienna, Prague and mostly low-cost ones in Brno and > > Bratislava > Anyone want to meet in munich and share a car? :-) Quite a ride: google says 6 hours. But you are welcome. I'll dirve. Anyone else? Sorry. -ENOGOOGLE, i forgot. -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 M?nchen Tel: (0162) 1650044 Fax: (089) 620 304 13 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 230 bytes Desc: This is a digitally signed message part. URL: From lists at alteeve.ca Wed Nov 26 22:58:51 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 26 Nov 2014 17:58:51 -0500 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> <1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com> <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> Message-ID: <54765B2B.3060703@alteeve.ca> On 26/11/14 05:28 PM, Andrew Beekhof wrote: > >> On 27 Nov 2014, at 8:18 am, Marek marx Grac wrote: >> >> >> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: >>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: >>>> Greetings, >>>> >>>> >>>> Guys, I am a poor Indian whom US of A Abhors and have successfully >>>> deployed over 5 centos/rhel clusts vaying from 4-6. >>>> >>>> May I Know where this event is held? >>> Brno, Slovakia. Next international Airport: Vienna. >> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava > > Anyone want to meet in munich and share a car? :-) I might be up for that. I've not looked into flights yet, though I do have a standing invitation for beer in Vienna, so I'm sort of planning to fly through there. Apparently there is a very convenient bus from Vienna to Brno. Why Munich? (Don't get me wrong, I loved it there last year!) -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From andrew at beekhof.net Thu Nov 27 00:40:31 2014 From: andrew at beekhof.net (Andrew Beekhof) Date: Thu, 27 Nov 2014 11:40:31 +1100 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <54765B2B.3060703@alteeve.ca> References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> <1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com> <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> <54765B2B.3060703@alteeve.ca> Message-ID: <7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net> > On 27 Nov 2014, at 9:58 am, Digimer wrote: > > On 26/11/14 05:28 PM, Andrew Beekhof wrote: >> >>> On 27 Nov 2014, at 8:18 am, Marek marx Grac wrote: >>> >>> >>> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: >>>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: >>>>> Greetings, >>>>> >>>>> >>>>> Guys, I am a poor Indian whom US of A Abhors and have successfully >>>>> deployed over 5 centos/rhel clusts vaying from 4-6. >>>>> >>>>> May I Know where this event is held? >>>> Brno, Slovakia. Next international Airport: Vienna. >>> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava >> >> Anyone want to meet in munich and share a car? :-) > > I might be up for that. I've not looked into flights yet, though I do have a standing invitation for beer in Vienna, so I'm sort of planning to fly through there. Apparently there is a very convenient bus from Vienna to Brno. > > Why Munich? (Don't get me wrong, I loved it there last year!) Its both a) a hub and b) where I used to live :) From lists at alteeve.ca Thu Nov 27 04:13:54 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 26 Nov 2014 23:13:54 -0500 Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015 In-Reply-To: <7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net> References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com> <1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com> <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net> <54765B2B.3060703@alteeve.ca> <7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net> Message-ID: <5476A502.6010508@alteeve.ca> On 26/11/14 07:40 PM, Andrew Beekhof wrote: > >> On 27 Nov 2014, at 9:58 am, Digimer wrote: >> >> On 26/11/14 05:28 PM, Andrew Beekhof wrote: >>> >>>> On 27 Nov 2014, at 8:18 am, Marek marx Grac wrote: >>>> >>>> >>>> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote: >>>>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan: >>>>>> Greetings, >>>>>> >>>>>> >>>>>> Guys, I am a poor Indian whom US of A Abhors and have successfully >>>>>> deployed over 5 centos/rhel clusts vaying from 4-6. >>>>>> >>>>>> May I Know where this event is held? >>>>> Brno, Slovakia. Next international Airport: Vienna. >>>> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava >>> >>> Anyone want to meet in munich and share a car? :-) >> >> I might be up for that. I've not looked into flights yet, though I do have a standing invitation for beer in Vienna, so I'm sort of planning to fly through there. Apparently there is a very convenient bus from Vienna to Brno. >> >> Why Munich? (Don't get me wrong, I loved it there last year!) > > Its both a) a hub and b) where I used to live :) Ah. Well, I'll see how the prices come out. I didn't realize it was a 6h drive. On the other hand, it'd be a great way to see Europe beyond hotels/airports... Are you serious about the 6h drive though? That's quite the ride. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From kgronlund at suse.com Thu Nov 27 12:33:30 2014 From: kgronlund at suse.com (Kristoffer =?utf-8?Q?Gr=C3=B6nlund?=) Date: Thu, 27 Nov 2014 13:33:30 +0100 Subject: [Linux-cluster] [Pacemaker] [ha-wg-technical] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com> <20141126154119.GN2522@suse.de> <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net> Message-ID: <87lhmw6cx1.fsf@krigpad.site> >> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree wrote: >> >> On 2014-11-25T16:46:01, David Vossel wrote: >> >> Okay, okay, apparently we have got enough topics to discuss. I'll >> grumble a bit more about Brno, but let's get the organisation of that >> thing on track ... Sigh. Always so much work! >> Will Chris Feist be at the summit? I would be happy to have a roundtable discussion or something similar about clients, exchange ideas and so on. I don't necessarily think that there is an urgent need to unify the efforts code-wise, but I think there is a lot we could do together on the level of idea exchange without giving up our independence, so to speak ;) Of course I would be happy to talk about such things with anyone else who is interested as well. -- // Kristoffer Gr?nlund // kgronlund at suse.com From fdinitto at redhat.com Thu Nov 27 12:56:34 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 27 Nov 2014 13:56:34 +0100 Subject: [Linux-cluster] [ha-wg-technical] [Pacemaker] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015 In-Reply-To: <87lhmw6cx1.fsf@krigpad.site> References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de> <547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de> <54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de> <1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com> <20141126154119.GN2522@suse.de> <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net> <87lhmw6cx1.fsf@krigpad.site> Message-ID: <54771F82.6050800@redhat.com> On 11/27/2014 1:33 PM, Kristoffer Gr?nlund wrote: > >>> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree wrote: >>> >>> On 2014-11-25T16:46:01, David Vossel wrote: >>> >>> Okay, okay, apparently we have got enough topics to discuss. I'll >>> grumble a bit more about Brno, but let's get the organisation of that >>> thing on track ... Sigh. Always so much work! >>> > > Will Chris Feist be at the summit? I would be happy to have a roundtable > discussion or something similar about clients, exchange ideas and so > on. I don't necessarily think that there is an urgent need to unify the > efforts code-wise, but I think there is a lot we could do together on > the level of idea exchange without giving up our independence, so to > speak ;) > > Of course I would be happy to talk about such things with anyone else > who is interested as well. > sorry, I keep replying from my private email address... Yes Chris will be there too. Fabio From lists at alteeve.ca Thu Nov 27 16:52:18 2014 From: lists at alteeve.ca (Digimer) Date: Thu, 27 Nov 2014 11:52:18 -0500 Subject: [Linux-cluster] Wiki for planning created - Re: [Pacemaker] [RFC] Organizing HA Summit 2015 In-Reply-To: <540D853F.3090109@redhat.com> References: <540D853F.3090109@redhat.com> Message-ID: <547756C2.1060504@alteeve.ca> I just created a dedicated/fresh wiki for planning and organizing: http://plan.alteeve.ca/index.php/Main_Page Other than the domain, it has no association with any existing project, so it should be a neutral enough platform. Also, it's not owned by $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If there is concern, I can setup https. If no one else gets to it before me, I'll start collating the data from the mailing list onto that wiki tomorrow (maaaybe today, depends). The wiki requires registration, but that's it. I'm not bothering with captchas because, in my experience, spammer walk right through them anyway. I do have edits email me, so I can catch and roll back any spam quickly. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Fri Nov 28 05:37:53 2014 From: lists at alteeve.ca (Digimer) Date: Fri, 28 Nov 2014 00:37:53 -0500 Subject: [Linux-cluster] [Cluster-devel] Wiki for planning created - Re: [Pacemaker] [RFC] Organizing HA Summit 2015 In-Reply-To: <5478090E.6030804@fabbione.net> References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca> <5478090E.6030804@fabbione.net> Message-ID: <54780A31.8060806@alteeve.ca> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote: > > > On 11/27/2014 5:52 PM, Digimer wrote: >> I just created a dedicated/fresh wiki for planning and organizing: >> >> http://plan.alteeve.ca/index.php/Main_Page >> >> Other than the domain, it has no association with any existing project, >> so it should be a neutral enough platform. Also, it's not owned by >> $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If >> there is concern, I can setup https. >> >> If no one else gets to it before me, I'll start collating the data from >> the mailing list onto that wiki tomorrow (maaaybe today, depends). >> >> The wiki requires registration, but that's it. I'm not bothering with >> captchas because, in my experience, spammer walk right through them >> anyway. I do have edits email me, so I can catch and roll back any spam >> quickly. >> > > Awesome! thanks for taking care of it. Do you have a chance to add also > an instance of etherpad to the site? > > Mostly to do collaborative editing while we sit all around the same table. > > Otherwise we can use a public instance and copy paste info after that in > the wiki. > > Fabio Never tried setting up etherpad before, but if it runs on rhel 6, I should have no problem setting it up. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From rajpatel at redhat.com Fri Nov 28 05:51:20 2014 From: rajpatel at redhat.com (Rajat) Date: Fri, 28 Nov 2014 11:21:20 +0530 Subject: [Linux-cluster] Cluster Overhead I/O, Network, Memory, CPU Message-ID: <54780D58.9010400@redhat.com> Hey Team, Our customer is using RHEL 5.X and RHEL 6.X as Cluster in they production stack. Customer is looking is there any doc/white paper which can share they management as cluster service usages on Disk % Network % Memory % CPU % Gratitude -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vc.jpg Type: image/jpeg Size: 22087 bytes Desc: not available URL: From jpokorny at redhat.com Fri Nov 28 19:10:06 2014 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Fri, 28 Nov 2014 20:10:06 +0100 Subject: [Linux-cluster] [Pacemaker] [Cluster-devel] Wiki for planning created - Re: [RFC] Organizing HA Summit 2015 In-Reply-To: <54780A31.8060806@alteeve.ca> References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca> <5478090E.6030804@fabbione.net> <54780A31.8060806@alteeve.ca> Message-ID: <20141128191006.GD31780@redhat.com> On 28/11/14 00:37 -0500, Digimer wrote: > On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote: >> On 11/27/2014 5:52 PM, Digimer wrote: >>> I just created a dedicated/fresh wiki for planning and organizing: >>> >>> http://plan.alteeve.ca/index.php/Main_Page >>> >>> [...] >> >> Awesome! thanks for taking care of it. Do you have a chance to add also >> an instance of etherpad to the site? >> >> Mostly to do collaborative editing while we sit all around the same table. >> >> Otherwise we can use a public instance and copy paste info after that in >> the wiki. >> > Never tried setting up etherpad before, but if it runs on rhel 6, I should > have no problem setting it up. Provided no conspiracy to be started, there are a bunch of popular instances, e.g. http://piratepad.net/ -- Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From jpokorny at redhat.com Fri Nov 28 23:56:46 2014 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Sat, 29 Nov 2014 00:56:46 +0100 Subject: [Linux-cluster] Cluster Overhead I/O, Network, Memory, CPU In-Reply-To: <54780D58.9010400@redhat.com> References: <54780D58.9010400@redhat.com> Message-ID: <20141128235646.GG31780@redhat.com> On 28/11/14 11:21 +0530, Rajat wrote: > Hey Team, Perhaps Friday kicked in and this was intended for internal RH lists. Don't cluster around this too much :) -- Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From fdinitto at redhat.com Sat Nov 29 05:45:03 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Sat, 29 Nov 2014 06:45:03 +0100 Subject: [Linux-cluster] [Cluster-devel] [Pacemaker] Wiki for planning created - Re: [RFC] Organizing HA Summit 2015 In-Reply-To: <20141128191006.GD31780@redhat.com> References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca> <5478090E.6030804@fabbione.net> <54780A31.8060806@alteeve.ca> <20141128191006.GD31780@redhat.com> Message-ID: <54795D5F.2080203@redhat.com> On 11/28/2014 8:10 PM, Jan Pokorn? wrote: > On 28/11/14 00:37 -0500, Digimer wrote: >> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote: >>> On 11/27/2014 5:52 PM, Digimer wrote: >>>> I just created a dedicated/fresh wiki for planning and organizing: >>>> >>>> http://plan.alteeve.ca/index.php/Main_Page >>>> >>>> [...] >>> >>> Awesome! thanks for taking care of it. Do you have a chance to add also >>> an instance of etherpad to the site? >>> >>> Mostly to do collaborative editing while we sit all around the same table. >>> >>> Otherwise we can use a public instance and copy paste info after that in >>> the wiki. >>> >> Never tried setting up etherpad before, but if it runs on rhel 6, I should >> have no problem setting it up. > > Provided no conspiracy to be started, there are a bunch of popular > instances, e.g. http://piratepad.net/ > Right, some of them only store etherpads for 30 days. Just be careful the one we choose or we make our own. Fabio From lists at alteeve.ca Sat Nov 29 05:50:50 2014 From: lists at alteeve.ca (Digimer) Date: Sat, 29 Nov 2014 00:50:50 -0500 Subject: [Linux-cluster] [Cluster-devel] [Pacemaker] Wiki for planning created - Re: [RFC] Organizing HA Summit 2015 In-Reply-To: <54795D5F.2080203@redhat.com> References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca> <5478090E.6030804@fabbione.net> <54780A31.8060806@alteeve.ca> <20141128191006.GD31780@redhat.com> <54795D5F.2080203@redhat.com> Message-ID: <54795EBA.2030807@alteeve.ca> On 29/11/14 12:45 AM, Fabio M. Di Nitto wrote: > > > On 11/28/2014 8:10 PM, Jan Pokorn? wrote: >> On 28/11/14 00:37 -0500, Digimer wrote: >>> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote: >>>> On 11/27/2014 5:52 PM, Digimer wrote: >>>>> I just created a dedicated/fresh wiki for planning and organizing: >>>>> >>>>> http://plan.alteeve.ca/index.php/Main_Page >>>>> >>>>> [...] >>>> >>>> Awesome! thanks for taking care of it. Do you have a chance to add also >>>> an instance of etherpad to the site? >>>> >>>> Mostly to do collaborative editing while we sit all around the same table. >>>> >>>> Otherwise we can use a public instance and copy paste info after that in >>>> the wiki. >>>> >>> Never tried setting up etherpad before, but if it runs on rhel 6, I should >>> have no problem setting it up. >> >> Provided no conspiracy to be started, there are a bunch of popular >> instances, e.g. http://piratepad.net/ >> > > Right, some of them only store etherpads for 30 days. Just be careful > the one we choose or we make our own. > > Fabio I'll set one up, but I'll need a few days, I'm out of the country at the moment. It's not needed until the conference, is it? Or will you want to have it before then? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Sun Nov 30 05:56:37 2014 From: lists at alteeve.ca (Digimer) Date: Sun, 30 Nov 2014 00:56:37 -0500 Subject: [Linux-cluster] [ha-wg-technical] Wiki for planning created - Re: [Pacemaker] [RFC] Organizing HA Summit 2015 In-Reply-To: <547756C2.1060504@alteeve.ca> References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca> Message-ID: <547AB195.5030100@alteeve.ca> On 27/11/14 11:52 AM, Digimer wrote: > I just created a dedicated/fresh wiki for planning and organizing: > > http://plan.alteeve.ca/index.php/Main_Page > > Other than the domain, it has no association with any existing project, > so it should be a neutral enough platform. Also, it's not owned by > $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If > there is concern, I can setup https. > > If no one else gets to it before me, I'll start collating the data from > the mailing list onto that wiki tomorrow (maaaybe today, depends). > > The wiki requires registration, but that's it. I'm not bothering with > captchas because, in my experience, spammer walk right through them > anyway. I do have edits email me, so I can catch and roll back any spam > quickly. Ok, I was getting 3~5 spam accounts created per day. To deal with this, I setup 'questy' captcha program with five (random) questions that should be easy to answer, even for non-english speakers. Just the same, if anyone has any trouble registering, please feel free to email me directly and I will be happy to help. Madi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?