From fdinitto at redhat.com  Sat Nov  1 05:06:39 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sat, 01 Nov 2014 06:06:39 +0100
Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015
In-Reply-To: <540D853F.3090109@redhat.com>
References: <540D853F.3090109@redhat.com>
Message-ID: <54546A5F.8030207@redhat.com>

just a kind reminder.

On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
> All,
> 
> it's been almost 6 years since we had a face to face meeting for all
> developers and vendors involved in Linux HA.
> 
> I'd like to try and organize a new event and piggy-back with DevConf in
> Brno [1].
> 
> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
> 
> My suggestion would be to have a 2 days dedicated HA summit the 4th and
> the 5th of February.
> 
> The goal for this meeting is to, beside to get to know each other and
> all social aspect of those events, tune the directions of the various HA
> projects and explore common areas of improvements.
> 
> I am also very open to the idea of extending to 3 days, 1 one dedicated
> to customers/users and 2 dedicated to developers, by starting the 3rd.
> 
> Thoughts?
> 
> Fabio
> 
> PS Please hit reply all or include me in CC just to make sure I'll see
> an answer :)
> 
> [1] http://devconf.cz/

Could you please let me know by end of Nov if you are interested or not?

I have heard only from few people so far.

Cheers
Fabio


From lists at alteeve.ca  Sat Nov  1 05:19:35 2014
From: lists at alteeve.ca (Digimer)
Date: Sat, 01 Nov 2014 01:19:35 -0400
Subject: [Linux-cluster] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit
	2015
In-Reply-To: <54546A5F.8030207@redhat.com>
References: <540D853F.3090109@redhat.com> <54546A5F.8030207@redhat.com>
Message-ID: <54546D67.6010606@alteeve.ca>

All the cool kids will be there.

You want to be a cool kid, right?

:p

On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote:
> just a kind reminder.
>
> On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
>> All,
>>
>> it's been almost 6 years since we had a face to face meeting for all
>> developers and vendors involved in Linux HA.
>>
>> I'd like to try and organize a new event and piggy-back with DevConf in
>> Brno [1].
>>
>> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
>>
>> My suggestion would be to have a 2 days dedicated HA summit the 4th and
>> the 5th of February.
>>
>> The goal for this meeting is to, beside to get to know each other and
>> all social aspect of those events, tune the directions of the various HA
>> projects and explore common areas of improvements.
>>
>> I am also very open to the idea of extending to 3 days, 1 one dedicated
>> to customers/users and 2 dedicated to developers, by starting the 3rd.
>>
>> Thoughts?
>>
>> Fabio
>>
>> PS Please hit reply all or include me in CC just to make sure I'll see
>> an answer :)
>>
>> [1] http://devconf.cz/
>
> Could you please let me know by end of Nov if you are interested or not?
>
> I have heard only from few people so far.
>
> Cheers
> Fabio
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From jfriesse at redhat.com  Mon Nov  3 09:02:57 2014
From: jfriesse at redhat.com (Jan Friesse)
Date: Mon, 03 Nov 2014 10:02:57 +0100
Subject: [Linux-cluster] daemon cpg_join error retrying
In-Reply-To: <D8FF83C6AA1D214B9F7B36281C06366713D7388D@xmb-rcd-x09.cisco.com>
References: <D8FF83C6AA1D214B9F7B36281C06366713D72E18@xmb-rcd-x09.cisco.com>
	<A2E74581-3795-472A-8E27-B18DE5E6E4C1@beekhof.net>
	<D8FF83C6AA1D214B9F7B36281C06366713D72E64@xmb-rcd-x09.cisco.com>
	<E7C2C0CC-7DC3-45D8-BBD3-0D0356B63628@beekhof.net>
	<D8FF83C6AA1D214B9F7B36281C06366713D72F9A@xmb-rcd-x09.cisco.com>
	<68ABE774-8755-416F-829B-CED002B14D03@beekhof.net>
	<5451F581.5050100@redhat.com>
	<D8FF83C6AA1D214B9F7B36281C06366713D7339B@xmb-rcd-x09.cisco.com>
	<5453BC31.2000102@redhat.com>
	<D8FF83C6AA1D214B9F7B36281C06366713D7388D@xmb-rcd-x09.cisco.com>
Message-ID: <545744C1.4080007@redhat.com>

Lax,

> 
> 
> 
>> This is just weird. What exact version of corosync are you running? Do you have latest Z stream?
> I am running  on Corosync 1.4.1 and pacemaker version is 1.1.8-7.el6

Are you running package version (like RHEL/CentOS) or did you compiled
package by yourself? If package version, can you please send exact
version (like 1.4.1-17.1)?

> How should I get access to Z stream? Is there a specific dir I should pick this z stream from?

For RHEL you are subscribed to RHN, so you should get it automatically,
for CentOS, you should get it automatically.


Regards,
  Honza

> 
> Thanks
> Lax
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse
> Sent: Friday, October 31, 2014 9:43 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
> 
> Lax,
> 
> 
>> Thanks Honza. Here is what I was doing,
>>
>>> usual reasons for this problem:
>>> 1. mtu is too high and fragmented packets are not enabled (take a 
>>> look to netmtu configuration option)
>> I am running with default mtu settings which is 1500. And I do see my interface(eth1) on the box does have MTU as 1500 too.
>>
> 
> Keep in mind that if they are not directly connected, switch can throw packets because of MTU.
> 
>>
>> 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two > clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending.
>> Verfiifed my config files cluster.conf and cib.xml and both have same 
>> no of node entries (2)
>>
>>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall.
>> I also ran tests with firewall off too on both the participating 
>> nodes, still see same issue
>>
>> In corosync log I see repeated set of these messages, hoping these will give some more pointers.
>>
>> Oct 29 22:11:02 corosync [SYNC  ] Committing synchronization for 
>> (corosync cluster closed process group service v1.01) Oct 29 22:11:02 corosync [MAIN  ] Completed service synchronization, ready to provide service.
>> Oct 29 22:11:02 corosync [TOTEM ] waiting_trans_ack changed to 0 Oct
>> 29 22:11:03 corosync [TOTEM ] entering GATHER state from 11.
>> Oct 29 22:11:03 corosync [TOTEM ] entering GATHER state from 10.
>> Oct 29 22:11:05 corosync [TOTEM ] entering GATHER state from 0.
> 
> This is just weird. What exact version of corosync are you running? Do you have latest Z stream?
> 
> Regards,
>    Honza
> 
>> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 
>> corosync [TOTEM ] Saving state aru 1b high seq received 1b Oct 29
>> 22:11:05 corosync [TOTEM ] Storing new sequence id for ring 51708 Oct
>> 29 22:11:05 corosync [TOTEM ] entering COMMIT state.
>> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 
>> corosync [TOTEM ] entering RECOVERY state.
>> Oct 29 22:11:05 corosync [TOTEM ] TRANS [0] member 172.28.0.64:
>> Oct 29 22:11:05 corosync [TOTEM ] TRANS [1] member 172.28.0.65:
>> Oct 29 22:11:05 corosync [TOTEM ] position [0] member 172.28.0.64:
>> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep
>> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 1b 
>> received flag 1 Oct 29 22:11:05 corosync [TOTEM ] position [1] member 172.28.0.65:
>> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep
>> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 1b 
>> received flag 1 Oct 29 22:11:05 corosync [TOTEM ] Did not need to originate any messages in recovery.
>> Oct 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set 
>> retrans flag0 retrans queue empty 1 count 0, aru ffffffff Oct 29
>> 22:11:05 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Oct
>> 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set retrans
>> flag0 retrans queue empty 1 count 1, aru 0 Oct 29 22:11:05 corosync 
>> [TOTEM ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 
>> corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans 
>> queue empty 1 count 2, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install 
>> seq 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] 
>> token retrans flag is 0 my set retrans flag0 retrans queue empty 1 
>> count 3, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install seq 0 aru 0 
>> high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] retrans flag 
>> count 4 token aru 0 install seq 0 aru 0 0 Oct 29 22:11:05 corosync 
>> [TOTEM ] Resetting old ring state Oct 29 22:11:05 corosync [TOTEM ] 
>> recovery to regular 1-0 Oct 29 22:11:05 corosync [CMAN  ] ais:
>> confchg_fn called type = 1, seq=333576 Oct 29 22:11:05 corosync [TOTEM 
>> ] waiting_trans_ack changed to 1 Oct 29 22:11:05 corosync [CMAN  ]
>> ais: confchg_fn called type = 0, seq=333576 Oct 29 22:11:05 corosync 
>> [CMAN  ] ais: last memb_count = 2, current = 2 Oct 29 22:11:05 
>> corosync [CMAN  ] memb: sending TRANSITION message. cluster_name = vsomcluster Oct 29 22:11:05 corosync [CMAN  ] ais: comms send message 0x7fff8185ca00 len = 65 Oct 29 22:11:05 corosync [CMAN  ] daemon: sending reply 103 to fd 24 Oct 29 22:11:05 corosync [CMAN  ] daemon: sending reply 103 to fd 34 Oct 29 22:11:05 corosync [SYNC  ] This node is within the primary component and will provide service.
>> Oct 29 22:11:05 corosync [TOTEM ] entering OPERATIONAL state.
>> Oct 29 22:11:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Oct 29 22:11:05 corosync [CMAN  ] ais: deliver_fn source nodeid = 2, 
>> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN  ] memb: Message 
>> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN  ] memb: got TRANSITION 
>> from node 2 Oct 29 22:11:05 corosync [CMAN  ] memb: Got TRANSITION 
>> message. msg->flags=20, node->flags=20, first_trans=0 Oct 29 22:11:05 
>> corosync [CMAN  ] memb: add_ais_node ID=2, incarnation = 333576 Oct 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [CMAN  ] ais: deliver_fn source nodeid = 1, 
>> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN  ] memb: Message 
>> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN  ] memb: got TRANSITION 
>> from node 1 Oct 29 22:11:05 corosync [CMAN  ] Completed first 
>> transition with nodes on the same config versions Oct 29 22:11:05 
>> corosync [CMAN  ] memb: Got TRANSITION message. msg->flags=20,
>> node->flags=20, first_trans=0 Oct 29 22:11:05 corosync [CMAN  ] memb: 
>> add_ais_node ID=1, incarnation = 333576 Oct 29 22:11:05 corosync [SYNC 
>> ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization actions starting for 
>> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC  ] confchg entries
>> 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier Start Received From 1 Oct
>> 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization 
>> actions starting for (dummy AMF service) Oct 29 22:11:05 corosync 
>> [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier 
>> Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy AMF service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization 
>> actions starting for (openais checkpoint service B.01.01) Oct 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier 
>> Start Received From 1 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (openais checkpoint service B.01.01) Oct 29 22:11:05 corosync [SYNC  ] 
>> Synchronization actions starting for (dummy EVT service) Oct 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy EVT service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization actions starting for (corosync cluster closed process group service v1.01)
>> Oct 29 22:11:05 corosync [CPG   ] got joinlist message from node 1
>> Oct 29 22:11:05 corosync [CPG   ] comparing: sender r(0) ip(172.28.0.65) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] comparing: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] got joinlist message from node 2
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[0] group:crmd\x00, ip:r(0) ip(172.28.0.65) , pid:9198
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[1] group:attrd\x00, ip:r(0) ip(172.28.0.65) , pid:9196
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[2] group:stonith-ng\x00, ip:r(0) ip(172.28.0.65) , pid:9194
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[3] group:cib\x00, ip:r(0) ip(172.28.0.65) , pid:9193
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[4] group:pcmk\x00, ip:r(0) ip(172.28.0.65) , pid:9187
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[5] group:gfs:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9111
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[6] group:dlm:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9057
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[7] group:fenced:default\x00, ip:r(0) ip(172.28.0.65) , pid:9040
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[8] group:fenced:daemon\x00, ip:r(0) ip(172.28.0.65) , pid:9040
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[9] group:crmd\x00, ip:r(0) ip(172.28.0.64) , pid:14530
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (corosync cluster closed process group service v1.01) Oct 29 22:11:05 corosync [MAIN  ] Completed service synchronization, ready to provide service.
>>
>> Thanks
>> Lax
>>
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse
>> Sent: Thursday, October 30, 2014 1:23 AM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>
>>>
>>>> On 30 Oct 2014, at 9:32 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>
>>>>
>>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>>> How to check  cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue.
>>>>
>>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in.
>>>> Ok.
>>>>
>>>>>>
>>>>>> Also one more issue I am seeing in one other setup a repeated 
>>>>>> flood of 'A processor joined or left the membership and a new 
>>>>>> membership was formed' messages for every 4secs. I am running with 
>>>>>> default TOTEM settings with token time out as 10 secs. Even after 
>>>>>> I increase the token, consensus values to be higher. It goes on 
>>>>>> flooding the same message after newer consensus defined time (eg:
>>>>>> if I increase it to be 10secs, then I see new membership formed 
>>>>>> messages for every 10secs)
>>>>>>
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>>>
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>
>>>>> It does not sound like your network is particularly healthy.
>>>>> Are you using multicast or udpu? If multicast, it might be worth 
>>>>> trying udpu
>>>>
>>>> I am using udpu and I also have firewall opened for ports 5404 & 5405. Tcpdump looks fine too, it does not complain of any issues. This is a VM envirornment and even if I switch to other node within same VM I keep getting same failure.
>>>
>>> Depending on what the host and VMs are doing, that might be your problem.
>>> In any case, I will defer to the corosync guys at this point.
>>>
>>
>> Lax,
>> usual reasons for this problem:
>> 1. mtu is too high and fragmented packets are not enabled (take a look to netmtu configuration option) 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending.
>>
>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall.
>>
>> Regards,
>>    Honza
>>
>>
>>
>>>>
>>>> Thanks
>>>> Lax
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces at redhat.com 
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew 
>>>> Beekhof
>>>> Sent: Wednesday, October 29, 2014 3:17 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>>>
>>>>
>>>>> On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>>
>>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>> How to check  cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue.
>>>>
>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in.
>>>>
>>>>>
>>>>> Also one more issue I am seeing in one other setup a repeated flood 
>>>>> of 'A processor joined or left the membership and a new membership 
>>>>> was formed' messages for every 4secs. I am running with default 
>>>>> TOTEM settings with token time out as 10 secs. Even after I 
>>>>> increase the token, consensus values to be higher. It goes on 
>>>>> flooding the same message after newer consensus defined time (eg:
>>>>> if I increase it to be 10secs, then I see new membership formed 
>>>>> messages for every
>>>>> 10secs)
>>>>>
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>>
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>
>>>> It does not sound like your network is particularly healthy.
>>>> Are you using multicast or udpu? If multicast, it might be worth 
>>>> trying udpu
>>>>
>>>>>
>>>>> Thanks
>>>>> Lax
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: linux-cluster-bounces at redhat.com 
>>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew 
>>>>> Beekhof
>>>>> Sent: Wednesday, October 29, 2014 2:42 PM
>>>>> To: linux clustering
>>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>>>>
>>>>>
>>>>>> On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon cpg_join error  retrying'. I have a 2 Node setup with pacemaker and corosync.
>>>>>
>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>>
>>>>>>
>>>>>> Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is  there any way to fix this issue?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Lax
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


From lkota at cisco.com  Mon Nov  3 18:26:02 2014
From: lkota at cisco.com (Lax Kota (lkota))
Date: Mon, 3 Nov 2014 18:26:02 +0000
Subject: [Linux-cluster] daemon cpg_join error retrying
In-Reply-To: <545744C1.4080007@redhat.com>
References: <D8FF83C6AA1D214B9F7B36281C06366713D72E18@xmb-rcd-x09.cisco.com>
	<A2E74581-3795-472A-8E27-B18DE5E6E4C1@beekhof.net>
	<D8FF83C6AA1D214B9F7B36281C06366713D72E64@xmb-rcd-x09.cisco.com>
	<E7C2C0CC-7DC3-45D8-BBD3-0D0356B63628@beekhof.net>
	<D8FF83C6AA1D214B9F7B36281C06366713D72F9A@xmb-rcd-x09.cisco.com>
	<68ABE774-8755-416F-829B-CED002B14D03@beekhof.net>
	<5451F581.5050100@redhat.com>
	<D8FF83C6AA1D214B9F7B36281C06366713D7339B@xmb-rcd-x09.cisco.com>
	<5453BC31.2000102@redhat.com>
	<D8FF83C6AA1D214B9F7B36281C06366713D7388D@xmb-rcd-x09.cisco.com>
	<545744C1.4080007@redhat.com>
Message-ID: <D8FF83C6AA1D214B9F7B36281C06366713D7402B@xmb-rcd-x09.cisco.com>

Hi Honza,

I am running on the packaged version from RHEL 6.4. The exact version is 'corosync-1.4.1-15'

Thanks
Lax

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse
Sent: Monday, November 03, 2014 1:03 AM
To: linux clustering
Subject: Re: [Linux-cluster] daemon cpg_join error retrying

Lax,

> 
> 
> 
>> This is just weird. What exact version of corosync are you running? Do you have latest Z stream?
> I am running  on Corosync 1.4.1 and pacemaker version is 1.1.8-7.el6

Are you running package version (like RHEL/CentOS) or did you compiled package by yourself? If package version, can you please send exact version (like 1.4.1-17.1)?

> How should I get access to Z stream? Is there a specific dir I should pick this z stream from?

For RHEL you are subscribed to RHN, so you should get it automatically, for CentOS, you should get it automatically.


Regards,
  Honza

> 
> Thanks
> Lax
> 
> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse
> Sent: Friday, October 31, 2014 9:43 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
> 
> Lax,
> 
> 
>> Thanks Honza. Here is what I was doing,
>>
>>> usual reasons for this problem:
>>> 1. mtu is too high and fragmented packets are not enabled (take a 
>>> look to netmtu configuration option)
>> I am running with default mtu settings which is 1500. And I do see my interface(eth1) on the box does have MTU as 1500 too.
>>
> 
> Keep in mind that if they are not directly connected, switch can throw packets because of MTU.
> 
>>
>> 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two > clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending.
>> Verfiifed my config files cluster.conf and cib.xml and both have same 
>> no of node entries (2)
>>
>>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall.
>> I also ran tests with firewall off too on both the participating 
>> nodes, still see same issue
>>
>> In corosync log I see repeated set of these messages, hoping these will give some more pointers.
>>
>> Oct 29 22:11:02 corosync [SYNC  ] Committing synchronization for 
>> (corosync cluster closed process group service v1.01) Oct 29 22:11:02 corosync [MAIN  ] Completed service synchronization, ready to provide service.
>> Oct 29 22:11:02 corosync [TOTEM ] waiting_trans_ack changed to 0 Oct
>> 29 22:11:03 corosync [TOTEM ] entering GATHER state from 11.
>> Oct 29 22:11:03 corosync [TOTEM ] entering GATHER state from 10.
>> Oct 29 22:11:05 corosync [TOTEM ] entering GATHER state from 0.
> 
> This is just weird. What exact version of corosync are you running? Do you have latest Z stream?
> 
> Regards,
>    Honza
> 
>> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 
>> corosync [TOTEM ] Saving state aru 1b high seq received 1b Oct 29
>> 22:11:05 corosync [TOTEM ] Storing new sequence id for ring 51708 Oct
>> 29 22:11:05 corosync [TOTEM ] entering COMMIT state.
>> Oct 29 22:11:05 corosync [TOTEM ] got commit token Oct 29 22:11:05 
>> corosync [TOTEM ] entering RECOVERY state.
>> Oct 29 22:11:05 corosync [TOTEM ] TRANS [0] member 172.28.0.64:
>> Oct 29 22:11:05 corosync [TOTEM ] TRANS [1] member 172.28.0.65:
>> Oct 29 22:11:05 corosync [TOTEM ] position [0] member 172.28.0.64:
>> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep
>> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 
>> 1b received flag 1 Oct 29 22:11:05 corosync [TOTEM ] position [1] member 172.28.0.65:
>> Oct 29 22:11:05 corosync [TOTEM ] previous ring seq 333572 rep
>> 172.28.0.64 Oct 29 22:11:05 corosync [TOTEM ] aru 1b high delivered 
>> 1b received flag 1 Oct 29 22:11:05 corosync [TOTEM ] Did not need to originate any messages in recovery.
>> Oct 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set 
>> retrans flag0 retrans queue empty 1 count 0, aru ffffffff Oct 29
>> 22:11:05 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 
>> Oct
>> 29 22:11:05 corosync [TOTEM ] token retrans flag is 0 my set retrans
>> flag0 retrans queue empty 1 count 1, aru 0 Oct 29 22:11:05 corosync 
>> [TOTEM ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 
>> corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 
>> retrans queue empty 1 count 2, aru 0 Oct 29 22:11:05 corosync [TOTEM 
>> ] install seq 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync 
>> [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue 
>> empty 1 count 3, aru 0 Oct 29 22:11:05 corosync [TOTEM ] install seq 
>> 0 aru 0 high seq received 0 Oct 29 22:11:05 corosync [TOTEM ] retrans 
>> flag count 4 token aru 0 install seq 0 aru 0 0 Oct 29 22:11:05 
>> corosync [TOTEM ] Resetting old ring state Oct 29 22:11:05 corosync 
>> [TOTEM ] recovery to regular 1-0 Oct 29 22:11:05 corosync [CMAN  ] ais:
>> confchg_fn called type = 1, seq=333576 Oct 29 22:11:05 corosync 
>> [TOTEM ] waiting_trans_ack changed to 1 Oct 29 22:11:05 corosync 
>> [CMAN  ]
>> ais: confchg_fn called type = 0, seq=333576 Oct 29 22:11:05 corosync 
>> [CMAN  ] ais: last memb_count = 2, current = 2 Oct 29 22:11:05 
>> corosync [CMAN  ] memb: sending TRANSITION message. cluster_name = vsomcluster Oct 29 22:11:05 corosync [CMAN  ] ais: comms send message 0x7fff8185ca00 len = 65 Oct 29 22:11:05 corosync [CMAN  ] daemon: sending reply 103 to fd 24 Oct 29 22:11:05 corosync [CMAN  ] daemon: sending reply 103 to fd 34 Oct 29 22:11:05 corosync [SYNC  ] This node is within the primary component and will provide service.
>> Oct 29 22:11:05 corosync [TOTEM ] entering OPERATIONAL state.
>> Oct 29 22:11:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
>> Oct 29 22:11:05 corosync [CMAN  ] ais: deliver_fn source nodeid = 2, 
>> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN  ] memb: Message 
>> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN  ] memb: got TRANSITION 
>> from node 2 Oct 29 22:11:05 corosync [CMAN  ] memb: Got TRANSITION 
>> message. msg->flags=20, node->flags=20, first_trans=0 Oct 29 22:11:05 
>> corosync [CMAN  ] memb: add_ais_node ID=2, incarnation = 333576 Oct 
>> 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [CMAN  ] ais: deliver_fn source nodeid = 1, 
>> len=81, endian_conv=0 Oct 29 22:11:05 corosync [CMAN  ] memb: Message 
>> on port 0 is 5 Oct 29 22:11:05 corosync [CMAN  ] memb: got TRANSITION 
>> from node 1 Oct 29 22:11:05 corosync [CMAN  ] Completed first 
>> transition with nodes on the same config versions Oct 29 22:11:05 
>> corosync [CMAN  ] memb: Got TRANSITION message. msg->flags=20,
>> node->flags=20, first_trans=0 Oct 29 22:11:05 corosync [CMAN  ] memb: 
>> add_ais_node ID=1, incarnation = 333576 Oct 29 22:11:05 corosync 
>> [SYNC ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization actions starting 
>> for (dummy CLM service) Oct 29 22:11:05 corosync [SYNC  ] confchg 
>> entries
>> 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier Start Received From 1 Oct
>> 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy CLM service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization 
>> actions starting for (dummy AMF service) Oct 29 22:11:05 corosync 
>> [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier 
>> Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy AMF service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization 
>> actions starting for (openais checkpoint service B.01.01) Oct 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier 
>> Start Received From 1 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (openais checkpoint service B.01.01) Oct 29 22:11:05 corosync [SYNC  
>> ] Synchronization actions starting for (dummy EVT service) Oct 29
>> 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 corosync 
>> [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 1 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed 
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (dummy EVT service) Oct 29 22:11:05 corosync [SYNC  ] Synchronization actions starting for (corosync cluster closed process group service v1.01)
>> Oct 29 22:11:05 corosync [CPG   ] got joinlist message from node 1
>> Oct 29 22:11:05 corosync [CPG   ] comparing: sender r(0) ip(172.28.0.65) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] comparing: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>> Oct 29 22:11:05 corosync [CPG   ] got joinlist message from node 2
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 1 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 0.
>> Oct 29 22:11:05 corosync [SYNC  ] confchg entries 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier Start Received From 2 Oct 29 22:11:05 
>> corosync [SYNC  ] Barrier completion status for nodeid 1 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Barrier completion status for nodeid 2 = 1.
>> Oct 29 22:11:05 corosync [SYNC  ] Synchronization barrier completed
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[0] group:crmd\x00, ip:r(0) ip(172.28.0.65) , pid:9198
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[1] group:attrd\x00, ip:r(0) ip(172.28.0.65) , pid:9196
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[2] group:stonith-ng\x00, ip:r(0) ip(172.28.0.65) , pid:9194
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[3] group:cib\x00, ip:r(0) ip(172.28.0.65) , pid:9193
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[4] group:pcmk\x00, ip:r(0) ip(172.28.0.65) , pid:9187
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[5] group:gfs:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9111
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[6] group:dlm:controld\x00, ip:r(0) ip(172.28.0.65) , pid:9057
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[7] group:fenced:default\x00, ip:r(0) ip(172.28.0.65) , pid:9040
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[8] group:fenced:daemon\x00, ip:r(0) ip(172.28.0.65) , pid:9040
>> Oct 29 22:11:05 corosync [CPG   ] joinlist_messages[9] group:crmd\x00, ip:r(0) ip(172.28.0.64) , pid:14530
>> Oct 29 22:11:05 corosync [SYNC  ] Committing synchronization for 
>> (corosync cluster closed process group service v1.01) Oct 29 22:11:05 corosync [MAIN  ] Completed service synchronization, ready to provide service.
>>
>> Thanks
>> Lax
>>
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jan Friesse
>> Sent: Thursday, October 30, 2014 1:23 AM
>> To: linux clustering
>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>
>>>
>>>> On 30 Oct 2014, at 9:32 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>
>>>>
>>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>>> How to check  cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue.
>>>>
>>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in.
>>>> Ok.
>>>>
>>>>>>
>>>>>> Also one more issue I am seeing in one other setup a repeated 
>>>>>> flood of 'A processor joined or left the membership and a new 
>>>>>> membership was formed' messages for every 4secs. I am running 
>>>>>> with default TOTEM settings with token time out as 10 secs. Even 
>>>>>> after I increase the token, consensus values to be higher. It 
>>>>>> goes on flooding the same message after newer consensus defined time (eg:
>>>>>> if I increase it to be 10secs, then I see new membership formed 
>>>>>> messages for every 10secs)
>>>>>>
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>>>
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>
>>>>> It does not sound like your network is particularly healthy.
>>>>> Are you using multicast or udpu? If multicast, it might be worth 
>>>>> trying udpu
>>>>
>>>> I am using udpu and I also have firewall opened for ports 5404 & 5405. Tcpdump looks fine too, it does not complain of any issues. This is a VM envirornment and even if I switch to other node within same VM I keep getting same failure.
>>>
>>> Depending on what the host and VMs are doing, that might be your problem.
>>> In any case, I will defer to the corosync guys at this point.
>>>
>>
>> Lax,
>> usual reasons for this problem:
>> 1. mtu is too high and fragmented packets are not enabled (take a look to netmtu configuration option) 2. config file on nodes are not in sync and one node may contain more node entries then other nodes (this may be also the case if you have two clusters and one cluster contains entry of one node for other cluster) 3. firewall is asymmetrically blocked (so node can send but not receive). Also keep in mind that ports 5404 & 5405 may not be enough for udpu, because udpu uses one socket per remote node for sending.
>>
>> I would recommend to disable firewall completely (for testing) and if everything will work, you just need to adjust firewall.
>>
>> Regards,
>>    Honza
>>
>>
>>
>>>>
>>>> Thanks
>>>> Lax
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-bounces at redhat.com 
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew 
>>>> Beekhof
>>>> Sent: Wednesday, October 29, 2014 3:17 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>>>
>>>>
>>>>> On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>>
>>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>> How to check  cluster name of GFS file system? I had similar configuration running fine in multiple other setups with no such issue.
>>>>
>>>> I don't really recall. Hopefully someone more familiar with GFS2 can chime in.
>>>>
>>>>>
>>>>> Also one more issue I am seeing in one other setup a repeated 
>>>>> flood of 'A processor joined or left the membership and a new 
>>>>> membership was formed' messages for every 4secs. I am running with 
>>>>> default TOTEM settings with token time out as 10 secs. Even after 
>>>>> I increase the token, consensus values to be higher. It goes on 
>>>>> flooding the same message after newer consensus defined time (eg:
>>>>> if I increase it to be 10secs, then I see new membership formed 
>>>>> messages for every
>>>>> 10secs)
>>>>>
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>> Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>>
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
>>>>> Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service synchronization, ready to provide service.
>>>>
>>>> It does not sound like your network is particularly healthy.
>>>> Are you using multicast or udpu? If multicast, it might be worth 
>>>> trying udpu
>>>>
>>>>>
>>>>> Thanks
>>>>> Lax
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: linux-cluster-bounces at redhat.com 
>>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Andrew 
>>>>> Beekhof
>>>>> Sent: Wednesday, October 29, 2014 2:42 PM
>>>>> To: linux clustering
>>>>> Subject: Re: [Linux-cluster] daemon cpg_join error retrying
>>>>>
>>>>>
>>>>>> On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) <lkota at cisco.com> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon cpg_join error  retrying'. I have a 2 Node setup with pacemaker and corosync.
>>>>>
>>>>> I wonder if there is a mismatch between the cluster name in cluster.conf and the cluster name the GFS filesystem was created with.
>>>>>
>>>>>>
>>>>>> Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is  there any way to fix this issue?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Lax
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From lars.ellenberg at linbit.com  Wed Nov  5 15:16:58 2014
From: lars.ellenberg at linbit.com (Lars Ellenberg)
Date: Wed, 5 Nov 2014 16:16:58 +0100
Subject: [Linux-cluster] [ha-wg-technical] [Linux-HA] [ha-wg] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <54546D67.6010606@alteeve.ca>
References: <540D853F.3090109@redhat.com> <54546A5F.8030207@redhat.com>
	<54546D67.6010606@alteeve.ca>
Message-ID: <20141105151658.GY20549@soda.linbit>

On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote:
> All the cool kids will be there.
> 
> You want to be a cool kid, right?

Well, no. ;-)

But I'll still be there,
and a few other Linbit'ers as well.

Fabio, let us know what we could do to help make it happen.

	Lars

> On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote:
> > just a kind reminder.
> >
> >On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
> >> All,
> >> 
> >> it's been almost 6 years since we had a face to face meeting for all
> >> developers and vendors involved in Linux HA.
> >> 
> >> I'd like to try and organize a new event and piggy-back with DevConf in
> >> Brno [1].
> >> 
> >> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
> >> 
> >> My suggestion would be to have a 2 days dedicated HA summit the 4th and
> >> the 5th of February.
> >> 
> >> The goal for this meeting is to, beside to get to know each other and
> >> all social aspect of those events, tune the directions of the various HA
> >> projects and explore common areas of improvements.
> >> 
> >> I am also very open to the idea of extending to 3 days, 1 one dedicated
> >> to customers/users and 2 dedicated to developers, by starting the 3rd.
> >> 
> >> Thoughts?
> >> 
> >> Fabio
> >> 
> >> PS Please hit reply all or include me in CC just to make sure I'll see
> >> an answer :)
> >> 
> >> [1] http://devconf.cz/
> >
> > Could you please let me know by end of Nov if you are interested or not?
> >
> > I have heard only from few people so far.
> >
> > Cheers
> > Fabio


From fdinitto at redhat.com  Tue Nov 11 08:17:56 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 11 Nov 2014 09:17:56 +0100
Subject: [Linux-cluster] [ha-wg] [ha-wg-technical] [Linux-HA] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <20141105151658.GY20549@soda.linbit>
References: <540D853F.3090109@redhat.com>
	<54546A5F.8030207@redhat.com>	<54546D67.6010606@alteeve.ca>
	<20141105151658.GY20549@soda.linbit>
Message-ID: <5461C634.5000503@redhat.com>


On 11/5/2014 4:16 PM, Lars Ellenberg wrote:
> On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote:
>> All the cool kids will be there.
>>
>> You want to be a cool kid, right?
> 
> Well, no. ;-)
> 
> But I'll still be there,
> and a few other Linbit'ers as well.
> 
> Fabio, let us know what we could do to help make it happen.
> 

I appreciate the offer.

Assuming we achieve quorum to do the event, I?d say that I?ll take of
the meeting rooms/hotel logistics and one "lunch and learn" pizza event.
It would be nice if others could organize a dinner event.

Cheers
Fabio


> 	Lars
> 
>> On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote:
>>> just a kind reminder.
>>>
>>> On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
>>>> All,
>>>>
>>>> it's been almost 6 years since we had a face to face meeting for all
>>>> developers and vendors involved in Linux HA.
>>>>
>>>> I'd like to try and organize a new event and piggy-back with DevConf in
>>>> Brno [1].
>>>>
>>>> DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
>>>>
>>>> My suggestion would be to have a 2 days dedicated HA summit the 4th and
>>>> the 5th of February.
>>>>
>>>> The goal for this meeting is to, beside to get to know each other and
>>>> all social aspect of those events, tune the directions of the various HA
>>>> projects and explore common areas of improvements.
>>>>
>>>> I am also very open to the idea of extending to 3 days, 1 one dedicated
>>>> to customers/users and 2 dedicated to developers, by starting the 3rd.
>>>>
>>>> Thoughts?
>>>>
>>>> Fabio
>>>>
>>>> PS Please hit reply all or include me in CC just to make sure I'll see
>>>> an answer :)
>>>>
>>>> [1] http://devconf.cz/
>>>
>>> Could you please let me know by end of Nov if you are interested or not?
>>>
>>> I have heard only from few people so far.
>>>
>>> Cheers
>>> Fabio
> _______________________________________________
> ha-wg mailing list
> ha-wg at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg
> 


From loulou07 at 126.com  Wed Nov 12 08:20:42 2014
From: loulou07 at 126.com (=?GBK?B?s8LCpQ==?=)
Date: Wed, 12 Nov 2014 16:20:42 +0800 (CST)
Subject: [Linux-cluster] GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata
	block
Message-ID: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com>

hi ,guys
I have a two-nodes GFS2 cluster based on  logic volume created by drbd block device /dev/drbd0. The two nodes' mount points of  GFS2 filesystem are exported by samba share. Then there are two clients mounting and copying data into them respectively. Hours later, one client(assume just call it clientA) has finished all tasks, while the other client(assume just call it clientB) is still copying with very slow write speed(2-3MB/s, in normal case 40-100MB/s). 
Then I doubt that the there is something wrong with gfs2 filesystem on the corresponding server node that clientB mount to, and I try to write some data into it by 
excute commad as follows:  
[root at dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000
1000+0 records in
1000+0 records out
131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s
It shows the write speed is too slow,  almostly hangs up. I redo it once again, it hangs up. Then, I terminate it with ?Ctr + c?, and kernel reports error messages as
follows:
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid metadata block
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   bh = 25 (magic number)
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to acquire journal lock...
Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1
Nov 12 11:50:11 dcs-229 kernel: Call Trace:
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ? gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ? inode_go_lock+0x88/0xf0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ? finish_xmote+0x178/0x410 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ? glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ? glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed
And the other node also reports error messages:
Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted 2.6.32-358.el6.x86_64 #1
Nov 12 11:48:50 dcs-226 kernel: Call Trace:
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ? gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ? gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ? gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ? inode_go_lock+0x88/0xf0 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid metadata block
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   bh = 66213 (magic number)
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw this file system
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to unmount
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ? finish_xmote+0x178/0x410 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ? glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ? glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
After this, mount points has crashed. what should i do? Anyone could help me?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141112/2fab2e2e/attachment.htm>

From pradiptasingha at yahoo.com  Wed Nov 12 12:10:28 2014
From: pradiptasingha at yahoo.com (Pradipta Singha)
Date: Wed, 12 Nov 2014 04:10:28 -0800
Subject: [Linux-cluster] Deployment of Redhat cluster setup 6 to provide HA
	to oracle 11g R2
Message-ID: <1415794228.40474.YahooMailNeo@web161705.mail.bf1.yahoo.com>

Hi Team,

I have to setup 2 node  Redhat cluster 6  to provide HA to oracle 11g R2 database with two instance.Kindly help me to setup the cluster.


Below  shared file system (shared in both node ) are for data file. 


/dev/mapper/vg1-lv3                   gfs2   250G  2.2G  248G   1% /u01
/dev/mapper/vg1-lv4                   gfs2   175G  268M  175G   1% /u02
/dev/mapper/vg1-lv5                   gfs2    25G  259M   25G   2% /u03
/dev/mapper/vg1-lv6                   gfs2    25G  259M   25G   2% /u04
/dev/mapper/vg1-lv7                   gfs2    25G  259M   25G   2% /u05
/dev/mapper/vg1-lv8                   gfs2   300G  259M  300G   1% /u06
/dev/mapper/vg1-lv9                   gfs2   300G  1.8G  299G   1% /u07


And below  local file system (both are local to both the node) are  for database binary on both node-
/dev/mapper/vg2-lv1_oracle            ext4    99G  4.5G   89G   5% /oracle -> one instance for oracle database 


/dev/mapper/vg2-lv2_orafmw            ext4    99G   60M   94G   1% /orafmw -> another for application instance

Note-Two instance will run one for oracle database and another for application.


Thankspradipta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141112/e1efe25f/attachment.htm>

From rpeterso at redhat.com  Wed Nov 12 13:20:38 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 12 Nov 2014 08:20:38 -0500 (EST)
Subject: [Linux-cluster] GFS2: fsid=MyCluster:gfs.1: fatal: invalid
 metadata	block
In-Reply-To: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com>
References: <5251064a.25cb9.149a31720b2.Coremail.loulou07@126.com>
Message-ID: <1613629069.11759385.1415798438904.JavaMail.zimbra@redhat.com>

----- Original Message -----
> hi ,guys
> I have a two-nodes GFS2 cluster based on  logic volume created by drbd block
> device /dev/drbd0. The two nodes' mount points of  GFS2 filesystem are
> exported by samba share. Then there are two clients mounting and copying
> data into them respectively. Hours later, one client(assume just call it
> clientA) has finished all tasks, while the other client(assume just call it
> clientB) is still copying with very slow write speed(2-3MB/s, in normal case
> 40-100MB/s).
> Then I doubt that the there is something wrong with gfs2 filesystem on the
> corresponding server node that clientB mount to, and I try to write some
> data into it by
> excute commad as follows:
> [root at dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000
> 1000+0 records in
> 1000+0 records out
> 131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s
> It shows the write speed is too slow,  almostly hangs up. I redo it once
> again, it hangs up. Then, I terminate it with ?Ctr + c?, and kernel reports
> error messages as
> follows:
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid
> metadata block
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   bh = 25 (magic
> number)
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1:   function =
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to
> acquire journal lock...
> Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted
> 2.6.32-358.el6.x86_64 #1
> Nov 12 11:50:11 dcs-229 kernel: Call Trace:
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ?
> gfs2_lm_withdraw+0x102/0x130 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ?
> wake_bit_function+0x0/0x50
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ?
> gfs2_meta_check_ii+0x45/0x50 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ?
> gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ?
> perf_event_task_sched_out+0x33/0x80
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ?
> gfs2_inode_refresh+0x25/0x2c0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ?
> inode_go_lock+0x88/0xf0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330
> [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ?
> finish_xmote+0x178/0x410 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ?
> glock_work_func+0x133/0x1d0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ?
> glock_work_func+0x0/0x1d0 [gfs2]
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ?
> worker_thread+0x170/0x2a0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ?
> autoremove_wake_function+0x0/0x40
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ?
> worker_thread+0x0/0x2a0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
> Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed
> And the other node also reports error messages:
> Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted
> 2.6.32-358.el6.x86_64 #1
> Nov 12 11:48:50 dcs-226 kernel: Call Trace:
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ?
> gfs2_lm_withdraw+0x102/0x130 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ?
> wake_bit_function+0x0/0x50
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ?
> gfs2_meta_check_ii+0x45/0x50 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ?
> gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ?
> perf_event_task_sched_out+0x33/0x80
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ?
> gfs2_inode_refresh+0x25/0x2c0 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ?
> inode_go_lock+0x88/0xf0 [gfs2]
> Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid
> metadata block
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   bh = 66213
> (magic number)
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0:   function =
> gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw
> this file system
> Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to
> unmount
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330
> [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ?
> finish_xmote+0x178/0x410 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ?
> glock_work_func+0x133/0x1d0 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ?
> glock_work_func+0x0/0x1d0 [gfs2]
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ?
> worker_thread+0x170/0x2a0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ?
> autoremove_wake_function+0x0/0x40
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ?
> worker_thread+0x0/0x2a0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
> Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> After this, mount points has crashed. what should i do? Anyone could help me?

Hi,

I recommend you open a support case with Red Hat. If you're not a Red Hat
customer, you can open a bugzilla record, save off the metadata for that
file system (with gfs2_edit savemeta) and post a link to it in the bugzilla.
The hang and the assert should not happen. 

Regards,

Bob Peterson
Red Hat File Systems


From fdinitto at redhat.com  Mon Nov 24 14:54:33 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 24 Nov 2014 15:54:33 +0100
Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015
In-Reply-To: <20141124143957.GU2508@suse.de>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
Message-ID: <547346A9.6010901@redhat.com>


On 11/24/2014 3:39 PM, Lars Marowsky-Bree wrote:
> On 2014-09-08T12:30:23, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
> 
> Folks, Fabio,
> 
> thanks for organizing this and getting the ball rolling. And again sorry
> for being late to said game; I was busy elsewhere.
> 
> However, it seems that the idea for such a HA Summit in Brno/Feb 2015
> hasn't exactly fallen on fertile grounds, even with the suggested
> user/client day. (Or if there was a lot of feedback, it wasn't
> public.)
> 
> I wonder why that is, and if/how we can make this more attractive?
> 
> Frankly, as might have been obvious ;-), for me the venue is an issue.
> It's not easy to reach, and I'm theoretically fairly close in Germany
> already.
> 
> I wonder if we could increase participation with a virtual meeting (on
> either those dates or another), similar to what the Ceph Developer
> Summit does?
> 
> Those appear really productive and make it possible for a wide range of
> interested parties from all over the world to attend, regardless of
> travel times, or even just attend select sessions (that would otherwise
> make it hard to justify travel expenses & time off).
> 
> 
> Alternatively, would a relocation to a more connected venue help, such
> as Vienna xor Prague?
> 
> 
> I'd love to get some more feedback from the community.

I agree. some feedback would be useful.

> 
> As Fabio put it, yes, I *can* suck it up and go to Brno if that's where
> everyone goes to play ;-), but I'd also prefer to have a broader
> participation.

dates and location were chosen to piggy-back with devconf.cz and allow
people to travel for more than just HA Summit.

I?d prefer, at least for this round, to keep dates/location and explore
the option to allow people to join remotely. Afterall there are tons of
tools between google hangouts and others that would allow that.

Fabio


From lists at alteeve.ca  Mon Nov 24 15:06:45 2014
From: lists at alteeve.ca (Digimer)
Date: Mon, 24 Nov 2014 10:06:45 -0500
Subject: [Linux-cluster] [Pacemaker] [ha-wg] [RFC] Organizing HA Summit
	2015
In-Reply-To: <547346A9.6010901@redhat.com>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com>
Message-ID: <54734985.1060106@alteeve.ca>

On 24/11/14 09:54 AM, Fabio M. Di Nitto wrote:
> On 11/24/2014 3:39 PM, Lars Marowsky-Bree wrote:
>> On 2014-09-08T12:30:23, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
>>
>> Folks, Fabio,
>>
>> thanks for organizing this and getting the ball rolling. And again sorry
>> for being late to said game; I was busy elsewhere.
>>
>> However, it seems that the idea for such a HA Summit in Brno/Feb 2015
>> hasn't exactly fallen on fertile grounds, even with the suggested
>> user/client day. (Or if there was a lot of feedback, it wasn't
>> public.)
>>
>> I wonder why that is, and if/how we can make this more attractive?

I suspect a lot of it is that, given people's busy schedules, February 
seems far away. Also, I wonder how much discussion has happened outside 
of these lists. Is it really that there hasn't been much feedback?

Fabio started this ball rolling, so I would be interested to hear what 
he's heard.

>> Frankly, as might have been obvious ;-), for me the venue is an issue.
>> It's not easy to reach, and I'm theoretically fairly close in Germany
>> already.
>>
>> I wonder if we could increase participation with a virtual meeting (on
>> either those dates or another), similar to what the Ceph Developer
>> Summit does?

Requested feedback given;

Virtual meetings are never as good, and I really don't like this idea. 
In my experience, just as much productive decision making happens in the 
unofficial after-hours activities as during formal(ish) 
meetings/presentations.

I think it is very important that the meeting remain in-person if at all 
possible.

>> Those appear really productive and make it possible for a wide range of
>> interested parties from all over the world to attend, regardless of
>> travel times, or even just attend select sessions (that would otherwise
>> make it hard to justify travel expenses & time off).
>>
>> Alternatively, would a relocation to a more connected venue help, such
>> as Vienna xor Prague?

Personally, I don't care where we meet, but I do believe Fabio already 
ruled out a relocation.

>> I'd love to get some more feedback from the community.
>
> I agree. some feedback would be useful.

<digimer puts on her flame-retardant pantaloons and waits for the worst...>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Mon Nov 24 15:14:26 2014
From: lists at alteeve.ca (Digimer)
Date: Mon, 24 Nov 2014 10:14:26 -0500
Subject: [Linux-cluster] [Pacemaker] [ha-wg] [RFC] Organizing HA Summit
	2015
In-Reply-To: <20141124151235.GX2508@suse.de>
References: <540D853F.3090109@redhat.com>
	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>
	<20141124151235.GX2508@suse.de>
Message-ID: <54734B52.50708@alteeve.ca>

On 24/11/14 10:12 AM, Lars Marowsky-Bree wrote:
> Beijing, the US, Tasmania (OK, one crazy guy), various countries in

Oh, bring him! crazy++

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From fdinitto at redhat.com  Mon Nov 24 15:16:05 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 24 Nov 2014 16:16:05 +0100
Subject: [Linux-cluster] [ha-wg] [RFC] Organizing HA Summit 2015
In-Reply-To: <20141124151235.GX2508@suse.de>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
Message-ID: <54734BB5.3010104@redhat.com>


On 11/24/2014 4:12 PM, Lars Marowsky-Bree wrote:
> On 2014-11-24T15:54:33, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
> 
>> dates and location were chosen to piggy-back with devconf.cz and allow
>> people to travel for more than just HA Summit.
> 
> Yeah, well, devconf.cz is not such an interesting event for those who do
> not wear the fedora ;-)

That would be the perfect opportunity for you to convert users to Suse ;)

> 
>> I?d prefer, at least for this round, to keep dates/location and explore
>> the option to allow people to join remotely. Afterall there are tons of
>> tools between google hangouts and others that would allow that.
> 
> That is, in my experience, the absolute worst. It creates second class
> participants and is a PITA for everyone.

I agree, it is still a way for people to join in tho.

> 
> I know that an in-person meeting is useful, but we have a large team in
> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
> Europe etc.
> 

Yes same here. No difference.. we have one crazy guy in Australia..

Fabio


From andrew at beekhof.net  Mon Nov 24 21:31:33 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Tue, 25 Nov 2014 08:31:33 +1100
Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [RFC] Organizing HA
	Summit 2015
In-Reply-To: <20141124151235.GX2508@suse.de>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
Message-ID: <F3198551-3284-4E4A-BF0B-14AD3F41014A@beekhof.net>


> On 25 Nov 2014, at 2:12 am, Lars Marowsky-Bree <lmb at suse.com> wrote:
> 
> On 2014-11-24T15:54:33, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
> 
>> dates and location were chosen to piggy-back with devconf.cz and allow
>> people to travel for more than just HA Summit.
> 
> Yeah, well, devconf.cz is not such an interesting event for those who do
> not wear the fedora ;-)

Its not necessarily the conference of choice even for people that do.
I just do what I'm told :)

> 
>> I?d prefer, at least for this round, to keep dates/location and explore
>> the option to allow people to join remotely. Afterall there are tons of
>> tools between google hangouts and others that would allow that.
> 
> That is, in my experience, the absolute worst. It creates second class
> participants and is a PITA for everyone.
> 
> I know that an in-person meeting is useful, but we have a large team in
> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
> Europe etc.
> 
> 
> Regards,
>    Lars
> 
> -- 
> Architect Storage/HA
> SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> _______________________________________________
> ha-wg-technical mailing list
> ha-wg-technical at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical


From andrew at beekhof.net  Tue Nov 25 21:31:02 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Wed, 26 Nov 2014 08:31:02 +1100
Subject: [Linux-cluster] [Cluster-devel] [Linux-HA] [ha-wg] [RFC]
	Organizing HA Summit 2015
In-Reply-To: <20141125095401.GG2522@suse.de>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
Message-ID: <EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>


> On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree <lmb at suse.com> wrote:
> 
> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
> 
>>> Yeah, well, devconf.cz is not such an interesting event for those who do
>>> not wear the fedora ;-)
>> That would be the perfect opportunity for you to convert users to Suse ;)
> 
>>>> I?d prefer, at least for this round, to keep dates/location and explore
>>>> the option to allow people to join remotely. Afterall there are tons of
>>>> tools between google hangouts and others that would allow that.
>>> That is, in my experience, the absolute worst. It creates second class
>>> participants and is a PITA for everyone.
>> I agree, it is still a way for people to join in tho.
> 
> I personally disagree. In my experience, one either does a face-to-face
> meeting, or a virtual one that puts everyone on the same footing.
> Mixing both works really badly unless the team already knows each
> other.
> 
>>> I know that an in-person meeting is useful, but we have a large team in
>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
>>> Europe etc.
>> Yes same here. No difference.. we have one crazy guy in Australia..
> 
> Yeah, but you're already bringing him for your personal conference.
> That's a bit different. ;-)
> 
> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
> fill two days? Where would we want to collect them?

Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer.

Other design-y topics:
- SBD
- degraded mode
- improved notifications
- containerisation of services (cgroups, docker, virt)
- resource-agents (upstream releases, handling of pull requests, testing)

User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong.


From dvossel at redhat.com  Tue Nov 25 21:46:01 2014
From: dvossel at redhat.com (David Vossel)
Date: Tue, 25 Nov 2014 16:46:01 -0500 (EST)
Subject: [Linux-cluster] [Pacemaker] [Cluster-devel] [Linux-HA] [ha-wg]
	[RFC] Organizing	HA Summit 2015
In-Reply-To: <EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
Message-ID: <1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com>


----- Original Message -----
> 
> > On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree <lmb at suse.com> wrote:
> > 
> > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
> > 
> >>> Yeah, well, devconf.cz is not such an interesting event for those who do
> >>> not wear the fedora ;-)
> >> That would be the perfect opportunity for you to convert users to Suse ;)
> > 
> >>>> I?d prefer, at least for this round, to keep dates/location and explore
> >>>> the option to allow people to join remotely. Afterall there are tons of
> >>>> tools between google hangouts and others that would allow that.
> >>> That is, in my experience, the absolute worst. It creates second class
> >>> participants and is a PITA for everyone.
> >> I agree, it is still a way for people to join in tho.
> > 
> > I personally disagree. In my experience, one either does a face-to-face
> > meeting, or a virtual one that puts everyone on the same footing.
> > Mixing both works really badly unless the team already knows each
> > other.
> > 
> >>> I know that an in-person meeting is useful, but we have a large team in
> >>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
> >>> Europe etc.
> >> Yes same here. No difference.. we have one crazy guy in Australia..
> > 
> > Yeah, but you're already bringing him for your personal conference.
> > That's a bit different. ;-)
> > 
> > OK, let's switch tracks a bit. What *topics* do we actually have? Can we
> > fill two days? Where would we want to collect them?
> 
> Personally I'm interested in talking about scaling - with pacemaker-remoted
> and/or a new messaging/membership layer.

If we're going to talk about scaling, we should throw in our new docker support
in the same discussion. Docker lends itself well to the "pet vs cattle" analogy.
I see management of docker with pacemaker making quite a bit of sense now that we
have the ability to scale into the "cattle" territory.

> Other design-y topics:
> - SBD
> - degraded mode
> - improved notifications
> - containerisation of services (cgroups, docker, virt)
> - resource-agents (upstream releases, handling of pull requests, testing)

Yep, We definitely need to talk about the resource-agents.

> 
> User-facing topics could include recent features (ie. pacemaker-remoted,
> crm_resource --restart) and common deployment scenarios (eg. NFS) that
> people get wrong.

Adding to the list, it would be a good idea to talk about Deployment
integration testing, what's going on with the phd project and why it's
important regardless if you're interested in what the project functionally
does.

-- Vossel

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


From lists at alteeve.ca  Tue Nov 25 23:06:29 2014
From: lists at alteeve.ca (Digimer)
Date: Tue, 25 Nov 2014 18:06:29 -0500
Subject: [Linux-cluster] [ha-wg-technical] [Cluster-devel] [Linux-HA]
 [ha-wg] [RFC] Organizing HA Summit 2015
In-Reply-To: <EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
References: <540D853F.3090109@redhat.com>
	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>
	<20141124151235.GX2508@suse.de>	<54734BB5.3010104@redhat.com>
	<20141125095401.GG2522@suse.de>
	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
Message-ID: <54750B75.6040207@alteeve.ca>

On 25/11/14 04:31 PM, Andrew Beekhof wrote:
>> Yeah, but you're already bringing him for your personal conference.
>> That's a bit different. ;-)
>>
>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>> fill two days? Where would we want to collect them?
>
> Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer.
>
> Other design-y topics:
> - SBD
> - degraded mode
> - improved notifications

This my be something my company can bring to the table. We just hired a 
dev whose principle goal is to develop and alert system for HA. We're 
modelling it heavily on the fence/resource agent model with a "scan 
core" and "scan agents". It's sort of like existing tools, but designed 
specifically for HA clusters and heavily focused on not interfering with 
the host more than at all necessary. By Feb., it should be mostly done.

We're doing this for our own needs, but it might be a framework worth 
talking about, if nothing else to see if others consider it a fit. Of 
course, it will be entirely open source. *If* there is interest, I could 
put together a(n informal) talk on it with a demo.

> - containerisation of services (cgroups, docker, virt)
> - resource-agents (upstream releases, handling of pull requests, testing)
>
> User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong.


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From andrew at beekhof.net  Tue Nov 25 23:11:25 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Wed, 26 Nov 2014 10:11:25 +1100
Subject: [Linux-cluster] [ha-wg-technical] [Cluster-devel] [Linux-HA]
	[ha-wg] [RFC] Organizing HA Summit 2015
In-Reply-To: <54750B75.6040207@alteeve.ca>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
	<54750B75.6040207@alteeve.ca>
Message-ID: <44714D54-50F1-4817-9B9D-09B64C128EC6@beekhof.net>


> On 26 Nov 2014, at 10:06 am, Digimer <lists at alteeve.ca> wrote:
> 
> On 25/11/14 04:31 PM, Andrew Beekhof wrote:
>>> Yeah, but you're already bringing him for your personal conference.
>>> That's a bit different. ;-)
>>> 
>>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>>> fill two days? Where would we want to collect them?
>> 
>> Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer.
>> 
>> Other design-y topics:
>> - SBD
>> - degraded mode
>> - improved notifications
> 
> This my be something my company can bring to the table. We just hired a dev whose principle goal is to develop and alert system for HA. We're modelling it heavily on the fence/resource agent model with a "scan core" and "scan agents". It's sort of like existing tools, but designed specifically for HA clusters and heavily focused on not interfering with the host more than at all necessary. By Feb., it should be mostly done.
> 
> We're doing this for our own needs, but it might be a framework worth talking about, if nothing else to see if others consider it a fit. Of course, it will be entirely open source. *If* there is interest, I could put together a(n informal) talk on it with a demo.

Definitely interesting

> 
>> - containerisation of services (cgroups, docker, virt)
>> - resource-agents (upstream releases, handling of pull requests, testing)
>> 
>> User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong.
> 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without access to education?


From lists at alteeve.ca  Wed Nov 26 05:58:30 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 26 Nov 2014 00:58:30 -0500
Subject: [Linux-cluster] [Cluster-devel] [ha-wg] [Linux-HA] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <54756A76.60905@fabbione.net>
References: <540D853F.3090109@redhat.com>	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>	<20141124151235.GX2508@suse.de>	<54734BB5.3010104@redhat.com>	<20141125095401.GG2522@suse.de>
	<54756A76.60905@fabbione.net>
Message-ID: <54756C06.4090508@alteeve.ca>

On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote:
>
>
> On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote:
>> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
>>
>>>> Yeah, well, devconf.cz is not such an interesting event for those who do
>>>> not wear the fedora ;-)
>>> That would be the perfect opportunity for you to convert users to Suse ;)
>>
>>>>> I?d prefer, at least for this round, to keep dates/location and explore
>>>>> the option to allow people to join remotely. Afterall there are tons of
>>>>> tools between google hangouts and others that would allow that.
>>>> That is, in my experience, the absolute worst. It creates second class
>>>> participants and is a PITA for everyone.
>>> I agree, it is still a way for people to join in tho.
>>
>> I personally disagree. In my experience, one either does a face-to-face
>> meeting, or a virtual one that puts everyone on the same footing.
>> Mixing both works really badly unless the team already knows each
>> other.
>>
>>>> I know that an in-person meeting is useful, but we have a large team in
>>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
>>>> Europe etc.
>>> Yes same here. No difference.. we have one crazy guy in Australia..
>>
>> Yeah, but you're already bringing him for your personal conference.
>> That's a bit different. ;-)
>>
>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>> fill two days? Where would we want to collect them?
>
> I?d say either a google doc or any random etherpad/wiki instance will do
> just fine.
>
> As for the topics:
> - corosync qdevice and plugins (network, disk, integration with sdb?,
>    others?)
> - corosync RRP / libknet integration/replacement
> - fence autodetection/autoconfiguration
>
> For the user facing topics (that is if there are enough participants and
> I only got 1 user confirmation so far):
>
> - demos, cluster 101, tutorials
> - get feedback
> - get feedback
> - get more feedback
>
> Fabio

I'd be happy to do a cluster 101 or similar, if there is interest. Not 
sure if that would be particularly appealing to anyone coming to our 
meeting, as I think anyone interested is probably well past 101. :) 
Anyway, you guys know my background, let me know if there is a topic 
you'd like me to cover for the user side of things.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Wed Nov 26 06:10:52 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 26 Nov 2014 01:10:52 -0500
Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <54756A76.60905@fabbione.net>
References: <540D853F.3090109@redhat.com>	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>	<20141124151235.GX2508@suse.de>	<54734BB5.3010104@redhat.com>	<20141125095401.GG2522@suse.de>
	<54756A76.60905@fabbione.net>
Message-ID: <54756EEC.7050905@alteeve.ca>

On 26/11/14 12:51 AM, Fabio M. Di Nitto wrote:
>
>
> On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote:
>> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
>>
>>>> Yeah, well, devconf.cz is not such an interesting event for those who do
>>>> not wear the fedora ;-)
>>> That would be the perfect opportunity for you to convert users to Suse ;)
>>
>>>>> I?d prefer, at least for this round, to keep dates/location and explore
>>>>> the option to allow people to join remotely. Afterall there are tons of
>>>>> tools between google hangouts and others that would allow that.
>>>> That is, in my experience, the absolute worst. It creates second class
>>>> participants and is a PITA for everyone.
>>> I agree, it is still a way for people to join in tho.
>>
>> I personally disagree. In my experience, one either does a face-to-face
>> meeting, or a virtual one that puts everyone on the same footing.
>> Mixing both works really badly unless the team already knows each
>> other.
>>
>>>> I know that an in-person meeting is useful, but we have a large team in
>>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
>>>> Europe etc.
>>> Yes same here. No difference.. we have one crazy guy in Australia..
>>
>> Yeah, but you're already bringing him for your personal conference.
>> That's a bit different. ;-)
>>
>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>> fill two days? Where would we want to collect them?
>
> I?d say either a google doc or any random etherpad/wiki instance will do
> just fine.
>
> As for the topics:
> - corosync qdevice and plugins (network, disk, integration with sdb?,
>    others?)
> - corosync RRP / libknet integration/replacement
> - fence autodetection/autoconfiguration
>
> For the user facing topics (that is if there are enough participants and
> I only got 1 user confirmation so far):
>
> - demos, cluster 101, tutorials
> - get feedback
> - get feedback
> - get more feedback
>
> Fabio

Ok, I do have a topic I want to add;

Merging the dozen different mailing lists, IRC channels and other 
support forums. This thread is a good example of the thinness that the 
community is spread over.

A 'dev', 'user', 'announce' list should be enough for all HA. Likewise, 
one IRC channel should be enough, too.

The trick will be discussing this without bikeshedding. :)

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From andrew at beekhof.net  Wed Nov 26 06:28:25 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Wed, 26 Nov 2014 17:28:25 +1100
Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC]
	Organizing HA Summit 2015
In-Reply-To: <54756A76.60905@fabbione.net>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<54756A76.60905@fabbione.net>
Message-ID: <AEC7B235-45AB-4566-ACAD-B7EF03C29374@beekhof.net>


> On 26 Nov 2014, at 4:51 pm, Fabio M. Di Nitto <fabbione at fabbione.net> wrote:
> 
> 
> 
> On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote:
>> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" <fdinitto at redhat.com> wrote:
>> 
>>>> Yeah, well, devconf.cz is not such an interesting event for those who do
>>>> not wear the fedora ;-)
>>> That would be the perfect opportunity for you to convert users to Suse ;)
>> 
>>>>> I?d prefer, at least for this round, to keep dates/location and explore
>>>>> the option to allow people to join remotely. Afterall there are tons of
>>>>> tools between google hangouts and others that would allow that.
>>>> That is, in my experience, the absolute worst. It creates second class
>>>> participants and is a PITA for everyone.
>>> I agree, it is still a way for people to join in tho.
>> 
>> I personally disagree. In my experience, one either does a face-to-face
>> meeting, or a virtual one that puts everyone on the same footing.
>> Mixing both works really badly unless the team already knows each
>> other.
>> 
>>>> I know that an in-person meeting is useful, but we have a large team in
>>>> Beijing, the US, Tasmania (OK, one crazy guy), various countries in
>>>> Europe etc.
>>> Yes same here. No difference.. we have one crazy guy in Australia..
>> 
>> Yeah, but you're already bringing him for your personal conference.
>> That's a bit different. ;-)
>> 
>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>> fill two days? Where would we want to collect them?
> 
> I?d say either a google doc or any random etherpad/wiki instance will do
> just fine.

-ENOGOOGLE

> 
> As for the topics:
> - corosync qdevice and plugins (network, disk, integration with sdb?,
>  others?)
> - corosync RRP / libknet integration/replacement
> - fence autodetection/autoconfiguration
> 
> For the user facing topics (that is if there are enough participants and
> I only got 1 user confirmation so far):
> 
> - demos, cluster 101, tutorials
> - get feedback
> - get feedback
> - get more feedback
> 
> Fabio
> _______________________________________________
> ha-wg-technical mailing list
> ha-wg-technical at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical


From bubble at hoster-ok.com  Wed Nov 26 15:53:50 2014
From: bubble at hoster-ok.com (Vladislav Bogdanov)
Date: Wed, 26 Nov 2014 18:53:50 +0300
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
	HA Summit 2015
In-Reply-To: <20141125095401.GG2522@suse.de>
References: <540D853F.3090109@redhat.com>
	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>
	<20141124151235.GX2508@suse.de>	<54734BB5.3010104@redhat.com>
	<20141125095401.GG2522@suse.de>
Message-ID: <5475F78E.1040700@hoster-ok.com>

25.11.2014 12:54, Lars Marowsky-Bree wrote:...
>
> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
> fill two days? Where would we want to collect them?
>

Just my 2c.

- It would be interesting to get some bird-view information
on what C APIs corosync and pacemaker currently provide to application
developers (one immediate use-case is in-app monitoring of the cluster
events).

- One more (more developer-bounded) topic could be a "resource degraded
state" support. From the user perspective it would be nice to have. One
immediate example is iscsi connection to several portals. When some
portals are not accessible, connection still may work, but in the
"degraded" state.

Best,
Vladislav


From raju.rajsand at gmail.com  Wed Nov 26 18:43:11 2014
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Thu, 27 Nov 2014 00:13:11 +0530
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
 HA Summit 2015
In-Reply-To: <5475F78E.1040700@hoster-ok.com>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<5475F78E.1040700@hoster-ok.com>
Message-ID: <CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>

Greetings,


Guys, I am a poor Indian whom US of A Abhors and have successfully
deployed over 5 centos/rhel clusts vaying from 4-6.

May I Know where this event is held?

Why don't you shift it to India which is much less expensive for all.

I will try to invigorate all ILUG groups as much as I can.

On Wed, Nov 26, 2014 at 9:23 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 25.11.2014 12:54, Lars Marowsky-Bree wrote:...
>>
>> OK, let's switch tracks a bit. What *topics* do we actually have? Can we
>> fill two days? Where would we want to collect them?
>>
>
> Just my 2c.
>
> - It would be interesting to get some bird-view information
> on what C APIs corosync and pacemaker currently provide to application
> developers (one immediate use-case is in-app monitoring of the cluster
> events).
>
> - One more (more developer-bounded) topic could be a "resource degraded
> state" support. From the user perspective it would be nice to have. One
> immediate example is iscsi connection to several portals. When some
> portals are not accessible, connection still may work, but in the
> "degraded" state.
>
> Best,
> Vladislav
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Regards,

Rajagopal


From misch at schwartzkopff.org  Wed Nov 26 19:00:33 2014
From: misch at schwartzkopff.org (Michael Schwartzkopff)
Date: Wed, 26 Nov 2014 20:00:33 +0100
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
	HA Summit 2015
In-Reply-To: <CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>
References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com>
	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>
Message-ID: <1875415.HLenkzVapo@nb003>

Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
> Greetings,
> 
> 
> Guys, I am a poor Indian whom US of A Abhors and have successfully
> deployed over 5 centos/rhel clusts vaying from 4-6.
> 
> May I Know where this event is held?

Brno, Slovakia. Next international Airport: Vienna.

> Why don't you shift it to India which is much less expensive for all.

Because flights to indea would be more expensive for most of the participants.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0162) 1650044
Fax: (089) 620 304 13
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141126/deaa6b77/attachment.sig>

From mgrac at redhat.com  Wed Nov 26 21:18:15 2014
From: mgrac at redhat.com (Marek "marx" Grac)
Date: Wed, 26 Nov 2014 22:18:15 +0100
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
 HA Summit 2015
In-Reply-To: <1875415.HLenkzVapo@nb003>
References: <540D853F.3090109@redhat.com>
	<5475F78E.1040700@hoster-ok.com>	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>
	<1875415.HLenkzVapo@nb003>
Message-ID: <54764397.60701@redhat.com>


On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
>> Greetings,
>>
>>
>> Guys, I am a poor Indian whom US of A Abhors and have successfully
>> deployed over 5 centos/rhel clusts vaying from 4-6.
>>
>> May I Know where this event is held?
> Brno, Slovakia. Next international Airport: Vienna.
Brno is quite close to Slovakia but it is in Czech Republic. 
International airports around are Vienna, Prague and mostly low-cost 
ones in Brno and Bratislava

m,


From andrew at beekhof.net  Wed Nov 26 22:26:52 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Thu, 27 Nov 2014 09:26:52 +1100
Subject: [Linux-cluster] [ha-wg-technical] [ha-wg] [Pacemaker]
	[Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015
In-Reply-To: <20141126154119.GN2522@suse.de>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
	<1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com>
	<20141126154119.GN2522@suse.de>
Message-ID: <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net>


> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree <lmb at suse.com> wrote:
> 
> On 2014-11-25T16:46:01, David Vossel <dvossel at redhat.com> wrote:
> 
> Okay, okay, apparently we have got enough topics to discuss. I'll
> grumble a bit more about Brno, but let's get the organisation of that
> thing on track ... Sigh. Always so much work!
> 
> I'm assuming arrival on the 3rd and departure on the 6th would be the
> plan?
> 
>>> Personally I'm interested in talking about scaling - with pacemaker-remoted
>>> and/or a new messaging/membership layer.
>> If we're going to talk about scaling, we should throw in our new docker support
>> in the same discussion. Docker lends itself well to the "pet vs cattle" analogy.
>> I see management of docker with pacemaker making quite a bit of sense now that we
>> have the ability to scale into the "cattle" territory.
> 
> While we're on that, I'd like to throw in a heretic thought and suggest
> that one might want to look at etcd and fleetd.

Nod. I suspect the next evolutionary step will be to sit on a NoSQL/Big-data kind of table.... somehow.
I was intending to head down that path last year when I did all that cib work.

> 
>>> Other design-y topics:
>>> - SBD
> 
> Point taken. I have actually not forgotten this Andrew, and am reading
> your development. I probably just need to pull the code over ...

ok

> 
>>> - degraded mode
>>> - improved notifications
>>> - containerisation of services (cgroups, docker, virt)
>>> - resource-agents (upstream releases, handling of pull requests, testing)
>> 
>> Yep, We definitely need to talk about the resource-agents.
> 
> Agreed.
> 
>>> User-facing topics could include recent features (ie. pacemaker-remoted,
>>> crm_resource --restart) and common deployment scenarios (eg. NFS) that
>>> people get wrong.
>> Adding to the list, it would be a good idea to talk about Deployment
>> integration testing, what's going on with the phd project and why it's
>> important regardless if you're interested in what the project functionally
>> does.
> 
> OK. So QA is within scope as well. It seems the agenda will fill up
> quite nicely.
> 
> 
> Regards,
>    Lars
> 
> -- 
> Architect Storage/HA
> SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> _______________________________________________
> ha-wg-technical mailing list
> ha-wg-technical at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical


From andrew at beekhof.net  Wed Nov 26 22:28:45 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Thu, 27 Nov 2014 09:28:45 +1100
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
	HA Summit 2015
In-Reply-To: <54764397.60701@redhat.com>
References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com>
	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>
	<1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com>
Message-ID: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>


> On 27 Nov 2014, at 8:18 am, Marek marx Grac <mgrac at redhat.com> wrote:
> 
> 
> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
>>> Greetings,
>>> 
>>> 
>>> Guys, I am a poor Indian whom US of A Abhors and have successfully
>>> deployed over 5 centos/rhel clusts vaying from 4-6.
>>> 
>>> May I Know where this event is held?
>> Brno, Slovakia. Next international Airport: Vienna.
> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava

Anyone want to meet in munich and share a car? :-)


From misch at schwartzkopff.org  Wed Nov 26 22:51:58 2014
From: misch at schwartzkopff.org (Michael Schwartzkopff)
Date: Wed, 26 Nov 2014 23:51:58 +0100
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
	HA Summit 2015
In-Reply-To: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>
References: <540D853F.3090109@redhat.com> <54764397.60701@redhat.com>
	<7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>
Message-ID: <3827969.fRP4QWWuTV@nb003>

Am Donnerstag, 27. November 2014, 09:28:45 schrieb Andrew Beekhof:
> > On 27 Nov 2014, at 8:18 am, Marek marx Grac <mgrac at redhat.com> wrote:
> > 
> > On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
> >> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
> >>> Greetings,
> >>> 
> >>> 
> >>> Guys, I am a poor Indian whom US of A Abhors and have successfully
> >>> deployed over 5 centos/rhel clusts vaying from 4-6.
> >>> 
> >>> May I Know where this event is held?
> >> 
> >> Brno, Slovakia. Next international Airport: Vienna.
> > 
> > Brno is quite close to Slovakia but it is in Czech Republic. International
> > airports around are Vienna, Prague and mostly low-cost ones in Brno and
> > Bratislava
> Anyone want to meet in munich and share a car? :-)

Quite a ride: google says 6 hours. But you are welcome. I'll dirve. Anyone 
else?

Sorry. -ENOGOOGLE, i forgot.

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0162) 1650044
Fax: (089) 620 304 13
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141126/9b4ef136/attachment.sig>

From lists at alteeve.ca  Wed Nov 26 22:58:51 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 26 Nov 2014 17:58:51 -0500
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
 HA Summit 2015
In-Reply-To: <7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>
References: <540D853F.3090109@redhat.com>
	<5475F78E.1040700@hoster-ok.com>	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>	<1875415.HLenkzVapo@nb003>
	<54764397.60701@redhat.com>
	<7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>
Message-ID: <54765B2B.3060703@alteeve.ca>

On 26/11/14 05:28 PM, Andrew Beekhof wrote:
>
>> On 27 Nov 2014, at 8:18 am, Marek marx Grac <mgrac at redhat.com> wrote:
>>
>>
>> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
>>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
>>>> Greetings,
>>>>
>>>>
>>>> Guys, I am a poor Indian whom US of A Abhors and have successfully
>>>> deployed over 5 centos/rhel clusts vaying from 4-6.
>>>>
>>>> May I Know where this event is held?
>>> Brno, Slovakia. Next international Airport: Vienna.
>> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava
>
> Anyone want to meet in munich and share a car? :-)

I might be up for that. I've not looked into flights yet, though I do 
have a standing invitation for beer in Vienna, so I'm sort of planning 
to fly through there. Apparently there is a very convenient bus from 
Vienna to Brno.

Why Munich? (Don't get me wrong, I loved it there last year!)

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From andrew at beekhof.net  Thu Nov 27 00:40:31 2014
From: andrew at beekhof.net (Andrew Beekhof)
Date: Thu, 27 Nov 2014 11:40:31 +1100
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
	HA Summit 2015
In-Reply-To: <54765B2B.3060703@alteeve.ca>
References: <540D853F.3090109@redhat.com> <5475F78E.1040700@hoster-ok.com>
	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>
	<1875415.HLenkzVapo@nb003> <54764397.60701@redhat.com>
	<7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>
	<54765B2B.3060703@alteeve.ca>
Message-ID: <7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net>


> On 27 Nov 2014, at 9:58 am, Digimer <lists at alteeve.ca> wrote:
> 
> On 26/11/14 05:28 PM, Andrew Beekhof wrote:
>> 
>>> On 27 Nov 2014, at 8:18 am, Marek marx Grac <mgrac at redhat.com> wrote:
>>> 
>>> 
>>> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
>>>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
>>>>> Greetings,
>>>>> 
>>>>> 
>>>>> Guys, I am a poor Indian whom US of A Abhors and have successfully
>>>>> deployed over 5 centos/rhel clusts vaying from 4-6.
>>>>> 
>>>>> May I Know where this event is held?
>>>> Brno, Slovakia. Next international Airport: Vienna.
>>> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava
>> 
>> Anyone want to meet in munich and share a car? :-)
> 
> I might be up for that. I've not looked into flights yet, though I do have a standing invitation for beer in Vienna, so I'm sort of planning to fly through there. Apparently there is a very convenient bus from Vienna to Brno.
> 
> Why Munich? (Don't get me wrong, I loved it there last year!)

Its both a) a hub and b) where I used to live :)


From lists at alteeve.ca  Thu Nov 27 04:13:54 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 26 Nov 2014 23:13:54 -0500
Subject: [Linux-cluster] [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing
 HA Summit 2015
In-Reply-To: <7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net>
References: <540D853F.3090109@redhat.com>
	<5475F78E.1040700@hoster-ok.com>	<CA+YdgarJexDTZ3RMPq7SjzKLr5qUqw=jseLGdVwZ9=mgjAUSQQ@mail.gmail.com>	<1875415.HLenkzVapo@nb003>
	<54764397.60701@redhat.com>	<7BC3FDC4-8218-47B8-BAE7-8D512D4C988E@beekhof.net>	<54765B2B.3060703@alteeve.ca>
	<7C5D986E-1CF2-4991-AFA7-6BEB9E552D41@beekhof.net>
Message-ID: <5476A502.6010508@alteeve.ca>

On 26/11/14 07:40 PM, Andrew Beekhof wrote:
>
>> On 27 Nov 2014, at 9:58 am, Digimer <lists at alteeve.ca> wrote:
>>
>> On 26/11/14 05:28 PM, Andrew Beekhof wrote:
>>>
>>>> On 27 Nov 2014, at 8:18 am, Marek marx Grac <mgrac at redhat.com> wrote:
>>>>
>>>>
>>>> On 11/26/2014 08:00 PM, Michael Schwartzkopff wrote:
>>>>> Am Donnerstag, 27. November 2014, 00:13:11 schrieb Rajagopal Swaminathan:
>>>>>> Greetings,
>>>>>>
>>>>>>
>>>>>> Guys, I am a poor Indian whom US of A Abhors and have successfully
>>>>>> deployed over 5 centos/rhel clusts vaying from 4-6.
>>>>>>
>>>>>> May I Know where this event is held?
>>>>> Brno, Slovakia. Next international Airport: Vienna.
>>>> Brno is quite close to Slovakia but it is in Czech Republic. International airports around are Vienna, Prague and mostly low-cost ones in Brno and Bratislava
>>>
>>> Anyone want to meet in munich and share a car? :-)
>>
>> I might be up for that. I've not looked into flights yet, though I do have a standing invitation for beer in Vienna, so I'm sort of planning to fly through there. Apparently there is a very convenient bus from Vienna to Brno.
>>
>> Why Munich? (Don't get me wrong, I loved it there last year!)
>
> Its both a) a hub and b) where I used to live :)

Ah.

Well, I'll see how the prices come out. I didn't realize it was a 6h 
drive. On the other hand, it'd be a great way to see Europe beyond 
hotels/airports... Are you serious about the 6h drive though? That's 
quite the ride.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From kgronlund at suse.com  Thu Nov 27 12:33:30 2014
From: kgronlund at suse.com (Kristoffer =?utf-8?Q?Gr=C3=B6nlund?=)
Date: Thu, 27 Nov 2014 13:33:30 +0100
Subject: [Linux-cluster] [Pacemaker] [ha-wg-technical] [ha-wg]
	[Cluster-devel]	[Linux-HA] [RFC] Organizing HA Summit 2015
In-Reply-To: <76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net>
References: <540D853F.3090109@redhat.com> <20141124143957.GU2508@suse.de>
	<547346A9.6010901@redhat.com> <20141124151235.GX2508@suse.de>
	<54734BB5.3010104@redhat.com> <20141125095401.GG2522@suse.de>
	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>
	<1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com>
	<20141126154119.GN2522@suse.de>
	<76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net>
Message-ID: <87lhmw6cx1.fsf@krigpad.site>


>> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree <lmb at suse.com> wrote:
>> 
>> On 2014-11-25T16:46:01, David Vossel <dvossel at redhat.com> wrote:
>> 
>> Okay, okay, apparently we have got enough topics to discuss. I'll
>> grumble a bit more about Brno, but let's get the organisation of that
>> thing on track ... Sigh. Always so much work!
>> 

Will Chris Feist be at the summit? I would be happy to have a roundtable
discussion or something similar about clients, exchange ideas and so
on. I don't necessarily think that there is an urgent need to unify the
efforts code-wise, but I think there is a lot we could do together on
the level of idea exchange without giving up our independence, so to
speak ;)

Of course I would be happy to talk about such things with anyone else
who is interested as well.

-- 
// Kristoffer Gr?nlund
// kgronlund at suse.com


From fdinitto at redhat.com  Thu Nov 27 12:56:34 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 27 Nov 2014 13:56:34 +0100
Subject: [Linux-cluster] [ha-wg-technical] [Pacemaker] [ha-wg]
 [Cluster-devel]	[Linux-HA] [RFC] Organizing HA Summit 2015
In-Reply-To: <87lhmw6cx1.fsf@krigpad.site>
References: <540D853F.3090109@redhat.com>
	<20141124143957.GU2508@suse.de>	<547346A9.6010901@redhat.com>
	<20141124151235.GX2508@suse.de>	<54734BB5.3010104@redhat.com>
	<20141125095401.GG2522@suse.de>	<EF12A921-0218-47AA-B737-BF05D4DC6F5D@beekhof.net>	<1770308907.3548355.1416951961151.JavaMail.zimbra@redhat.com>	<20141126154119.GN2522@suse.de>	<76F44DBB-4E4B-4813-81E2-B0A5A664BD1A@beekhof.net>
	<87lhmw6cx1.fsf@krigpad.site>
Message-ID: <54771F82.6050800@redhat.com>


On 11/27/2014 1:33 PM, Kristoffer Gr?nlund wrote:
> 
>>> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree <lmb at suse.com> wrote:
>>>
>>> On 2014-11-25T16:46:01, David Vossel <dvossel at redhat.com> wrote:
>>>
>>> Okay, okay, apparently we have got enough topics to discuss. I'll
>>> grumble a bit more about Brno, but let's get the organisation of that
>>> thing on track ... Sigh. Always so much work!
>>>
> 
> Will Chris Feist be at the summit? I would be happy to have a roundtable
> discussion or something similar about clients, exchange ideas and so
> on. I don't necessarily think that there is an urgent need to unify the
> efforts code-wise, but I think there is a lot we could do together on
> the level of idea exchange without giving up our independence, so to
> speak ;)
> 
> Of course I would be happy to talk about such things with anyone else
> who is interested as well.
> 

sorry, I keep replying from my private email address...

Yes Chris will be there too.

Fabio


From lists at alteeve.ca  Thu Nov 27 16:52:18 2014
From: lists at alteeve.ca (Digimer)
Date: Thu, 27 Nov 2014 11:52:18 -0500
Subject: [Linux-cluster] Wiki for planning created - Re: [Pacemaker] [RFC]
 Organizing HA Summit 2015
In-Reply-To: <540D853F.3090109@redhat.com>
References: <540D853F.3090109@redhat.com>
Message-ID: <547756C2.1060504@alteeve.ca>

I just created a dedicated/fresh wiki for planning and organizing:

http://plan.alteeve.ca/index.php/Main_Page

Other than the domain, it has no association with any existing project, 
so it should be a neutral enough platform. Also, it's not owned by 
$megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If 
there is concern, I can setup https.

If no one else gets to it before me, I'll start collating the data from 
the mailing list onto that wiki tomorrow (maaaybe today, depends).

The wiki requires registration, but that's it. I'm not bothering with 
captchas because, in my experience, spammer walk right through them 
anyway. I do have edits email me, so I can catch and roll back any spam 
quickly.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Fri Nov 28 05:37:53 2014
From: lists at alteeve.ca (Digimer)
Date: Fri, 28 Nov 2014 00:37:53 -0500
Subject: [Linux-cluster] [Cluster-devel] Wiki for planning created - Re:
 [Pacemaker] [RFC] Organizing HA Summit 2015
In-Reply-To: <5478090E.6030804@fabbione.net>
References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca>
	<5478090E.6030804@fabbione.net>
Message-ID: <54780A31.8060806@alteeve.ca>

On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote:
>
>
> On 11/27/2014 5:52 PM, Digimer wrote:
>> I just created a dedicated/fresh wiki for planning and organizing:
>>
>> http://plan.alteeve.ca/index.php/Main_Page
>>
>> Other than the domain, it has no association with any existing project,
>> so it should be a neutral enough platform. Also, it's not owned by
>> $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If
>> there is concern, I can setup https.
>>
>> If no one else gets to it before me, I'll start collating the data from
>> the mailing list onto that wiki tomorrow (maaaybe today, depends).
>>
>> The wiki requires registration, but that's it. I'm not bothering with
>> captchas because, in my experience, spammer walk right through them
>> anyway. I do have edits email me, so I can catch and roll back any spam
>> quickly.
>>
>
> Awesome! thanks for taking care of it. Do you have a chance to add also
> an instance of etherpad to the site?
>
> Mostly to do collaborative editing while we sit all around the same table.
>
> Otherwise we can use a public instance and copy paste info after that in
> the wiki.
>
> Fabio

Never tried setting up etherpad before, but if it runs on rhel 6, I 
should have no problem setting it up.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From rajpatel at redhat.com  Fri Nov 28 05:51:20 2014
From: rajpatel at redhat.com (Rajat)
Date: Fri, 28 Nov 2014 11:21:20 +0530
Subject: [Linux-cluster] Cluster Overhead I/O, Network, Memory, CPU
Message-ID: <54780D58.9010400@redhat.com>

Hey Team,

Our customer is using RHEL 5.X and RHEL 6.X as Cluster in they 
production stack.

Customer is looking is there any doc/white paper which can share they 
management as cluster service usages on
Disk                        %
Network                %
Memory                 %
CPU                        %

Gratitude


-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141128/c95df7bc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc.jpg
Type: image/jpeg
Size: 22087 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141128/c95df7bc/attachment.jpg>

From jpokorny at redhat.com  Fri Nov 28 19:10:06 2014
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Fri, 28 Nov 2014 20:10:06 +0100
Subject: [Linux-cluster] [Pacemaker] [Cluster-devel] Wiki for planning
 created - Re: [RFC] Organizing HA Summit 2015
In-Reply-To: <54780A31.8060806@alteeve.ca>
References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca>
	<5478090E.6030804@fabbione.net> <54780A31.8060806@alteeve.ca>
Message-ID: <20141128191006.GD31780@redhat.com>

On 28/11/14 00:37 -0500, Digimer wrote:
> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote:
>> On 11/27/2014 5:52 PM, Digimer wrote:
>>> I just created a dedicated/fresh wiki for planning and organizing:
>>> 
>>> http://plan.alteeve.ca/index.php/Main_Page
>>> 
>>> [...]
>> 
>> Awesome! thanks for taking care of it. Do you have a chance to add also
>> an instance of etherpad to the site?
>> 
>> Mostly to do collaborative editing while we sit all around the same table.
>> 
>> Otherwise we can use a public instance and copy paste info after that in
>> the wiki.
>> 
> Never tried setting up etherpad before, but if it runs on rhel 6, I should
> have no problem setting it up.

Provided no conspiracy to be started, there are a bunch of popular
instances, e.g. http://piratepad.net/

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141128/d1826565/attachment.sig>

From jpokorny at redhat.com  Fri Nov 28 23:56:46 2014
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Sat, 29 Nov 2014 00:56:46 +0100
Subject: [Linux-cluster] Cluster Overhead I/O, Network, Memory, CPU
In-Reply-To: <54780D58.9010400@redhat.com>
References: <54780D58.9010400@redhat.com>
Message-ID: <20141128235646.GG31780@redhat.com>

On 28/11/14 11:21 +0530, Rajat wrote:
> Hey Team,

Perhaps Friday kicked in and this was intended for internal RH lists.
Don't cluster around this too much :)

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20141129/6d334b52/attachment.sig>

From fdinitto at redhat.com  Sat Nov 29 05:45:03 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Sat, 29 Nov 2014 06:45:03 +0100
Subject: [Linux-cluster] [Cluster-devel] [Pacemaker] Wiki for planning
 created - Re: [RFC] Organizing HA Summit 2015
In-Reply-To: <20141128191006.GD31780@redhat.com>
References: <540D853F.3090109@redhat.com>
	<547756C2.1060504@alteeve.ca>	<5478090E.6030804@fabbione.net>
	<54780A31.8060806@alteeve.ca> <20141128191006.GD31780@redhat.com>
Message-ID: <54795D5F.2080203@redhat.com>


On 11/28/2014 8:10 PM, Jan Pokorn? wrote:
> On 28/11/14 00:37 -0500, Digimer wrote:
>> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote:
>>> On 11/27/2014 5:52 PM, Digimer wrote:
>>>> I just created a dedicated/fresh wiki for planning and organizing:
>>>>
>>>> http://plan.alteeve.ca/index.php/Main_Page
>>>>
>>>> [...]
>>>
>>> Awesome! thanks for taking care of it. Do you have a chance to add also
>>> an instance of etherpad to the site?
>>>
>>> Mostly to do collaborative editing while we sit all around the same table.
>>>
>>> Otherwise we can use a public instance and copy paste info after that in
>>> the wiki.
>>>
>> Never tried setting up etherpad before, but if it runs on rhel 6, I should
>> have no problem setting it up.
> 
> Provided no conspiracy to be started, there are a bunch of popular
> instances, e.g. http://piratepad.net/
> 

Right, some of them only store etherpads for 30 days. Just be careful
the one we choose or we make our own.

Fabio


From lists at alteeve.ca  Sat Nov 29 05:50:50 2014
From: lists at alteeve.ca (Digimer)
Date: Sat, 29 Nov 2014 00:50:50 -0500
Subject: [Linux-cluster] [Cluster-devel] [Pacemaker] Wiki for planning
 created - Re: [RFC] Organizing HA Summit 2015
In-Reply-To: <54795D5F.2080203@redhat.com>
References: <540D853F.3090109@redhat.com>	<547756C2.1060504@alteeve.ca>	<5478090E.6030804@fabbione.net>	<54780A31.8060806@alteeve.ca>
	<20141128191006.GD31780@redhat.com> <54795D5F.2080203@redhat.com>
Message-ID: <54795EBA.2030807@alteeve.ca>

On 29/11/14 12:45 AM, Fabio M. Di Nitto wrote:
>
>
> On 11/28/2014 8:10 PM, Jan Pokorn? wrote:
>> On 28/11/14 00:37 -0500, Digimer wrote:
>>> On 28/11/14 12:33 AM, Fabio M. Di Nitto wrote:
>>>> On 11/27/2014 5:52 PM, Digimer wrote:
>>>>> I just created a dedicated/fresh wiki for planning and organizing:
>>>>>
>>>>> http://plan.alteeve.ca/index.php/Main_Page
>>>>>
>>>>> [...]
>>>>
>>>> Awesome! thanks for taking care of it. Do you have a chance to add also
>>>> an instance of etherpad to the site?
>>>>
>>>> Mostly to do collaborative editing while we sit all around the same table.
>>>>
>>>> Otherwise we can use a public instance and copy paste info after that in
>>>> the wiki.
>>>>
>>> Never tried setting up etherpad before, but if it runs on rhel 6, I should
>>> have no problem setting it up.
>>
>> Provided no conspiracy to be started, there are a bunch of popular
>> instances, e.g. http://piratepad.net/
>>
>
> Right, some of them only store etherpads for 30 days. Just be careful
> the one we choose or we make our own.
>
> Fabio

I'll set one up, but I'll need a few days, I'm out of the country at the 
moment. It's not needed until the conference, is it? Or will you want to 
have it before then?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


From lists at alteeve.ca  Sun Nov 30 05:56:37 2014
From: lists at alteeve.ca (Digimer)
Date: Sun, 30 Nov 2014 00:56:37 -0500
Subject: [Linux-cluster] [ha-wg-technical] Wiki for planning created -
 Re: [Pacemaker] [RFC] Organizing HA Summit 2015
In-Reply-To: <547756C2.1060504@alteeve.ca>
References: <540D853F.3090109@redhat.com> <547756C2.1060504@alteeve.ca>
Message-ID: <547AB195.5030100@alteeve.ca>

On 27/11/14 11:52 AM, Digimer wrote:
> I just created a dedicated/fresh wiki for planning and organizing:
>
> http://plan.alteeve.ca/index.php/Main_Page
>
> Other than the domain, it has no association with any existing project,
> so it should be a neutral enough platform. Also, it's not owned by
> $megacorp (I wish!), so spying/privacy shouldn't be an issue I hope. If
> there is concern, I can setup https.
>
> If no one else gets to it before me, I'll start collating the data from
> the mailing list onto that wiki tomorrow (maaaybe today, depends).
>
> The wiki requires registration, but that's it. I'm not bothering with
> captchas because, in my experience, spammer walk right through them
> anyway. I do have edits email me, so I can catch and roll back any spam
> quickly.

Ok, I was getting 3~5 spam accounts created per day. To deal with this, 
I setup 'questy' captcha program with five (random) questions that 
should be easy to answer, even for non-english speakers. Just the same, 
if anyone has any trouble registering, please feel free to email me 
directly and I will be happy to help.

Madi

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?