From laszlo.budai at gmail.com Mon Aug 1 10:20:12 2011 From: laszlo.budai at gmail.com (Budai Laszlo) Date: Mon, 01 Aug 2011 13:20:12 +0300 Subject: [Linux-cluster] service startup order In-Reply-To: References: <4E30AB1A.1080102@gmail.com> <4E30B25E.5010302@alteeve.com> Message-ID: <4E367DDC.9050608@gmail.com> Hi, I got this answer today from Fabio M. Di Nitto: On 7/29/2011 1:25 AM, Budai Laszlo wrote: > > Hello, > > > > Please excuse my direct mail, but I've received no conclusive answer on > > the linux-cluster mailing list, and I've seen your email on the Cluster > > Wiki home page. > > Please tell me what is the startup order for services in Red Hat Cluster > > 4.5 (rgmanager-1.9.68-1)? > > Are the services started in parallel, or are started one by one in the > > order as these appears in cluster.conf. > > I'm not talking about resources, or resource trees. They are started in sequence as they appear in cluster.conf. I am not extremely familiar with EL4 but the behavior should be pretty much the same as other rgmanager releases. Fabio -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at adurotec.com Tue Aug 2 01:50:15 2011 From: david at adurotec.com (David) Date: Mon, 01 Aug 2011 20:50:15 -0500 Subject: [Linux-cluster] Corosync fails to start using cman Message-ID: <4E3757D7.2080607@adurotec.com> I have the RHCS installed on CentOS6 x86_64. One of the nodes in a 3 node cluster won't start after I moved the nodes to a new vlan. When I start cman this is what I get: Starting cluster: Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Aug 02 01:45:17 corosync [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service. Aug 02 01:45:17 corosync [MAIN ] Corosync built-in features: nss rdma Aug 02 01:45:17 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 02 01:45:17 corosync [MAIN ] Successfully parsed cman config Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit timeout (2380 ms) Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits before loss (4 retrans) Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms) consensus (12000 ms) merge (200 ms) Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs) Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1402 Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages) Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages) Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads) Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms) Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms) Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count) Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none. Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0) Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms) Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP). Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Aug 02 01:45:17 corosync [IPC ] you are using ipc api v2 Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer size (262142 bytes). Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer size (262142 bytes). corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed. Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is now up. corosync died with signal: 6 Check cluster logs for details Any idea what the issue could be? Thanks David From linux at alteeve.com Tue Aug 2 01:56:47 2011 From: linux at alteeve.com (Digimer) Date: Mon, 01 Aug 2011 21:56:47 -0400 Subject: [Linux-cluster] Corosync fails to start using cman In-Reply-To: <4E3757D7.2080607@adurotec.com> References: <4E3757D7.2080607@adurotec.com> Message-ID: <4E37595F.5010003@alteeve.com> On 08/01/2011 09:50 PM, David wrote: > I have the RHCS installed on CentOS6 x86_64. > > One of the nodes in a 3 node cluster won't start after I moved the nodes > to a new vlan. > > When I start cman this is what I get: > > Starting cluster: > Checking Network Manager... [ OK ] > Global setup... [ OK ] > Loading kernel modules... [ OK ] > Mounting configfs... [ OK ] > Starting cman... Aug 02 01:45:17 corosync [MAIN ] Corosync Cluster > Engine ('1.2.3'): started and ready to provide service. > Aug 02 01:45:17 corosync [MAIN ] Corosync built-in features: nss rdma > Aug 02 01:45:17 corosync [MAIN ] Successfully read config from > /etc/cluster/cluster.conf > Aug 02 01:45:17 corosync [MAIN ] Successfully parsed cman config > Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit > timeout (2380 ms) > Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits > before loss (4 retrans) > Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms) > consensus (12000 ms) merge (200 ms) > Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const > (2500 msgs) > Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations) > Maximum network MTU 1402 > Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages) > maximum messages per rotation (17 messages) > Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages) > Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads) > Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms) > Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms) > Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count) > Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none. > Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0) > Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms) > Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set > heartbeat_failures_allowed > 0 > Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP). > Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive > security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > Aug 02 01:45:17 corosync [IPC ] you are using ipc api v2 > Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer > size (262142 bytes). > Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer > size (262142 bytes). > corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res > == sizeof (unsigned long long)' failed. > Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is > now up. > corosync died with signal: 6 Check cluster logs for details > > > Any idea what the issue could be? > > Thanks > David What is your cluster.conf file (please obscure passwords only), what does `uname -n` return and what is your network configuration (interface names and IPs)? -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "At what point did we forget that the Space Shuttle was, essentially, a program that strapped human beings to an explosion and tried to stab through the sky with fire and math?" From linux at alteeve.com Tue Aug 2 05:34:20 2011 From: linux at alteeve.com (Digimer) Date: Tue, 02 Aug 2011 01:34:20 -0400 Subject: [Linux-cluster] Problem with rhel6 + rgmanager + ip service Message-ID: <4E378C5C.8010505@alteeve.com> I'm trying to setup a trivially simple cluster using RHEL 6.1 (cman+rgmanager). I've got three interfaces, and I want a managed IP on 192.168.2.100/24 (which should match eth1). At this point though, I'd be happy to get any IP working on any interface. Here's my config: