[Linux-cluster] how to mount GFS volumes same time both the cluster nodes?

Tue Nov 20 10:58:57 UTC 2012

Hi all,

i am installing redhat cluster 6 two node cluser.the issue is i am not able
to mount my GFS file sytem in both the node at same time..

please find my clustat output ..

[root at saperpprod01 ~]# clustat
Cluster Status for sap-cluster1 @ Tue Nov 20 14:51:28 2012
Member Status: Quorate
 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 saperpprod01                                                        1
Online, Local, rgmanager
 saperpprod02                                                        2
Online, rgmanager
 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 service:oracle
saperpprod01                                                     started
 service:profile-gfs
saperpprod01                                                     started
 service:sap
saperpprod01                                                     started
[root at saperpprod01 ~]#
oralce and sap is fine and it is flaying in both nodes.i want mount my GFS
vols same time at both the nodes.

Thanks in advacne
james

but profile-gfs is GFS file system and i want present the GFS mount point
same time both the node.please help me this
On Mon, Nov 19, 2012 at 1:16 PM, Jacek Konieczny <jajcus at jajcus.net> wrote:

> Hi,
>
> I am setting up a cluster using:
>
> Linux kernel 3.6.6
> Corosync 2.1.0
> DLM 4.0.0
> CLVMD 2.02.98
> Pacemaker 1.1.8
> DRBD 8.3.13
>
> Now I have stuck on the 'clean shutdown of a node' scenario.
>
> It goes like that:
> - resources using the shared storage are properly stopped by Pacemaker.
> - DRBD is cleanly demoted and unconfigured by Pacemaker
> - Pacemaker cleanly exits
> - CLVMD is stopped.
> – dlm_controld is stopped
> – corosync is being stopped
>
> and at this point the node is fenced (rebooted) by the dlm_controld on
> the other node. I would expect it continue with a clean shutdown.
>
> Any idea how to debug/fix it?
> Is this '541 cpg_dispatch error 9' the problem?
>
> Logs from the node being shut down (log file system mounted with the 'sync'
> option, syslog shutdown delayed as much as possible):
>
> Kernel:
> Nov 19 09:49:40 dev1n2 kernel: : [  542.049407] block drbd0: worker
> terminated
> Nov 19 09:49:40 dev1n2 kernel: : [  542.049412] block drbd0: Terminating
> drbd0_worker
> Nov 19 09:49:43 dev1n2 kernel: : [  544.934390] dlm: clvmd: leaving the
> lockspace group...
> Nov 19 09:49:43 dev1n2 kernel: : [  544.937584] dlm: clvmd: group event
> done 0 0
> Nov 19 09:49:43 dev1n2 kernel: : [  544.937897] dlm: clvmd:
> release_lockspace final free
> Nov 19 09:49:43 dev1n2 kernel: : [  544.961407] dlm: closing connection to
> node 2
> Nov 19 09:49:43 dev1n2 kernel: : [  544.961431] dlm: closing connection to
> node 1
>
> User space:
> Nov 19 09:49:41 dev1n2 pacemakerd[1267]:   notice: stop_child: Stopping
> cib: Sent -15 to process 1279
> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2]
> stonithd:1281:0x7fc423dfd5e0 is now disconnected from corosync
> Nov 19 09:49:41 dev1n2 cib[1279]:   notice: terminate_cs_connection:
> Disconnecting from Corosync
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1db
> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2]
> cib:1279:0x7fc4240008d0 is now disconnected from corosync
> Nov 19 09:49:41 dev1n2 cib[1279]:   notice: terminate_cs_connection:
> Disconnecting from Corosync
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1dd
> Nov 19 09:49:41 dev1n2 pacemakerd[1267]:   notice: pcmk_shutdown_worker:
> Shutdown complete
> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2]
> pacemakerd:1267:0x7fc423bf8ed0 is now disconnected from corosync
> Nov 19 09:49:41 dev1n2 notifyd[1139]: [notice] dev1n2[2]
> pacemakerd:1267:0x7fc423bf7660 is now disconnected from corosync
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1de
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1de
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1e1
> Nov 19 09:49:41 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1e1
> Nov 19 09:49:43 dev1n2 dlm_controld[1142]: 541 cpg_dispatch error 9
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1e7
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [TOTEM ] Retransmit List: 1e7
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Unloading all Corosync
> service engines.
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [QB    ] withdrawing server sockets
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync vote quorum service v1.0
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [QB    ] withdrawing server sockets
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync configuration map access
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [QB    ] withdrawing server sockets
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync configuration service
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [QB    ] withdrawing server sockets
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync cluster closed process group service v1.01
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [QB    ] withdrawing server sockets
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync cluster quorum service v0.1
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync profile loading service
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [WD    ] magically closing the
> watchdog.
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [SERV  ] Service engine unloaded:
> corosync watchdog service
> Nov 19 09:49:43 dev1n2 corosync[1130]:  [MAIN  ] Corosync Cluster Engine
> exiting normally
>
>
> Logs from the surviving node:
>
> Kernel:
> Nov 19 09:49:39 dev1n1 kernel: : [80664.615988] block drbd0: conn(
> Unconnected -> WFConnection )
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497187] dlm: clvmd: dlm_recover 11
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497241] dlm: clvmd: dlm_clear_toss
> 1 done
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497291] dlm: clvmd: remove member 2
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497295] dlm: clvmd:
> dlm_recover_members 1 nodes
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497298] dlm: clvmd: generation 15
> slots 1 1:1
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497300] dlm: clvmd:
> dlm_recover_directory
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497302] dlm: clvmd:
> dlm_recover_directory 0 in 0 new
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497304] dlm: clvmd:
> dlm_recover_directory 0 out 0 messages
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497307] dlm: clvmd:
> dlm_recover_masters
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497309] dlm: clvmd:
> dlm_recover_masters 0 of 1
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497311] dlm: clvmd:
> dlm_recover_locks 0 out
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497313] dlm: clvmd:
> dlm_recover_locks 0 in
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497314] dlm: clvmd:
> dlm_recover_rsbs 1 done
> Nov 19 09:49:42 dev1n1 kernel: : [80667.497366] dlm: clvmd: dlm_recover 11
> generation 15 done: 0 ms
> Nov 19 09:49:43 dev1n1 kernel: : [80668.211818] dlm: closing connection to
> node 2
> Nov 19 09:49:46 dev1n1 kernel: : [80670.779015] igb: p1p2 NIC Link is Down
>
> User space:
> Nov 19 09:49:40 dev1n1 pengine[1078]:   notice: stage6: Scheduling Node
> dev1n2 for shutdown
> Nov 19 09:49:40 dev1n1 pengine[1078]:    error: rsc_expand_action:
> Couldn't expand vpbx_vg_cl_demote_0
> Nov 19 09:49:40 dev1n1 pengine[1078]:    error: rsc_expand_action:
> Couldn't expand vpbx_vg_cl_demote_0
> Nov 19 09:49:40 dev1n1 pengine[1078]:   notice: LogActions: Stop
>  stonith-dev1n1      (dev1n2)
> Nov 19 09:49:40 dev1n1 pengine[1078]:   notice: process_pe_message:
> Calculated Transition 17: /var/lib/pacemaker/pengine/pe-input-1035.bz2
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1d1
> Nov 19 09:49:40 dev1n1 crmd[1080]:   notice: run_graph: Transition 17
> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1035.bz2): Complete
> Nov 19 09:49:40 dev1n1 crmd[1080]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1d4
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1d6 1d8
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1d6
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1d6
> Nov 19 09:49:40 dev1n1 crmd[1080]:   notice: peer_update_callback:
> do_shutdown of dev1n2 (op 63) is complete
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1df
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1df
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1e3
> Nov 19 09:49:40 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1e3
> Nov 19 09:49:42 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1e6
> Nov 19 09:49:42 dev1n1 corosync[1004]:  [TOTEM ] Retransmit List: 1e9
> Nov 19 09:49:43 dev1n1 corosync[1004]:  [QUORUM] Members[1]: 1
> Nov 19 09:49:43 dev1n1 crmd[1080]:   notice:
> corosync_mark_unseen_peer_dead: Node 2/dev1n2 was not seen in the previous
> transition
> Nov 19 09:49:43 dev1n1 crmd[1080]:   notice: crm_update_peer_state:
> corosync_mark_unseen_peer_dead: Node dev1n2[2] - state is now lost
> Nov 19 09:49:43 dev1n1 corosync[1004]:  [TOTEM ] A processor joined or
> left the membership and a new membership (10.28.45.27:30736) was formed.
> Nov 19 09:49:43 dev1n1 corosync[1004]:  [MAIN  ] Completed service
> synchronization, ready to provide service.
> Nov 19 09:49:43 dev1n1 dlm_controld[1014]: 80664 fence request 2 pid 27225
> nodedown time 1353314983 fence_all dlm_stonith
> Nov 19 09:49:43 dev1n1 notifyd[1010]: [notice] 192.168.1.2[2]
> ip:192.168.1.2 left
> Nov 19 09:49:43 dev1n1 stonith-ng[1075]:   notice: stonith_command: Client
> stonith-api.27225.b5ff8f4d wants to fence (reboot) '2' with device '(any)'
> Nov 19 09:49:43 dev1n1 stonith-ng[1075]:   notice:
> initiate_remote_stonith_op: Initiating remote operation reboot for dev1n2:
> 71447261-0e53-4b20-b628-d3f026a4ae24 (0)
> Nov 19 09:49:44 dev1n1 external/ipmi[27242]: [27254]: debug: ipmitool
> output: Chassis Power Control: Reset
> Nov 19 09:49:45 dev1n1 stonith-ng[1075]:   notice: log_operation:
> Operation 'reboot' [27234] (call 0 from stonith-api.27225) for host
> 'dev1n2' with device 'stonith-dev1n2' returned: 0 (OK)
> Nov 19 09:49:45 dev1n1 stonith-ng[1075]:   notice: remote_op_done:
> Operation reboot of dev1n2 by dev1n1 for stonith-api.27225 at dev1n1.71447261:
> OK
> Nov 19 09:49:45 dev1n1 crmd[1080]:   notice: tengine_stonith_notify: Peer
> dev1n2 was terminated (st_notify_fence) by dev1n1 for dev1n1: OK
> (ref=71447261-0e53-4b20-b628-d3f026a4ae24) by client stonith-api.27225
>
> Greets,
>         Jacek
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121120/61742541/attachment.htm>