[Linux-cluster] Cluster environment issue

Tue May 31 03:03:58 UTC 2011

Hello

I'm not sure, This is useful or not.

Have you ever checked ``ping some_where'' on domU when cluster is broken??
( I thought you are using Xen, because you are using 2.6.18-194.3.1.el5xen. )
If it does not respond anything, you should check iptables.
(ex, disable iptables)

--
Hiroyuki Sato

2011/5/31 Srija <swap_project at yahoo.com>:
> Thanks for your quick reply.
>
> I talked to the network people , but they are saying everything is good at their end. Is there anyway at the server end, to figure it  for the switch restart or multicast traffic?
>
> I think you have already checked the cluster.conf file.. Except quorum disk, do you think that the cluster configuration is sufficient for handling the sixteen node cluster!!
>
> thanks again .
> regards
>
> --- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev at varna.net> wrote:
>
>> From: Kaloyan Kovachev <kkovachev at varna.net>
>> Subject: Re: [Linux-cluster] Cluster environment issue
>> To: "linux clustering" <linux-cluster at redhat.com>
>> Date: Monday, May 30, 2011, 4:05 PM
>> Hi,
>>  when your cluster gets broken, most likely the reason is,
>> there is a
>> network problem (switch restart or multicast traffic is
>> lost for a while)
>> on the interface where serverX-priv IPs are configured.
>> Having a quorum
>> disk may help by giving a quorum vote to one of the
>> servers, so it can
>> fence the others, but the best thing to do is to fix your
>> network and
>> preferably add a redundant link for the cluster
>> communication to avoid
>> breakage in the first place
>>
>> On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project at yahoo.com>
>> wrote:
>> > Hi,
>> >
>> > I am very new to the redhat cluster. Need some help
>> and suggession for
>> the
>> > cluster configuration.
>> > We have sixteen node cluster of
>> >
>> >             OS
>> : Linux Server release 5.5 (Tikanga)
>> >
>>    kernel :  2.6.18-194.3.1.el5xen.
>> >
>> > The problem is sometimes the cluster is getting
>> broken. The solution is
>> > (still yet)to reboot the
>> > sixteen nodes. Otherwise the nodes are not joining
>> >
>> > We are using  clvm and not using any quorum disk.
>> The quorum is by
>> default.
>> >
>> > When it is getting broken, clustat commands
>> shows  evrything  offline
>> > except the node from where
>> > the clustat command executed.  If we execute vgs,
>> lvs command, those
>> > commands are getting hung.
>> >
>> > Here is at present the clustat report
>> > -------------------------------------
>> >
>> > [server1]# clustat
>> > Cluster Status for newcluster @ Mon May 30 14:55:10
>> 2011
>> > Member Status: Quorate
>> >
>> >  Member Name
>>
>> ID   Status
>> >  ------ ----
>>             ---- ------
>> >  server1
>>               1 Online
>> >  server2
>>               2 Online,
>> Local
>> >  server3
>>               3 Online
>> >  server4
>>               4 Online
>> >  server5
>>               5 Online
>> >  server6
>>               6 Online
>> >  server7
>>               7 Online
>> >  server8
>>               8 Online
>> >  server9
>>               9 Online
>> >  server10
>>
>>    10 Online
>> >  server11
>>
>>    11 Online
>> >  server12
>>
>>    12 Online
>> >  server13
>>
>>    13 Online
>> >  server14
>>
>>    14 Online
>> >  server15
>>
>>    15 Online
>> >  server16
>>
>>    16 Online
>> >
>> > Here the cman_tool status  output  from one
>> server
>> > --------------------------------------------------
>> >
>> > [server1 ~]# cman_tool status
>> > Version: 6.2.0
>> > Config Version: 23
>> > Cluster Name: newcluster
>> > Cluster Id: 53322
>> > Cluster Member: Yes
>> > Cluster Generation: 11432
>> > Membership state: Cluster-Member
>> > Nodes: 16
>> > Expected votes: 16
>> > Total votes: 16
>> > Quorum: 9
>> > Active subsystems: 8
>> > Flags: Dirty
>> > Ports Bound: 0 11
>> > Node name: server1
>> > Node ID: 1
>> > Multicast addresses: xxx.xxx.xxx.xx
>> > Node addresses: 192.168.xxx.xx
>> >
>> >
>> > Here is the cluster.conf file.
>> > ------------------------------
>> >
>> > <?xml version="1.0"?>
>> > <cluster alias="newcluster" config_version="23"
>> name="newcluster">
>> > <fence_daemon clean_start="1" post_fail_delay="0"
>> post_join_delay="15"/>
>> >
>> > <clusternodes>
>> >
>> > <clusternode name="server1-priv" nodeid="1"
>> votes="1">
>> >
>>   <fence><method name="1">
>> >
>>   <device name="ilo-server1r"/></method>
>> >
>>   </fence>
>> > </clusternode>
>> >
>> > <clusternode name="server2-priv" nodeid="3"
>> votes="1">
>> >
>>    <fence><method name="1">
>> >         <device
>> name="ilo-server2r"/></method>
>> >         </fence>
>> > </clusternode>
>> >
>> > <clusternode name="server3-priv" nodeid="2"
>> votes="1">
>> >
>>    <fence><method name="1">
>> >         <device
>> name="ilo-server3r"/></method>
>> >         </fence>
>> > </clusternode>
>> >
>> > [ ... sinp .....]
>> >
>> > <clusternode name="server16-priv" nodeid="16"
>> votes="1">
>> >        <fence><method
>> name="1">
>> >        <device
>> name="ilo-server16r"/></method>
>> >        </fence>
>> > </clusternode>
>> >
>> > </clusternodes>
>> > <cman/>
>> >
>> > <dlm plock_ownership="1" plock_rate_limit="0"/>
>> > <gfs_controld plock_rate_limit="0"/>
>> >
>> > <fencedevices>
>> >         <fencedevice
>> agent="fence_ilo" hostname="server1r" login="Admin"
>> >
>>    name="ilo-server1r" passwd="xxxxx"/>
>> >         ..........
>> >         <fencedevice
>> agent="fence_ilo" hostname="server16r"
>> login="Admin"
>> >
>>    name="ilo-server16r" passwd="xxxxx"/>
>> > </fencedevices>
>> > <rm>
>> > <failoverdomains/>
>> > <resources/>
>> > </rm></cluster>
>> >
>> > Here is the lvm.conf file
>> > --------------------------
>> >
>> > devices {
>> >
>> >     dir = "/dev"
>> >     scan = [ "/dev" ]
>> >     preferred_names = [ ]
>> >     filter = [
>> "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ]
>> >     cache_dir = "/etc/lvm/cache"
>> >     cache_file_prefix = ""
>> >     write_cache_state = 1
>> >     sysfs_scan = 1
>> >     md_component_detection = 1
>> >     md_chunk_alignment = 1
>> >     data_alignment_detection = 1
>> >     data_alignment = 0
>> >
>>    data_alignment_offset_detection = 1
>> >     ignore_suspended_devices = 0
>> > }
>> >
>> > log {
>> >
>> >     verbose = 0
>> >     syslog = 1
>> >     overwrite = 0
>> >     level = 0
>> >     indent = 1
>> >     command_names = 0
>> >     prefix = "  "
>> > }
>> >
>> > backup {
>> >
>> >     backup = 1
>> >     backup_dir =
>> "/etc/lvm/backup"
>> >     archive = 1
>> >     archive_dir =
>> "/etc/lvm/archive"
>> >     retain_min = 10
>> >     retain_days = 30
>> > }
>> >
>> > shell {
>> >
>> >     history_size = 100
>> > }
>> > global {
>> >     library_dir = "/usr/lib64"
>> >     umask = 077
>> >     test = 0
>> >     units = "h"
>> >     si_unit_consistency = 0
>> >     activation = 1
>> >     proc = "/proc"
>> >     locking_type = 3
>> >     wait_for_locks = 1
>> >     fallback_to_clustered_locking
>> = 1
>> >     fallback_to_local_locking = 1
>> >     locking_dir = "/var/lock/lvm"
>> >     prioritise_write_locks = 1
>> > }
>> >
>> > activation {
>> >     udev_sync = 1
>> >     missing_stripe_filler =
>> "error"
>> >     reserved_stack = 256
>> >     reserved_memory = 8192
>> >     process_priority = -18
>> >     mirror_region_size = 512
>> >     readahead = "auto"
>> >     mirror_log_fault_policy =
>> "allocate"
>> >     mirror_image_fault_policy =
>> "remove"
>> > }
>> > dmeventd {
>> >
>> >     mirror_library =
>> "libdevmapper-event-lvm2mirror.so"
>> >     snapshot_library =
>> "libdevmapper-event-lvm2snapshot.so"
>> > }
>> >
>> >
>> > If you need more  information,  I can
>> provide ...
>> >
>> > Thanks for your help
>> > Priya
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>