[Linux-cluster] Cluster environment issue

Tue May 31 01:22:00 UTC 2011

Thanks for your quick reply.

I talked to the network people , but they are saying everything is good at their end. Is there anyway at the server end, to figure it  for the switch restart or multicast traffic? 

I think you have already checked the cluster.conf file.. Except quorum disk, do you think that the cluster configuration is sufficient for handling the sixteen node cluster!! 

thanks again .
regards

--- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev at varna.net> wrote:

> From: Kaloyan Kovachev <kkovachev at varna.net>
> Subject: Re: [Linux-cluster] Cluster environment issue
> To: "linux clustering" <linux-cluster at redhat.com>
> Date: Monday, May 30, 2011, 4:05 PM
> Hi,
>  when your cluster gets broken, most likely the reason is,
> there is a
> network problem (switch restart or multicast traffic is
> lost for a while)
> on the interface where serverX-priv IPs are configured.
> Having a quorum
> disk may help by giving a quorum vote to one of the
> servers, so it can
> fence the others, but the best thing to do is to fix your
> network and
> preferably add a redundant link for the cluster
> communication to avoid
> breakage in the first place
> 
> On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project at yahoo.com>
> wrote:
> > Hi,
> > 
> > I am very new to the redhat cluster. Need some help
> and suggession for
> the
> > cluster configuration.
> > We have sixteen node cluster of 
> > 
> >             OS
> : Linux Server release 5.5 (Tikanga)
> >         
>    kernel :  2.6.18-194.3.1.el5xen.
> > 
> > The problem is sometimes the cluster is getting 
> broken. The solution is
> > (still yet)to reboot the 
> > sixteen nodes. Otherwise the nodes are not joining
> > 
> > We are using  clvm and not using any quorum disk.
> The quorum is by
> default.
> > 
> > When it is getting broken, clustat commands
> shows  evrything  offline
> > except the node from where
> > the clustat command executed.  If we execute vgs,
> lvs command, those
> > commands are getting hung.
> > 
> > Here is at present the clustat report
> > -------------------------------------
> > 
> > [server1]# clustat
> > Cluster Status for newcluster @ Mon May 30 14:55:10
> 2011
> > Member Status: Quorate
> > 
> >  Member Name         
>            
> ID   Status
> >  ------ ----         
>             ---- ------
> >  server1           
>               1 Online
> >  server2           
>               2 Online,
> Local
> >  server3           
>               3 Online
> >  server4           
>               4 Online
> >  server5           
>               5 Online
> >  server6           
>               6 Online
> >  server7           
>               7 Online
> >  server8           
>               8 Online
> >  server9           
>               9 Online
> >  server10         
>            
>    10 Online
> >  server11         
>            
>    11 Online
> >  server12         
>            
>    12 Online
> >  server13         
>            
>    13 Online
> >  server14         
>            
>    14 Online
> >  server15         
>            
>    15 Online
> >  server16         
>            
>    16 Online
> > 
> > Here the cman_tool status  output  from one
> server
> > --------------------------------------------------
> > 
> > [server1 ~]# cman_tool status
> > Version: 6.2.0
> > Config Version: 23
> > Cluster Name: newcluster
> > Cluster Id: 53322
> > Cluster Member: Yes
> > Cluster Generation: 11432
> > Membership state: Cluster-Member
> > Nodes: 16
> > Expected votes: 16
> > Total votes: 16
> > Quorum: 9  
> > Active subsystems: 8
> > Flags: Dirty 
> > Ports Bound: 0 11  
> > Node name: server1
> > Node ID: 1
> > Multicast addresses: xxx.xxx.xxx.xx 
> > Node addresses: 192.168.xxx.xx 
> > 
> > 
> > Here is the cluster.conf file.
> > ------------------------------
> > 
> > <?xml version="1.0"?>
> > <cluster alias="newcluster" config_version="23"
> name="newcluster">
> > <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="15"/>
> > 
> > <clusternodes>
> > 
> > <clusternode name="server1-priv" nodeid="1"
> votes="1">
> >               
>   <fence><method name="1">
> >               
>   <device name="ilo-server1r"/></method>
> >               
>   </fence>
> > </clusternode>
> > 
> > <clusternode name="server2-priv" nodeid="3"
> votes="1">
> >     
>    <fence><method name="1">
> >         <device
> name="ilo-server2r"/></method>
> >         </fence>
> > </clusternode>
> > 
> > <clusternode name="server3-priv" nodeid="2"
> votes="1">
> >     
>    <fence><method name="1">
> >         <device
> name="ilo-server3r"/></method>
> >         </fence>
> > </clusternode>
> > 
> > [ ... sinp .....]
> > 
> > <clusternode name="server16-priv" nodeid="16"
> votes="1">
> >        <fence><method
> name="1">
> >        <device
> name="ilo-server16r"/></method>
> >        </fence>
> > </clusternode>
> > 
> > </clusternodes>
> > <cman/>
> > 
> > <dlm plock_ownership="1" plock_rate_limit="0"/>
> > <gfs_controld plock_rate_limit="0"/>
> > 
> > <fencedevices>
> >         <fencedevice
> agent="fence_ilo" hostname="server1r" login="Admin"
> >     
>    name="ilo-server1r" passwd="xxxxx"/>
> >         ..........
> >         <fencedevice
> agent="fence_ilo" hostname="server16r"
> login="Admin"
> >     
>    name="ilo-server16r" passwd="xxxxx"/>
> > </fencedevices>
> > <rm>
> > <failoverdomains/>
> > <resources/>
> > </rm></cluster>
> > 
> > Here is the lvm.conf file
> > --------------------------
> > 
> > devices {
> > 
> >     dir = "/dev"
> >     scan = [ "/dev" ]
> >     preferred_names = [ ]
> >     filter = [
> "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ]
> >     cache_dir = "/etc/lvm/cache"
> >     cache_file_prefix = ""
> >     write_cache_state = 1
> >     sysfs_scan = 1
> >     md_component_detection = 1
> >     md_chunk_alignment = 1
> >     data_alignment_detection = 1
> >     data_alignment = 0
> > 
>    data_alignment_offset_detection = 1
> >     ignore_suspended_devices = 0
> > }
> > 
> > log {
> > 
> >     verbose = 0
> >     syslog = 1
> >     overwrite = 0
> >     level = 0
> >     indent = 1
> >     command_names = 0
> >     prefix = "  "
> > }
> > 
> > backup {
> > 
> >     backup = 1
> >     backup_dir =
> "/etc/lvm/backup"
> >     archive = 1
> >     archive_dir =
> "/etc/lvm/archive"
> >     retain_min = 10
> >     retain_days = 30
> > }
> > 
> > shell {
> > 
> >     history_size = 100
> > }
> > global {
> >     library_dir = "/usr/lib64"
> >     umask = 077
> >     test = 0
> >     units = "h"
> >     si_unit_consistency = 0
> >     activation = 1
> >     proc = "/proc"
> >     locking_type = 3
> >     wait_for_locks = 1
> >     fallback_to_clustered_locking
> = 1
> >     fallback_to_local_locking = 1
> >     locking_dir = "/var/lock/lvm"
> >     prioritise_write_locks = 1
> > }
> > 
> > activation {
> >     udev_sync = 1
> >     missing_stripe_filler =
> "error"
> >     reserved_stack = 256
> >     reserved_memory = 8192
> >     process_priority = -18
> >     mirror_region_size = 512
> >     readahead = "auto"
> >     mirror_log_fault_policy =
> "allocate"
> >     mirror_image_fault_policy =
> "remove"
> > }
> > dmeventd {
> > 
> >     mirror_library =
> "libdevmapper-event-lvm2mirror.so"
> >     snapshot_library =
> "libdevmapper-event-lvm2snapshot.so"
> > }
> > 
> > 
> > If you need more  information,  I can
> provide ...
> > 
> > Thanks for your help
> > Priya
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>