[Linux-cluster] Cluster environment issue
Hiroyuki Sato
hiroysato at gmail.com
Tue May 31 03:03:58 UTC 2011
Hello
I'm not sure, This is useful or not.
Have you ever checked ``ping some_where'' on domU when cluster is broken??
( I thought you are using Xen, because you are using 2.6.18-194.3.1.el5xen. )
If it does not respond anything, you should check iptables.
(ex, disable iptables)
--
Hiroyuki Sato
2011/5/31 Srija <swap_project at yahoo.com>:
> Thanks for your quick reply.
>
> I talked to the network people , but they are saying everything is good at their end. Is there anyway at the server end, to figure it for the switch restart or multicast traffic?
>
> I think you have already checked the cluster.conf file.. Except quorum disk, do you think that the cluster configuration is sufficient for handling the sixteen node cluster!!
>
> thanks again .
> regards
>
> --- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev at varna.net> wrote:
>
>> From: Kaloyan Kovachev <kkovachev at varna.net>
>> Subject: Re: [Linux-cluster] Cluster environment issue
>> To: "linux clustering" <linux-cluster at redhat.com>
>> Date: Monday, May 30, 2011, 4:05 PM
>> Hi,
>> when your cluster gets broken, most likely the reason is,
>> there is a
>> network problem (switch restart or multicast traffic is
>> lost for a while)
>> on the interface where serverX-priv IPs are configured.
>> Having a quorum
>> disk may help by giving a quorum vote to one of the
>> servers, so it can
>> fence the others, but the best thing to do is to fix your
>> network and
>> preferably add a redundant link for the cluster
>> communication to avoid
>> breakage in the first place
>>
>> On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project at yahoo.com>
>> wrote:
>> > Hi,
>> >
>> > I am very new to the redhat cluster. Need some help
>> and suggession for
>> the
>> > cluster configuration.
>> > We have sixteen node cluster of
>> >
>> > OS
>> : Linux Server release 5.5 (Tikanga)
>> >
>> kernel : 2.6.18-194.3.1.el5xen.
>> >
>> > The problem is sometimes the cluster is getting
>> broken. The solution is
>> > (still yet)to reboot the
>> > sixteen nodes. Otherwise the nodes are not joining
>> >
>> > We are using clvm and not using any quorum disk.
>> The quorum is by
>> default.
>> >
>> > When it is getting broken, clustat commands
>> shows evrything offline
>> > except the node from where
>> > the clustat command executed. If we execute vgs,
>> lvs command, those
>> > commands are getting hung.
>> >
>> > Here is at present the clustat report
>> > -------------------------------------
>> >
>> > [server1]# clustat
>> > Cluster Status for newcluster @ Mon May 30 14:55:10
>> 2011
>> > Member Status: Quorate
>> >
>> > Member Name
>>
>> ID Status
>> > ------ ----
>> ---- ------
>> > server1
>> 1 Online
>> > server2
>> 2 Online,
>> Local
>> > server3
>> 3 Online
>> > server4
>> 4 Online
>> > server5
>> 5 Online
>> > server6
>> 6 Online
>> > server7
>> 7 Online
>> > server8
>> 8 Online
>> > server9
>> 9 Online
>> > server10
>>
>> 10 Online
>> > server11
>>
>> 11 Online
>> > server12
>>
>> 12 Online
>> > server13
>>
>> 13 Online
>> > server14
>>
>> 14 Online
>> > server15
>>
>> 15 Online
>> > server16
>>
>> 16 Online
>> >
>> > Here the cman_tool status output from one
>> server
>> > --------------------------------------------------
>> >
>> > [server1 ~]# cman_tool status
>> > Version: 6.2.0
>> > Config Version: 23
>> > Cluster Name: newcluster
>> > Cluster Id: 53322
>> > Cluster Member: Yes
>> > Cluster Generation: 11432
>> > Membership state: Cluster-Member
>> > Nodes: 16
>> > Expected votes: 16
>> > Total votes: 16
>> > Quorum: 9
>> > Active subsystems: 8
>> > Flags: Dirty
>> > Ports Bound: 0 11
>> > Node name: server1
>> > Node ID: 1
>> > Multicast addresses: xxx.xxx.xxx.xx
>> > Node addresses: 192.168.xxx.xx
>> >
>> >
>> > Here is the cluster.conf file.
>> > ------------------------------
>> >
>> > <?xml version="1.0"?>
>> > <cluster alias="newcluster" config_version="23"
>> name="newcluster">
>> > <fence_daemon clean_start="1" post_fail_delay="0"
>> post_join_delay="15"/>
>> >
>> > <clusternodes>
>> >
>> > <clusternode name="server1-priv" nodeid="1"
>> votes="1">
>> >
>> <fence><method name="1">
>> >
>> <device name="ilo-server1r"/></method>
>> >
>> </fence>
>> > </clusternode>
>> >
>> > <clusternode name="server2-priv" nodeid="3"
>> votes="1">
>> >
>> <fence><method name="1">
>> > <device
>> name="ilo-server2r"/></method>
>> > </fence>
>> > </clusternode>
>> >
>> > <clusternode name="server3-priv" nodeid="2"
>> votes="1">
>> >
>> <fence><method name="1">
>> > <device
>> name="ilo-server3r"/></method>
>> > </fence>
>> > </clusternode>
>> >
>> > [ ... sinp .....]
>> >
>> > <clusternode name="server16-priv" nodeid="16"
>> votes="1">
>> > <fence><method
>> name="1">
>> > <device
>> name="ilo-server16r"/></method>
>> > </fence>
>> > </clusternode>
>> >
>> > </clusternodes>
>> > <cman/>
>> >
>> > <dlm plock_ownership="1" plock_rate_limit="0"/>
>> > <gfs_controld plock_rate_limit="0"/>
>> >
>> > <fencedevices>
>> > <fencedevice
>> agent="fence_ilo" hostname="server1r" login="Admin"
>> >
>> name="ilo-server1r" passwd="xxxxx"/>
>> > ..........
>> > <fencedevice
>> agent="fence_ilo" hostname="server16r"
>> login="Admin"
>> >
>> name="ilo-server16r" passwd="xxxxx"/>
>> > </fencedevices>
>> > <rm>
>> > <failoverdomains/>
>> > <resources/>
>> > </rm></cluster>
>> >
>> > Here is the lvm.conf file
>> > --------------------------
>> >
>> > devices {
>> >
>> > dir = "/dev"
>> > scan = [ "/dev" ]
>> > preferred_names = [ ]
>> > filter = [
>> "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ]
>> > cache_dir = "/etc/lvm/cache"
>> > cache_file_prefix = ""
>> > write_cache_state = 1
>> > sysfs_scan = 1
>> > md_component_detection = 1
>> > md_chunk_alignment = 1
>> > data_alignment_detection = 1
>> > data_alignment = 0
>> >
>> data_alignment_offset_detection = 1
>> > ignore_suspended_devices = 0
>> > }
>> >
>> > log {
>> >
>> > verbose = 0
>> > syslog = 1
>> > overwrite = 0
>> > level = 0
>> > indent = 1
>> > command_names = 0
>> > prefix = " "
>> > }
>> >
>> > backup {
>> >
>> > backup = 1
>> > backup_dir =
>> "/etc/lvm/backup"
>> > archive = 1
>> > archive_dir =
>> "/etc/lvm/archive"
>> > retain_min = 10
>> > retain_days = 30
>> > }
>> >
>> > shell {
>> >
>> > history_size = 100
>> > }
>> > global {
>> > library_dir = "/usr/lib64"
>> > umask = 077
>> > test = 0
>> > units = "h"
>> > si_unit_consistency = 0
>> > activation = 1
>> > proc = "/proc"
>> > locking_type = 3
>> > wait_for_locks = 1
>> > fallback_to_clustered_locking
>> = 1
>> > fallback_to_local_locking = 1
>> > locking_dir = "/var/lock/lvm"
>> > prioritise_write_locks = 1
>> > }
>> >
>> > activation {
>> > udev_sync = 1
>> > missing_stripe_filler =
>> "error"
>> > reserved_stack = 256
>> > reserved_memory = 8192
>> > process_priority = -18
>> > mirror_region_size = 512
>> > readahead = "auto"
>> > mirror_log_fault_policy =
>> "allocate"
>> > mirror_image_fault_policy =
>> "remove"
>> > }
>> > dmeventd {
>> >
>> > mirror_library =
>> "libdevmapper-event-lvm2mirror.so"
>> > snapshot_library =
>> "libdevmapper-event-lvm2snapshot.so"
>> > }
>> >
>> >
>> > If you need more information, I can
>> provide ...
>> >
>> > Thanks for your help
>> > Priya
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list