[Linux-cluster] Re: Linux-cluster Digest, Vol 64, Issue 12

Thu Aug 13 03:50:55 UTC 2009

Hi all

Just an update on the CentOS 5.3 and lucci.

When installing the network and enabling IPV6, the default , and well  
known, 127.0.0.1 localhost.localdomain  localhost is replaced with a new
1:: localhost.localdomain localhost.

Luci lives on 127.0.0.1, so if that is not in /etc/hosts, luci will  
not start!

I think the biggest frustration with this was the problem that no  
proper logfile entry is written.

Tks
Andre

# Do not remove the following line, or various programs
# that require network functionality will fail.
::1	localhost.localdomain	localhost	apollo
127.0.0.1	localhost.localdomain	localhost

On Aug 11, 2009, at 6:00 PM, linux-cluster-request at redhat.com wrote:

> Send Linux-cluster mailing list submissions to
> 	linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
> 	linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
> 	linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>   1. Succesfull installation on centos 5.3 with live kvm	migration
>      (Robert Verspuy)
>   2. Centos 5.3 X64 a& luci (akapp)
>   3. Re: Linux-cluster Digest, Vol 64, Issue 10 (Bob Peterson)
>   4. RHEL 4.7 fenced fails -- stuck join state: S-2,2,1 (Robert Hurst)
>   5. Re: do I have a fence DRAC device? (bergman at merctech.com)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 11 Aug 2009 14:59:13 +0200
> From: Robert Verspuy <robert at exa-omicron.nl>
> Subject: [Linux-cluster] Succesfull installation on centos 5.3 with
> 	live kvm	migration
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <4A816B21.2040907 at exa-omicron.nl>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Getting cluster software including kvm virtual machines with live
> migration working,
> can be a very difficult task, with many obstacles.
>
> But I would like to mention to the mailing list, that I just booked  
> some
> succes.
> And because nobody is around to tell the wonderfull news,
> I would like to share my hapiness here ;)
>
> My setup:
> 2 NAS servers and 1 Supermicro bladeserver with 5 blades.
>
> The NAS servers are running Openfiler 2.3
> both NAS servers have:
> 1 Transcend IDE 4Gbyte flashcard (on the ide port on the mainboard).
> 3 x Transcend 4Gbyte usb sticks
> 8 SATA disks.
>
> The IDE flashcard is setup in a raid-1 mirroring (md0) with one USB
> stick providing the root FS voor openfiler
> The other 2 USB sticks have 5 partitions: 4 x 500 MB and 1 x 2GB.
> those are mirrored with raid-1 together. (md5 until md8 are the 500mb
> partions, and md9 is the 2Gb partition).
> Then the 8 harddisks are also tied together per 2 as mirroring raid1
> (md1 until md4).
>
> Then I used DRBD (8.2.7) to mirror the 4 raid-1's of the disks (md1
> until md4) and the 2GB mirror (md9)
> over the network to the other NAS server. (drbd1, drbd2, drbd3 and  
> drbd4)
> The 500mb raid-1's are used to store metadata of the 4 disk raid-1's.
> The 2gb drbd (drbd0) has internal metadata.
>
> The 2gb drbd (drbd0) is mounted as ext3 on one only server and is used
> to store all kinds
> of openfiler information that is needed on both nas servers,
> like the openfiler config (mostly), dhcp leases database, openldap  
> database.
> And heartbeat makes sure that one NAS server is running all the
> software, and with any problems,
> it can switch over very easily.
>
> The drbd1 til 4 are setup as a LVM PV, and bound together in one big  
> VG.
> From that VG, I created a 5 x 5GB LV to be used as root device for
> blade1 til blade5
> These LV's are stripped accross 2 PV for speed (altough that's still  
> my
> only bottleneck at the moment, but more later about this...).
> These LV's are setup as iSCSI
>
> I also created one big LV of around 600GB, which can be mounted  
> through NFS.
>
> Then a few more LV's are created (around 10GB, also iscsi) for every  
> VM
> I want.
> For every iSCSI LV I create a separate target.
>
> The Supermicro blades can boot from an iscsi device.
> The exact scsi device is given through a DHCP option.
> I only setup a initiatior name in the iscsi bios of the blade.
>
> On the blade LV's I installed CentOS 5.3 (latest updates).
> But with a few modifications.
>
> I changes a few things in the initrd, to bound eth0 to br0 during the
> linux boot,
> and before linux is taking over the iscsi from the bios, because
> when you have a linux root through iscsi, and try to attach eth0 to  
> br0,
> you loose networkconnectivity for a moment, and could crash the linux,
> because everything it uses, comes from the network (iscsi root).
> I also added a little script to the initrd to call iscsiadm with a  
> fixed
> iscsi
> target, because unfortunately iscsiadm can't read the iscsi settings
> from dhcp
> or the supermicro firmware.
>
> When the blades are booted, they all join one redhat cluster with 3
> nodes to be quorum.
> Because I have 5 blades, two can fail before everything stops working.
>
> Then I compiled the following software my own, because the ones in the
> centos repo,
> and the testing repo didn't function correctly:
> libvirt 0.7.0 (./configure --prefix=/usr)
> kvm-88 (./configure --prefix=/usr --disable-xen)
>
> The /usr/share/cluster/vm.sh from the default centos repo is still  
> based
> on xen.
> I downloaded the latest from
> https://bugzilla.redhat.com/show_bug.cgi?id=412911
> but it appears that that one is not working correctly either.
> I made some changes myself.
>
> And now it's working all together very nicely
>
> I just ran a VM on blade1, and while this VM was running bonnie++ on a
> NFS mount to the NAS server,
> I live-migrated it about 10 times to blade2 and back.
>
> During this bonnie++ run and live migrations, I pinged the device.
> And where the normal ping times are around 20-35 ms (I pinged  
> through a
> VPN line from my home to the data center).
> I only saw one or 2 pings just around the end of the live migration  
> that
> were around 40-60ms.
> but no drops, and no errors in bonnie++.
>
> I will write some more information about the complete setup, and  
> post it
> somewhere on my blog or someting,
> But I just wanted to let everybody know, that it can be done ;)
>
> If you have any questions, let me know.
>
> The only 'problem' I still have is the speed to and from the disks.
> When I update any settings on the bladeserver. I always do this on  
> blade1.
> Then shut it down, On the NAS server I copy the content of the iscsi  
> LV
> to an image file on the ext3 LV.
> Then I can power up blade1, wait until it reenters the cluster,
> and then one by one shut down the next blade, On the NAS copy the  
> image
> from the ext3 LV to the blade LV.
> And start the blade again.
>
> I use the drbd1 til drbd4 as 4 PV's for a VG.
> The speed (hdparm -t on the NAS) of all PV's are around 75 MB/sec
> (except for one which is 45MB/sec)
>
> The blade LV (/dev/vg0/blade1 for example) is striped over 2 PV's.
> The Speed (hdparm -t) of /dev/vg0/blade1 is 122MB/sec.
>
> The ext3 LV (/dev/vg0/data0) is striped over 4 PV's.
> The Speed (hdparm -t) of /dev/vg0/data0 is 227 MB/sec.
>
> But when copying from the blade LV to the ext3 LV:
> dd if=/dev/vg0/blade1 of=/mnt/vg0/data0/vm/image/blade_v2.7.img
> it takes about 70 seconds, which is about 75MB/sec.
>
> but when copying back:
> dd if=/mnt/vg0/data0/vm/image/blade_v2.7.img of=/dev/vg0/blade1
> It takes about 390 seconds, which is about 13MB/sec
>
> I think it has something to do with the striped over 4 PV's of the  
> LVM.
> So I will try to create a new ext3 LV stiped accross 2 PV's and see if
> this is faster.
>
> Robert Verspuy
>
> -- 
> *Exa-Omicron*
> Patroonsweg 10
> 3892 DB Zeewolde
> Tel.: 088-OMICRON (66 427 66)
> http://www.exa-omicron.nl
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 11 Aug 2009 14:58:38 +0200
> From: akapp <akapp at fnds3000.com>
> Subject: [Linux-cluster] Centos 5.3 X64 a& luci
> To: linux-cluster at redhat.com
> Message-ID: <34CB6BB9-6856-4E52-B946-5465F90B081C at fnds3000.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Good day
>
> I have a Sun x4100 server running Centos 5.3 X64 - patch to latest and
> greatest.
>
> When trying to start luci, it simply fails, no error in /var/log  and
> nothing in /var/lib/luci/log
>
> I have re-installed luci and ricci a couple of times now. Cleaned  
> out /
> var/lib/luci & /rici between installations.
>
> I have even tried the complete yum grouremove "Clustering" "Cluster
> Storage" and re-installed the complete package again.
>
> Used ricci/luci combination with great success in 5.2, but both
> servers giving the same problem.
>
> Any pointers will be appreciated.
>
>
> Here is screen snipped of problem:
>
>
>
> Installed: luci.x86_64 0:0.12.1-7.3.el5.centos.1
> Complete!
> [root at clu1 luci]# luci_admin init
> Initializing the luci server
>
>
> Creating the 'admin' user
>
> Enter password:
> Confirm password:
>
> Please wait...
> The admin password has been successfully set.
> Generating SSL certificates...
> The luci server has been successfully initialized
>
>
> You must restart the luci server for changes to take effect.
>
> Run "service luci restart" to do so
>
> [root at clu1 luci]# service luci restart
> Shutting down luci:                                        [  OK  ]
> Starting luci: Generating https SSL certificates...  done
>                                                            [FAILED]
> [root at clu1 luci]#
>
>
>
>
>
>
> Tks
> Andre
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 11 Aug 2009 09:24:14 -0400 (EDT)
> From: Bob Peterson <rpeterso at redhat.com>
> Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 64, Issue 10
> To: Wendell Dingus <wendell at bisonline.com>
> Cc: linux-cluster at redhat.com
> Message-ID:
> 	<1891903567.443711249997054068.JavaMail.root at zmail06.collab.prod.int.phx2.redhat.com 
> >
> 	
> Content-Type: text/plain; charset=utf-8
>
> ----- "Wendell Dingus" <wendell at bisonline.com> wrote:
> | Well, here's the entire list of blocks it ignored and the entire
> | message section.
> | Perhaps I'm just overlooking it but I'm not seeing anything in the
> | messages
> | that appears to be a block number. Maybe 1633350398 but if so it is
> | not a match.
>
> Your assumption is correct.  The block number was 1633350398, which
> is labeled "bh = " for some reason.
>
> | Anyway, since you didn't specifically say a new/fixed version of  
> fsck
> | was
> | imminent and that it would likely fix this we began plan B today. We
>
> Yesterday I pushed a newer gfs_fsck and fsck.gfs2 to their appropriate
> git source repositories.  So you can build that version from source if
> you need it right away.  But it sounds like it wouldn't have helped
> your problem anyway.  What would really be nice is if there is a way
> to recreate the problem in our lab.  In theory, this error could be
> caused by a hardware problem too.
>
> | plugged
> | in another drive, placed a GFS2 filesystem on it and am actively
> | copying files
> | off to it now. Fingers crossed that nothing will hit a disk block  
> that
> | causes
> | this again but I could be so lucky probably...
>
> It's hard to say whether you'll hit it again.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 11 Aug 2009 10:55:48 -0400
> From: "Robert Hurst" <rhurst at bidmc.harvard.edu>
> Subject: [Linux-cluster] RHEL 4.7 fenced fails -- stuck join state:
> 	S-2,2,1
> To: "linux clustering" <linux-cluster at redhat.com>
> Message-ID: <1250002549.2782.36.camel at WSBID06223.bidmc.harvard.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> Simple 4-node cluster, 2-nodes have a GFS shared home directory  
> mounted
> for over a month.  Today, I wanted to mount /home on a 3rd node, so:
>
> # service fenced start                [failed]
>
> Weird.  Checking /var/log/messages show:
>
> Aug 11 10:19:06 cerberus kernel: Lock_Harness 2.6.9-80.9.el4_7.10  
> (built
> Jan 22 2009 18:39:16) installed
> Aug 11 10:19:06 cerberus kernel: GFS 2.6.9-80.9.el4_7.10 (built Jan 22
> 2009 18:39:32) installed
> Aug 11 10:19:06 cerberus kernel: GFS: Trying to join cluster  
> "lock_dlm",
> "ccc_cluster47:home"
> Aug 11 10:19:06 cerberus kernel: Lock_DLM (built Jan 22 2009 18:39:18)
> installed
> Aug 11 10:19:06 cerberus kernel: lock_dlm: fence domain not found;  
> check
> fenced
> Aug 11 10:19:06 cerberus kernel: GFS: can't mount proto = lock_dlm,
> table = ccc_cluster47:home, hostdata =
>
> # cman_tool services
> Service          Name                              GID LID State
> Code
> Fence Domain:    "default"                           0   2 join
> S-2,2,1
> []
>
> So, a fenced process is now hung:
>
> root     28302  0.0  0.0  3668  192 ?        Ss   10:19   0:00  
> fenced -t
> 120 -w
>
> Q: Any idea how to "recover" from this state, without rebooting?
>
> The other two servers are unaffected by this (thankfully) and show
> normal operations:
>
> $ cman_tool services
>
> Service          Name                              GID LID State
> Code
> Fence Domain:    "default"                           2   2 run       -
> [1 12]
>
> DLM Lock Space:  "home"                              5   5 run       -
> [1 12]
>
> GFS Mount Group: "home"                              6   6 run       -
> [1 12]
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://www.redhat.com/archives/linux-cluster/attachments/20090811/7b46b120/attachment.html
>
> ------------------------------
>
> Message: 5
> Date: Tue, 11 Aug 2009 11:39:39 -0400
> From: bergman at merctech.com
> Subject: Re: [Linux-cluster] do I have a fence DRAC device?
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <27649.1250005179 at mirchi>
> Content-Type: text/plain; charset=us-ascii
>
>
>
> In the message dated: Tue, 11 Aug 2009 14:14:03 +0200,
> The pithy ruminations from Juan Ramon Martin Blanco on
> <Re: [Linux-cluster] do I have a fence DRAC device?> were:
> => --===============1917368601==
> => Content-Type: multipart/alternative;  
> boundary=0016364c7c07663f600470dca3b8
> =>
> => --0016364c7c07663f600470dca3b8
> => Content-Type: text/plain; charset=ISO-8859-1
> => Content-Transfer-Encoding: quoted-printable
> =>
> => On Tue, Aug 11, 2009 at 2:03 PM, ESGLinux <esggrupos at gmail.com>  
> wrote:
> =>
> => > Thanks
> => > I=B4ll check it when I could reboot the server.
> => >
> => > greetings,
> => >
> => You have a BMC ipmi in the first network interface, it can be  
> configured at
> => boot time (I don't  remember if inside the BIOS or pressing cntrl 
> +something
> => during boot)
> =>
>
> Based on my notes, here's how I configured the DRAC interface on a  
> Dell 1950
> for use as a fence device:
>
> 	Configuring the card from Linux depending on the installation of  
> Dell's
> 	OMSA package. Once that's installed, use the following
> commands:
>
> 		racadm config -g cfgSerial -o cfgSerialTelnetEnable 1
> 		racadm config -g cfgLanNetworking -o cfgDNSRacName  
> HOSTNAME_FOR_INTERFACE
> 		racadm config -g cfgDNSDomainName DOMAINNAME_FOR_INTERFACE
> 		racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 2 PASSWORD
> 		racadm config -g cfgNicEnable 1
> 		racadm config -g cfgNicIpAddress WWW.XXX.YYY.ZZZ
> 		racadm config -g cfgNicNetmask WWW.XXX.YYY.ZZZ
> 		racadm config -g cfgNicGateway WWW.XXX.YYY.ZZZ
> 		racadm config -g cfgNicUseDhcp 0
>
>
> 	I also save a backup of the configuration with:
>
> 		racadm getconfig -f ~/drac_config
>
>
> Hope this helps,
>
> Mark
>
> ----
> Mark Bergman                              voice: 215-662-7310
> mark.bergman at uphs.upenn.edu                 fax: 215-614-0266
> System Administrator     Section of Biomedical Image Analysis
> Department of Radiology            University of Pennsylvania
>      PGP Key: https://www.rad.upenn.edu/sbia/bergman
>
>
> => Greetings,
> => Juanra
> =>
> => >
> => > ESG
> => >
> => > 2009/8/10 Paras pradhan <pradhanparas at gmail.com>
> => >
> => > On Mon, Aug 10, 2009 at 5:24 AM, ESGLinux<esggrupos at gmail.com>  
> wrote:
> => >> > Hi all,
> => >> > I was designing a 2 node cluster and I was going to use 2  
> servers DELL
> => >> > PowerEdge 1950. I was going to buy a DRAC card to use for  
> fencing but
> => >> > running several commands in the servers I have noticed that  
> when I run
> => >> this
> => >> > command:
> => >> > #ipmitool lan print
> => >> > Set in Progress : Set Complete
> => >> > Auth Type Support : NONE MD2 MD5 PASSWORD
> => >> > Auth Type Enable : Callback : MD2 MD5
> => >> >                         : User : MD2 MD5
> => >> >                         : Operator : MD2 MD5
> => >> >                         : Admin : MD2 MD5
> => >> >                         : OEM : MD2 MD5
> => >> > IP Address Source : Static Address
> => >> > IP Address : 0.0.0.0
> => >> > Subnet Mask : 0.0.0.0
> => >> > MAC Address : 00:1e:c9:ae:6f:7e
> => >> > SNMP Community String : public
> => >> > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
> => >> > Default Gateway IP : 0.0.0.0
> => >> > Default Gateway MAC : 00:00:00:00:00:00
> => >> > Backup Gateway IP : 0.0.0.0
> => >> > Backup Gateway MAC : 00:00:00:00:00:00
> => >> > 802.1q VLAN ID : Disabled
> => >> > 802.1q VLAN Priority : 0
> => >> > RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
> => >> > Cipher Suite Priv Max : aaaaaaaaaaaaaaa
> => >> >                         : X=Cipher Suite Unused
> => >> >                         : c=CALLBACK
> => >> >                         : u=USER
> => >> >                         : o=OPERATOR
> => >> >                         : a=ADMIN
> => >> >                         : O=OEM
> => >> > does this mean that I already have an ipmi card (not  
> configured) that
> => I
> => >> can
> => >> > use for fencing? if the anwser is yes, where hell must I  
> configure it?
> =>  I
> => >> > don=B4t see wher can I do it.
> => >> > If I haven=B4t a fencing device which one do you recommed to  
> use?
> => >> > Thanks in advance
> => >> > ESG
> => >> >
> => >> > --
> => >> > Linux-cluster mailing list
> => >> > Linux-cluster at redhat.com
> => >> > https://www.redhat.com/mailman/listinfo/linux-cluster
> => >> >
> => >>
> => >> Yes you have IPMI and if you are using 1950 Dell, DRAC should  
> be there
> => >> too. You can see if you have DRAC or not when the server  
> starts and
> => >> before the loading of the OS.
> => >>
> => >> I have 1850s and I am using DRAC for fencing.
> => >>
> => >>
> => >> Paras.
> => >>
> => >> --
> => >> Linux-cluster mailing list
> => >> Linux-cluster at redhat.com
> => >> https://www.redhat.com/mailman/listinfo/linux-cluster
> => >>
> => >
> => >
>
>
>
>
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 64, Issue 12
> *********************************************
>