NTP problem for virtual RHEL 4 server on VmWare (Kenneth Holter)

Sat Nov 8 06:40:16 UTC 2008

Hi Kenneth 

This is  a know issue with rhel on vmware , U can find the knowledege base about time running slow in vmware website .

Pls find the link 

http://www.djax.co.uk/kb/linux/vmware_clock_drift.html

Rgards
Karthik

--- On Fri, 11/7/08, redhat-list-request at redhat.com <redhat-list-request at redhat.com> wrote:

> From: redhat-list-request at redhat.com <redhat-list-request at redhat.com>
> Subject: redhat-list Digest, Vol 57, Issue 7
> To: redhat-list at redhat.com
> Date: Friday, November 7, 2008, 10:30 PM
> Send redhat-list mailing list submissions to
> 	redhat-list at redhat.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://www.redhat.com/mailman/listinfo/redhat-list
> or, via email, send a message with subject or body
> 'help' to
> 	redhat-list-request at redhat.com
> 
> You can reach the person managing the list at
> 	redhat-list-owner at redhat.com
> 
> When replying, please edit your Subject line so it is more
> specific
> than "Re: Contents of redhat-list digest..."
> 
> 
> Today's Topics:
> 
>    1. RE: Cluster Heart Beat Using Cross Over Cable
>       (Karchner, Craig (IT Solutions US))
>    2. Help Slick Mach make the right choice!
> (mailanky at gmail.com)
>    3. NTP problem for virtual RHEL 4 server on VmWare
> (Kenneth Holter)
>    4. Cluster Broken pipe & node Reboot (lingu)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 6 Nov 2008 09:24:12 -0800
> From: "Karchner, Craig (IT Solutions US)"
> 	<craig.a.karchner at siemens.com>
> Subject: RE: Cluster Heart Beat Using Cross Over Cable
> To: "General Red Hat Linux discussion list"
> <redhat-list at redhat.com>
> Message-ID:
> 	<13FE6613E1ADA041A0124537010C11E903742020 at USNWK102MSX.ww017.siemens.net>
> 	
> Content-Type: text/plain;	charset="us-ascii"
> 
>  
> Lingu,
> 
> I had this same problem a few weeks back.
> 
> This is how I solved it. 
> 
> Make sure your NIC's are at 1G.
> 
> Add the following entries into your cluster.ccs file and
> write it to
> disk;
> 
> heartbeat_rate = 30
> allowed_misses = 4
> 
> My cluster.ccs file looks like this now;
> 
> cluster {
> name = "alpha"
> lock_gulm {
> servers = ["server1", "server2",
> "server3"]
> heartbeat_rate = 30
> allowed_misses = 4
> }
> }
> 
> This example procedure shows how to change configuration
> files in a CCS
> archive.
> 
> 1. Extract configuration files from the CCA device into
> temporary
> directory /root/alpha-new/.
> 
> ccs_tool extract /dev/pool/alpha_cca /root/alpha-new/
> 
> 2. Make changes to the configuration files in
> /root/alpha-new/.
> 
> 3. Create a new CCS archive on the CCA device by using the
> -O (override)
> flag to forcibly overwrite
> the existing CCS archive.
> 
> ccs_tool -O create /root/alpha-new/ /dev/pool/alpha_cca
> 
> 
> 
> What you are suggesting ( cross over cable) is not
> supported at least in
> GFS 6.0 which I assume you are running with RHEL 3.0
> 
> 
> -----Original Message-----
> From: redhat-list-bounces at redhat.com
> [mailto:redhat-list-bounces at redhat.com] On Behalf Of lingu
> Sent: Thursday, November 06, 2008 7:41 AM
> To: General Red Hat Linux discussion list
> Subject: Cluster Heart Beat Using Cross Over Cable
> 
> Hi,
> 
>  I am running two node active/passive  cluster running 
> RHEL3 update
> 8 64 bit  OS on Hp Box with external hp storage connected
> via scsi. My
> cluster was running fine for  last 3 years.But all of a
> sudden cluster
> service keep on shifting (atleast one time in a day )form
> one node to
> another.
> 
>  After analysed the syslog i found that  due to some
> network
> fluctuation service was getting shifted.Both the nodes has
> two NIC
> bonded together and configured with  below ip.
> 
> My network details:
> 
> 192.168.1.2 --node 1 physical ip  with  class c subnet
> (bond0 )
> 192.168.1.3 --node 2 physical ip  with class c subnet
> (bond0 )
> 192.168.1.4  --- floating ip  ( cluster )
> 
>  Since it is a very critical and busy server may be due to
> heavy
> network load  some hear beat signal is getting missed 
> resulting in
> shifting of service from one node to another.
> 
> So i planned to connect crossover cable for heart beat
> messages, can
> any one guide me  or provide me the link that best explains
>  how to do
> the same and the changes i have to made in cluster
> configuration file
> after connecting the crossover cable.
> 
> Regards,
> 
> Lingu
> 
> -- 
> redhat-list mailing list
> unsubscribe
> mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 7 Nov 2008 13:38:09 +0530
> From: <mailanky at gmail.com>
> Subject: Help Slick Mach make the right choice!
> To: <redhat-list at redhat.com>
> Message-ID:
> <B07101D09C1F45E0A8C6C191BEA2A154 at webchutney2>
> Content-Type: text/plain;	charset="iso-8859-1"
> 
> Hey, 
> 
> ankur has signed you up for a perfect shave!
> Simply help Slick Mach make the right choice & you
> could win a free
> Gillette Mach 3 razor.
> Click here to take the challenge.
> <http://www.slickmach.com/index.html> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 7 Nov 2008 10:49:04 +0100
> From: "Kenneth Holter"
> <kenneho.ndu at gmail.com>
> Subject: NTP problem for virtual RHEL 4 server on VmWare
> To: redhat-list at redhat.com
> Message-ID:
> 	<c25f25140811070149u2d098492rf2c36e6b07941225 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hei.
> 
> 
> One of our RHEL 4 servers running on VmWare has a quite
> serious NTP problem.
> I know that NTP can be an issue when running red hat boxes
> on VmWare, so as
> a fix I put this small script in a file in
> /etc/cron.hourly:
> 
> 
> [root at server cron.hourly]# cat ntpdate
> #!/bin/sh
> /etc/init.d/ntpd stop
> ntpdate 1.2.3.4 >> /tmp/time_adjust.log
> /etc/init.d/ntp
> 
> 
> After investigating the "/tmp/time_adjust.log"
> file, I was quite surprised
> by the amount of drift found on one particular server.
> Consider this extract
> from the file:
> 
> 6 Nov 20:00:01 ntpdate[19373]: step time server 1.2.3.4
> offset -60.504153
> sec
>  6 Nov 20:00:52 ntpdate[19666]: step time server 1.2.3.4
> offset -8.735440
> sec
>  6 Nov 20:01:00 ntpdate[19689]: step time server 1.2.3.4
> offset -1.635632
> sec
>  6 Nov 20:54:06 ntpdate[24198]: step time server 1.2.3.4
> offset -415.894712
> sec
>  6 Nov 21:01:01 ntpdate[24920]: adjust time server 1.2.3.4
> offset 0.136833
> sec
>  6 Nov 22:01:02 ntpdate[29943]: adjust time server 1.2.3.4
> offset -0.114253
> sec
>  6 Nov 23:01:01 ntpdate[2519]: adjust time server 1.2.3.4
> offset -0.036345
> sec
>  7 Nov 00:01:00 ntpdate[7577]: step time server 1.2.3.4
> offset -1.064935 sec
>  7 Nov 01:00:57 ntpdate[12697]: step time server 1.2.3.4
> offset -3.922577
> sec
>  7 Nov 02:00:21 ntpdate[17733]: step time server 1.2.3.4
> offset -40.421825
> sec
>  7 Nov 02:01:00 ntpdate[17777]: step time server 1.2.3.4
> offset -1.123175
> sec
>  7 Nov 02:57:23 ntpdate[22542]: step time server 1.2.3.4
> offset -218.649820
> sec
>  7 Nov 03:00:36 ntpdate[22900]: step time server 1.2.3.4
> offset -25.284528
> sec
>  7 Nov 03:00:58 ntpdate[22940]: step time server 1.2.3.4
> offset -3.104130
> sec
>  7 Nov 03:52:32 ntpdate[27430]: step time server 1.2.3.4
> offset -509.363952
> sec
>  7 Nov 03:59:50 ntpdate[27943]: step time server 1.2.3.4
> offset -71.430354
> sec
>  7 Nov 04:00:52 ntpdate[28236]: step time server 1.2.3.4
> offset -9.344907
> sec
>  7 Nov 04:01:00 ntpdate[28259]: step time server 1.2.3.4
> offset -1.237651
> sec
>  7 Nov 05:01:01 ntpdate[1363]: adjust time server 1.2.3.4
> offset 0.390149
> sec
>  7 Nov 06:01:01 ntpdate[6419]: adjust time server 1.2.3.4
> offset -0.185112
> sec
>  7 Nov 07:01:02 ntpdate[11493]: adjust time server 1.2.3.4
> offset -0.228884
> sec
>  7 Nov 08:00:59 ntpdate[16579]: step time server 1.2.3.4
> offset -2.166519
> sec
>  7 Nov 09:00:38 ntpdate[21522]: step time server 1.2.3.4
> offset -23.169420
> sec
>  7 Nov 09:01:02 ntpdate[21558]: adjust time server 1.2.3.4
> offset -0.492106
> sec
>  7 Nov 09:59:26 ntpdate[26329]: step time server 1.2.3.4
> offset -95.154264
> sec
>  7 Nov 10:00:55 ntpdate[26639]: step time server 1.2.3.4
> offset -5.997955
> sec
>  7 Nov 10:01:01 ntpdate[26658]: step time server 1.2.3.4
> offset -0.506367
> sec
> 
> 
> Does anyone know what may be causing the RHEL box to drift
> as much as 500
> seconds in only one hour?
> 
> Regards,
> Kenneth Holter
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 7 Nov 2008 16:15:08 +0530
> From: lingu <hicheerup at gmail.com>
> Subject: Cluster Broken pipe & node Reboot
> To: "General Red Hat Linux discussion list"
> <redhat-list at redhat.com>
> Message-ID:
> 	<29e045b80811070245t1c303530xbf58626227638260 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi  all,
> 
>   I am running two node RHEL3U8  cluster of below cluster
> version on
> HP servers connected  via scsi channel to HP Storage (SAN)
> for oracle
> database server.
> 
> Kernel & Cluster Version
> 
> Kernel-2.4.21-47.EL #1 SMP
> redhat-config-cluster-1.0.7-1-noarch
> clumanager-1.2.26.1-1-x86_64
> 
> 
>  Suddenly  my active node got rebooted after analysed the
> logs it is
> throwing below errors on syslog.I want to know what might
> cause this
> type of error and also after analysed the sar output
> indicates there
> was no load on the server at the time system get rebooted
> as well as
> on the time i am getting I/O Hang error.
> 
> Nov  3 14:23:00 cluster1 clulockd[1996]: <warning>
> Denied 20.1.2.162:
> Broken pipe
> Nov  3 14:23:00 cluster1 clulockd[1996]: <err> select
> error: Broken pipe
> Nov  3 14:23:06 cluster1 clulockd[1996]: <warning>
> Denied 20.1.2.162:
> Broken pipe
> Nov  3 14:23:06 cluster1 clulockd[1996]: <err> select
> error: Broken pipe
> Nov  3 14:23:13 cluster1 cluquorumd[1921]: <warning>
> Disk-TB: Detected
> I/O Hang!
> Nov  3 14:23:15 cluster1 clulockd[1996]: <warning>
> Denied 20.1.2.161:
> Broken pipe
> Nov  3 14:23:15 cluster1 clulockd[1996]: <err> select
> error: Broken pipe
> Nov  3 14:23:12 cluster1 clusvcmgrd[2011]: <err>
> Unable to obtain
> cluster lock: Connection timed out
> 
> Nov  5 17:18:00 cluster1 cluquorumd[1921]: <warning>
> Disk-TB: Detected
> I/O Hang!
> Nov  5 17:18:00 cluster1 clulockd[1996]: <warning>
> Denied 20.1.2.162:
> Broken pipe
> Nov  5 17:18:00 cluster1 clulockd[1996]: <err> select
> error: Broken pipe
> Nov  5 17:18:17 cluster1 clulockd[1996]: <warning>
> Denied 20.1.2.162:
> Broken pipe
> Nov  5 17:18:17 cluster1 clulockd[1996]: <err> select
> error: Broken pipe
> Nov  5 17:18:17 cluster1 clulockd[1996]: <warning>
> Potential recursive
> lock #0 grant to member
>  #1, PID1962
> 
> 
>  I need some one help  in guiding how to fix out this error
> and also
> the real cause for such above  errors .
> 
> Attached my cluster.xml file.
> 
> 
> 
> <?xml version="1.0"?>
> <cluconfig version="3.0">
>  <clumembd broadcast="yes"
> interval="1000000" loglevel="5"
> multicast="no" multicast_ipaddress=""
> thread="yes" tko_count="25"/>
>  <cluquorumd loglevel="7"
> pinginterval="5" tiebreaker_ip=""/>
>  <clurmtabd loglevel="7"
> pollinterval="4"/>
>  <clusvcmgrd loglevel="7"/>
>  <clulockd loglevel="7"/>
>  <cluster config_viewnumber="4"
> key="6672bc0a71be2ec9486f6a2f5846c172"
> name="ORACLECLUSTER"/>
>  <sharedstate driver="libsharedraw.so"
> rawprimary="/dev/raw/raw1"
> rawshadow="/dev/raw/raw2"
> type="raw"/>
>  <members>
>    <member id="0" name="cluster1"
> watchdog="yes"/>
>    <member id="1" name="cluster2"
> watchdog="yes"/>
>  </members>
>  <services>
>    <service checkinterval="10"
> failoverdomain="oracle_db" id="0"
> maxfalsestarts="0" maxrestarts="0"
> name="database"
> userscript="/etc/init.d/script_db.sh">
>      <service_ipaddresses>
>        <service_ipaddress broadcast="None"
> id="0"
> ipaddress="20.1.2.35" monitor_link="1"
> netmask="255.255.0.0"/>
>      </service_ipaddresses>
>       <device id="0"
> name="/dev/cciss/c0d0p1"
> sharename="">
>        <mount forceunmount="yes"
> fstype="ext3" mountpoint="/vol1"
> options="rw"/>
>      </device>
>      <device id="1"
> name="/dev/cciss/c0d0p2"
> sharename="">
>        <mount forceunmount="yes"
> fstype="ext3" mountpoint="/vol2"
> options="rw"/>
>      </device>
>      <device id="2"
> name="/dev/cciss/c0d0p5"
> sharename="">
>        <mount forceunmount="yes"
> fstype="ext3" mountpoint="/vol3"
> options="rw"/>
>      </device>
> 
>  </service>
>  </services>
>  <failoverdomains>
>    <failoverdomain id="0"
> name="oracle_db" ordered="no"
> restricted="yes">
>      <failoverdomainnode id="0"
> name="cluster1"/>
>      <failoverdomainnode id="1"
> name="cluster2"/>
>    </failoverdomain>
>  </failoverdomains>
> </cluconfig>
> 
> Regards,
> Lingu
> 
> 
> 
> ------------------------------
> 
> __
> redhat-list mailing list
> Unsubscribe
> mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
> 
> End of redhat-list Digest, Vol 57, Issue 7
> ******************************************