NTP problem for virtual RHEL 4 server on VmWare (Kenneth Holter)

Sat Nov 8 20:30:58 UTC 2008

Sorry for top post, sent from blackberry.

Clock skew is a known issue and the recommendation is for 32-bit. I
run ntpdate every 30 minutes to fix scewing, disable ntp daemon, and
DO NOT use the vmware tools... DO NOT...

On 11/8/08, Le Wen <wenle at lenovo.com> wrote:
> Hi Kenneth
>
> Try add
> clock=pmtmr  notsc
> to your kernel parameter, it works for me.
>
>
>
>
> karthikeyan <karthik_arnold1 at yahoo.com>
> Sent by: redhat-list-bounces at redhat.com
> 2008-11-08 14:40
> Please respond to
> karthik_arnold1 at yahoo.com; Please respond to
> General Red Hat Linux discussion list <redhat-list at redhat.com>
>
>
> To
> redhat-list at redhat.com
> cc
>
> Subject
> NTP problem for virtual RHEL 4 server on VmWare (Kenneth Holter)
>
>
>
>
>
>
>
> Hi Kenneth
>
> This is  a know issue with rhel on vmware , U can find the knowledege base
> about time running slow in vmware website .
>
> Pls find the link
>
> http://www.djax.co.uk/kb/linux/vmware_clock_drift.html
>
>
> Rgards
> Karthik
>
>
> --- On Fri, 11/7/08, redhat-list-request at redhat.com
> <redhat-list-request at redhat.com> wrote:
>
>> From: redhat-list-request at redhat.com <redhat-list-request at redhat.com>
>> Subject: redhat-list Digest, Vol 57, Issue 7
>> To: redhat-list at redhat.com
>> Date: Friday, November 7, 2008, 10:30 PM
>> Send redhat-list mailing list submissions to
>>                redhat-list at redhat.com
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>                https://www.redhat.com/mailman/listinfo/redhat-list
>> or, via email, send a message with subject or body
>> 'help' to
>>                redhat-list-request at redhat.com
>>
>> You can reach the person managing the list at
>>                redhat-list-owner at redhat.com
>>
>> When replying, please edit your Subject line so it is more
>> specific
>> than "Re: Contents of redhat-list digest..."
>>
>>
>> Today's Topics:
>>
>>    1. RE: Cluster Heart Beat Using Cross Over Cable
>>       (Karchner, Craig (IT Solutions US))
>>    2. Help Slick Mach make the right choice!
>> (mailanky at gmail.com)
>>    3. NTP problem for virtual RHEL 4 server on VmWare
>> (Kenneth Holter)
>>    4. Cluster Broken pipe & node Reboot (lingu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Thu, 6 Nov 2008 09:24:12 -0800
>> From: "Karchner, Craig (IT Solutions US)"
>>                <craig.a.karchner at siemens.com>
>> Subject: RE: Cluster Heart Beat Using Cross Over Cable
>> To: "General Red Hat Linux discussion list"
>> <redhat-list at redhat.com>
>> Message-ID:
>> <13FE6613E1ADA041A0124537010C11E903742020 at USNWK102MSX.ww017.siemens.net>
>>
>> Content-Type: text/plain;              charset="us-ascii"
>>
>>
>> Lingu,
>>
>> I had this same problem a few weeks back.
>>
>> This is how I solved it.
>>
>> Make sure your NIC's are at 1G.
>>
>> Add the following entries into your cluster.ccs file and
>> write it to
>> disk;
>>
>> heartbeat_rate = 30
>> allowed_misses = 4
>>
>> My cluster.ccs file looks like this now;
>>
>> cluster {
>> name = "alpha"
>> lock_gulm {
>> servers = ["server1", "server2",
>> "server3"]
>> heartbeat_rate = 30
>> allowed_misses = 4
>> }
>> }
>>
>> This example procedure shows how to change configuration
>> files in a CCS
>> archive.
>>
>> 1. Extract configuration files from the CCA device into
>> temporary
>> directory /root/alpha-new/.
>>
>> ccs_tool extract /dev/pool/alpha_cca /root/alpha-new/
>>
>> 2. Make changes to the configuration files in
>> /root/alpha-new/.
>>
>> 3. Create a new CCS archive on the CCA device by using the
>> -O (override)
>> flag to forcibly overwrite
>> the existing CCS archive.
>>
>> ccs_tool -O create /root/alpha-new/ /dev/pool/alpha_cca
>>
>>
>>
>> What you are suggesting ( cross over cable) is not
>> supported at least in
>> GFS 6.0 which I assume you are running with RHEL 3.0
>>
>>
>> -----Original Message-----
>> From: redhat-list-bounces at redhat.com
>> [mailto:redhat-list-bounces at redhat.com] On Behalf Of lingu
>> Sent: Thursday, November 06, 2008 7:41 AM
>> To: General Red Hat Linux discussion list
>> Subject: Cluster Heart Beat Using Cross Over Cable
>>
>> Hi,
>>
>>  I am running two node active/passive  cluster running
>> RHEL3 update
>> 8 64 bit  OS on Hp Box with external hp storage connected
>> via scsi. My
>> cluster was running fine for  last 3 years.But all of a
>> sudden cluster
>> service keep on shifting (atleast one time in a day )form
>> one node to
>> another.
>>
>>  After analysed the syslog i found that  due to some
>> network
>> fluctuation service was getting shifted.Both the nodes has
>> two NIC
>> bonded together and configured with  below ip.
>>
>> My network details:
>>
>> 192.168.1.2 --node 1 physical ip  with  class c subnet
>> (bond0 )
>> 192.168.1.3 --node 2 physical ip  with class c subnet
>> (bond0 )
>> 192.168.1.4  --- floating ip  ( cluster )
>>
>>  Since it is a very critical and busy server may be due to
>> heavy
>> network load  some hear beat signal is getting missed
>> resulting in
>> shifting of service from one node to another.
>>
>> So i planned to connect crossover cable for heart beat
>> messages, can
>> any one guide me  or provide me the link that best explains
>>  how to do
>> the same and the changes i have to made in cluster
>> configuration file
>> after connecting the crossover cable.
>>
>> Regards,
>>
>> Lingu
>>
>> --
>> redhat-list mailing list
>> unsubscribe
>> mailto:redhat-list-request at redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 7 Nov 2008 13:38:09 +0530
>> From: <mailanky at gmail.com>
>> Subject: Help Slick Mach make the right choice!
>> To: <redhat-list at redhat.com>
>> Message-ID:
>> <B07101D09C1F45E0A8C6C191BEA2A154 at webchutney2>
>> Content-Type: text/plain;              charset="iso-8859-1"
>>
>> Hey,
>>
>> ankur has signed you up for a perfect shave!
>> Simply help Slick Mach make the right choice & you
>> could win a free
>> Gillette Mach 3 razor.
>> Click here to take the challenge.
>> <http://www.slickmach.com/index.html>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Fri, 7 Nov 2008 10:49:04 +0100
>> From: "Kenneth Holter"
>> <kenneho.ndu at gmail.com>
>> Subject: NTP problem for virtual RHEL 4 server on VmWare
>> To: redhat-list at redhat.com
>> Message-ID:
>> <c25f25140811070149u2d098492rf2c36e6b07941225 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hei.
>>
>>
>> One of our RHEL 4 servers running on VmWare has a quite
>> serious NTP problem.
>> I know that NTP can be an issue when running red hat boxes
>> on VmWare, so as
>> a fix I put this small script in a file in
>> /etc/cron.hourly:
>>
>>
>> [root at server cron.hourly]# cat ntpdate
>> #!/bin/sh
>> /etc/init.d/ntpd stop
>> ntpdate 1.2.3.4 >> /tmp/time_adjust.log
>> /etc/init.d/ntp
>>
>>
>> After investigating the "/tmp/time_adjust.log"
>> file, I was quite surprised
>> by the amount of drift found on one particular server.
>> Consider this extract
>> from the file:
>>
>> 6 Nov 20:00:01 ntpdate[19373]: step time server 1.2.3.4
>> offset -60.504153
>> sec
>>  6 Nov 20:00:52 ntpdate[19666]: step time server 1.2.3.4
>> offset -8.735440
>> sec
>>  6 Nov 20:01:00 ntpdate[19689]: step time server 1.2.3.4
>> offset -1.635632
>> sec
>>  6 Nov 20:54:06 ntpdate[24198]: step time server 1.2.3.4
>> offset -415.894712
>> sec
>>  6 Nov 21:01:01 ntpdate[24920]: adjust time server 1.2.3.4
>> offset 0.136833
>> sec
>>  6 Nov 22:01:02 ntpdate[29943]: adjust time server 1.2.3.4
>> offset -0.114253
>> sec
>>  6 Nov 23:01:01 ntpdate[2519]: adjust time server 1.2.3.4
>> offset -0.036345
>> sec
>>  7 Nov 00:01:00 ntpdate[7577]: step time server 1.2.3.4
>> offset -1.064935 sec
>>  7 Nov 01:00:57 ntpdate[12697]: step time server 1.2.3.4
>> offset -3.922577
>> sec
>>  7 Nov 02:00:21 ntpdate[17733]: step time server 1.2.3.4
>> offset -40.421825
>> sec
>>  7 Nov 02:01:00 ntpdate[17777]: step time server 1.2.3.4
>> offset -1.123175
>> sec
>>  7 Nov 02:57:23 ntpdate[22542]: step time server 1.2.3.4
>> offset -218.649820
>> sec
>>  7 Nov 03:00:36 ntpdate[22900]: step time server 1.2.3.4
>> offset -25.284528
>> sec
>>  7 Nov 03:00:58 ntpdate[22940]: step time server 1.2.3.4
>> offset -3.104130
>> sec
>>  7 Nov 03:52:32 ntpdate[27430]: step time server 1.2.3.4
>> offset -509.363952
>> sec
>>  7 Nov 03:59:50 ntpdate[27943]: step time server 1.2.3.4
>> offset -71.430354
>> sec
>>  7 Nov 04:00:52 ntpdate[28236]: step time server 1.2.3.4
>> offset -9.344907
>> sec
>>  7 Nov 04:01:00 ntpdate[28259]: step time server 1.2.3.4
>> offset -1.237651
>> sec
>>  7 Nov 05:01:01 ntpdate[1363]: adjust time server 1.2.3.4
>> offset 0.390149
>> sec
>>  7 Nov 06:01:01 ntpdate[6419]: adjust time server 1.2.3.4
>> offset -0.185112
>> sec
>>  7 Nov 07:01:02 ntpdate[11493]: adjust time server 1.2.3.4
>> offset -0.228884
>> sec
>>  7 Nov 08:00:59 ntpdate[16579]: step time server 1.2.3.4
>> offset -2.166519
>> sec
>>  7 Nov 09:00:38 ntpdate[21522]: step time server 1.2.3.4
>> offset -23.169420
>> sec
>>  7 Nov 09:01:02 ntpdate[21558]: adjust time server 1.2.3.4
>> offset -0.492106
>> sec
>>  7 Nov 09:59:26 ntpdate[26329]: step time server 1.2.3.4
>> offset -95.154264
>> sec
>>  7 Nov 10:00:55 ntpdate[26639]: step time server 1.2.3.4
>> offset -5.997955
>> sec
>>  7 Nov 10:01:01 ntpdate[26658]: step time server 1.2.3.4
>> offset -0.506367
>> sec
>>
>>
>> Does anyone know what may be causing the RHEL box to drift
>> as much as 500
>> seconds in only one hour?
>>
>> Regards,
>> Kenneth Holter
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Fri, 7 Nov 2008 16:15:08 +0530
>> From: lingu <hicheerup at gmail.com>
>> Subject: Cluster Broken pipe & node Reboot
>> To: "General Red Hat Linux discussion list"
>> <redhat-list at redhat.com>
>> Message-ID:
>> <29e045b80811070245t1c303530xbf58626227638260 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi  all,
>>
>>   I am running two node RHEL3U8  cluster of below cluster
>> version on
>> HP servers connected  via scsi channel to HP Storage (SAN)
>> for oracle
>> database server.
>>
>> Kernel & Cluster Version
>>
>> Kernel-2.4.21-47.EL #1 SMP
>> redhat-config-cluster-1.0.7-1-noarch
>> clumanager-1.2.26.1-1-x86_64
>>
>>
>>  Suddenly  my active node got rebooted after analysed the
>> logs it is
>> throwing below errors on syslog.I want to know what might
>> cause this
>> type of error and also after analysed the sar output
>> indicates there
>> was no load on the server at the time system get rebooted
>> as well as
>> on the time i am getting I/O Hang error.
>>
>> Nov  3 14:23:00 cluster1 clulockd[1996]: <warning>
>> Denied 20.1.2.162:
>> Broken pipe
>> Nov  3 14:23:00 cluster1 clulockd[1996]: <err> select
>> error: Broken pipe
>> Nov  3 14:23:06 cluster1 clulockd[1996]: <warning>
>> Denied 20.1.2.162:
>> Broken pipe
>> Nov  3 14:23:06 cluster1 clulockd[1996]: <err> select
>> error: Broken pipe
>> Nov  3 14:23:13 cluster1 cluquorumd[1921]: <warning>
>> Disk-TB: Detected
>> I/O Hang!
>> Nov  3 14:23:15 cluster1 clulockd[1996]: <warning>
>> Denied 20.1.2.161:
>> Broken pipe
>> Nov  3 14:23:15 cluster1 clulockd[1996]: <err> select
>> error: Broken pipe
>> Nov  3 14:23:12 cluster1 clusvcmgrd[2011]: <err>
>> Unable to obtain
>> cluster lock: Connection timed out
>>
>> Nov  5 17:18:00 cluster1 cluquorumd[1921]: <warning>
>> Disk-TB: Detected
>> I/O Hang!
>> Nov  5 17:18:00 cluster1 clulockd[1996]: <warning>
>> Denied 20.1.2.162:
>> Broken pipe
>> Nov  5 17:18:00 cluster1 clulockd[1996]: <err> select
>> error: Broken pipe
>> Nov  5 17:18:17 cluster1 clulockd[1996]: <warning>
>> Denied 20.1.2.162:
>> Broken pipe
>> Nov  5 17:18:17 cluster1 clulockd[1996]: <err> select
>> error: Broken pipe
>> Nov  5 17:18:17 cluster1 clulockd[1996]: <warning>
>> Potential recursive
>> lock #0 grant to member
>>  #1, PID1962
>>
>>
>>  I need some one help  in guiding how to fix out this error
>> and also
>> the real cause for such above  errors .
>>
>> Attached my cluster.xml file.
>>
>>
>>
>> <?xml version="1.0"?>
>> <cluconfig version="3.0">
>>  <clumembd broadcast="yes"
>> interval="1000000" loglevel="5"
>> multicast="no" multicast_ipaddress=""
>> thread="yes" tko_count="25"/>
>>  <cluquorumd loglevel="7"
>> pinginterval="5" tiebreaker_ip=""/>
>>  <clurmtabd loglevel="7"
>> pollinterval="4"/>
>>  <clusvcmgrd loglevel="7"/>
>>  <clulockd loglevel="7"/>
>>  <cluster config_viewnumber="4"
>> key="6672bc0a71be2ec9486f6a2f5846c172"
>> name="ORACLECLUSTER"/>
>>  <sharedstate driver="libsharedraw.so"
>> rawprimary="/dev/raw/raw1"
>> rawshadow="/dev/raw/raw2"
>> type="raw"/>
>>  <members>
>>    <member id="0" name="cluster1"
>> watchdog="yes"/>
>>    <member id="1" name="cluster2"
>> watchdog="yes"/>
>>  </members>
>>  <services>
>>    <service checkinterval="10"
>> failoverdomain="oracle_db" id="0"
>> maxfalsestarts="0" maxrestarts="0"
>> name="database"
>> userscript="/etc/init.d/script_db.sh">
>>      <service_ipaddresses>
>>        <service_ipaddress broadcast="None"
>> id="0"
>> ipaddress="20.1.2.35" monitor_link="1"
>> netmask="255.255.0.0"/>
>>      </service_ipaddresses>
>>       <device id="0"
>> name="/dev/cciss/c0d0p1"
>> sharename="">
>>        <mount forceunmount="yes"
>> fstype="ext3" mountpoint="/vol1"
>> options="rw"/>
>>      </device>
>>      <device id="1"
>> name="/dev/cciss/c0d0p2"
>> sharename="">
>>        <mount forceunmount="yes"
>> fstype="ext3" mountpoint="/vol2"
>> options="rw"/>
>>      </device>
>>      <device id="2"
>> name="/dev/cciss/c0d0p5"
>> sharename="">
>>        <mount forceunmount="yes"
>> fstype="ext3" mountpoint="/vol3"
>> options="rw"/>
>>      </device>
>>
>>  </service>
>>  </services>
>>  <failoverdomains>
>>    <failoverdomain id="0"
>> name="oracle_db" ordered="no"
>> restricted="yes">
>>      <failoverdomainnode id="0"
>> name="cluster1"/>
>>      <failoverdomainnode id="1"
>> name="cluster2"/>
>>    </failoverdomain>
>>  </failoverdomains>
>> </cluconfig>
>>
>> Regards,
>> Lingu
>>
>>
>>
>> ------------------------------
>>
>> __
>> redhat-list mailing list
>> Unsubscribe
>> mailto:redhat-list-request at redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>> End of redhat-list Digest, Vol 57, Issue 7
>> ******************************************
>
>
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>

-- 
Sent from my mobile device