[Linux-cluster] share experience migrating cluster suite from centos 5.3 to centos 5.4

Thu Nov 5 10:32:22 UTC 2009

On Thu, Nov 5, 2009 at 10:38 AM, Gianluca Cecchi
<gianluca.cecchi at gmail.com>wrote:

> [snip]
> two other things:
> 1) I see these messages about quorum inside the first node, that didn't
> came during the previous days in 5.3 env
> Nov  5 08:00:14 mork clurgmgrd: [2692]: <notice> Getting status
> Nov  5 08:27:08 mork qdiskd[2206]: <warning> qdiskd: read (system call) has
> hung for 40 seconds
> Nov  5 08:27:08 mork qdiskd[2206]: <warning> In 40 more seconds, we will be
> evicted
> Nov  5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
> Nov  5 09:00:15 mork clurgmgrd: [2692]: <notice> Getting status
> Nov  5 09:48:23 mork qdiskd[2206]: <warning> qdiskd: read (system call) has
> hung for 40 seconds
> Nov  5 09:48:23 mork qdiskd[2206]: <warning> In 40 more seconds, we will be
> evicted
> Nov  5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status
> Nov  5 10:00:15 mork clurgmgrd: [2692]: <notice> Getting status
>
> Any timings changed between releases?
> My relevant lines about timings in cluster.conf were in 5.3 and remained so
> in 5.4:
>
> <cluster alias="clumm" config_version="7" name="clumm">
>         <totem token="162000"/>
>         <cman quorum_dev_poll="80000" expected_votes="3" two_node="0"/>
>         <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="20"/>
>
>         <quorumd device="/dev/sda" interval="5" label="clummquorum"
> log_facility="local4" log_level="7" tko="16" votes="1">
>                 <heuristic interval="2" program="ping -c1 -w1
> 192.168.122.1" score="1" tko="3000"/>
>         </quorumd>
>
> (tko very big in heuristic because I was testing best and safer way to do
> on-the-fly changes to heuristic, due to network maintenance activity causing
> gw disappear for some time, not predictable by the net-guys...)
>
> I don't know if this message is deriving from a problem with latencies in
> my virtual env or not....
> On the host side I don't see any message with dmesg command or in
> /var/log/messages.....
>
> 2) saw that a new kernel just released...... ;-(
> Hints about possible interferences with cluster infra?
>
> Gianluca
>
>
>
Probably 1) is due to this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=500450
that found its solution released in  RHSA-2009-1341 advisory
with cman-2.0.115-1.el5.x86_64.rpm.
And coming from 2.0.98 this is reasonable.
In my case tko=16 and interval=5, so that max time tolerance is about 80
seconds that is the 40+40 seconds I see inside the messages....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091105/c9841df5/attachment.htm>