[Linux-cluster] Fenced node never reboots properly
Jeroen van den Horn
J.vandenHorn at xb.nl
Tue Apr 3 08:04:57 UTC 2007
Lon,
The only thing I changed is the line
my $powerop_mode = VM_POWEROP_MODE_HARD
instead of 'soft'. The machine is now correctly power-cycled and comes
back online.
The last issue I am still struggling with is the fact that the heartbeat
sometimes stops. In answer to a previous post: vmware-tools are
installed and running.
Based upon further investigation I must now conclude that the ESX-layer
cannot throttle the VCPU up in time - the test-machines are mainly idle
and VMWare (correctly) drops the amount of
CPU-cycles allocated to this machine. When some process now suddenly
needs more CPU power the machine 'chokes' - the 'virtual' load suddenly
peaks and the heartbeat just doesn't execute (for example, we see
fencing occur when the nightly cron jobs execute). I've tried renicing
the cman_hbeat process but this also does not prove effective. I'm aware
that setting a larger post-fail delay may solve the problem, but I'd
prefer this value to be 0 in case of a real failure.
Regards,
Jeroen
Lon Hohberger wrote:
> On Mon, Apr 02, 2007 at 09:35:16AM +0200, Jeroen van den Horn wrote:
>
>> It's ESX - it runs on HP blades (AMD). Guest OSes are all 32-bit.
>>
>>> In response to Lon's suggestion I modified the fence_vmware code and
>>> set the type of reset to HARD - cluster node now resets properly.
>>> Remaining issue is that under VMWare we are still experiencing
>>> performance issues. It's as if a node in the cluster starts 'lagging
>>> behind' (also the system clock starts drifting) and that after some
>>> time one of the nodes declares the other dead.
>>>
>
> Do you have a patch or the new fence agent? I'd like the change so I
> can commit it to CVS. Soft-reboot is *definitely* wrong; other problems
> aside.
>
> -- Lon
>
>
More information about the Linux-cluster
mailing list