[Linux-cluster] Problem in clvmd/dlm_recoverd
Tom Lanyon
tom at netspot.com.au
Wed Nov 19 04:30:33 UTC 2008
On 19/11/2008, at 2:06 AM, David Teigland wrote:
> On Tue, Nov 18, 2008 at 05:14:38PM +1030, Tom Lanyon wrote:
>> We seem to be having the same problem on a 5 node virtual cluster
>> where 3 of the nodes share a GFS mount.
>>
>> A backup script runs on one node which does some heavy reads + writes
>> to this mount at which point all three nodes jump to 100% cpu (90%
>> iowait on the machine that is doing the backup, 100% system on the
>> other two) and all LVM VGs, LVs and GFS mounts lock up.
>
> Which process was using 100% cpu? If it was groupd, fenced,
> dlm_controld
> or gfs_controld, then yes it may be the same problem.
>
>> Is there anything that could be tuned here to avoid this issue
>> until a
>> bug fix is released?
>
> I don't think there's any way to avoid the bug in the bz I referenced.
>
> Dave
We haven't been able to catch it quick enough to determine which
process is using all CPU.
The other option is that we're just seeing a huge amount of glocks
created on the node running backups and all others (webservers) are
just hanging whilst trying to access files. I've just done some fairly
aggressive tuning of the GFS mounts on all nodes; hopefully this fixes
it!
Regards,
Tom
More information about the Linux-cluster
mailing list