[Linux-cluster] GFS2 and D state HTTPD processes

Fri Apr 9 05:02:37 UTC 2010

Looks like this bug:

GFS2 - probably lost glock call back
https://bugzilla.redhat.com/show_bug.cgi?id=498976

This is fixed in the kernel included in RHEL 5.5.
Do a "yum update" to fix it.

Ricardo Arguello

On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <emilio.ah at gmail.com> wrote:
> Thanks for your response, Steve.
>
> 2010/3/2 Steven Whitehouse <swhiteho at redhat.com>:
>> Hi,
>>
>> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:
>>> Hi,
>>>
>>> we are experiencing some problems commented in an old thread:
>>>
>>> http://www.mail-archive.com/linux-cluster@redhat.com/msg07091.html
>>>
>>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 resource.
>>>
>>> fstab options:
>>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2
>>> defaults,noatime,nodiratime,noquota 0 0
>>>
>>> GFS options:
>>> plock_rate_limit="0"
>>> plock_ownership=1
>>>
>>> httpd processes run into D status sometimes and the only solution is
>>> hard reset the affected server.
>>>
>>> Can anyone give me some hints to diagnose the problem?
>>>
>>> Thanks :)
>>>
>> Can you give me a rough idea of what the actual workload is and how it
>> is distributed amoung the director(y/ies) ?
>
> We had problems with php sessions in the past but we fixed it by
> configuring php to store the sessions in the database instead of in
> the GFS filesystem. Now, we're having problems with files and
> directories in the "data" folder of Moodle LMS.
>
> "lsof -p" returned a i/o operation over the same folder in 2/3 nodes,
> we did a hard reset of these nodes but some hours after the CPU load
> grew up again, specially in the node that wasn't rebooted. We decided
> to reboot (vía ssh) this node, then the CPU load went down to normal
> values in all nodes.
>
> I don't think the system's load is high enough to produce concurrent
> access problems. It's more likely to be some misconfiguration, in
> fact, we changed some GFS2 options to non default values to increase
> performance (http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html).
>
>>
>> This is often down to contention on glocks (one per inode) and maybe
>> because there is a process of processes writing a file or directory
>> which is in use (either read-only or writable) by other processes.
>>
>> If you are using php, then you might have to strace it to find out what
>> it is really doing,
>
> Ok, we will try to strace the D processes and post the results. Hope
> we find something!!
>
>>
>> Steve.
>>
>>> --
>>>
>>> Emilio Arjona.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Emilio Arjona.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>