[Linux-cluster] GFS2 with IMAP Maildir server

Flavio Junior billpp at gmail.com
Fri Jul 3 19:30:44 UTC 2009


Hi folks....

I'm (trying to) using GFS2 with a mailserver scenario using:

- CentOS 5.3 updated
- Dovecot IMAP/Maildir
- Postfix

To make servers active/active i'm using CTDB (http://ctdb.samba.org).

Some info that could be relevant:
[root at pinky ~]# uname -a
Linux pinky 2.6.18-128.1.16.el5 #1 SMP Tue Jun 30 06:07:26 EDT 2009 x86_64
x86_64 x86_64 GNU/Linux
[root at pinky ~]# rpm -qa | grep -E 'gfs2|clust|kernel|cman|openais'
kernel-2.6.18-128.1.16.el5
gfs2-utils-0.1.53-1.el5_3.3
modcluster-0.12.1-2.el5.centos
cluster-cim-0.12.1-2.el5.centos
kernel-devel-2.6.18-128.1.10.el5
openais-0.80.3-22.el5_3.8
system-config-cluster-1.0.55-1.0
kernel-2.6.18-128.1.6.el5
kernel-2.6.18-128.1.10.el5
kernel-devel-2.6.18-128.1.16.el5
lvm2-cluster-2.02.40-7.el5
cluster-snmp-0.12.1-2.el5.centos
kernel-headers-2.6.18-128.1.16.el5
kernel-devel-2.6.18-128.1.6.el5
cman-2.0.98-1.el5_3.4
[root at pinky ~]# grep /home /etc/fstab
/dev/homeClusterVG/home_vmail   /home           gfs2
auto,noatime,quota=off,noexec,nodev,_netdev       0 0


Everything works fine for some time, but two or three times by day I get
some dovecot/deliver process hanged D state, so the only way to solve it is
rebooting node.

I'm not a developer and don't know much about debugging. As i've got other
problems ago I learn to use "sysrq-t" and here is the output related with
two of these process:

Pastebin: http://pastebin.ca/1483264

Jul  3 15:45:20 cerebro kernel: deliver       D ffff81007e442800     0
24420  23846                     (NOTLB)
Jul  3 15:45:20 cerebro kernel:  ffff810013885e08 0000000000000082
ffff810013885d68 0000000000000092
Jul  3 15:45:20 cerebro kernel:  ffff810013885e20 0000000000000001
ffff8100141870c0 ffff81000904b0c0
Jul  3 15:45:20 cerebro kernel:  0000052a72ff2a70 000000000000034a
ffff8100141872a8 000000036caf5000
Jul  3 15:45:20 cerebro kernel: Call Trace:
Jul  3 15:45:20 cerebro kernel:  [<ffffffff88562a7d>]
:dlm:dlm_posix_lock+0x172/0x210
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8009eba4>]
autoremove_wake_function+0x0/0x2e
Jul  3 15:45:20 cerebro kernel:  [<ffffffff88591c7a>]
:gfs2:gfs2_lock+0xc3/0xcf
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8003a39e>]
fcntl_setlk+0x11e/0x273
Jul  3 15:45:20 cerebro kernel:  [<ffffffff800b5659>]
audit_syscall_entry+0x16e/0x1a1
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8002ea66>] sys_fcntl+0x269/0x2dc
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Jul  3 15:45:21 cerebro kernel: deliver       D ffff81000238f480     0
1358  32225                     (NOTLB)
Jul  3 15:45:21 cerebro kernel:  ffff8100086cfe08 0000000000000082
ffff8100086cfd68 0000000000000092
Jul  3 15:45:21 cerebro kernel:  ffff8100086cfe20 0000000000000001
ffff81000904b0c0 ffff81007ff28100
Jul  3 15:45:21 cerebro kernel:  0000052a72ff2ca2 0000000000000232
ffff81000904b2a8 000000037ed68a00
Jul  3 15:45:21 cerebro kernel: Call Trace:
Jul  3 15:45:21 cerebro kernel:  [<ffffffff88562a7d>]
:dlm:dlm_posix_lock+0x172/0x210
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8009eba4>]
autoremove_wake_function+0x0/0x2e
Jul  3 15:45:21 cerebro kernel:  [<ffffffff88591c7a>]
:gfs2:gfs2_lock+0xc3/0xcf
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8003a39e>]
fcntl_setlk+0x11e/0x273
Jul  3 15:45:21 cerebro kernel:  [<ffffffff800b5659>]
audit_syscall_entry+0x16e/0x1a1
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8002ea66>] sys_fcntl+0x269/0x2dc
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Before reboot the node I went into the directory of this user and run some
"ls" and everything works as expected. I was pretty sure that command will
hang, but it don't.
Here is the "ps ax" output:
cicero   24420  0.0  0.0   8960  1220 ?        Ds   14:46   0:00
/usr/libexec/dovecot/deliver -f cicero -d cicero

I've already rebooted that node, but if there is someway more deeply to
perform a debug of this case, just let me know that probably till the end of
the day i'll get same situation.


Thanks in advance.

--

Flávio do Carmo Júnior aka waKKu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090703/87babf13/attachment.htm>


More information about the Linux-cluster mailing list