Strange delay in syscall close() on a large ext3-filesystem

Thu Dec 15 14:09:12 UTC 2005

Dear Mailinglist,

this morning I had a very strange problem with my SuSE Groupware-Server SLOX
4.1.

The webmail-access was very slow and even with thunderbird the access to my
mailboxes was not okay.
I rebooted the whole system, but the problem still was there.

I thought, this may be a problem with cyrusd, but I realized, that the error
was somewhere deeper in the system,
maybe in the filesystem.

The cyrusd uses a directory  /var/spool/imap/user to save the emails in
seperate files (on each mail).

When I was trying to open a mail (../user/<MYNAME>/723. ) with the
"cat"-command I saw a delay at the end of displaying the file.
When I used the "strace"-command to see, which functions are used, I saw,
that the one last systemcall was delayed:

# /root/strace cat 212966.

[... the mailtext is displayed in a usual speed....]
[...]
) = 3021
read(3, "", 4096)                       = 0
close(3

AND NOW there is a delay. After a second or so the close is finished and the
screen lookes like this:

) = 3021
read(3, "", 4096)                       = 0
close(3)                                = 0
_exit(0)                                = ?

The last line and the half of the pre-lats now is shown.

What me really wonders, is that there is only a delay - no errors are in any
logfile (accpet in webmail.log, "broken pipe").

The system has the following partitions:

slox:/var/spool/imap/alarm-virus # mount
/dev/sda6 on / type ext3 (rw)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sdb1 on /var type ext3 (rw,noatime,data=writeback) shmfs on /dev/shm
type shm (rw)

slox:/var/spool/imap/alarm-virus # df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda6             30811104  16479040  12766936  57% /
/dev/sdb1            282774360 122292576 146117668  46% /var
shmfs                  1940876         0   1940876   0% /dev/shm

I took a look with tune2fs, but this seems to be okay, right?
--- snip ---
slox:~ # tune2fs -l /dev/sdb1
tune2fs 1.28 (31-Aug-2002)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          1f328532-cb23-4c38-bf54-41725fbf89cf
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              35913728
Block count:              71820582
Reserved block count:     3591029
Free blocks:              40854579
Free inodes:              35259009
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Thu Dec  8 19:39:48 2005
Last write time:          Thu Dec  8 19:39:48 2005
Mount count:              1
Maximum mount count:      38
Last checked:             Thu Dec  8 18:34:47 2005
Check interval:           15552000 (6 months)
Next check after:         Tue Jun  6 19:34:47 2006
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal UUID:             <none>
Journal inode:            8
Journal device:           0x0000
First orphan inode:       18677862

---- snip ---

On that particular filesystem ar a lot of mostly small files:

--- snip ---
slox:/var # find . -type f|wc
 576393  669590 18992660
--- snip ---

I did the following workaround to solve the problem for now:

moving the "user"-dir to /var and make a symlink to /var/spool/imap - this
did work. Now (in my opinion) I am using another "way" (inode)  to access
the files and the speed increased to usual behavior.

Is it possible, the the inode-chain below /var/spool is damaged in a way?
It seems to me, that now, after the move, there must be another inode-chain
when accessing the mail-files.

I did a fsck a few days later while a reboot but there was nothing
remarkable at all.

It is maybe  important, that /dev/sdb1 is a RAID5 on a Dell-Server 2850,
dual XEON 3Ghz.

Any help would be appreciated.

With kind regards,
Volker Dose

 --

Die eCONNEX AG bietet wirksame IT-Lösungen.
Weitere Informationen finden Sie unter www.econnex.de.
____________________________________________

eCONNEX AG
Volker Dose (Netzwerkadministration)

Dänische Straße 15 - 24103 Kiel
Tel 0431 59369 0 - Fax 0431 59369 19

Valentinskamp 24 - 20354 Hamburg
Tel 040 31112 903 - Fax 040 31112 200
_____________________________________________________