[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: htree stabilitity and performance issues

My guess it is it being the partly NFS Linux implementation or the way
it talks with the filesystem and the filesystem itself. I have seen this problem without
the htree patch and just normal ext3. I was considering using htree to help with the
problem but after this post I think I will hold off.

I think I have a somewhat similiar setup to Adams with Maildirs and all.
I have a setup that goes through a Linux LVS, then to 1 of 6
FreeBSD NFS clients that run qmail, then over NFS to a Linux storage
box. POP and webmail goes through the same process on the same boxes.
Now as near as we can figure the stat and listing of POP and webmail was
killing the storage box. I have 32 instances of NFS on the storage
box and the load would go up to about 32. 1 load for every NFS
proc, I have read about this somewhere before. All the NFS daemons
would go into the DW state. From what it looked like NFS was waiting
on the filesystem to stat and list peoples Maildirs that had lots of files.
Another strange note is that kswapd would be using 99 percent CPU during
these NFS storms, not sure why, since I wasn't swapping. All of this was
happening while I had plenty of disk IO left on the storage box.

Eventually we just started to nuke old mail out of the larger dirs to get
them down to a sane size and things have cleared up.

Now from what it sounds like htree will actually make things worse in
this type of situation, is this correct?  Is there a patch somewhere
or a filesystem out there that is good at doing this stat and list
type of load.  Or is it just NetApp time? :)


Theodore Ts'o wrote:

On Thu, Dec 18, 2003 at 01:36:25PM +1100, Adam Cassar wrote:

What's your take on the nfs client load issues? It does run for 4-5
hours albeit at higher load (how explained by your post) however it does
eventually die with the load going stupid (180 odd). It seems that the
patch still has some nfs interoperability problems.

Was this on the nfs *client* or the nfs *server*?

I'd really, really like to see a ps listing on the machine involved;
the output of "ps alxww" and "ps auxww" would be useful.  The question
is what processes are hung in wait, and what they're waiting on....

It would also be interesting to see if the LD_PRELOAD hack which I
sent you helped alleviate the load on the server?  With the LD_PRELOAD
hack, the access pattern on stat's and open's should be restored to
the original workload, so if that makes the problem go away, then
the problem was merely that NFS doesn't degrade gracefully under load.

(This is not actually earth-shattering news; I've had really strange
results trying to do heavy-duty NFS over a wireless connection,
although that's more due to Linux's NFS implementation utterly failing
to deal dropped packets.)

I believe, although I am not sure, that there are some NFS
improvements that went into 2.6 that didn't get back-ported to 2.4.
So it might be that running 2.6.0 on the clients and/or servers might
actually help.  That would be a pretty daring move, though....

Finally, can you give me a little bit more detail of exactly what is
running on the clients and server, and the rationale of why you are
trying to apparently run incoming mail processes over NFS?  (Is that
what you're doing?  If so, it sounds rather scary...)

- Ted

_______________________________________________ Ext3-users mailing list Ext3-users redhat com https://www.redhat.com/mailman/listinfo/ext3-users

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]