[linux-lvm] Snit fight beetween LVM, MD and NFSD.
mauelshagen at redhat.com
Tue May 11 11:10:56 UTC 2004
this looks very much like the resource contention problem
NFSD and pvmove had (as you assumed below) causing a severe slowdown
With LVM2/device-mapper the problem is likely to be much less visible,
because of the use of temprorary mirrors for data relocation and background
copies used for mirror resynchronization.
IOW: I expect LVM2/device-mapper to be smoother ITR but of course
not free of resource contention problems.
Heinz -- The LVM Guy --
On Wed, May 05, 2004 at 08:43:45PM -0500, Dr. Greg Wettstein wrote:
> Good evening, hope the day is going well for everyone.
> We just spent the last 24 hours dealing with a rather strange
> situation on one of our big file servers. I wanted to summarize what
> happened to find out if there is an issue or whether this is a "don't
> do that type of thing situation".
> The server in question is a dual 1.2Ghz PIII with 1 gigabyte of RAM
> running 2.4.26 and providing NFS services to around 100 Linux clients
> (IA32/IA64). Storage is implemented with a 8x160 Gbyte MD based RAID5
> array using a 7508 3-ware controller. LVM is used to carve the MD
> device into 5 logical volumes supporting ext3 filesystems which serve
> as the NFS export sources. LVM is up to date with whatever patches
> were relevant from the 1.0.8 distribution.
> Clients are mounted with the following options:
> Last week one of the drives in the RAID5 stripe failed. In order to
> avoid a double fault situation we migrated all the physical extents
> from the RAID5 based PV to a FC based PV on the SAN. SAN access is
> provided through a Qlogic 2300 with firmware 3.02.16 using the 6.06.10
> driver from Qlogic.
> Migration to the FC based physical volume was uneventful. The faulty
> drive was replaced this week and the extents were migrated back from
> the FC based physical volume on an LV by LV basis. All of this went
> fine until the final 150 Gbyte LV was migrated.
> Early into the migration the load on the box went high (10-12). Both
> the pvmove process and the NFSD processes were persistently stuck for
> long periods of time in D state. The pvmove process would stick in
> get_active_stripe while the NFSD processes were stuck in
> I/O patterns were very similar for NFS and the pvmove process. NFS
> clients would hang for 20-30 seconds followed by a burst of I/O. On
> the FC controllers we would see a burst of I/O from the pvmove process
> followed by a 20-30 seconds of no activity. Interactive performance
> on the fileserver was good.
> We unmounted almost all of the NFS clients and reduced the situation
> to a case where we had 5-7 clients doing modest I/O, mostly listing
> directories and other common interactive functions. Load remained
> high with the NFSD processes oscillating in and out of D state with
> the pvmove process.
> We then unmounted all the clients that were accessing the filesystem
> supported by the LV which was having its physical extents migrated.
> Load patterns remained the same. We then unmounted the physical
> filesystem and the load still remained high.
> As a final test we stopped NFS services. This caused the pvmove
> process to run almost continuously with only occasional D state waits.
> We confirmed this by observing almost continuous traffic on the FC
> controller. When the pvmove completed NFS services were restarted,
> all clients were remounted and the server is running with 80-90 client
> connections with modest load.
> So it would seem that the NFSD processes and the pvmove process were
> involved in some type of resource contention problem. I would write
> this off to "LVM doesn't work well for NFS exported filesystems"
> except for the fact that we had successfully transferred 250+
> gigabytes of filesystems off the box and back onto the box without
> event before this incident.
> I would be interested in any thoughts that anyone may have. We can
> setup a testbed to try and re-create the problem if there are
> additional diagnostics that would be helpful in figuring out what was
> going on.
> Best wishes for a productive end of the week.
> As always,
> Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
> 4206 N. 19th Ave. Specializing in information infra-structure
> Fargo, ND 58102 development.
> PH: 701-281-1686
> FAX: 701-281-3949 EMAIL: greg at enjellic.com
> "There are two ways of constructing a software design. One is to make
> it so simple that there are obviously no deficiencies; the other is to
> make it so complicated that there are no obvious deficiencies. The
> first method is far more difficult."
> -- C. A. R. Hoare
> The Emperor's Old Clothes
> CACM February 1981
> linux-lvm mailing list
> linux-lvm at redhat.com
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
*** Software bugs are stupid.
Nevertheless it needs not so stupid people to solve them ***
Heinz Mauelshagen Red Hat GmbH
Consulting Development Engineer Am Sonnenhang 11
Mauelshagen at RedHat.com +49 2626 141200
More information about the linux-lvm