Wayne H. Badger badger at yahoo-inc.com
Thu Jan 22 22:44:46 UTC 2009

On Jan 16, 2009, at 4:36 PM, Alasdair G Kergon wrote:
> 1. You don't say what filesystem/journal configuration is involved.

We're using an ext2 fs.  I have tried with ext3 with journaling and I  
get the same result.

> 2. Is it a system where you can run test kernels or is it too 'live'  
> for that?

I can run tests.

> 3. Try all the syncing and cache flushing options before issuing the  
> lvremove.

I drop caches before running the test.  A sync immediately after  
writing the files results in the same hang.

> 4. Could it depend on the amount of io load over a short period  
> before the
> lvremove runs, rather than something cumulative?

I do notice that smaller data writes result in a quicker lvremove, one  
that is commensurate with the amount of data written.  It seems as  
though there is a threshold over which the slow result occurs.  Also,  
waiting a long time after the writes complete clears the problem and  
the lvremove occurs quickly.  It seems to be related to a burst of  
data of a certain size.

> 5. What more can you find out from ps/iostat/sysrq/lvm logging about  
> what the
> system is doing during the lvremove?

iostat shows that there is not a whole lot of I/O going on.  There are  
writes going on pretty continually during the slowdown time, but the  
disk is certainly not pegged.

ps shows that the lvremove is stuck in an uninterruptible sleep.

> 6. What devices/drivers are involved in the stack?  (We had a  
> perhaps-related
> issue show up with loop.)

We're using an HP P400 smart array controller with RAID5.  There are 4  
750GB SATA disks hanging off the controller.


