[linux-lvm] pvmove hangs

Allen, Jack Jack.Allen at mckesson.com
Tue Aug 17 19:26:09 UTC 2010


Hello:

        I posted this in the Dm-devel list yesterday afternoon, but so
far I have not gotten any responses, so I thought I would ask the same
questions here since the command that hang is pvmove.

        I had a customer that tried to do a pvmove and it hung. So we
setup a test system to try and duplicate the problem and were able to.

        A little history and why I am asking the question in this list.
The customer needed to move from an existing SAN to a new SAN and wanted
as little as possible down time for the Application. So they zoned the
new SAN for access by the system and then added the new LUNs to the
existing Volume Group. Then ran the pvmove commands. It worked with no
problem on one of the PVs, but on the second one all the I/O hung at the
Application and any commands that access the LVM information such as
vgdisplay.

        On our test system we only have 1 SAN (EMC CX700). We put X
number of LUNs in a Volume Group and allocated Logical Volumes for the
Application. Added some more LUNs to the Volume Group to simulate a
second SAN. Started the Application with a test program to generate I/O.
Ran pvmove with no problems on one PV, but on the second PV, it hung
just like on the customer's system.

        The reason I am posting to this list is because the same type of
move was done earlier on the test system running PowerPath and did not
have any problems. The OS is Red Hat EL 5.5 32 bit. The same version of
LVM was used on both tests. I can provide other details if needed.

        Below is part of the messages file when this happen.

Aug 13 14:18:14 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: remove map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-17: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-20: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-23: add map (uevent)

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1884 22161

22151         22162 22160 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f34e2e04 00000082 baebd585

00000bd7 f34e2e50 c045d1a9 f34e2e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f6eb1550 baec5e00 00000bd7

0000887b 00000000 f6eb165c c8612700 f723f040

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12e1f80 018dc68e c042cbd1 f6cb3bdc ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:29:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:29:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:29:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:29:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:29:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:29:22 mss121 kernel: Call Trace:

Aug 13 14:29:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:29:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:29:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:29:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:29:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:29:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:29:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:29:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:29:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:29:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:29:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:29:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:29:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:29:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:29:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:29:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:29:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:29:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

        The mpdsk processes above are part of the Application which is a
MUMPS database (not a RDB) that does the writing of data blocks to raw
Logical Volume (no file system involved). It would have been doing
writes during both pvmoves. I know pvmove is part ofLVM2, but because it
worked with PowerPath and not when using Multipath and all other things
are the same is the reason I am asking the questions here.

_____

Jack Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20100817/09231850/attachment.htm>


More information about the linux-lvm mailing list