[linux-lvm] HELP - Activating a VG kernel panicking the system
Christopher Smith
csmith at nighthawkrad.net
Sun Feb 7 19:29:05 UTC 2010
I was pvmove-ing around data on a large (75TB), clustered (2-node) VG when the process hung:
[CHI (UTC+0000) root at dicombackup01 ~]# pvmove -b /dev/dm-10:0-2000
[CHI (UTC+0000) root at dicombackup01 ~]# pvmove -v -i600
Checking progress every 600 seconds
Finding all volume groups
Finding volume group "vg_store"
Finding volume group "vg00"
Finding volume group "vg_dicomstore"
/dev/dm-10: Moved: 0.1%
Executing: /sbin/modprobe dm-log-clustered
Updating volume group metadata
Error locking on node dicombackup01-int.chi.nighthawkrad.net: Command timed out
Failed to suspend lv_NRS_20090405
With no progress 8 hours later (and no IO to the relevant devices), I decided to kill the pvmove process and reboot the host. On reboot, starting clvmd kernel panicked the system. After messing around for some time - and setting locking = 0 in /etc/lvm/lvm.conf to avoid having to start up all the clustering infrastructure - I discovered that it was activating the VG that was causing the problem.
I assume that something has gotten corrupted in the metadata and is crashing the system when it is read. I have "archive" copies of the metadata that were created during the pvmove process. Ie:
[CHI (UTC+0000) root at dicombackup01 ~]# head -20 /etc/lvm/backup/vg_dicomstore
# Generated by LVM2 version 2.02.46-RHEL5 (2009-09-15): Sun Feb 7 08:18:09 2010
contents = "Text Format Volume Group"
version = 1
description = "Created *after* executing 'pvmove -b /dev/dm-10:0-2000'"
creation_host = "dicombackup01.chi.nighthawkrad.net" # Linux dicombackup01.chi.nighthawkrad.net 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64
creation_time = 1265530689 # Sun Feb 7 08:18:09 2010
vg_dicomstore {
id = "w7YvIp-bjYd-sNag-m0DD-t2fL-ShXd-dssXTY"
seqno = 2047
status = ["RESIZEABLE", "READ", "WRITE", "CLUSTERED"]
flags = []
extent_size = 2097152 # 1024 Megabytes
max_lv = 0
max_pv = 0
physical_volumes {
[CHI (UTC+0000) root at dicombackup01 ~]# head -20 /etc/lvm/archive/vg_dicomstore_00820.vg
# Generated by LVM2 version 2.02.46-RHEL5 (2009-09-15): Sun Feb 7 08:18:02 2010
contents = "Text Format Volume Group"
version = 1
description = "Created *before* executing 'pvmove -b /dev/dm-10:0-2000'"
creation_host = "dicombackup01.chi.nighthawkrad.net" # Linux dicombackup01.chi.nighthawkrad.net 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64
creation_time = 1265530682 # Sun Feb 7 08:18:02 2010
vg_dicomstore {
id = "w7YvIp-bjYd-sNag-m0DD-t2fL-ShXd-dssXTY"
seqno = 2046
status = ["RESIZEABLE", "READ", "WRITE", "CLUSTERED"]
flags = []
extent_size = 2097152 # 1024 Megabytes
max_lv = 0
max_pv = 0
physical_volumes {
Should I just try and manually overwrite the metadata on the PVs with /etc/lvm/archive/vg_dicomstore_00820.vg (the one created before running the last pvmove) ? Will that make the failed pvmove "disappear" (like a pvmove --abort would) ?
Any help would be greatly appreciated.
--
Christopher Smith
UNIX Team Leader
NightHawk Radiology Services
Suite 600, 4900 N Scottsdale Road
Scottsdale, 85251, USA
http://www.nighthawkrad.net
USA Toll free: 866 241 6635
Email: csmith at nighthawkrad.net
IP Extension: 4483
Sydney Mobile: +61 4 0739 7563
Sydney Phone: +61 2 8211 2363
US Mobile/Cell: +1 480 717 9562
US Phone: +1 480 822 4483
US Fax: +1 208 763 3643
All phones forwarded to my current location, however, please consider the local time in Arizona before calling from abroad.
More information about the linux-lvm
mailing list