[linux-lvm] Full DRDB device on LVM is now unusable
Seb A
4jngrhkk3c at snkmail.com
Tue Jan 28 22:54:11 UTC 2014
I'm not sure if this is more of an LVM issue or a DRDB issue, but maybe someone here can help me...
My DRDB device on LVM filled up with data and now it is unusable after a power cycle. The DRDB device that was not on LVM is fine (but it did not fill up).
I configured two DRDB nodes running Openfiler with corosync and pacemaker as per the instructions here [http://www.howtoforge.com/openfiler-2.99-active-passive-with-corosync-pacemaker-and-drbd] over two years ago. At one point it swapped over to what was originally the secondary node "Openfiler2" and I left it like that and all was fine (AFAIK). (I did have a few issues in the early days with it losing sync on reboot / power failure, but that's ancient history.) Eventually the DRBD data partition filled up as there are processes that ftp files onto it. There were lots of proftpd processes that were stuck trying to do a CWD into the data partition and therefore the cpu 'load' went really high. I tried to start a process to delete old files and it got stuck. It wasn't doing anything, and I couldn't cancel or kill it. kill -9 <pid> did not work on that process or any of the stuck proftpd processes. So I could not unmount the drive and when I tried fuser it just killed m!
y ssh session and failed to kill the proftpd processes. I restarted sshd via the console, and, as there had been some kernel panics I decided to reboot, hardly expecting it to succeed. It didn't - it got stuck and I had to kill the virtual power. When it came back up it could not mount the DRBD data partition (that uses LVM). Both the DRBD partitions were synchronized before and after the reboot - they reconnected and the primary stayed on 'Openfiler2'.
The first errors after the reboot were in here:
daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Activating volume group vg0drbd
daemon.info<30>: Jan 28 15:26:34 Openfiler2 LVM[3228]: INFO: Reading all physical volumes. This may take a while... Found volume group "localvg" using metadata type lvm2 Found volume group "vg0drbd" using metadata type lvm2
kern.err<3>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: table: 253:1: linear: dm-linear: Device lookup failed
kern.warn<4>: Jan 28 15:26:34 Openfiler2 kernel: device-mapper: ioctl: error adding target to table
daemon.err<27>: Jan 28 15:26:34 Openfiler2 LVM[3228]: ERROR: device-mapper: reload ioctl failed: Invalid argument 1 logical volume(s) in volume group "vg0drbd" now active
daemon.info<30>: Jan 28 15:26:34 Openfiler2 crmd: [1284]: info: process_lrm_event: LRM operation lvmdata_start_0 (call=26, rc=1, cib-update=31, confirmed=true) unknown error
I tried to mount it manually but the device is missing. Any suggestions on how I can get this volume mounted? Thanks!
For reference:
kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: uevent: version 1.0.3
kern.info<6>: Jan 28 17:28:11 Openfiler2 kernel: device-mapper: ioctl: 4.17.0-ioctl (2010-03-05) initialised: dm-devel at redhat.com
Linux Openfiler2 2.6.32-71.18.1.el6-0.20.smp.gcc4.1.x86_64 #1 SMP Fri Mar 25 23:12:47 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
My last line of /etc/fstab is commented out as it is controlled by pacemaker:
#/dev/vg0drbd/filer /mnt/vg0drbd/filer xfs defaults,usrquota,grpquota 0 0
Right now I have:
[root at Openfiler2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil at fat-tyre, 2011-01-28 12:17:35
m:res cs ro ds p mounted fstype
0:cluster_metadata Connected Primary/Secondary UpToDate/UpToDate C /cluster_metadata ext3
1:vg0_drbd Connected Primary/Secondary UpToDate/UpToDate C
[root at Openfiler2 ~]# crm status
============
Last updated: Tue Jan 28 20:11:19 2014
Stack: openais
Current DC: Openfiler1 - partition with quorum
Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ Openfiler1 Openfiler2 ]
Resource Group: g_services
MetaFS (ocf::heartbeat:Filesystem): Started Openfiler2
lvmdata (ocf::heartbeat:LVM): Stopped
DataFS (ocf::heartbeat:Filesystem): Stopped
openfiler (lsb:openfiler): Stopped
ClusterIP (ocf::heartbeat:IPaddr2): Stopped
iscsi (lsb:iscsi-target): Stopped
ldap (lsb:ldap): Stopped
samba (lsb:smb): Stopped
nfs (lsb:nfs): Stopped
nfslock (lsb:nfslock): Stopped
ftp (lsb:proftpd): Stopped
Master/Slave Set: ms_g_drbd
Masters: [ Openfiler2 ]
Slaves: [ Openfiler1 ]
Failed actions:
lvmdata_start_0 (node=Openfiler2, call=28, rc=1, status=complete): unknown error
######
More reference:
[root at Openfiler2 ~]# pvdisplay
--- Physical volume ---
PV Name /dev/sdc1
VG Name localvg
PV Size 975.93 GiB / not usable 2.32 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 249837
Free PE 0
Allocated PE 249837
PV UUID OPFfsk-LXkz-3Voc-CQbj-Qf8d-YmHs-cR4Xjt
--- Physical volume ---
PV Name /dev/sdb2
VG Name localvg
PV Size 975.44 GiB / not usable 3.32 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 249712
Free PE 25600
Allocated PE 224112
PV UUID yG1gfI-1HRb-AdCS-RqUV-Cm2j-pdqe-ZcB10j
--- Physical volume ---
PV Name /dev/dm-0
VG Name vg0drbd
PV Size 1.81 TiB / not usable 1.11 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 473934
Free PE 0
Allocated PE 473934
PV UUID u8Au1m-U1pJ-RMik-bZGk-7NPA-3EOL-P21MHW
[root at Openfiler2 ~]# pvscan
PV /dev/sdc1 VG localvg lvm2 [975.93 GiB / 0 free]
PV /dev/sdb2 VG localvg lvm2 [975.44 GiB / 100.00 GiB free]
PV /dev/localvg/r1 VG vg0drbd lvm2 [1.81 TiB / 0 free]
Total: 3 [3.71 TiB] / in use: 3 [3.71 TiB] / in no VG: 0 [0 ]
[root at Openfiler2 ~]# vgdisplay
--- Volume group ---
VG Name localvg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 23
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 1.91 TiB
PE Size 4.00 MiB
Total PE 499549
Alloc PE / Size 473949 / 1.81 TiB
Free PE / Size 25600 / 100.00 GiB
VG UUID 5knbwX-LaJ5-1fEd-OD1R-59jZ-Otmy-8IKtVl
--- Volume group ---
VG Name vg0drbd
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 7
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.81 TiB
PE Size 4.00 MiB
Total PE 473934
Alloc PE / Size 473934 / 1.81 TiB
Free PE / Size 0 / 0
VG UUID 4pgyVr-Eduj-2CVD-rUhf-Sr7L-Q814-45BE2N
[root at Openfiler2 ~]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "localvg" using metadata type lvm2
Found volume group "vg0drbd" using metadata type lvm2
[root at Openfiler2 ~]# lvdisplay
--- Logical volume ---
LV Name /dev/localvg/r1
VG Name localvg
LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
LV Write Access read/write
LV Status available
# open 2
LV Size 1.81 TiB
Current LE 473949
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Name /dev/vg0drbd/filer
VG Name vg0drbd
LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
LV Write Access read/write
LV Status NOT available
LV Size 1.81 TiB
Current LE 473934
Segments 1
Allocation inherit
Read ahead sectors auto
[root at Openfiler2 ~]# lvscan
ACTIVE '/dev/localvg/r1' [1.81 TiB] inherit
inactive '/dev/vg0drbd/filer' [1.81 TiB] inherit
[root at Openfiler2 ~]# lvchange -ay /dev/vg0drbd/filer
device-mapper: reload ioctl failed: Invalid argument
[root at Openfiler2 ~]# lvdisplay
--- Logical volume ---
LV Name /dev/localvg/r1
VG Name localvg
LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
LV Write Access read/write
LV Status available
# open 2
LV Size 1.81 TiB
Current LE 473949
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Name /dev/vg0drbd/filer
VG Name vg0drbd
LV UUID eSuNJr-yFDC-WCET-sIgi-IgTf-JRYz-Ack7oe
LV Write Access read/write
LV Status available
# open 0
LV Size 1.81 TiB
Current LE 473934
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
[root at Openfiler2 ~]# lvscan
ACTIVE '/dev/localvg/r1' [1.81 TiB] inherit
ACTIVE '/dev/vg0drbd/filer' [1.81 TiB] inherit
[root at Openfiler2 ~]# ls -l /dev/dm-*
brw-rw---- 1 root disk 253, 0 Jan 28 17:39 /dev/dm-0
brw-rw---- 1 root disk 253, 1 Jan 28 21:00 /dev/dm-1
[root at Openfiler2 ~]# dmsetup ls
localvg-r1 (253, 0)
vg0drbd-filer (253, 1)
[root at Openfiler2 ~]# dmsetup info
Name: localvg-r1
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 2
Event number: 0
Major, minor: 253, 0
Number of targets: 2
UUID: LVM-5knbwXLaJ51fEdOD1R59jZOtmy8IKtVleSuNJryFDCWCETsIgiIgTfJRYzAck7oe
Name: vg0drbd-filer
State: ACTIVE
Read Ahead: 256
Tables present: None
Open count: 0
Event number: 0
Major, minor: 253, 1
Number of targets: 0
UUID: LVM-4pgyVrEduj2CVDrUhfSr7LQ81445BE2NeSuNJryFDCWCETsIgiIgTfJRYzAck7oe
[root at Openfiler2 ~]# dmsetup deps
localvg-r1: 2 dependencies : (8, 18) (8, 33)
vg0drbd-filer: 0 dependencies :
[root at Openfiler2 ~]# dmsetup table
localvg-r1: 0 2046664704 linear 8:33 2048
localvg-r1: 2046664704 1835925504 linear 8:18 2048
vg0drbd-filer:
[root at Openfiler2 ~]# drbdsetup /dev/drbd1 show
disk {
size 0s _is_default; # bytes
on-io-error detach;
fencing dont-care _is_default;
max-bio-bvecs 0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size 2048 _is_default;
max-buffers 2048 _is_default;
unplug-watermark 128 _is_default;
connect-int 10 _is_default; # seconds
ping-int 10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count 0 _is_default;
after-sb-0pri disconnect _is_default;
after-sb-1pri disconnect _is_default;
after-sb-2pri disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout 5 _is_default; # 1/10 seconds
on-congestion block _is_default;
congestion-fill 0s _is_default; # byte
congestion-extents 127 _is_default;
}
syncer {
rate 112640k; # bytes/second
after 0;
al-extents 127 _is_default;
on-no-data-accessible io-error _is_default;
c-plan-ahead 0 _is_default; # 1/10 seconds
c-delay-target 10 _is_default; # 1/10 seconds
c-fill-target 0s _is_default; # bytes
c-max-rate 102400k _is_default; # bytes/second
c-min-rate 4096k _is_default; # bytes/second
}
protocol C;
_this_host {
device minor 1;
disk "/dev/localvg/r1";
meta-disk internal;
address ipv4 192.168.100.159:7789;
}
_remote_host {
address ipv4 192.168.100.158:7789;
}
[root at Openfiler2 ~]# crm configure show
node Openfiler1 \
attributes standby="off"
node Openfiler2 \
attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.4.157" cidr_netmask="32" \
op monitor interval="30s"
primitive DataFS ocf:heartbeat:Filesystem \
params device="/dev/vg0drbd/filer" directory="/mnt/vg0drbd/filer" fstype="xfs" \
meta target-role="started"
primitive MetaFS ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/cluster_metadata" fstype="ext3" \
meta target-role="started"
primitive drbd_data ocf:linbit:drbd \
params drbd_resource="vg0_drbd" \
op monitor interval="15s"
primitive drbd_meta ocf:linbit:drbd \
params drbd_resource="cluster_metadata" \
op monitor interval="15s"
primitive ftp lsb:proftpd \
meta target-role="stopped"
primitive iscsi lsb:iscsi-target
primitive ldap lsb:ldap
primitive lvmdata ocf:heartbeat:LVM \
params volgrpname="vg0drbd" \
meta target-role="started"
primitive nfs lsb:nfs
primitive nfslock lsb:nfslock
primitive openfiler lsb:openfiler
primitive samba lsb:smb
group g_drbd drbd_meta drbd_data
group g_services MetaFS lvmdata DataFS openfiler ClusterIP iscsi ldap samba nfs nfslock ftp
ms ms_g_drbd g_drbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location cli-prefer-ClusterIP ClusterIP \
rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq Openfiler1
location cli-standby-g_services g_services \
rule $id="cli-standby-rule-g_services" -inf: #uname eq Openfiler1
location cli-standby-ms_g_drbd ms_g_drbd \
rule $id="cli-standby-ms_g_drbd-rule" $role="Master" -inf: #uname eq Openfiler1
colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master
order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start
property $id="cib-bootstrap-options" \
dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1390944138"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
I intentially stopped proftpd (ftp) via the Linux Cluster Management Console 1.5.14 so that I didn't get more proftpd processes starting up if you are wondering why it says stopped above.
Many thanks and regards,
Seb A
More information about the linux-lvm
mailing list