[vdo-devel] dmsetup stuck for more than one day
Sweet Tea Dorminy
sweettea at redhat.com
Thu Nov 5 16:50:45 UTC 2020
No, I believe you'd need to update the kernel also to go along with the
updated kmod-kvdo.
On Thu, Nov 5, 2020 at 10:21 AM Łukasz Michalski <lm at zork.pl> wrote:
> Hi,
>
> Is it possible to upgrade only vdo and stick with CentOS 7.5.1804 for rest
> of packages?
>
> Regards,
> Łukasz
>
> On 05/11/2020 16.17, Sweet Tea Dorminy wrote:
>
> Greetings Łukasz;
>
> I think this may be a instance of BZ 1821275
> <https://bugzilla.redhat.com/show_bug.cgi?id=1821275>, fixed in 6.1.3.23.
> Is it feasible to restart the machine (unfortunately there's no other way
> to stop a presumably hung attempt to start VDO), upgrade to at least that
> version, and try again?
>
> Thanks!
>
> Sweet Tea Dorminy
>
>
> On Thu, Nov 5, 2020 at 9:54 AM Łukasz Michalski <lm at zork.pl> wrote:
>
>> Details below.
>>
>> Now I see that I was looking at wrong block device, My vdo is on /dev/sda
>> and atop shows no activity for it.
>>
>> Thanks,
>> Łukasz
>>
>> On 05/11/2020 15.26, Andrew Walsh wrote:
>>
>> Hi Lukasz,
>>
>> Can you please confirm a few details? These will help us understand what
>> may be going on. We may end up needing additional information, but this
>> will help us identify a starting point for the investigation.
>>
>> **Storage Stack Configuration:**
>> High Level Configuration: [e.g. SSD -> MD RAID 5 -> VDO -> XFS]
>>
>> Two servers, on each:
>> Hardware RAID6, 54Tb -> LVM -> VDO -> GlusterFS (XFS for bricks) -> Samba
>> shares.
>> Currently samba and gluster are disabled.
>>
>> Output of `blockdev --report`:
>>
>> [root at ixmed1 /]# blockdev --report
>>
>> RO RA SSZ BSZ StartSec Size Device
>> rw 256 512 4096 0 59999990579200 /dev/sda
>> rw 256 512 4096 0 238999830528 /dev/sdb
>> rw 256 512 4096 2048 1073741824 /dev/sdb1
>> rw 256 512 4096 2099200 216446009344 /dev/sdb2
>> rw 256 512 4096 424845312 21479030784 /dev/sdb3
>> rw 256 512 4096 0 119810293760 /dev/dm-0
>> rw 256 512 4096 0 21470642176 /dev/dm-1
>> rw 256 512 4096 0 32212254720 /dev/dm-2
>> rw 256 512 4096 0 42949672960 /dev/dm-3
>> rw 256 512 4096 0 21474836480 /dev/dm-4
>> rw 256 512 4096 0 21990232555520 /dev/dm-5
>> rw 256 512 4096 0 21474144256 /dev/drbd999
>>
>> Output of `lsblk -o name,maj:min,kname,type,fstype,state,sched,uuid`:
>>
>> [root at ixmed1 /]# lsblk -o name,maj:min,kname,type,fstype,state,sched,uuid
>> lsblk: dm-6: failed to get device path
>> lsblk: dm-6: failed to get device path
>> NAME MAJ:MIN KNAME TYPE FSTYPE STATE SCHED UUID
>> sda 8:0 sda disk LVM2_mem runni deadline
>> ggCzji-1O8d-BWCa-XwLe-BJ94-fwHa-cOseC0
>> └─vgStorage-LV_vdo_Rada--ixmed
>> 253:5 dm-5 lvm vdo runni
>> b668b2d9-96bf-4840-a43d-6b7ab0a7f235
>> sdb 8:16 sdb disk runni deadline
>> ├─sdb1 8:17 sdb1 part xfs deadline
>> f89ef6d8-d9f4-4061-8f48-3ffae8e23b1e
>> ├─sdb2 8:18 sdb2 part LVM2_mem deadline
>> pHO0UQ-aGWu-Hg6g-siiq-TGPT-kw4B-gD0fgs
>> │ ├─vgSys-root 253:0 dm-0 lvm xfs runni
>> 4f48e2c7-6324-4465-953a-c1a9512ab782
>> │ ├─vgSys-swap 253:1 dm-1 lvm swap runni
>> 97234c91-7804-43b2-944f-0122c90fc962
>> │ ├─vgSys-cluster 253:2 dm-2 lvm xfs runni
>> 97b4c285-4bfe-4d4f-8c3c-ca716157bf52
>> │ └─vgSys-var 253:3 dm-3 lvm xfs runni
>> 6f5c860b-88e0-4d28-bc09-2e365299f86e
>> └─sdb3 8:19 sdb3 part LVM2_mem deadline
>> nvBfNi-qm2u-bt5T-dyCL-3FgQ-DSic-z8dUDq
>> └─vgSys-pgsql 253:4 dm-4 lvm xfs runni
>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>> └─drbd999 147:999 drbd999 disk xfs
>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>>
>>
>> **Hardware Information:**
>> - CPU: [e.g. 2x Intel Xeon E5-1650 v2 @ 3.5GHz]
>> - Memory: [e.g. 128G]
>> - Storage: [e.g. Intel Optane SSD 900P]
>> - Other: [e.g. iSCSI backed storage]
>>
>> Huawei 5288 V5
>> 64GB RAM
>> 2 X Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
>> RAID: Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02) (from lspci,
>> megaraid_sas driver)
>>
>>
>> **Distro Information:**
>> - OS: [e.g. RHEL-7.5]
>>
>> CentOS Linux release 7.5.1804 (Core)
>>
>> - Architecture: [e.g. x86_64]
>>
>> x86_64
>>
>> - Kernel: [e.g. kernel-3.10.0-862.el7]
>>
>> 3.10.0-862.el7
>>
>> - VDO Version: [e.g. vdo-6.2.0.168-18.el7, or a commit hash]
>> - KVDO Version: [e.g. kmod-kvdo6.2.0.153-15.el7, or a commit hash]
>>
>> [root at ixmed1 /]# yum list |grep vdo
>> kmod-kvdo.x86_64 6.1.0.168-16.el7_5
>> @updates
>> vdo.x86_64 6.1.0.168-18
>> @updates
>>
>> - LVM Version: [e.g. 2.02.177-4.el7]
>>
>> 2.02.177(2)-RHEL7 (2018-01-22
>>
>> - Output of `uname -a`: [e.g. Linux localhost.localdomain
>> 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64
>> x86_64 GNU/Linux]
>>
>> Linux ixmed1 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> On Thu, Nov 5, 2020 at 6:49 AM Łukasz Michalski <lm at zork.pl> wrote:
>>
>>> Hi,
>>>
>>> I have two 20T two devices that was crashed during power outage - on two
>>> servers.
>>>
>>> After server restart I see in logs on the first server:
>>>
>>> [root at ixmed1 /]# dmesg |grep vdo
>>> [ 11.223770] kvdo: modprobe: loaded version 6.1.0.168
>>> [ 11.904949] kvdo0:dmsetup: starting device 'vdo_test' device
>>> instantiation 0 write policy auto
>>> [ 11.904979] kvdo0:dmsetup: underlying device, REQ_FLUSH: not
>>> supported, REQ_FUA: not supported
>>> [ 11.904985] kvdo0:dmsetup: Using mode sync automatically.
>>> [ 11.905017] kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash; base
>>> threads: 5
>>> [ 11.966414] kvdo0:journalQ: Device was dirty, rebuilding reference
>>> counts
>>> [ 12.452589] kvdo0:logQ0: Finished reading recovery journal
>>> [ 12.458550] kvdo0:logQ0: Highest-numbered recovery journal block has
>>> sequence number 70548140, and the highest-numbered usable block is 70548140
>>> [ 12.458556] kvdo0:logQ0: Replaying entries into slab journals
>>> [ 13.538099] kvdo0:logQ0: Replayed 5568767 journal entries into slab
>>> journals
>>> [ 14.174984] kvdo0:logQ0: Recreating missing journal entries
>>> [ 14.175025] kvdo0:journalQ: Synthesized 0 missing journal entries
>>> [ 14.177768] kvdo0:journalQ: Saving recovery progress
>>> [ 14.636416] kvdo0:logQ0: Replaying 2528946 recovery entries into
>>> block map
>>>
>>> [root at ixmed1 /]# uptime
>>> 12:41:33 up 1 day, 4:07, 2 users, load average: 1.06, 1.05, 1.16
>>>
>>> [root at ixmed1 /]# ps ax |grep vdo
>>> 1135 ? Ss 0:00 /usr/bin/python /usr/bin/vdo start --all
>>> --confFile /etc/vdoconf.yml
>>> 1210 ? R 21114668:39 dmsetup create vdo_Rada-ixmed --uuid
>>> VDO-b668b2d9-96bf-4840-a43d-6b7ab0a7f235 --table 0 72301908952 vdo
>>> /dev/disk/by-id/dm-name-vgStorage-LV_test 4096 disabled 0 32768 16380 on
>>> auto vdo_test
>>> ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1
>>> 1213 ? S 1:51 [kvdo0:dedupeQ]
>>> 1214 ? S 1:51 [kvdo0:journalQ]
>>> 1215 ? S 1:51 [kvdo0:packerQ]
>>> 1216 ? S 1:51 [kvdo0:logQ0]
>>> 1217 ? S 1:51 [kvdo0:physQ0]
>>> 1218 ? S 1:50 [kvdo0:hashQ0]
>>> 1219 ? S 1:52 [kvdo0:bioQ0]
>>> 1220 ? S 1:51 [kvdo0:bioQ1]
>>> 1221 ? S 1:51 [kvdo0:bioQ2]
>>> 1222 ? S 1:51 [kvdo0:bioQ3]
>>> 1223 ? S 1:48 [kvdo0:ackQ]
>>> 1224 ? S 1:49 [kvdo0:cpuQ0]
>>> 1225 ? S 1:49 [kvdo0:cpuQ1]
>>>
>>> The only activity I see is that there are small writes shown in 'atop'
>>> to vdo underlying device.
>>>
>>> On the first server dmsetup takes 100% cpu (one core), on the second
>>> server dmsetup seems to be idle.
>>>
>>> What should I do in this situation?
>>>
>>> Regards,
>>> Łukasz
>>>
>>>
>>>
>>> _______________________________________________
>>> vdo-devel mailing list
>>> vdo-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>>
>>
>> _______________________________________________
>> vdo-devel mailing list
>> vdo-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20201105/7638274a/attachment.htm>
More information about the vdo-devel
mailing list