[vdo-devel] dmsetup stuck for more than one day

Sweet Tea Dorminy sweettea at redhat.com
Thu Nov 5 16:50:45 UTC 2020


No, I believe you'd need to update the kernel also to go along with the
updated kmod-kvdo.

On Thu, Nov 5, 2020 at 10:21 AM Łukasz Michalski <lm at zork.pl> wrote:

> Hi,
>
> Is it possible to upgrade only vdo and stick with CentOS 7.5.1804 for rest
> of packages?
>
> Regards,
> Łukasz
>
> On 05/11/2020 16.17, Sweet Tea Dorminy wrote:
>
> Greetings Łukasz;
>
> I think this may be a instance of BZ 1821275
> <https://bugzilla.redhat.com/show_bug.cgi?id=1821275>, fixed in 6.1.3.23.
> Is it feasible to restart the machine (unfortunately there's no other way
> to stop a presumably hung attempt to start VDO), upgrade to at least that
> version, and try again?
>
> Thanks!
>
> Sweet Tea Dorminy
>
>
> On Thu, Nov 5, 2020 at 9:54 AM Łukasz Michalski <lm at zork.pl> wrote:
>
>> Details below.
>>
>> Now I see that I was looking at wrong block device, My vdo is on /dev/sda
>> and atop shows no activity for it.
>>
>> Thanks,
>> Łukasz
>>
>> On 05/11/2020 15.26, Andrew Walsh wrote:
>>
>> Hi Lukasz,
>>
>> Can you please confirm a few details?  These will help us understand what
>> may be going on.  We may end up needing additional information, but this
>> will help us identify a starting point for the investigation.
>>
>> **Storage Stack Configuration:**
>> High Level Configuration: [e.g. SSD -> MD RAID 5 -> VDO -> XFS]
>>
>> Two servers, on each:
>> Hardware RAID6, 54Tb -> LVM -> VDO -> GlusterFS (XFS for bricks) -> Samba
>> shares.
>> Currently samba and gluster are disabled.
>>
>> Output of `blockdev --report`:
>>
>> [root at ixmed1 /]# blockdev --report
>>
>> RO    RA   SSZ   BSZ   StartSec            Size   Device
>> rw   256   512  4096          0  59999990579200   /dev/sda
>> rw   256   512  4096          0    238999830528   /dev/sdb
>> rw   256   512  4096       2048      1073741824   /dev/sdb1
>> rw   256   512  4096    2099200    216446009344   /dev/sdb2
>> rw   256   512  4096  424845312     21479030784   /dev/sdb3
>> rw   256   512  4096          0    119810293760   /dev/dm-0
>> rw   256   512  4096          0     21470642176   /dev/dm-1
>> rw   256   512  4096          0     32212254720   /dev/dm-2
>> rw   256   512  4096          0     42949672960   /dev/dm-3
>> rw   256   512  4096          0     21474836480   /dev/dm-4
>> rw   256   512  4096          0  21990232555520   /dev/dm-5
>> rw   256   512  4096          0     21474144256   /dev/drbd999
>>
>> Output of `lsblk -o name,maj:min,kname,type,fstype,state,sched,uuid`:
>>
>> [root at ixmed1 /]# lsblk -o name,maj:min,kname,type,fstype,state,sched,uuid
>> lsblk: dm-6: failed to get device path
>> lsblk: dm-6: failed to get device path
>> NAME              MAJ:MIN KNAME   TYPE FSTYPE   STATE SCHED    UUID
>> sda                 8:0   sda     disk LVM2_mem runni deadline
>> ggCzji-1O8d-BWCa-XwLe-BJ94-fwHa-cOseC0
>> └─vgStorage-LV_vdo_Rada--ixmed
>>                   253:5   dm-5    lvm  vdo      runni
>> b668b2d9-96bf-4840-a43d-6b7ab0a7f235
>> sdb                 8:16  sdb     disk          runni deadline
>> ├─sdb1              8:17  sdb1    part xfs            deadline
>> f89ef6d8-d9f4-4061-8f48-3ffae8e23b1e
>> ├─sdb2              8:18  sdb2    part LVM2_mem       deadline
>> pHO0UQ-aGWu-Hg6g-siiq-TGPT-kw4B-gD0fgs
>> │ ├─vgSys-root    253:0   dm-0    lvm  xfs      runni
>> 4f48e2c7-6324-4465-953a-c1a9512ab782
>> │ ├─vgSys-swap    253:1   dm-1    lvm  swap     runni
>> 97234c91-7804-43b2-944f-0122c90fc962
>> │ ├─vgSys-cluster 253:2   dm-2    lvm  xfs      runni
>> 97b4c285-4bfe-4d4f-8c3c-ca716157bf52
>> │ └─vgSys-var     253:3   dm-3    lvm  xfs      runni
>> 6f5c860b-88e0-4d28-bc09-2e365299f86e
>> └─sdb3              8:19  sdb3    part LVM2_mem       deadline
>> nvBfNi-qm2u-bt5T-dyCL-3FgQ-DSic-z8dUDq
>>   └─vgSys-pgsql   253:4   dm-4    lvm  xfs      runni
>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>>     └─drbd999     147:999 drbd999 disk xfs
>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>>
>>
>> **Hardware Information:**
>>  - CPU: [e.g. 2x Intel Xeon E5-1650 v2 @ 3.5GHz]
>>  - Memory: [e.g. 128G]
>>  - Storage: [e.g. Intel Optane SSD 900P]
>>  - Other: [e.g. iSCSI backed storage]
>>
>> Huawei 5288 V5
>> 64GB RAM
>> 2 X Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
>> RAID: Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02) (from lspci,
>> megaraid_sas driver)
>>
>>
>> **Distro Information:**
>>  - OS: [e.g. RHEL-7.5]
>>
>> CentOS Linux release 7.5.1804 (Core)
>>
>>  - Architecture: [e.g. x86_64]
>>
>> x86_64
>>
>>  - Kernel: [e.g. kernel-3.10.0-862.el7]
>>
>> 3.10.0-862.el7
>>
>>  - VDO Version: [e.g. vdo-6.2.0.168-18.el7, or a commit hash]
>>  - KVDO Version: [e.g. kmod-kvdo6.2.0.153-15.el7, or a commit hash]
>>
>> [root at ixmed1 /]# yum list |grep vdo
>> kmod-kvdo.x86_64                          6.1.0.168-16.el7_5
>> @updates
>> vdo.x86_64                                6.1.0.168-18
>> @updates
>>
>>  - LVM Version: [e.g. 2.02.177-4.el7]
>>
>> 2.02.177(2)-RHEL7 (2018-01-22
>>
>>  - Output of `uname -a`: [e.g. Linux localhost.localdomain
>> 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64
>> x86_64 GNU/Linux]
>>
>> Linux ixmed1 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>> On Thu, Nov 5, 2020 at 6:49 AM Łukasz Michalski <lm at zork.pl> wrote:
>>
>>> Hi,
>>>
>>> I have two 20T two devices that was crashed during power outage - on two
>>> servers.
>>>
>>> After server restart I see in logs on the first server:
>>>
>>> [root at ixmed1 /]# dmesg |grep vdo
>>> [   11.223770] kvdo: modprobe: loaded version 6.1.0.168
>>> [   11.904949] kvdo0:dmsetup: starting device 'vdo_test' device
>>> instantiation 0 write policy auto
>>> [   11.904979] kvdo0:dmsetup: underlying device, REQ_FLUSH: not
>>> supported, REQ_FUA: not supported
>>> [   11.904985] kvdo0:dmsetup: Using mode sync automatically.
>>> [   11.905017] kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash; base
>>> threads: 5
>>> [   11.966414] kvdo0:journalQ: Device was dirty, rebuilding reference
>>> counts
>>> [   12.452589] kvdo0:logQ0: Finished reading recovery journal
>>> [   12.458550] kvdo0:logQ0: Highest-numbered recovery journal block has
>>> sequence number 70548140, and the highest-numbered usable block is 70548140
>>> [   12.458556] kvdo0:logQ0: Replaying entries into slab journals
>>> [   13.538099] kvdo0:logQ0: Replayed 5568767 journal entries into slab
>>> journals
>>> [   14.174984] kvdo0:logQ0: Recreating missing journal entries
>>> [   14.175025] kvdo0:journalQ: Synthesized 0 missing journal entries
>>> [   14.177768] kvdo0:journalQ: Saving recovery progress
>>> [   14.636416] kvdo0:logQ0: Replaying 2528946 recovery entries into
>>> block map
>>>
>>> [root at ixmed1 /]# uptime
>>>  12:41:33 up 1 day,  4:07,  2 users,  load average: 1.06, 1.05, 1.16
>>>
>>> [root at ixmed1 /]# ps ax |grep vdo
>>>   1135 ?        Ss     0:00 /usr/bin/python /usr/bin/vdo start --all
>>> --confFile /etc/vdoconf.yml
>>>   1210 ?        R    21114668:39 dmsetup create vdo_Rada-ixmed --uuid
>>> VDO-b668b2d9-96bf-4840-a43d-6b7ab0a7f235 --table 0 72301908952 vdo
>>> /dev/disk/by-id/dm-name-vgStorage-LV_test 4096 disabled 0 32768 16380 on
>>> auto vdo_test
>>> ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1
>>>   1213 ?        S      1:51 [kvdo0:dedupeQ]
>>>   1214 ?        S      1:51 [kvdo0:journalQ]
>>>   1215 ?        S      1:51 [kvdo0:packerQ]
>>>   1216 ?        S      1:51 [kvdo0:logQ0]
>>>   1217 ?        S      1:51 [kvdo0:physQ0]
>>>   1218 ?        S      1:50 [kvdo0:hashQ0]
>>>   1219 ?        S      1:52 [kvdo0:bioQ0]
>>>   1220 ?        S      1:51 [kvdo0:bioQ1]
>>>   1221 ?        S      1:51 [kvdo0:bioQ2]
>>>   1222 ?        S      1:51 [kvdo0:bioQ3]
>>>   1223 ?        S      1:48 [kvdo0:ackQ]
>>>   1224 ?        S      1:49 [kvdo0:cpuQ0]
>>>   1225 ?        S      1:49 [kvdo0:cpuQ1]
>>>
>>> The only activity I see is that there are small writes shown in 'atop'
>>> to vdo underlying device.
>>>
>>> On the first server dmsetup takes 100% cpu (one core), on the second
>>> server dmsetup seems to be idle.
>>>
>>> What should I do in this situation?
>>>
>>> Regards,
>>> Łukasz
>>>
>>>
>>>
>>> _______________________________________________
>>> vdo-devel mailing list
>>> vdo-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>>
>>
>> _______________________________________________
>> vdo-devel mailing list
>> vdo-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20201105/7638274a/attachment.htm>


More information about the vdo-devel mailing list