[vdo-devel] dmsetup stuck for more than one day

Andrew Walsh awalsh at redhat.com
Thu Nov 5 17:55:51 UTC 2020


Hi Lukasz,

Version 6.1.3.7 is the latest available as of RHEL-7.8, and 6.1.3.23 is the
latest available as of RHEL-7.9.  Perhaps the CentOS repos haven't been
updated to include RHEL-7.9 content just yet.

Unfortunately the fix for the issue you encountered isn't available in
6.1.3.7 as it was actually fixed in 6.1.3.23.

Andy Walsh


On Thu, Nov 5, 2020 at 11:57 AM Łukasz Michalski <lm at zork.pl> wrote:

> Hmmm looking at http://mirror.centos.org/centos/7/os/x86_64/Packages/ I
> see kmod-kvdo-6.1.3.7-5.el7.x86_64.rp
>
> Is 6.1.3.23 available somewhere?
>
>
> On 05/11/2020 17.50, Sweet Tea Dorminy wrote:
>
> No, I believe you'd need to update the kernel also to go along with the
> updated kmod-kvdo.
>
> On Thu, Nov 5, 2020 at 10:21 AM Łukasz Michalski <lm at zork.pl> wrote:
>
>> Hi,
>>
>> Is it possible to upgrade only vdo and stick with CentOS 7.5.1804 for
>> rest of packages?
>>
>> Regards,
>> Łukasz
>>
>> On 05/11/2020 16.17, Sweet Tea Dorminy wrote:
>>
>> Greetings Łukasz;
>>
>> I think this may be a instance of BZ 1821275
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1821275>, fixed in
>> 6.1.3.23. Is it feasible to restart the machine (unfortunately there's no
>> other way to stop a presumably hung attempt to start VDO), upgrade to at
>> least that version, and try again?
>>
>> Thanks!
>>
>> Sweet Tea Dorminy
>>
>>
>> On Thu, Nov 5, 2020 at 9:54 AM Łukasz Michalski <lm at zork.pl> wrote:
>>
>>> Details below.
>>>
>>> Now I see that I was looking at wrong block device, My vdo is on
>>> /dev/sda and atop shows no activity for it.
>>>
>>> Thanks,
>>> Łukasz
>>>
>>> On 05/11/2020 15.26, Andrew Walsh wrote:
>>>
>>> Hi Lukasz,
>>>
>>> Can you please confirm a few details?  These will help us understand
>>> what may be going on.  We may end up needing additional information, but
>>> this will help us identify a starting point for the investigation.
>>>
>>> **Storage Stack Configuration:**
>>> High Level Configuration: [e.g. SSD -> MD RAID 5 -> VDO -> XFS]
>>>
>>> Two servers, on each:
>>> Hardware RAID6, 54Tb -> LVM -> VDO -> GlusterFS (XFS for bricks) ->
>>> Samba shares.
>>> Currently samba and gluster are disabled.
>>>
>>> Output of `blockdev --report`:
>>>
>>> [root at ixmed1 /]# blockdev --report
>>>
>>> RO    RA   SSZ   BSZ   StartSec            Size   Device
>>> rw   256   512  4096          0  59999990579200   /dev/sda
>>> rw   256   512  4096          0    238999830528   /dev/sdb
>>> rw   256   512  4096       2048      1073741824   /dev/sdb1
>>> rw   256   512  4096    2099200    216446009344   /dev/sdb2
>>> rw   256   512  4096  424845312     21479030784   /dev/sdb3
>>> rw   256   512  4096          0    119810293760   /dev/dm-0
>>> rw   256   512  4096          0     21470642176   /dev/dm-1
>>> rw   256   512  4096          0     32212254720   /dev/dm-2
>>> rw   256   512  4096          0     42949672960   /dev/dm-3
>>> rw   256   512  4096          0     21474836480   /dev/dm-4
>>> rw   256   512  4096          0  21990232555520   /dev/dm-5
>>> rw   256   512  4096          0     21474144256   /dev/drbd999
>>>
>>> Output of `lsblk -o name,maj:min,kname,type,fstype,state,sched,uuid`:
>>>
>>> [root at ixmed1 /]# lsblk -o
>>> name,maj:min,kname,type,fstype,state,sched,uuid
>>> lsblk: dm-6: failed to get device path
>>> lsblk: dm-6: failed to get device path
>>> NAME              MAJ:MIN KNAME   TYPE FSTYPE   STATE SCHED    UUID
>>> sda                 8:0   sda     disk LVM2_mem runni deadline
>>> ggCzji-1O8d-BWCa-XwLe-BJ94-fwHa-cOseC0
>>> └─vgStorage-LV_vdo_Rada--ixmed
>>>                   253:5   dm-5    lvm  vdo      runni
>>> b668b2d9-96bf-4840-a43d-6b7ab0a7f235
>>> sdb                 8:16  sdb     disk          runni deadline
>>> ├─sdb1              8:17  sdb1    part xfs            deadline
>>> f89ef6d8-d9f4-4061-8f48-3ffae8e23b1e
>>> ├─sdb2              8:18  sdb2    part LVM2_mem       deadline
>>> pHO0UQ-aGWu-Hg6g-siiq-TGPT-kw4B-gD0fgs
>>> │ ├─vgSys-root    253:0   dm-0    lvm  xfs      runni
>>> 4f48e2c7-6324-4465-953a-c1a9512ab782
>>> │ ├─vgSys-swap    253:1   dm-1    lvm  swap     runni
>>> 97234c91-7804-43b2-944f-0122c90fc962
>>> │ ├─vgSys-cluster 253:2   dm-2    lvm  xfs      runni
>>> 97b4c285-4bfe-4d4f-8c3c-ca716157bf52
>>> │ └─vgSys-var     253:3   dm-3    lvm  xfs      runni
>>> 6f5c860b-88e0-4d28-bc09-2e365299f86e
>>> └─sdb3              8:19  sdb3    part LVM2_mem       deadline
>>> nvBfNi-qm2u-bt5T-dyCL-3FgQ-DSic-z8dUDq
>>>   └─vgSys-pgsql   253:4   dm-4    lvm  xfs      runni
>>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>>>     └─drbd999     147:999 drbd999 disk xfs
>>> 5c3e18cc-9e0f-4c81-906b-3e68f196cafe
>>>
>>>
>>> **Hardware Information:**
>>>  - CPU: [e.g. 2x Intel Xeon E5-1650 v2 @ 3.5GHz]
>>>  - Memory: [e.g. 128G]
>>>  - Storage: [e.g. Intel Optane SSD 900P]
>>>  - Other: [e.g. iSCSI backed storage]
>>>
>>> Huawei 5288 V5
>>> 64GB RAM
>>> 2 X Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
>>> RAID: Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02) (from lspci,
>>> megaraid_sas driver)
>>>
>>>
>>> **Distro Information:**
>>>  - OS: [e.g. RHEL-7.5]
>>>
>>> CentOS Linux release 7.5.1804 (Core)
>>>
>>>  - Architecture: [e.g. x86_64]
>>>
>>> x86_64
>>>
>>>  - Kernel: [e.g. kernel-3.10.0-862.el7]
>>>
>>> 3.10.0-862.el7
>>>
>>>  - VDO Version: [e.g. vdo-6.2.0.168-18.el7, or a commit hash]
>>>  - KVDO Version: [e.g. kmod-kvdo6.2.0.153-15.el7, or a commit hash]
>>>
>>> [root at ixmed1 /]# yum list |grep vdo
>>> kmod-kvdo.x86_64                          6.1.0.168-16.el7_5
>>> @updates
>>> vdo.x86_64                                6.1.0.168-18
>>> @updates
>>>
>>>  - LVM Version: [e.g. 2.02.177-4.el7]
>>>
>>> 2.02.177(2)-RHEL7 (2018-01-22
>>>
>>>  - Output of `uname -a`: [e.g. Linux localhost.localdomain
>>> 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64
>>> x86_64 GNU/Linux]
>>>
>>> Linux ixmed1 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> On Thu, Nov 5, 2020 at 6:49 AM Łukasz Michalski <lm at zork.pl> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have two 20T two devices that was crashed during power outage - on
>>>> two servers.
>>>>
>>>> After server restart I see in logs on the first server:
>>>>
>>>> [root at ixmed1 /]# dmesg |grep vdo
>>>> [   11.223770] kvdo: modprobe: loaded version 6.1.0.168
>>>> [   11.904949] kvdo0:dmsetup: starting device 'vdo_test' device
>>>> instantiation 0 write policy auto
>>>> [   11.904979] kvdo0:dmsetup: underlying device, REQ_FLUSH: not
>>>> supported, REQ_FUA: not supported
>>>> [   11.904985] kvdo0:dmsetup: Using mode sync automatically.
>>>> [   11.905017] kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash;
>>>> base threads: 5
>>>> [   11.966414] kvdo0:journalQ: Device was dirty, rebuilding reference
>>>> counts
>>>> [   12.452589] kvdo0:logQ0: Finished reading recovery journal
>>>> [   12.458550] kvdo0:logQ0: Highest-numbered recovery journal block has
>>>> sequence number 70548140, and the highest-numbered usable block is 70548140
>>>> [   12.458556] kvdo0:logQ0: Replaying entries into slab journals
>>>> [   13.538099] kvdo0:logQ0: Replayed 5568767 journal entries into slab
>>>> journals
>>>> [   14.174984] kvdo0:logQ0: Recreating missing journal entries
>>>> [   14.175025] kvdo0:journalQ: Synthesized 0 missing journal entries
>>>> [   14.177768] kvdo0:journalQ: Saving recovery progress
>>>> [   14.636416] kvdo0:logQ0: Replaying 2528946 recovery entries into
>>>> block map
>>>>
>>>> [root at ixmed1 /]# uptime
>>>>  12:41:33 up 1 day,  4:07,  2 users,  load average: 1.06, 1.05, 1.16
>>>>
>>>> [root at ixmed1 /]# ps ax |grep vdo
>>>>   1135 ?        Ss     0:00 /usr/bin/python /usr/bin/vdo start --all
>>>> --confFile /etc/vdoconf.yml
>>>>   1210 ?        R    21114668:39 dmsetup create vdo_Rada-ixmed --uuid
>>>> VDO-b668b2d9-96bf-4840-a43d-6b7ab0a7f235 --table 0 72301908952 vdo
>>>> /dev/disk/by-id/dm-name-vgStorage-LV_test 4096 disabled 0 32768 16380 on
>>>> auto vdo_test
>>>> ack=1,bio=4,bioRotationInterval=64,cpu=2,hash=1,logical=1,physical=1
>>>>   1213 ?        S      1:51 [kvdo0:dedupeQ]
>>>>   1214 ?        S      1:51 [kvdo0:journalQ]
>>>>   1215 ?        S      1:51 [kvdo0:packerQ]
>>>>   1216 ?        S      1:51 [kvdo0:logQ0]
>>>>   1217 ?        S      1:51 [kvdo0:physQ0]
>>>>   1218 ?        S      1:50 [kvdo0:hashQ0]
>>>>   1219 ?        S      1:52 [kvdo0:bioQ0]
>>>>   1220 ?        S      1:51 [kvdo0:bioQ1]
>>>>   1221 ?        S      1:51 [kvdo0:bioQ2]
>>>>   1222 ?        S      1:51 [kvdo0:bioQ3]
>>>>   1223 ?        S      1:48 [kvdo0:ackQ]
>>>>   1224 ?        S      1:49 [kvdo0:cpuQ0]
>>>>   1225 ?        S      1:49 [kvdo0:cpuQ1]
>>>>
>>>> The only activity I see is that there are small writes shown in 'atop'
>>>> to vdo underlying device.
>>>>
>>>> On the first server dmsetup takes 100% cpu (one core), on the second
>>>> server dmsetup seems to be idle.
>>>>
>>>> What should I do in this situation?
>>>>
>>>> Regards,
>>>> Łukasz
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> vdo-devel mailing list
>>>> vdo-devel at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>>>
>>>
>>> _______________________________________________
>>> vdo-devel mailing list
>>> vdo-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/vdo-devel
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20201105/96a9a71f/attachment.htm>


More information about the vdo-devel mailing list