[linux-lvm] lvm metadata sequence number reverts

Aaron Young aaron.young at ctl.io
Wed Sep 16 22:31:03 UTC 2015


Yes, I have lots of data to share, I thought first to open at high level.
This is all happening inside a single VM. Archive is available, I will post
them shortly. No lvmetad. No errors that I can tell (at least not on
console or syslog).

root at VA1CTLT-SRN2-03:/etc/lvm/archive# grep seqno test_dvol-13-vg_00*
test_dvol-13-vg_00261-1410850844.vg: seqno = 0  <---- before vgcreate
test_dvol-13-vg_00262-1188507802.vg: seqno = 1   <-- before lvcreate 1
test_dvol-13-vg_00263-1818746321.vg: seqno = 2   <---- before lvcreate 2
test_dvol-13-vg_00264-1122545952.vg: seqno = 3   <--- before lvcreate 3
test_dvol-13-vg_00265-1497145254.vg: seqno = 4  <---- before lvcreate 4
test_dvol-13-vg_00266-1300493675.vg: seqno = 5  <--- before lvs
test_dvol-13-vg_00267-490193445.vg: seqno = 4   <----- disabled device
cache, lvs
test_dvol-13-vg_00268-2051497792.vg: seqno = 4  <----- disabled device
cache, lvs
test_dvol-13-vg_00269-370016695.vg: seqno = 5   <---- enabled device cache,
lvs

The contents of the metadata area seems to be the same (both contain seqno
5):

dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.nocache
dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.cache

cmp sbd13.nocache sbd13.cache

I tracked down these sectors by running strace on
pvcreate/vgcreate/lvcreate. As far as I can tell, all the sectors involved
are being written correctly.

Random facts:
1. Devicemapper still correctly lists the logical volume that is missing
from lvs
2. 3.13.0-44-generic, Ubuntu 14.04
3. LVM version: 2.02.98(2) (2012-10-15) Library version: 1.02.77
(2012-10-15) Driver version: 4.27.0

Random suspicious snippet generated by lvscan -vvv

/dev/mapper/sbd13p1: lvm2 label detected at sector 1
lvmcache: /dev/mapper/sbd13p1: now in VG #orphans_lvm2 (#orphans_lvm2) with
1 mdas
/dev/mapper/sbd13p1: Found metadata at 8704 size 1749 (in area at 4096 size
1044480) for test_dvol-13-vg (DFvQDG-nYVS-QQlT-Uv35-aPr4-2pY0-zMQ0dr)
lvmcache: /dev/mapper/sbd13p1: now in VG test_dvol-13-vg with 1 mdas
lvmcache: /dev/mapper/sbd13p1: setting test_dvol-13-vg VGID to
DFvQDGnYVSQQlTUv35aPr42pY0zMQ0dr
lvmcache: /dev/mapper/sbd13p1: VG test_dvol-13-vg: Set creation host to
VA1CTLT-SRN2-03. Allocated VG test_dvol-13-vg at 0x257bc00.
Using cached label for /dev/mapper/sbd13p1
Read test_dvol-13-vg metadata (4) from /dev/mapper/sbd13p1 at 8704 size
1749
/dev/mapper/sbd13p1 0: 0 19: VM-test_dvol-13-0-hard-drive-0(0:0)
/dev/mapper/sbd13p1 1: 19 19: VM-test_dvol-13-0-hard-drive-1(0:0)
/dev/mapper/sbd13p1 2: 38 19: VM-test_dvol-13-1-hard-drive-0(0:0)
/dev/mapper/sbd13p1 3: 57 42: NULL(0:0) *<----missing logical volume*

I don't understand how this is possible if that sector (8704) is identical
in both cases.

Attached are two verbose straces of vgdisplay, one of which discovered 3
logical volumes and one of that discovers 4.
I am looking for insight into the disk contents that are necessary for this
discovery. Thank you very much.

Aaron



On Wed, 16 Sep 2015 at 03:05 Zdenek Kabelac <zkabelac at redhat.com> wrote:

> Dne 15.9.2015 v 23:18 Aaron Young napsal(a):
> > Hello, I'm deep into debugging an issue we have with a disk driver of
> ours and
> > LVM.
> >
> > Long story short:
> >
> > create vg -> seqno 1
> > create lv1 -> seqno 2
> > create lv2 -> seqno 3
> > create lv3 -> seqno 4
> > create lv4 -> seqno 5
> > <clear our device cache> (note, this generates no IO)
> > vgdisplay: seqno = 4, lv4 is missing
> >
> > * This happens only after dozens to hundreds of iterations. Most of the
> time
> > it is fine.
> >
> > I dd all the metadata blocks off of the pv, yep, seqno5 is on disk
> metadata
> > area perfectly fine. But the system believes 4 is the current version.
> > Shouldn't the system be using the highest value? Or is it stored
> somewhere?
> > What mechanism is responsible for changing the seqno? And where does it
> change
> > it? (Not the metadata contents, just the number)
>
>
> Hi
>
> Your email is quite 'mystic' - I'd need lots of crystal balls to see your
> surrounding conditions.
>
>
> 1.) Is this 'clustered' environment or a  'single' host setup ?
>
> 2.) Do you have 'archive' backup enabled  - can you check what are last
> operations in history before problem happens?
>
> 3.) Are you using 'lvmetad' ? (if so, try  use_lvmetad=0 )
>
> 4.) Kernel version,  lvm2  version ?
>
> 5.) Was there any lvm2 command error  ?
> (as vgdisplay may just do a backup of most recent metadata in case they are
> are missing after some command failure)
>
> Zdenek
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20150916/b56700ea/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vgdisplay.strace.right
Type: application/octet-stream
Size: 1761794 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20150916/b56700ea/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vgdisplay.strace.wrong
Type: application/octet-stream
Size: 1750096 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20150916/b56700ea/attachment-0001.obj>


More information about the linux-lvm mailing list