<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> Months ago, I worked on a NULL pointer deference crash on dm mirror target. I worked out two patches to fix the crash issue, but when I was submitting them, I found that upstream had "fixed" the crash by reverting, you can find the discussion here: - <a class="moz-txt-link-freetext" href="https://patchwork.kernel.org/patch/9808897/">https://patchwork.kernel.org/patch/9808897/</a> Zdenek did through out his doubt, but no body gave response: """ <pre class="content">>> Which kernel version is this ? >> >> I'd thought we've already fixed this BZ for old mirrors: >> <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1382382">https://bugzilla.redhat.com/show_bug.cgi?id=1382382</a> >> >> There similar BZ for md-raid based mirrors (--type raid1) >> <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1416099">https://bugzilla.redhat.com/show_bug.cgi?id=1416099</a> > My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > """ > Revert "dm mirror: use all available legs on multiple failures" Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. I'm now starting to wonder why? It's been a real fix for a real issue - and 'revert' message states there is no such problem ?? I'm confused.... Mike - have you tried the sequence from BZ ? Zdenek </pre> """ I wrongly accepted the facts: 1. the crash issue do disappear; 2. the "reverting" fixing way is likely wrong, but I did follow up it further because people now mainly uses raid1 instead of mirror - my fault to think that way. But, I was just feeling it's hard to persuade the maintainer to revert the "reverting fixes" and try my fix. Anyway, why are you using mirror? why not raid1? Eric <div class="moz-cite-prefix">On 02/05/2018 03:42 PM, Liwei wrote: </div> <blockquote type="cite" cite="mid:CAPE0SYxR2NtM_vdrqSMBsy==YT2MF8we_Q+HJ9Upeb6an2PLpQ@mail.gmail.com"> <div dir="auto">Hi Eric, <div dir="auto"> Thanks for answering! Here are the details:</div> <div dir="auto"> </div> <div dir="auto"># lvm version <div dir="auto"> LVM version: 2.02.176(2) (2017-11-03)</div> <div dir="auto"> Library version: 1.02.145 (2017-11-03)</div> <div dir="auto"> Driver version: 4.37.0</div> <div dir="auto"> Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-clvmd=corosync --with-cluster=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-cmirrord --enable-dmeventd --enable-dbus-service --enable-lvmetad --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync</div> <div dir="auto"> </div> <div dir="auto"># uname -a </div> <div dir="auto">Linux dataserv 4.14.0-3-amd64 #1 SMP Debian 4.14.13-1 (2018-01-14) x86_64 GNU/Linux</div> <div dir="auto"> </div> <div dir="auto">Warm regards, </div> <div dir="auto">Liwei</div> </div> </div> <div class="gmail_extra"> <div class="gmail_quote">On 5 Feb 2018 15:27, "Eric Ren" <<a href="mailto:zren@suse.com" moz-do-not-send="true">zren@suse.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi, Your LVM version and kernel version please? like: """" # lvm version LVM version: 2.02.177(2) (2017-12-18) Library version: 1.03.01 (2017-12-18) Driver version: 4.35.0 # uname -a Linux sle15-c1-n1 4.12.14-9.1-default #1 SMP Fri Jan 19 09:13:51 UTC 2018 (849a2fe) x86_64 x86_64 x86_64 GNU/Linux """ Eric On 02/03/2018 05:43 PM, Liwei wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Hi list, I had a LV that I was converting from linear to mirrored (not raid1) whose source device failed partway-through during the initial sync. I've since recovered the source device, but it seems like the mirror is still acting as if some blocks are not readable? I'm getting this in my logs, and the FS is full of errors: [ +1.613126] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.000278] device-mapper: raid1: Primary mirror (253:25) failed while out-of-sync: Reads may fail. [ +0.085916] device-mapper: raid1: Mirror read failed. [ +0.196562] device-mapper: raid1: Mirror read failed. [ +0.000237] Buffer I/O error on dev dm-27, logical block 5371800560, async page read [ +0.592135] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.082882] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.246945] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.107374] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.083344] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.114949] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.085056] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.203929] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.157953] device-mapper: raid1: Unable to read primary mirror during recovery [ +3.065247] recovery_complete: 23 callbacks suppressed [ +0.000001] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.128064] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.103100] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.107827] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.140871] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.132844] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.124698] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.138502] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.117827] device-mapper: raid1: Unable to read primary mirror during recovery [ +0.125705] device-mapper: raid1: Unable to read primary mirror during recovery [Feb 3 17:09] device-mapper: raid1: Mirror read failed. [ +0.167553] device-mapper: raid1: Mirror read failed. [ +0.000268] Buffer I/O error on dev dm-27, logical block 5367765816, async page read [ +0.135138] device-mapper: raid1: Mirror read failed. [ +0.000238] Buffer I/O error on dev dm-27, logical block 5367765816, async page read [ +0.000365] device-mapper: raid1: Mirror read failed. [ +0.000315] device-mapper: raid1: Mirror read failed. [ +0.000213] Buffer I/O error on dev dm-27, logical block 5367896888, async page read [ +0.000276] device-mapper: raid1: Mirror read failed. [ +0.000199] Buffer I/O error on dev dm-27, logical block 5367765816, async page read However, if I take down the destination device and restart the LV with --activateoption partial, I can read my data and everything checks out. My theory (and what I observed) is that lvm continued the initial sync even after the source drive stopped responding, and has now mapped the blocks that it 'synced' as dead. How can I make lvm retry those blocks again? In fact, I don't trust the mirror anymore, is there a way I can conduct a scrub of the mirror after the initial sync is done? I read about --syncaction check, but seems like it only notes the number of inconsistencies. Can I have lvm re-mirror the inconsistencies from the source to destination device? I trust the source device because we ran a btrfs scrub on it and it reported that all checksums are valid. It took months for the mirror sync to get to this stage (actually, why does it take months to mirror 20TB?), I don't want to start it all over again. Warm regards, Liwei _______________________________________________ linux-lvm mailing list <a href="mailto:linux-lvm@redhat.com" target="_blank" moz-do-not-send="true">linux-lvm@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-lvm" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.redhat.com/mailman/listinfo/linux-lvm</a> read the LVM HOW-TO at <a href="http://tldp.org/HOWTO/LVM-HOWTO/" rel="noreferrer" target="_blank" moz-do-not-send="true">http://tldp.org/HOWTO/LVM-HOWTO/</a> </blockquote> </blockquote> </div> </div> </blockquote> </body> </html>