[dm-devel] fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss

Mon May 20 13:53:30 UTC 2019

Had same issue on a similar (well,  quite exactly same setup), on a machine
in production.
But It Is more than 4 tera of data, so in the end I re-dd the image and
restarted, sticking to 5.0.y branch never had problem.
I was able to replicate it. SSD Samsung, more recent version.
Not with btrfs but ext4, by the way.
I saw the discard of big initial part of lvm partition. I can't find
superblocks Copy in the First half, but torwards the end of logical volume.

Sorry, i can't play with It again, but i have the whole (4 tera) dd image
with the bug.

Ciao,
Gelma

Il lun 20 mag 2019, 02:38 Michael Laß <bevan at bi-co.net> ha scritto:

> CC'ing dm-devel, as this seems to be a dm-related issue. Short summary for
> new readers:
>
> On Linux 5.1 (tested up to 5.1.3), fstrim may discard too many blocks,
> leading to data loss. I have the following storage stack:
>
> btrfs
> dm-crypt (LUKS)
> LVM logical volume
> LVM single physical volume
> MBR partition
> Samsung 830 SSD
>
> The mapping between logical volumes and physical segments is a bit mixed
> up. See below for the output for “pvdisplay -m”. When I issue fstrim on the
> mounted btrfs volume, I get the following kernel messages:
>
> attempt to access beyond end of device
> sda1: rw=16387, want=252755893, limit=250067632
> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>
> At the same time, other logical volumes on the same physical volume are
> destroyed. Also the btrfs volume itself may be damaged (this seems to
> depend on the actual usage).
>
> I can easily reproduce this issue locally and I’m currently bisecting. So
> far I could narrow down the range of commits to:
> Good: 92fff53b7191cae566be9ca6752069426c7f8241
> Bad: 225557446856448039a9e495da37b72c20071ef2
>
> In this range of commits, there are only dm-related changes.
>
> So far, I have not reproduced the issue with other file systems or a
> simplified stack. I first want to continue bisecting but this may take
> another day.
>
>
> > Am 18.05.2019 um 12:26 schrieb Qu Wenruo <quwenruo.btrfs at gmx.com>:
> > On 2019/5/18 下午5:18, Michael Laß wrote:
> >>
> >>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists at colorremedies.com>:
> >>>
> >>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan at bi-co.net> wrote:
> >>>>
> >>>>
> >>>> I tried to reproduce this issue: I recreated the btrfs file system,
> set up a minimal system and issued fstrim again. It printed the following
> error message:
> >>>>
> >>>> fstrim: /: FITRIM ioctl failed: Input/output error
> >>>
> >>> Huh. Any kernel message at the same time? I would expect any fstrim
> >>> user space error message to also have a kernel message. Any i/o error
> >>> suggests some kind of storage stack failure - which could be hardware
> >>> or software, you can't know without seeing the kernel messages.
> >>
> >> I missed that. The kernel messages are:
> >>
> >> attempt to access beyond end of device
> >> sda1: rw=16387, want=252755893, limit=250067632
> >> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
> >>
> >> Here are some more information on the partitions and LVM physical
> segments:
> >>
> >> fdisk -l /dev/sda:
> >>
> >> Device     Boot Start       End   Sectors   Size Id Type
> >> /dev/sda1  *     2048 250069679 250067632 119.2G 8e Linux LVM
> >>
> >> pvdisplay -m:
> >>
> >>  --- Physical volume ---
> >>  PV Name               /dev/sda1
> >>  VG Name               vg_system
> >>  PV Size               119.24 GiB / not usable <22.34 MiB
> >>  Allocatable           yes (but full)
> >>  PE Size               32.00 MiB
> >>  Total PE              3815
> >>  Free PE               0
> >>  Allocated PE          3815
> >>  PV UUID               mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
> >>
> >>  --- Physical Segments ---
> >>  Physical extent 0 to 1248:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   2231 to 3479
> >>  Physical extent 1249 to 1728:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   640 to 1119
> >>  Physical extent 1729 to 1760:
> >>    Logical volume    /dev/vg_system/grml-images
> >>    Logical extents   0 to 31
> >>  Physical extent 1761 to 2016:
> >>    Logical volume    /dev/vg_system/swap
> >>    Logical extents   0 to 255
> >>  Physical extent 2017 to 2047:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   3480 to 3510
> >>  Physical extent 2048 to 2687:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   0 to 639
> >>  Physical extent 2688 to 3007:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   1911 to 2230
> >>  Physical extent 3008 to 3320:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   1120 to 1432
> >>  Physical extent 3321 to 3336:
> >>    Logical volume    /dev/vg_system/boot
> >>    Logical extents   0 to 15
> >>  Physical extent 3337 to 3814:
> >>    Logical volume    /dev/vg_system/btrfs
> >>    Logical extents   1433 to 1910
> >>
> >>
> >> Would btrfs even be able to accidentally trim parts of other LVs or
> does this clearly hint towards a LVM/dm issue?
> >
> > I can't speak sure, but (at least for latest kernel) btrfs has a lot of
> > extra mount time self check, including chunk stripe check against
> > underlying device, thus the possibility shouldn't be that high for btrfs.
>
> Indeed, bisecting the issue led me to a range of commits that only
> contains dm-related and no btrfs-related changes. So I assume this is a bug
> in dm.
>
> >> Is there an easy way to somehow trace the trim through the different
> layers so one can see where it goes wrong?
> >
> > Sure, you could use dm-log-writes.
> > It will record all read/write (including trim) for later replay.
> >
> > So in your case, you can build the storage stack like:
> >
> > Btrfs
> > <dm-log-writes>
> > LUKS/dmcrypt
> > LVM
> > MBR partition
> > Samsung SSD
> >
> > Then replay the log (using src/log-write/replay-log in fstests) with
> > verbose output, you can verify every trim operation against the dmcrypt
> > device size.
> >
> > If all trim are fine, then move the dm-log-writes a layer lower, until
> > you find which layer is causing the problem.
>
> That sounds like a plan! However, I first want to continue bisecting as I
> am afraid to lose my reproducer by changing parts of my storage stack.
>
> Cheers,
> Michael
>
> >
> > Thanks,
> > Qu
> >>
> >> Cheers,
> >> Michael
> >>
> >> PS: Current state of bisection: It looks like the error was introduced
> somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20190520/a2cdaaf2/attachment.htm>