[dm-devel] can we fix some dm snapshot crashes please? :)

Fri May 7 18:42:16 UTC 2021

07.05.2021 18:20, Mike Snitzer wrote:
> On Fri, May 07 2021 at 10:10P -0400,
> Michael Tokarev <mjt at tls.msk.ru> wrote:
> 
>> 07.05.2021 15:31, Zdenek Kabelac wrote:
>>> Dne 07. 05. 21 v 12:31 Michael Tokarev napsal(a):
>> ...
>>>>    sz=$(blockdev --getsize /dev/loop0)
>>>>    dmsetup create base --table "0 $sz snapshot-origin /dev/loop0"
>>>>    # and now the crash
>>>>    mkfs.ext4 /dev/mapper/base
>>
>>> Yes reproducible - can you please open BZ report here:
>>>
>>> https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper
>>
>> Ok, will do, thanks!
> 
> Thanks.  But when you do, please tone down the lamenting of how slow
> snapshot crashes have been fixed -- or don't.  But that line of
> rhetoric shows you're full of it.

That's a joke really - note the smile at the end of my statement.

However this is true, - this particular crash has been here for a long time,
someone else asked about it earlier too, and I asked about it here 1.5 years
ago when trying snapshots the previous time (actually when just trying to
_understand_ how to do it properly).

Not really, - it is just a joke, and I know full well how it all is done.

..
> Yes, it is dangerous to stab in the dark like you clearly are doing.

I'd LOVE to know how to do it properly. Unfortunately I found no
information about it. There's one usage scenario outlined in a few
places which talks about the topic, all are doing it similar to how
LVM is doing it. But the thing is that there are actually numerous
usage scenarios for the feature, it definitely can be used numerous
ways, - the few mentions of dm snapshots offers one particular scenario
but does not show even the basic rules. There's an email (on this list
IIRC) from about a decade ago which defines a similar scenario and which -
apparently - is the only actual source of information about the whole
thing, and it again does not describe the game _rules_ :)

What you see as "I'm full of it" is a laugh actually - there's nether
a defined rules (for which many users asked in various places), and
there are crashes, and each time one points to a crash someone points
to "you're using a wrong rules". Which turns into a dead circle... :)

My actual intention is two-fold: - besides trying to make the thing
less dangerous, I'd *love* to make some more clear docs, describing
not only one particular scenario but the actual rules, so to say.

There's a significant gap between developers understanding how this
layer and this particular feature of this layer works, and the rest
of the world. Somehow this gap seems to be a bit more significant
than it is for many other areas (I'm a (positive! :) ) sysadmin and
system developer for 20+ years and know how things are working).

Just to give an example: I for one was puzzled by dmsetup reload -
why it is not actually replacing the table *and* returning success??
I started digging into kernel sources and doing some experiments,
and just by a chance discovered that the table *will* be replaced
by the next "dmsetup resume", - even for a non-suspended device!
This might be an obvious as an air thing for people who know some
internals, but it is non-obvious even for some users doing things
in this area for years. I asked several friends of mine who worked
with dmsetup and definitely knows it - no, they never *tried* to
reload a table without explicit suspend-resume, so the operation
is always 3-fold instead of single.  While the docs mentions it
somehow, one should have the question or a guess to notice or to
find this in the docs. Also the docs mentions, like, "should be
be suspended or else something can go wrong if the device is in
active use" - note the "if". So I assumed that nothing can go
wrong if the device is not in use. And finally, maybe this is
just a somewhat-uneasily designed software when it requires the
same 3 operations for a single operation?

Back to this snapshot-origin thing, to its *usage*. The usual
example given in a few places says:

  1. create a linear 1:1 mapping of the base volume
  2. mount it
  3. create a copy of the mapping from step 1
  4. suspend the mapping from step 1
  5. create snapshot device of the device from step 3
  6. *replace* the original mapping from step 1 with snapshot-origin
     device based on the mapping from step 3
  7. resume the snapshot-origin device
  8. the result is that the filesystem mounted in step 2
     is now based on the snapshot-origin instead of the linear mapping.

(I can have something backwards or slightly wrong but the concept
should be the same).

This gives tons of questions. The first question is why we need
to *replace* the table and why we need to mount it before replacing?
Why can't we mount (if it needs to be mounted) snapshot-origin device
(if that's the thing which gets mounted in the end) instead of using
snapshot-origin directly?

...hmm. After re-reading the docs in the gentoo wiki, I think I see
the issue in my (mis)understanding, even if it is not explicitly clear.
It *might* be the snapshot-origin is not a separate device which is useful
by its own, but is a "side-device", a "marker" to detect writes to the
base/origin device for one or more snapshots?? And the mount should be
done of the original linear mapping created in step 1?

If that's the case, please count me twice.. because 1.5 years ago
when I tried this for the first time, I made the same exact mistake.
Because it is still unclear to me how it is actually used...

I'll try it a bit later...

Maybe my confusion is not sufficient to refine the docs, I dunno..
But this is an example of an attempted usage after actually *trying*
to understand how it works... twice :)

Thanks!

/mjt