[linux-lvm] about the lying nature of thin

Xen list at xenhideout.nl
Tue May 3 13:03:37 UTC 2016


Mark Mielke schreef op 30-04-2016 6:46:

> Lots of interesting ideas in this thread.

Thank you for your sane response.

> There was some discussion about how data is presented to the higher
> layers. I didn't follow the suggestion exactly (communicating layout
> information?), but I did have these thoughts:
> 
> 	* When the storage runs out, it clearly communicates layout
> information to the caller in the form of a boolean "does it work or
> not?"
> 	* There are other ways that information does get communicated, such
> as if a device becomes read only. For example, an iSCSI LUN.
> 
> I didn't follow communication of specific layout information as this
> didn't really make sense to me when it comes to dynamic allocation.
> But, if the intent is to provide early warning of the likelihood of
> failure, compared to waiting to the very last minute where it has
> already failed, it seems like early warning would be useful. I did
> have a question about the performance of this type of communication,
> however, as I wouldn't want the host to be constantly polling the
> storage to recalculate the up-to-date storage space available.

Zdenec alluded to the idea and fact that this continuous polling would 
either be required or deeply ungrateful to the hardware. In the sense of 
being hugely expensive. Of course I do not know everything about a 
system before I start thinking. If I have an idea it is usually possible 
to implement it but I only find out later down the road if this is 
actually so and if it needs amending. I could not progress with life if 
every idea needed to be 100% sure before I could commence with it, 
because in that sense the commencing and the learning would never 
happen.

I didn't know thin (or LVM) doesn't maintain maps of used blocks.

Of course for regular LVM it makes no sense if the usage of the blocks 
you have allocated to a system is none of your concern at all.

The recent DISCARD improvements apparently just signal some special case 
(?) but SSDs DO maintain maps or it wouldn't even work (?).

I don't know, it would seem that having a map of used extents in a thin 
pool is in some way deeply important in being able to allocate unused 
ones?

I would have to dig into it of course but I am sure I would be able to 
find some information (and not lies ;-))).

I guess continuous polling would be deeply disrespectful of the hardware 
and software resources.

In the theoretical system I proposed it would be a constant 
communication between systems bogging down resources. But we must agree 
we are typically talking about 4MB blocks here (and mutations to them). 
In a sense you could easily increase that to 16MB, or 32MB, or whatever.

You could even update a filesystem when mutations of a thousand 
gigabytes have happened.

We are talking about a map of regions and these regions can be as large 
as you want.

It would say to a filesystem: these regions are currently unavailable.

You would even get more flags:

- this region is entirely unavailable
- this region is now more expensive to allocate to
- this region is the preferred place

When you allocate memory in the kernel (like with kmalloc) you specify 
what kind of requirements you have.

This is more of the same kind, I guess.

Typically a thin system is a system of extent allocation, the way we 
have it.

It is the thin volume that allocates this space, but the filesystem that 
causes it.

The thin volume would be able to say "don't use these parts".

Or "all parts are equal, but don't use more than X currently".

Actually the latter is a false statement, you need real information.

I know in ext filesystems the inodes are scattered everywhere (and the 
tables) so the blocks are already getting used, in that sense. And if 
you had very large blocks that you would want to make totally 
unavailable, you would get weird issues. "That's funny, I'm already 
using it".

So in order to make sense they would have to be contiguous regions (in 
the virtual space) that are really not used yet.

I don't know, it seems fun to make something like that. Maybe I'll do it 
some day.




More information about the linux-lvm mailing list