<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 29, 2016 at 7:23 AM, Zdenek Kabelac <span dir="ltr"><<a href="mailto:zkabelac@redhat.com" target="_blank">zkabelac@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><span style="color:rgb(34,34,34)">Thin-provisioning is NOT about providing device to the upper</span><br></div></div> system levels and inform THEM about this lie in-progress.<br> That's complete misunderstanding of the purpose.<br></blockquote><div><br></div><div>I think this line of thought is a bit of a strawman.</div><div><br></div><div>Thin provisioning is entirely about presenting the upper layer with a logical view which does not match the physical view, including the possibility for such things as over provisioning. How much of this detail is presented to the higher layer is an implementation detail and has nothing to do with "purpose". The purpose or objective is to allow volumes that are not fully allocated in advance. This is what "thin" means, as compared to "thick".</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> If you seek for a filesystem with over-provisioning - look at btrfs, zfs and other variants...<br></blockquote><div><br></div><div>I have to say that I am disappointed with this view, particularly if this is a view held by Red Hat. To me this represents a misunderstanding of the purpose for over-provisioning, and a misunderstanding of why thin volumes are required. It seems there is a focus on "filesystem" in the above statement, and that this may be the point of debate.</div><div><br></div><div>When a storage provider providers a block device (EMC, NetApp, ...) and a snapshot capability, I expect to be able to take snapshots with low overhead. The previous LVM model for snapshots was really bad, in that it was not low overhead. We use this capability for many purposes including:</div><div><br></div><div>1) Instantiating test environments or dev environments from a snapshot of production, with copy-on-write to allow for very large full-scale environments to be constructed quickly and with low overhead. In one of our examples, this includes an example where we have about 1 TByte of JIRA and Confluence attachments collected over several years. It is exposed over NFS by the NetApp device, but in the backend it is a volume. This volume is snapshot and then exposed as a different volume with copy-on-write characteristics. The storage allocation is monitored, and if it is exceeded, it is known that there will be particular behaviour. I believe in our case, the behaviour is that the snapshot becomes unusable.</div><div><br></div><div>2) Frequent snapshots. In many of our use cases, we may take snapshots every 15 minutes, every hour, and every day, keeping 3 or more of each. If this storage had to be allocated in full, this amounts to at least 10X the storage cost. Using snapshots, and understanding the rate of churn, we can use closer to 1X or 2X the storage overhead, instead of 10X the storage overhead. </div><div><br></div><div>3) Snapshot as a means of achieving a consistent backup at low cost of outage or storage overhead. If we "quiesce" the application (flush buffers, put new requests on hold, etc.) take the snapshot, and then "resume" the application, this can be achieved in a matter of seconds or less. Then, we can mount the snapshot at a separate mount point and proceed with a more intensive backup process against a particular consistent point-in-time. This can be fast and require closer to 1X the storage overhead, instead of 2X the storage overhead.</div><div><br></div><div>In all of these cases - we'll buy more storage if we need more storage. But, we're not going to use BTRFS or ZFS to provide the above capabilities, just because this is your opinion on the matter. Storage vendors of reputation and market presence sell these capabilities as features, and we pay a lot of money to have access to these features.</div><div><br></div><div>In the case of LVM... which is really the point of this discussion... LVM is not necessarily going to be used or available on a storage appliance. The LVM use case, at least for us, is for storage which is thinly provisioned by the compute host instead of the backend storage appliance. This includes:</div><div><br></div><div>1) Local disks, particularly included local flash drives that are local to achieve higher levels of performance than can normally be achieved with a remote storage appliance.</div><div><br></div><div>2) Local file systems, on remote storage appliances, using a protocol such as iSCSI to access the backend block device. This might be the case where we need better control of the snapshot process, or to abstract the management of the snapshots from the backend block device. In our case, we previously use an EMC over iSCSI for one of these use cases, and we are switching to NetApp. However, instead of embedding NetApp-specific logic into our code, we want to use LVM on top of iSCSI, and re-use the LVM thin pool capabilities from the host, such that we don't care what storage is used on the backend. The management scripts will work the same whether the storage is local (the first case above) or not (the case we are looking into now).</div><div><br></div><div>In both of these cases, we have a need to take snapshots and manage them locally on the host, instead of managing them on a storage appliance. In both cases, we want to take many light weight snapshots of the block device. You could argue that we should use BTRFS or ZFS, but you should full well know that both of these have caveats as well. We want to use XFS or EXT4 as our needs require, and still have the ability to take light-weight snapshots.</div><div><br></div><div>Generally, I've seen the people who argue that thin provisioning is a "lie", tend to not be talking about snapshots. I have a sense that you are talking more as storage providers for customers, and talking more about thinly provisioning content for your customers. In this case - I think I would agree that it is a "lie" if you don't make sure to have the storage by the time it is required. But, I think this is a very small use case in reality. I think large service providers would use Ceph or EMC or NetApp, or some such technology to provision large amounts of storage per customer, and LVM would be used more at the level of a single customer, or a single machine. In these cases, I would expect that LVM thin volumes should not be used across multiple customers without understanding the exact type of churn expected, to understand what the maximum allocation that would be required. In the case of our IT team and EMC or NetApp, they mostly avoid the use of thin volumes for "cross customer" purposes, and instead use thin volumes for a specific customer, for a specific need. In the case of Amazon EC2, for example... I would use EBS for storage, and expect that even if it is "thin", Amazon would make sure to have enough storage to meet my requirement if I need them. But, I would use LVM on my Amazon EC2 instance, and I would expect to be able to use LVM thin pool snapshots to over provision my own per-machine storage requirements by creating multiple snapshots of the underlying storage, with a full understanding of the amount of churn that I expect to occur, and a full understanding of the need to monitor.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Device target is definitely not here to solve filesystem troubles.<br> Thinp is about 'promising' - you as admin promised you will provide<br> space - we could here discuss maybe that LVM may possibly maintain<br> max growth size we can promise to user - meanwhile - it's still the admin<br> who creates thin-volume and gets WARNING if VG is not big enough when all thin volumes would be fully provisioned.<br> And THAT'S IT - nothing more.<br> So please avoid making thinp target to be answer to ultimate question of life, the universe, and everything - as we all know it's 42...</blockquote><div><br></div><div>The WARNING is a cover-your-ass type warning that is showing up inappropriately for us. It is warning me something that I should already know, and it is training me to ignore warnings. Thinp doesn't have to be the answer to everything. It does, however, need to provide a block device visible to the file system layer, and it isn't invalid for the file system layer to be able to query about the nature of the block device, such as "how much space do you *really* have left?"</div><div><br></div><div>This seems to be a crux of this debate between you and the other people. You think the block storage should be as transparent as possible, as if the storage was not thin. Others, including me, think that this theory is impractical, as it leads to edge cases where the file system could choose to fail in a cleaner way, but it gets too far today leading to a more dangerous failure when it allocates some block, but not some other block.</div><div><br></div><div>Exaggerating this to say that thinp would become everything, and the answer to the ultimate question of life, weakens your point to me, as it means that you are seeing things in far too black + white, whereas real life is often not black + white.</div><div><br></div><div>It is your opinion that extending thin volumes to allow the file system to have more information is breaking some fundamental law. But, in practice, this sort of thing is done all of the time. "Size", "Read only", "Discard/Trim Support", "Physical vs Logical Sector Size", ... are all information queried from the device, and used by the file system. If it is a general concept that applies to many different device targets, and it will help the file system make better and smarter choices, why *shouldn't* it be communicated? Who decides which ones are valid and which ones are not?</div><div><br></div><div>I didn't disagree with all of your points. But, enough of them seemed to be directly contradicting my perspective on the matter that I felt it important to respond to them.</div><div><br></div><div>Mostly, I think everybody has a set of opinions and use cases in mind when they come to their conclusions. Please don't ignore mine. If there is something unreasonable above, please let me know.</div></div><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Mark Mielke <<a href="mailto:mark.mielke@gmail.com" target="_blank">mark.mielke@gmail.com</a>><br><br></div> </div></div>