[linux-lvm] Network-attached block storage and local SSDs for dm-cache

Tue Apr 23 13:58:58 UTC 2019

On Mon, Apr 22, 2019 at 02:25:44PM -0400, Mike Snitzer wrote:
>> I know it's possible to set up dm-cache to combine network-attached
>> block devices and local SSDs, but I'm having a hard time finding any
>> first-hand evidence of this being done anywhere -- so I'm wondering
>> if it's because there are reasons why this is a Bad Idea, or merely
>> because there aren't many reasons for folks to do that.
>>
>> The reason why I'm trying to do it, in particular, is for
>> mirrors.kernel.org systems where we already rely on dm-cache to
>> combine large slow spinning disks with SSDs to a great advantage.
>> Most hits on those systems are to the same set of files (latest
>> distro package updates), so dm-cache hit-to-miss ratio is very
>> advantageous. However, we need to build newest iterations of those
>> systems, and being able to use network-attached storage at providers
>> like Packet with local SSD drives would remove the need for us to
>> purchase and host huge drive arrays.
>>
>> Thanks for any insights you may offer.
>
>Only thing that could present itself as a new challenge is the
>reliability of the network-attached block devices (e.g. do network
>outages compromise dm-cache's ability to function).

I expect them to be *reasonably* reliable, but of course the chances of 
network-attached block storage becoming unavailable are higher than for 
directly-attached storage.

>I've not done any focused testing for, or thinking about, the impact
>unreliable block devices might have on dm-cache (or dm-thinp, etc).
>Usually we advise people to ensure the devices that they layer upon are
>adequately robust/reliable.  Short of that you'll need to create your
>own luck by engineering a solution that provides network storage
>recovery.

I expect that in writethrough mode the worst kind of recovery we'd have 
to deal with is rebuilding the dm-cache setup, as even if the underlying 
slow storage becomes unavailable, that shouldn't result in FS corruption 
on it. Even though mirrors.kernel.org data is just that, mirrors, we 
certainly would like to avoid situations where we have to re-sync 40TB 
all over, as that usually means a week-long outage.

>If the "origin" device is network-attached and proves unreliable you
>can expect to see the dm-cache experience errors.  dm-cache is not
>raid.  So if concerned about network outages you might want to (ab)use
>dm-multipath's "queue_if_no_path" mode to queue IO for retry once the
>network-based device is available again (dm-multipath isn't raid
>either, but for your purposes you need some way to isolate potential for
>network based faults).  Or do you think you might be able to RAID1 or
>RAID5 N of these network attached drives together?

I don't think that makes sense, as these volumes would likely be coming 
from the same NAS array, so we'd be increasing complexity without 
necessarily hedging any risks.

Thanks for your help -- I think we're going to try this out as 
experimental setup and then see what kind of issue we run into.

Best,
-K