[vdo-devel] Dedup performance / Writes

Gionatan Danti g.danti at assyoma.it
Fri Jan 29 12:36:13 UTC 2021


Il 2021-01-29 08:25 corwin ha scritto:
> Unfortunately, this issue stems from the fact that VDO was not
> originally designed to optimize spinning disk storage. VDO consumes a
> fair amount of memory and CPU resources, and in general, it is not the
> case that the cost of storage saved is equal to the resource cost of
> running VDO.
> 
> The actual problem stems from the fact that in order to get reasonable
> rates of deduplication, VDO has to operate with a 4K block size.
> Furthermore, because of the nature of deduplication, and the amount of
> metadata VDO has to update, the writes it does to storage below it are
> effectively 4K random writes.

I agree: looking at iostat, it is clear that many 4K IOs are in-flight 
when %util reaches >99%

> The best suggestion is to use faster storage. If the space savings is
> truly significant, the amount of storage required is hopefully reduced
> enough to offset that cost. If this is not an option, is it possible
> to add a smaller amount of fast storage to the system? If so, setting
> up a dm-writecache underneath VDO which uses the faster storage should
> help.

I am not sure dm-writeback would be of any help here: BBU-equipped DELL 
PERC controllers are already very fast at absorbing 4K writes. However, 
the week point is 4K random *reads*, which would have no benefit from 
dm-writecache (or RAID cache).

As a side note, in the past I did some testing with a dm-cache layer 
both "under" and "on top" of a VDO device. Random IO improved in 
different manner, but for OP (which already as a fast write cache) I 
think dm-cache on top of the VDO volume would be the better strategy.

> Beyond that, some improvement might be seen if you can configure a
> smaller stripe size for the RAID. For many hardware RAID controllers,
> this isn't an option, but for some it is. Alternately, you could
> consider switching to software RAID which is more configurable, or
> even rethinking whether a box that is just for backups needs the RAID
> at all.

RAID10 should be the faster, but it commands a massive space penalty 
(50%) which many (not even me) are not prepared to pay for backups. I 
can see why one uses RAID5 for backups - to maximize the available space 
while having reasonable redundancy (I use RAID6/RAIDZ2 for backups, by 
the way). No redundancy at all generally is a tough sell, even for 
backup machines.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8




More information about the vdo-devel mailing list