[vdo-devel] Dedup performance / Writes
Gionatan Danti
g.danti at assyoma.it
Fri Jan 29 12:36:13 UTC 2021
Il 2021-01-29 08:25 corwin ha scritto:
> Unfortunately, this issue stems from the fact that VDO was not
> originally designed to optimize spinning disk storage. VDO consumes a
> fair amount of memory and CPU resources, and in general, it is not the
> case that the cost of storage saved is equal to the resource cost of
> running VDO.
>
> The actual problem stems from the fact that in order to get reasonable
> rates of deduplication, VDO has to operate with a 4K block size.
> Furthermore, because of the nature of deduplication, and the amount of
> metadata VDO has to update, the writes it does to storage below it are
> effectively 4K random writes.
I agree: looking at iostat, it is clear that many 4K IOs are in-flight
when %util reaches >99%
> The best suggestion is to use faster storage. If the space savings is
> truly significant, the amount of storage required is hopefully reduced
> enough to offset that cost. If this is not an option, is it possible
> to add a smaller amount of fast storage to the system? If so, setting
> up a dm-writecache underneath VDO which uses the faster storage should
> help.
I am not sure dm-writeback would be of any help here: BBU-equipped DELL
PERC controllers are already very fast at absorbing 4K writes. However,
the week point is 4K random *reads*, which would have no benefit from
dm-writecache (or RAID cache).
As a side note, in the past I did some testing with a dm-cache layer
both "under" and "on top" of a VDO device. Random IO improved in
different manner, but for OP (which already as a fast write cache) I
think dm-cache on top of the VDO volume would be the better strategy.
> Beyond that, some improvement might be seen if you can configure a
> smaller stripe size for the RAID. For many hardware RAID controllers,
> this isn't an option, but for some it is. Alternately, you could
> consider switching to software RAID which is more configurable, or
> even rethinking whether a box that is just for backups needs the RAID
> at all.
RAID10 should be the faster, but it commands a massive space penalty
(50%) which many (not even me) are not prepared to pay for backups. I
can see why one uses RAID5 for backups - to maximize the available space
while having reasonable redundancy (I use RAID6/RAIDZ2 for backups, by
the way). No redundancy at all generally is a tough sell, even for
backup machines.
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8
More information about the vdo-devel
mailing list